AMD SEV Confidential Compute farming comes to GPUX.AI

vans163
6 min readJul 9, 2021

--

So our crew is making a video transcoder, well we’ve made it already, we are working on the hard task of decentralising it; honesty, quality, integrity.

The peer should be honest about the GPU they have, their internet speed, and how much work they did.

The peer should provide a quality job and not overprovision / overload the GPU or use inferior models.

The peer should not insert malicious watermarks or do other tampering with the video.

Part of the reason these questions are so hard to solve is why we still rely on centralised cloud services, that markup things like IOPS and bandwidth 1000x to what they actually cost making many software business models infeasible. Netflix on AWS, Twitch on AWS, Mixer on Azure, Stadia on GCP, GeforceNow on Nvidia Cloud.

Do you see the pattern, 82%+ of internet traffic, anything related to video content, is being dominated by a handful of players who have broad access to geographical coverage (the edge) that is essentially free of charge.

How do we build out this edge to transcode our videos?

Wait, stop, why are we doing all this work to just transcode video, there is MUCH more potential here.

The mere feat of being able to run ANYTHING — ANYWHERE with honesty, quality and integrity is a feat of its own.

So lets do that we said.

PRISM our transcode app is no longer all of GPUX.AI, its now just a single app (out of potentially many) on-top of GPUX (both operated by the GPUX Foundation), as GPUX shifts to become the base layer for decentralised confidential compute.

With that we present the farm (github.com/gpuedge/farm), our decentralised confidential compute software that gives you more ways to collect yields from your metal. You can farm out your CPUs, bandwidth, and GPUs for various jobs all the while building out what we call the golden edge; the bedrock for the new web (web 3.0 *cringe*).

You see the solution for web 3.0 is not blockchain, no, its the availability of planetary compute - bandwidth - gpus at the edge, why do so many of these new products have mostly all their nodes hosted on AWS? How are you making a decentralised product, in yes, many different stakeholder hands, but all those stakeholders are just hosting on AWS, what happens if AWS goes night night? Or worse what if Mr. Bezos decides he does not like your shitcoin when he wakes up in the morning aboard his spaceship (how do you tell when its morning in space) and shuts down all AWS instances running your node?

Lets digress and do a tech dive.

Confidential Compute — what is it?

The ability to run a piece of code anywhere in the world knowing only you can have access to that code. The delegation of execution falls onto the hands of another party, the party executing is unable to see the actual code executing, yet the work gets done.

Here is a clear image about how it works. A SOC is a System on a Chip (most processors do much more than you realise in those transistors). The DRAM of the confidential compute system is encrypted by the processor, leaving the only attack vector the processor itself (*disclaimer* there are side channel attacks, voltage attacks and more) but as time advances more of these attacks will be resolved. The knowledge, complexity, and reliability required to execute these attacks is enough to give confidential compute systems the label of being beyond reasonably secure.

4K Ultra Bluray movies only run in a confidential compute environment.

Apple uses confidential compute to store your lockscreen password.

In fact its secure enough for Azure, known for being a top defence industry pick because of unparalleled security, here is a link to gain access to their preview for confidential compute on AMD SEV, https://azure.microsoft.com/en-ca/blog/azure-and-amd-enable-lift-and-shift-confidential-computing/.

The technology is the same that GPUX uses.

How does github.com/gpuedge/farm work?

Beyond plugging in your Near key to collect your yields know not all hardware is yet supported, please understand practical confidential compute is very new and growing very rapidly (based on velocity of github commits to projects in the space).

Farm is currently able to run 2 jobs:

  • AMD EPYC confidential compute VM off an encrypted disk.
  • Any Processor plain text (Docker) sandbox.

On GPUX the security status is as follows:

  • AMD EPYC /w SEV — Confidential 🔒
  • Intel Processors /w SGX — Not Confidential (yet) 📄
  • AMD Ryzen|Threadripper — Not Confidential 📄

We thought hard and decided to provide support for plain text jobs, plain text means there is no encryption and the host can peek at whatever your app is doing. This is because we do not want to leave the largest chunk of the market out in the cold as we build support for Intel processors.

When you run on an AMD EPYC processor, farm is able to run your encrypted disks inside a VM. The host cannot peak at what happens inside your VM, you can consider your application and code beyond reasonably secure.

Use cases:

  • Trusted oracle that can run anywhere in the world
  • Signing / Rollup transactions
  • IPFS Gateway / Sia Portal

You would not run the above on a random system with an untrusted host.

When you run on any other processor, farm is not able to encrypt anything, the host sees exactly what you are running, now if your app is compiled into a binary it makes it harder for the host to steal your code (unless they are a skilled reverse engineer) but not your secrets; even so it would take a reasonable amount effort.

Use cases:

  • Machine learning for public datasets
  • Miscellaneous GPU compute
  • Public Live Streaming

When submitting jobs you are given a clear choice of where you want to run, inside a confidential environment or as a plain text Docker image in a sandbox (GPUX does not actually use Docker to run the Dockerfile and the image runs rootless).

Many things can be expected to be in a state of flux aka broken, it is still very early, but with each passing execution the pieces start aligning more correctly.

Stay tuned.

--

--