Designing Compute Markets

We have been increasingly crunched for compute capacity over the past few months. Everyone, from early-stage startups to established hyperscalars, shares this sentiment. Market pricing does indeed reflect this narrative: the prices for 1-year H100 leases have risen 35% over the past four months.

Amidst this capacity crunch, we have spontaneously seen market-making or brokering mechanisms emerge. From VC firms running compute pools to independent compute brokers slinging H100s, everyone has seemingly made “compute” an agenda. Many cloud aggregators are also working on compute allocation in a more systematic way: Compute Exchange, vast.ai, Prime Intellect to name a few. All these efforts are essentially attempts to provide transparency and price discovery, one of the core functions of exchanges.

The other side of this shortage is price volatility. On that front, we have many startups promising to “financialize” compute and offer services ranging from market intelligence to financial derivatives. As their first product – the precursor to the financial derivatives – these companies have created GPU price indices. Market leaders Ornn and Silicon Data already have indices on Bloomberg, while Pluto is working toward a CFTC license. Others, like Squaretower and Compute Desk are also competing.

Right now, these indices use different transaction data and methodologies, so they vary significantly. On April 28, the Ornn H100 index was marked at $1.92/hr, Silicon Data’s was $2.58/hr, and Squaretower’s was $2.71/hr. My own index was at $2.49/hr.

This simply isn’t tenable. Markets are opaque and compute is heterogeneous, so every index operator has incomplete information and is applying differing normalization techniques to get a headline index price. For exchange-traded financial derivatives to work, we need an authoritative, trustable index built on transparent market data. This transparent market data is only obtained through a highly liquid compute market. Once again, we see the need for compute markets.

How should compute markets be designed?

I will move from the core design principle of having maximally efficient allocation of compute. This means that we are trying to maximize effective compute use, which is the economic value or surplus generated from the use of that compute. We determine this from the buyers’ willingness-to-pay. In other words, we are trying to maximally monetize the compute available.
As mentioned, an early, organic product-market fit for such a marketplace has been VC-backed compute pools. This looks like a degenerate case of the proposed market: multiple buyers, one supplier. Taking this degenerate case further, what is the most efficient way to allocate compute in this system instantaneously? We can model this mathematically.

Buyers ultimately think in units of workloads: inference, pre-training, or RL post-training. These are compute-agnostic to varying degrees – certain workloads, like say, running inference for Deepseek-V4, has a sharper pareto frontier than, say, running inference for gpt-oss-120b across GPUs. Similarly, some workloads may be more memory-constrained than others. We ideally want our market mechanism to reflect this heterogeneity. We can do this by denominating everything in workloads instead of GPUs or racks, and so we work with functions of (a) how well a workload runs on a specific compute configuration, and (b) how much is the buyer willing to pay for some $m$ units of that workload.

Moreover, in line with Semianalysis’ theory of goodput, we should also account for auxiliary costs beyond the base GPU – CPUs, networking, storage, and so on. Therefore, I consider “compute” to encompass not just GPUs, but these other elements too.

workload abstraction via benchmarking

Here’s how the (simplified) math looks:

Let $X$ be the vector of compute characteristics. For example:

\begin{aligned} X = (&\text{GPU type}, \text{GPU count}, \text{HBM}, \text{interconnect}, \text{topology}, \\ &\text{CPU}, \text{storage}, \text{latency}, \text{uptime}, \text{time window}) \end{aligned}

Each workload $n$ has a performance function:

f_n(X)

which maps the bundle $X$ into units of useful work. The buyer’s willingness to pay for those units is:

p_n(f_n(X))

or, more simply:

V_n(X)=p_n(f_n(X)).

In the one-supplier case, the pool operator is solving:

\max_{X_1,\ldots,X_N}\sum_n V_n(X_n) \quad \text{s.t.} \quad \sum_n X_n \leq R

where $X_n$ is the bundle of compute characteristics given to workload $n$, $R$ is the pool’s inventory, and $V_n(X_n)$ is the buyer’s value for that bundle.

In a compute pool, we are working with one supplier with fixed supply. The supply can’t flee anywhere or serve other customers outside of the network, so it is passive. It is merely a price-taker. However, as we generalize this to many suppliers, we can no longer guarantee those assumptions. Moreover, different suppliers will have different cost structures – eg. from differing electricity prices across datacenter regions. To model this, we can introduce functions denoting the cost of supplying compute per supplier. This “cost” is not just the raw infrastructure cost, but the opportunity cost of supplying this compute elsewhere, too.

With many suppliers, the market-clearing problem becomes:

\max_{\{X_n\},\{Y_s\}} \sum_n V_n(X_n) - \sum_s c_s(Y_s)

subject to:

\sum_n X_n \leq \sum_s Y_s, \quad Y_s \leq R_s.

Here, $Y_s$ is the capacity supplier $s$ makes available, $R_s$ is its inventory, and $c_s(Y_s)$ is its cost of exposing that capacity to the market.

After this, we can generalize this to different levels of service-level objectives. Ideally, our system should have both spot/interruptible capacity and on-demand/reserved capacity. We can generalize this binary into a spectrum, with differing levels of probability of interruption. When using spot, the expected units of the workload generated are a function of (a) the probability of interruption, and (b) interruption costs (eg. having to recompute the KV cache). With this, we get:

Let $k$ index the service tier, and let $\pi_k$ be the probability of interruption or preemption over the relevant horizon. A buyer’s expected value can be written as:

U_n(X,k) = (1-\pi_k)V_n(X) - \pi_k L_n(X)

where $L_n(X)$ is the loss from interruption.

For inference, $L_n(X)$ may be small. For a long training run, $L_n(X)$ may be very large. On the supply side, the spot/on-demand spread can be interpreted as the value of the supplier’s preemption option. Under a simplified model:

\lambda_{\text{OD}}-\lambda_{\text{spot}} \approx \text{value of reclaiming capacity when scarcity hits}.

A more formal version uses an overcommit factor $\beta_{\text{spot}}$:

\lambda_{\text{OD}}-\lambda_{\text{spot}} \approx (1-\beta_{\text{spot}})\mu

For each period $T$ for which we have to allocate compute, we can run this market multiple times: at T-minus-1 day, T-minus-1 week, T-minus-1 month, and so on. A standard unit for $T$ is 1 hour, but we can run this at shorter or longer cadences too. By running these markets multiple times, participants obtain clarity on market dynamics in advance, while still giving them the opportunity to update plans and incorporate more information. We also get a natural, smooth pricing curve and term structure across time.

A major inspiration for this proposal has been Independent System Operators (ISOs) like ERCOT and PJM, who run and coordinate electricity markets. Electricity markets and compute markets have natural affinities: one is the input to the other and both resources are ephemeral and perishable. The mathematics between how ERCOT runs and how this compute market runs is also very similar – both would essentially solve giant optimization problems through market-clearing solvers.

Given this, we can take inspiration from the history of ISOs, and introduce a couple more features.

Much like electricity, we can introduce a mechanism to ensure that the market always has sufficient reserves to clear in the short-term. To do this, we can introduce a scarcity/reserve premium, denoted by the function $A(r)$ which increases as the market suffers increased strain. This naturally encourages suppliers to keep spare capacity in the market in the short-run. The market, in expectation of the scarcity/reserve premium, is also encouraged to keep adding supply in the long-term.

Second, we can introduce a virtual bidding system. These are purely financial instruments which pay the difference between the realized price of compute in a period $T$, and the predicted price of compute in that period in the day/week/month ahead market. This closes any arbitrage between those markets. This also provides a channel to speculate on compute prices.

term structure: forward markets clear ahead of T

ahead-market runs produce expected prices λ̂; virtual bids pay the basis λ − λ̂ at T.

What I’ve presented thus far is only a rough illustration. Most of the complexity lies not in the theoretical market design, but actually configuring the market-clearing solver to handle this giant optimization problem and deal with non-convexities like discontinuous scale-up domains. The bidding/asking system – on what parameters and in what format do buyers and suppliers provide their information and preferences – has also been handwaved away. These things have been solved before (ERCOT and FCC spectrum auctions), so I expect them to be solved again.

The big question left is on longer-term hedging. Our current system provides clarity to market participants over the intervals where we can run markets. But markets run far too ahead are too noisy, as both the models and the chips will change by the time the market executes.

So, what do we do about these longer tenors? Currently, we have a crude system of just holding or hedging over long-term GPU contracts. This is one option. We can refine this by dividing this into narrower and broader contracts, where the narrow contracts are more specified (with CPU/memory configs, region, and so on), and broader contracts which track the price of a wide basket of compute configs.

If we’re really ambitious, we can try running the compute-equivalent of PJM’s capacity markets. Many companies in the semiconductor supply chain essentially have the job of predicting future workloads. Nvidia has to plan out its GPU roadmaps 3-4 years in advance, and it does so with the expectation of how its GPUs will be used. Downstream, TSMC, SK Hynix, and others do the same, all in anticipation of how the company above them in the chain will act.

We can formalize this process into a capacity market: buyers pre-sign with suppliers for Feynman and future compute platforms many years in advance, through an auction-like process. Buyers put up their workload performance functions and willingness-to-pay curves, and suppliers put up potential supply curves and cost curves. We run a set number of iterated auctions where each round, buyers and suppliers can modify their workloads and supply curves, until the market reaches a collective understanding and surplus is maximized. Essentially, this is a decentralized, surplus-maximizing way for deciding how datacenters and hardware should look many years into the future.

With information from the capacity market, the short-to-medium tenors can be better priced in expectation of what happens on those longer tenors.

Why is this design better?

Market design is based on the principle of minimizing the predictable failures bilateral transactions have, like allocation pressure from perishability, or coordination gaps. It does this by designing a mechanism that aggregates siloed private information to produce prices or allocations that these decentralized agents can respond to.

In this long march toward minimizing failure and maximizing surplus, we have only scratched the surface. The extent of compute markets today is only bidding and supplying GPUs across SKU, duration and region. This paradigm is too lossy! There is far, far more information we can aggregate to better solve for perishability and coordination.

For example, our design takes into account, say, CPU-heavy, agentic or RL workloads, versus compute-bound pre-fill, versus memory-bound decode. All of these would value things outside of the GPU differently, and shouldn’t be accommodated in the same $\text{GPU/hour}$ number!

This design gives us extremely granular information. At any point, we can know exactly the binding constraints in the system, and this instant, transparent feedback allows both buyers and suppliers to rapidly iterate and adjust their workloads and supply. Theoretically, we can know real-time demand for specific classes of workloads, specific interconnect topologies in specific regions, and so on. This would be a massive improvement over existing attempts at trying to understand intelligence per watt, or “tokenomics” more generally.

This market intelligence translates well to financial instruments too. The structure of the market allows us to price financial instruments in terms of workload cost – the buyers can avoid the basis risk associated with hedging with an abstract, approximate GPU/hour index, which may or may not correspond to the specific compute config best suited to the workload. Beyond that, we can also create financial instruments based on approximate classes of workloads – inference, training, RL, and so on; or specific model architectures, like DiTs for world models. And with virtual bidding, we have ways to ensure the various ahead-markets stay tethered to the real-time, compute-allocating market.

Beyond this, it makes spot/interruptible compute usable. Spot compute is fragmented and poorly monetized right now. Unless they’re a hyperscalar, suppliers don’t have enough GPUs available on spot to have the small-enough variance on spot interruptions. If buyers knew with near-certainty that their spot instance has a 20% chance of interruption, as opposed to some unknown, fluctuating number, they would use it more. By pooling spot capacity across many suppliers, we can reduce variance on spot, and make it usable. The network effects makes spot compute more usable, which increases its price, which then encourages more supply, and so on.

aggregating spot capacity collapses interruption variance

fragmented per-supplier pools (grey) are too wide to plan around; one aggregated pool (terracotta) collapses to a narrow band.

The usability of spot opens up many opportunities for neoclouds to monetize further. They can list unused but reserved capacity from bilateral agreements as spot capacity on the market. Buyers can do the same, and perhaps even lease their on-demand/reserved capacity to the market, akin to how SF Compute does it. Essentially, the market can interoperate with the current paradigm of long-tenor, bilateral deals.

Why is it better for the world?

The simplest framework to understand the financialization of compute is this:

Why hasn’t compute “financialized”? Because it isn’t a commodity.
Why isn’t it a commodity? Because it is messy and complex (i.e. heterogeneous and non-fungible).
Why is it messy and complex? Because we didn’t make an effort to standardize it.
Why didn’t we standardize it? Because compute is rapidly evolving, and we have had immense innovation in it. Standardization makes it hard to innovate – you only standardize when you are convinced that the common standard will actually be relevant for a significant amount of time.

In essence, the financialization of compute is intrinsically linked to innovation. This is the point I made in my previous essay. As long as there is innovation pressure, it will be hard to financialize. All of Coreweave’s, Nvidia’s or Google’s edge is essentially in this non-fungibility and heterogeneity: the fact that they can develop and deploy NVLink, OCS or ICI, or operate their datacenters with unparalleled uptime and reliability.

To ensure markets aren’t a tyrannical, territorializing force impinging on innovation and progress – the kind they can be if they impose onerous standards – they must be as passive and accepting of the currents of innovation. They should build in all the heterogeneity which comes with innovation: the leviathan shouldn’t prevent Etched or any new chip startup from disrupting the status quo. If all of our markets and financial plumbing are denominated in specific GPU SKUs, we have essentially entrenched that technology paradigm. The switching costs to another technology paradigm increase, as it wouldn’t have the same financing or allocation mechanism.

By denominating everything in workloads as opposed to specific GPUs, the suppliers are actively rewarded for performing on the workload. Innovation is encouraged. Likewise, it reduces entrenchment for software/models: the market rewards models which are performant and optimized for a variety of compute configs, as opposed to just Nvidia SKUs.

Most important, it democratizes and decentralizes compute. Many of the economies of scale accrued by hyperscalars and larger neoclouds, like bulk sales, are nullified to an extent. Similarly, smaller research labs and startups are able to procure scarce compute, as the effective available compute goes up as utilization increases.

Yet, in making AI libertarian, we have created an all-powerful, omniscient entity. We saw what damage such entities can do in the 2021 Texas energy crisis. Perhaps this is a necessary evil.

Who uses this market?

We saw VC compute pools were an early sign of PMF for such markets. Here is Anjney Midha describing the ideal user of a compute “grid”:

“When independent teams pool their compute needs, they create an infrastructure layer whose sole function is to maximize utilization for each other, without compromising individual freedoms. Each member stays independent, retains full control over its own baseload, and gets access to automated infrastructure at a scale that would otherwise require becoming the kind of organization that produces fewer breakthroughs per unit of compute … The distinction is that a grid pools compute across providers in a way that makes compute access as flexible as possible for individual teams.”

In essence, the ideal buyers are research labs or application-layer companies with enough scale, complexity and heterogeneity in their workloads where this mechanism would be useful to them, but not large enough where they can build out their own datacenters and vertically integrate with the infrastructure layer. Likewise, the market is best suited to smaller, underpriced, but performant neoclouds.

At a large enough scale, companies are already running internal markets: matching heterogeneous workloads to heterogeneous chip supply. The complex and cutting-edge schedulers and load-balancers they operate are quasi-markets.

They might still procure some capacity on the markets – the OTC dealings between the hyperscalars/frontier labs and the neoclouds are a sign of this. Very recently, we saw SpaceX lease Colossus 1 to Anthropic, so customizability is not the end-all, be-all. In the long-term however, Anthropic is sticking to Trainiums and customized datacenters.

Who runs this market?

There are many options here:

The current option is a VC or a capital allocator running it for their portfolio companies. This works in a smaller, limited setting and can’t generalize to an ISO-level market. The conflicts of interest would be too large at that level.
Startups are the emerging option. For a nascent company, solving for trust is difficult. The shape of this problem is also extremely technical – the market design and auction work exceeds the complexity of the Nobel-prize winning FCC Spectrum auction design, and so the initiative needs to be extremely well-funded, in collaboration with existing stakeholders.
SemiAnalysis-type companies are a good wildcard. Semianalysis already runs InferenceX, which benchmarks inference workloads across many SKUs. Such a service, in addition to extensive monitoring and telemetry, will have to be integrated in the market. They are neutral, third-party which can potentially be trusted with such a grid.
Clouds like Modal and SF Compute are also great options. Modal in particular already does workload-based, serverless jobs, so its expertise can naturally carry over to such a market. SF Compute already does leaseback into the market, and it can expand from there.
The most tested option would be the non-profit consortium, ISO-style model. Other parallels to this model include The Clearing House for payment rails.
An underrated option is Nvidia. Its incentives are to fragment the neocloud/datacenter layer, so as to prevent any individual operator from having too much pricing leverage. This market can do exactly that. Moreover, the capacity market bits will require extensive coordination from Nvidia/chip companies.
A state-operated setup is always an option. I’m skeptical it can be agile enough.
There might be some decentralized, crypto, Prime Intellect-esque angle. It seems difficult to work out though, given how frequently the protocol might need to change.

Next Steps

I will try running an Aaru-style multi-agent simulation for a simplified version of this market to see how it works, and if I catch any surprises. Given the vast combinatorial space, I expect a lot of the bidding/asking to be done by agents, even in the real market. Let’s see how it works out!

If you found these ideas interesting and would like to chat, feel free to contact me on kavishg@stanford.edu or @thatkavish on X.