A Layered Basis Risk Model for Compute Capacity¶
This draft proposes a modeling frame for compute exposure. It is intentionally simple enough to be criticized. The goal is not to define the correct compute unit, but to show how a layered model might expose mismatches that a raw GPU-hour view can hide.
Working Definition¶
Compute basis risk is the mismatch between the compute exposure a buyer, lender, insurer, or operator actually has and the compute exposure represented by a contract, index, hedge, budget, or headline.
A buyer might hedge H100 rental rates and still remain exposed to HBM supply, power delays, region-specific latency, software compatibility, or a model-efficiency shock. The hedge might be economically useful, but it is not the whole exposure.
Quality-Adjusted Compute Unit¶
A possible starting abstraction:
QACU =
raw_gpu_hours
* utilization_factor
* performance_factor
* availability_factor
* locality_factor
* policy_factor
* settlement_factor Where:
| Factor | AILIS anchors | What it captures |
|---|---|---|
raw_gpu_hours | L1 | Quantity of accelerator time before adjustment. |
utilization_factor | L2-L3, L10-L12 | Scheduler efficiency, batch shape, prefill/decode split, orchestration. |
performance_factor | L1-L4 | SKU, memory, interconnect, compiler, kernels, precision, quantization. |
availability_factor | L0-L2, L13 | Uptime, interruptibility, maintenance windows, delivery confidence. |
locality_factor | L0, L9, L13-L15 | Region, data gravity, latency, data movement, sovereignty. |
policy_factor | L15 | Compliance, tenancy, auditability, safety and governance constraints. |
settlement_factor | L11, L15 | Index fit, measurement quality, delivery verification, counterparty credit. |
This is not a pricing model by itself. It is a scaffold for asking which adjustments matter for a given exposure.
Layer Shock Matrix¶
| Shock | Entry layer | Possible market effect | Who might misread it |
|---|---|---|---|
| Grid interconnection delay | L0 | Reserved capacity slips, bridge power cost rises, flexible-load value increases. | Traders watching only GPU spot rates. |
| HBM shortage or allocation change | L1 | High-memory workloads reprice; long-context inference becomes capacity-constrained. | Buyers hedged with generic GPU-hour curves. |
| New GPU generation ramps | L1 | Older fleets may fall in price, unless software support or availability keeps them valuable. | Buyers assuming all new-generation capacity substitutes one-for-one. |
| TensorRT/vLLM-style serving gain | L3 | Effective inference capacity rises on installed hardware. | Infrastructure investors modeling only physical supply. |
| Quantization breakthrough | L4 | Some workloads shift to cheaper devices; others expand demand because costs fall. | Commentators assuming efficiency always lowers aggregate demand. |
| Larger context windows become standard | L8-L9 | KV-cache and memory pressure can dominate raw FLOPs. | Contract designers ignoring memory and storage layers. |
| Sovereign/disconnected requirement | L15 | Public-cloud capacity becomes non-substitutable for some regulated workloads. | Index users applying a global cloud price to local obligations. |
| Benchmark methodology change | L11/L15 | Settlement value changes without any physical supply change. | Risk managers treating index governance as a back-office detail. |
Scenario Sketches¶
Scenario 1: Power-Constrained Contango¶
If data center projects are delayed by grid interconnection queues, short-term delivered compute may become more valuable even if chip supply improves. The forward curve could steepen for regions where power and substations are binding. The arbitrage question is not "more GPUs or fewer GPUs?" but "which regions have deliverable powered racks?"
Relevant evidence: IEA projects data center electricity consumption roughly doubling from 485 TWh in 2025 to 950 TWh in 2030, while DOE/LBNL estimated U.S. data centers could rise from 4.4% of U.S. electricity in 2023 to 6.7-12% by 2028.
Scenario 2: Memory-Constrained Compute¶
If HBM and high-value server memory remain constrained, raw GPU capacity could be a misleading proxy. A model with larger context or memory bandwidth needs may price against HBM availability, not just accelerator count.
Relevant evidence: Micron described memory as a strategic asset in its fiscal Q2 2026 results, and Samsung's Q1 2026 update pointed to continued AI infrastructure demand, HBM4 sales, HBM4E sampling plans, and strong server-memory demand.
Scenario 3: Efficiency Shock¶
Software and numeric improvements can increase effective compute supply. vLLM reported 2-4x throughput improvements at similar latency in its evaluated serving workloads. AWQ reported strong 4-bit deployment performance and more than 3x TinyChat speedup versus Hugging Face FP16 on tested desktop and mobile GPUs. NVIDIA reported up to 2.8x per-GPU Blackwell throughput gains over three months from TensorRT-LLM optimizations in early 2026.
The market question is ambiguous. Efficiency could reduce demand for raw GPU-hours in a fixed workload. It could also increase demand by making AI features cheap enough to embed everywhere.
Scenario 4: Sovereign Non-Substitution¶
If a buyer needs disconnected, on-prem, or sovereign infrastructure, a global public-cloud GPU index might be a poor hedge. The contractual unit might need to include operational boundary, control plane location, identity/data residency, and local hardware rights.
Relevant evidence: Microsoft's February 2026 Sovereign Cloud announcement emphasized fully disconnected local operations and large-model support within customer-controlled boundaries. The EU AI Factories initiative similarly ties compute capacity to regional industrial, research, public-sector, and sovereignty goals.
Layered Exposure Record Schema¶
The following schema could support an internal research notebook, spreadsheet, or agent memory. It uses a JSON-Schema-style shape so the fields can be validated rather than read as shorthand.
title: LayeredComputeExposure
type: object
required:
- workload
- capacity
- layer_constraints
- hedge_or_index
- residual_basis_risks
properties:
workload:
type: object
required:
- exposure_type
- model_family
- sequence_profile
- latency_requirement
properties:
exposure_type:
type: string
enum:
- training
- inference
- evaluation
- batch
- edge
- sovereign
model_family:
type: string
sequence_profile:
type: object
properties:
input_tokens:
type: integer
output_tokens:
type: integer
context_window:
type: integer
latency_requirement:
type: object
properties:
p50_ms:
type: integer
p95_ms:
type: integer
p99_ms:
type: integer
capacity:
type: object
required:
- accelerator_sku
- memory_gb
- interconnect
- region
- tenancy
- duration
properties:
accelerator_sku:
type: string
memory_gb:
type: number
interconnect:
type: string
region:
type: string
tenancy:
type: string
enum:
- shared
- dedicated
- on_prem
- sovereign
duration:
type: string
enum:
- spot
- monthly
- reserved
- multi_year
layer_constraints:
type: object
properties:
l0_power:
type: string
enum:
- grid
- colocated
- bridge
- flexible
l1_memory:
type: string
enum:
- hbm
- dram
- kv_cache
- storage
l2_runtime:
type: string
enum:
- cuda
- rocm
- custom
l3_engine:
type: string
l4_precision:
type: string
enum:
- fp16
- fp8
- fp4
- int4
- mixed
l15_policy:
type: string
enum:
- public_cloud
- regulated
- disconnected
- classified
hedge_or_index:
type: object
required:
- reference_name
- settlement_type
- methodology_url
- coverage_layers
properties:
reference_name:
type: string
settlement_type:
type: string
enum:
- financial
- physical
- parametric
- hybrid
methodology_url:
type: string
coverage_layers:
type: array
items:
type: string
enum:
- L0
- L1
- L2
- L3
- L4
- L15
residual_basis_risks:
type: array
items:
type: object
required:
- risk_type
- severity
- note
properties:
risk_type:
type: string
severity:
type: string
enum:
- low
- medium
- high
note:
type: string Layer-Aware Arbitrage Questions¶
The word arbitrage should be used carefully. Some opportunities will be true arbitrage; many will be information advantages, procurement advantages, operational flexibility, or better hedging. AILIS can still help identify where the gap lives.
| If the market reacts to... | But the binding layer is... | Possible analytical edge |
|---|---|---|
| GPU spot prices | L0 power | Model power-backed capacity spreads by region. |
| H100 rental curve | L1 memory/topology | Compare memory-rich versus compute-rich capacity. |
| Chip shipments | L2-L3 software readiness | Track usable capacity, not shipped silicon. |
| Quantization news | L16 workload economics | Separate workloads where quality loss is acceptable from those where it is not. |
| Sovereign AI announcements | L15 governance | Estimate non-substitutable local demand versus global cloud oversupply. |
| Forward-curve publication | L11/L15 benchmark governance | Analyze methodology, data sufficiency, and manipulation risk before trusting settlement. |
How To Use The Model¶
- Start with the real exposure, not the available hedge.
- Map the exposure to layers.
- Map the index or contract to layers.
- Compare included and excluded attributes.
- Stress the excluded attributes.
- Describe the residual basis risk in plain language.
Open Questions¶
- Should AILIS define a "Layer Coverage Statement" for compute benchmarks?
- Could compute contracts expose both a raw unit and a quality-adjusted unit?
- How should model-efficiency improvements be represented in market data?
- Should power-flexible AI factories receive a different capacity category from static loads?
- What empirical data would make QACU less speculative?
Sources¶
- IEA, Key Questions on Energy and AI, accessed May 7, 2026.
- U.S. Department of Energy, DOE Releases New Report Evaluating Increase in Electricity Demand from Data Centers, Dec. 20, 2024.
- Micron, Fiscal Q2 2026 results, Mar. 18, 2026.
- Samsung, First Quarter 2026 Results, Apr. 30, 2026.
- vLLM, Efficient Memory Management for Large Language Model Serving with PagedAttention, 2023.
- AWQ, Activation-aware Weight Quantization for LLM Compression and Acceleration, 2023.
- NVIDIA, Delivering Massive Performance Leaps for Mixture of Experts Inference on NVIDIA Blackwell, Jan. 8, 2026.
- Microsoft, Sovereign Cloud adds disconnected large-model support, Feb. 24, 2026.
- European Commission, AI Factories, accessed May 7, 2026.