Skip to content

A Layered Basis Risk Model for Compute Capacity

This draft proposes a modeling frame for compute exposure. It is intentionally simple enough to be criticized. The goal is not to define the correct compute unit, but to show how a layered model might expose mismatches that a raw GPU-hour view can hide.

Working Definition

Compute basis risk is the mismatch between the compute exposure a buyer, lender, insurer, or operator actually has and the compute exposure represented by a contract, index, hedge, budget, or headline.

A buyer might hedge H100 rental rates and still remain exposed to HBM supply, power delays, region-specific latency, software compatibility, or a model-efficiency shock. The hedge might be economically useful, but it is not the whole exposure.

Quality-Adjusted Compute Unit

A possible starting abstraction:

QACU =
  raw_gpu_hours
  * utilization_factor
  * performance_factor
  * availability_factor
  * locality_factor
  * policy_factor
  * settlement_factor

Where:

Factor AILIS anchors What it captures
raw_gpu_hours L1 Quantity of accelerator time before adjustment.
utilization_factor L2-L3, L10-L12 Scheduler efficiency, batch shape, prefill/decode split, orchestration.
performance_factor L1-L4 SKU, memory, interconnect, compiler, kernels, precision, quantization.
availability_factor L0-L2, L13 Uptime, interruptibility, maintenance windows, delivery confidence.
locality_factor L0, L9, L13-L15 Region, data gravity, latency, data movement, sovereignty.
policy_factor L15 Compliance, tenancy, auditability, safety and governance constraints.
settlement_factor L11, L15 Index fit, measurement quality, delivery verification, counterparty credit.

This is not a pricing model by itself. It is a scaffold for asking which adjustments matter for a given exposure.

Layer Shock Matrix

Shock Entry layer Possible market effect Who might misread it
Grid interconnection delay L0 Reserved capacity slips, bridge power cost rises, flexible-load value increases. Traders watching only GPU spot rates.
HBM shortage or allocation change L1 High-memory workloads reprice; long-context inference becomes capacity-constrained. Buyers hedged with generic GPU-hour curves.
New GPU generation ramps L1 Older fleets may fall in price, unless software support or availability keeps them valuable. Buyers assuming all new-generation capacity substitutes one-for-one.
TensorRT/vLLM-style serving gain L3 Effective inference capacity rises on installed hardware. Infrastructure investors modeling only physical supply.
Quantization breakthrough L4 Some workloads shift to cheaper devices; others expand demand because costs fall. Commentators assuming efficiency always lowers aggregate demand.
Larger context windows become standard L8-L9 KV-cache and memory pressure can dominate raw FLOPs. Contract designers ignoring memory and storage layers.
Sovereign/disconnected requirement L15 Public-cloud capacity becomes non-substitutable for some regulated workloads. Index users applying a global cloud price to local obligations.
Benchmark methodology change L11/L15 Settlement value changes without any physical supply change. Risk managers treating index governance as a back-office detail.

Scenario Sketches

Scenario 1: Power-Constrained Contango

If data center projects are delayed by grid interconnection queues, short-term delivered compute may become more valuable even if chip supply improves. The forward curve could steepen for regions where power and substations are binding. The arbitrage question is not "more GPUs or fewer GPUs?" but "which regions have deliverable powered racks?"

Relevant evidence: IEA projects data center electricity consumption roughly doubling from 485 TWh in 2025 to 950 TWh in 2030, while DOE/LBNL estimated U.S. data centers could rise from 4.4% of U.S. electricity in 2023 to 6.7-12% by 2028.

Scenario 2: Memory-Constrained Compute

If HBM and high-value server memory remain constrained, raw GPU capacity could be a misleading proxy. A model with larger context or memory bandwidth needs may price against HBM availability, not just accelerator count.

Relevant evidence: Micron described memory as a strategic asset in its fiscal Q2 2026 results, and Samsung's Q1 2026 update pointed to continued AI infrastructure demand, HBM4 sales, HBM4E sampling plans, and strong server-memory demand.

Scenario 3: Efficiency Shock

Software and numeric improvements can increase effective compute supply. vLLM reported 2-4x throughput improvements at similar latency in its evaluated serving workloads. AWQ reported strong 4-bit deployment performance and more than 3x TinyChat speedup versus Hugging Face FP16 on tested desktop and mobile GPUs. NVIDIA reported up to 2.8x per-GPU Blackwell throughput gains over three months from TensorRT-LLM optimizations in early 2026.

The market question is ambiguous. Efficiency could reduce demand for raw GPU-hours in a fixed workload. It could also increase demand by making AI features cheap enough to embed everywhere.

Scenario 4: Sovereign Non-Substitution

If a buyer needs disconnected, on-prem, or sovereign infrastructure, a global public-cloud GPU index might be a poor hedge. The contractual unit might need to include operational boundary, control plane location, identity/data residency, and local hardware rights.

Relevant evidence: Microsoft's February 2026 Sovereign Cloud announcement emphasized fully disconnected local operations and large-model support within customer-controlled boundaries. The EU AI Factories initiative similarly ties compute capacity to regional industrial, research, public-sector, and sovereignty goals.

Layered Exposure Record Schema

The following schema could support an internal research notebook, spreadsheet, or agent memory. It uses a JSON-Schema-style shape so the fields can be validated rather than read as shorthand.

title: LayeredComputeExposure
type: object
required:
  - workload
  - capacity
  - layer_constraints
  - hedge_or_index
  - residual_basis_risks
properties:
  workload:
    type: object
    required:
      - exposure_type
      - model_family
      - sequence_profile
      - latency_requirement
    properties:
      exposure_type:
        type: string
        enum:
          - training
          - inference
          - evaluation
          - batch
          - edge
          - sovereign
      model_family:
        type: string
      sequence_profile:
        type: object
        properties:
          input_tokens:
            type: integer
          output_tokens:
            type: integer
          context_window:
            type: integer
      latency_requirement:
        type: object
        properties:
          p50_ms:
            type: integer
          p95_ms:
            type: integer
          p99_ms:
            type: integer

  capacity:
    type: object
    required:
      - accelerator_sku
      - memory_gb
      - interconnect
      - region
      - tenancy
      - duration
    properties:
      accelerator_sku:
        type: string
      memory_gb:
        type: number
      interconnect:
        type: string
      region:
        type: string
      tenancy:
        type: string
        enum:
          - shared
          - dedicated
          - on_prem
          - sovereign
      duration:
        type: string
        enum:
          - spot
          - monthly
          - reserved
          - multi_year

  layer_constraints:
    type: object
    properties:
      l0_power:
        type: string
        enum:
          - grid
          - colocated
          - bridge
          - flexible
      l1_memory:
        type: string
        enum:
          - hbm
          - dram
          - kv_cache
          - storage
      l2_runtime:
        type: string
        enum:
          - cuda
          - rocm
          - custom
      l3_engine:
        type: string
      l4_precision:
        type: string
        enum:
          - fp16
          - fp8
          - fp4
          - int4
          - mixed
      l15_policy:
        type: string
        enum:
          - public_cloud
          - regulated
          - disconnected
          - classified

  hedge_or_index:
    type: object
    required:
      - reference_name
      - settlement_type
      - methodology_url
      - coverage_layers
    properties:
      reference_name:
        type: string
      settlement_type:
        type: string
        enum:
          - financial
          - physical
          - parametric
          - hybrid
      methodology_url:
        type: string
      coverage_layers:
        type: array
        items:
          type: string
          enum:
            - L0
            - L1
            - L2
            - L3
            - L4
            - L15

  residual_basis_risks:
    type: array
    items:
      type: object
      required:
        - risk_type
        - severity
        - note
      properties:
        risk_type:
          type: string
        severity:
          type: string
          enum:
            - low
            - medium
            - high
        note:
          type: string

Layer-Aware Arbitrage Questions

The word arbitrage should be used carefully. Some opportunities will be true arbitrage; many will be information advantages, procurement advantages, operational flexibility, or better hedging. AILIS can still help identify where the gap lives.

If the market reacts to... But the binding layer is... Possible analytical edge
GPU spot prices L0 power Model power-backed capacity spreads by region.
H100 rental curve L1 memory/topology Compare memory-rich versus compute-rich capacity.
Chip shipments L2-L3 software readiness Track usable capacity, not shipped silicon.
Quantization news L16 workload economics Separate workloads where quality loss is acceptable from those where it is not.
Sovereign AI announcements L15 governance Estimate non-substitutable local demand versus global cloud oversupply.
Forward-curve publication L11/L15 benchmark governance Analyze methodology, data sufficiency, and manipulation risk before trusting settlement.

How To Use The Model

  1. Start with the real exposure, not the available hedge.
  2. Map the exposure to layers.
  3. Map the index or contract to layers.
  4. Compare included and excluded attributes.
  5. Stress the excluded attributes.
  6. Describe the residual basis risk in plain language.

Open Questions

  • Should AILIS define a "Layer Coverage Statement" for compute benchmarks?
  • Could compute contracts expose both a raw unit and a quality-adjusted unit?
  • How should model-efficiency improvements be represented in market data?
  • Should power-flexible AI factories receive a different capacity category from static loads?
  • What empirical data would make QACU less speculative?

Sources