Skip to content

L1 - Compute Fabric

L1 covers the hardware substrate exposed to system software: GPUs, TPUs, NPUs, CPUs, memory hierarchy, accelerators, interconnects, and cluster topology. It is where AI capability becomes a physical allocation problem.

L1Compute Fabric
  1. Accelerators
  2. Memory
  3. Interconnect
  4. Topology

What belongs here

L1 is concerned with the compute resources themselves. It does not decide how a model is compiled, quantized, routed, or governed, but it strongly shapes what those upper layers can achieve.

Representative projects and platforms

Project or platform Why it might fit Adjacent layers
NVIDIA Blackwell GPU architecture for large-scale training and inference systems. L2 drivers, L3 compilation
AMD Instinct Accelerator hardware used in AI and HPC deployments. L2 ROCm, L7 serving
Google Cloud TPU Tensor processing hardware exposed through Google Cloud. L2 runtime, L3 XLA
AWS Trainium and Inferentia Purpose-built AWS chips for model training and inference. L2 runtime, L7 inference
Cerebras Wafer-Scale Engine Wafer-scale AI compute that challenges typical cluster assumptions. L1 compute, L3 compilation
Groq LPU Inference-oriented processing architecture with distinct latency tradeoffs. L7 decoding, L12 routing

Boundary questions

  • Should interconnect protocols such as NVLink or InfiniBand be modeled inside L1, or as part of L13 transport when they affect distributed inference semantics?
  • When a cloud exposes "instances" rather than chips, should the layer describe the underlying fabric or the purchasable unit?
  • How much model-specific acceleration belongs in hardware before it becomes an L3 or L7 concern?

Signals to watch

  • Specialized inference chips changing routing economics.
  • Memory bandwidth becoming a stronger limiting factor than raw FLOPS.
  • Compute providers exposing topology-aware placement APIs to application teams.