L1 - Compute Fabric¶
L1 covers the hardware substrate exposed to system software: GPUs, TPUs, NPUs, CPUs, memory hierarchy, accelerators, interconnects, and cluster topology. It is where AI capability becomes a physical allocation problem.
L1Compute Fabric
- Accelerators
- Memory
- Interconnect
- Topology
What belongs here¶
L1 is concerned with the compute resources themselves. It does not decide how a model is compiled, quantized, routed, or governed, but it strongly shapes what those upper layers can achieve.
Representative projects and platforms¶
| Project or platform | Why it might fit | Adjacent layers |
|---|---|---|
| NVIDIA Blackwell | GPU architecture for large-scale training and inference systems. | L2 drivers, L3 compilation |
| AMD Instinct | Accelerator hardware used in AI and HPC deployments. | L2 ROCm, L7 serving |
| Google Cloud TPU | Tensor processing hardware exposed through Google Cloud. | L2 runtime, L3 XLA |
| AWS Trainium and Inferentia | Purpose-built AWS chips for model training and inference. | L2 runtime, L7 inference |
| Cerebras Wafer-Scale Engine | Wafer-scale AI compute that challenges typical cluster assumptions. | L1 compute, L3 compilation |
| Groq LPU | Inference-oriented processing architecture with distinct latency tradeoffs. | L7 decoding, L12 routing |
Boundary questions¶
- Should interconnect protocols such as NVLink or InfiniBand be modeled inside L1, or as part of L13 transport when they affect distributed inference semantics?
- When a cloud exposes "instances" rather than chips, should the layer describe the underlying fabric or the purchasable unit?
- How much model-specific acceleration belongs in hardware before it becomes an L3 or L7 concern?
Signals to watch¶
- Specialized inference chips changing routing economics.
- Memory bandwidth becoming a stronger limiting factor than raw FLOPS.
- Compute providers exposing topology-aware placement APIs to application teams.