L2 - System & Driver Runtime¶
L2 covers the runtime bridge between hardware and ML frameworks: drivers, kernel APIs, device memory management, container integration, scheduling hooks, and low-level libraries. It is where hardware becomes programmable by the rest of the stack.
L2System & Driver Runtime
- Drivers
- Kernel APIs
- Memory management
- Cluster scheduling hooks
What belongs here¶
L2 is lower than model graph compilation. It asks whether accelerated hardware can be addressed, scheduled, isolated, monitored, and shared safely by higher layers.
Representative projects¶
| Project | Why it might fit | Adjacent layers |
|---|---|---|
| NVIDIA CUDA | Programming model, libraries, and runtime for NVIDIA GPU computing. | L1 GPUs, L3 compilation |
| AMD ROCm | Open software stack for AMD GPU acceleration. | L1 accelerators, L3 frameworks |
| Apple Metal | Low-level graphics and compute API used by Apple silicon workloads. | L1 devices, L7 local inference |
| Intel oneAPI | Cross-architecture programming model and toolkits. | L1 CPU/GPU, L3 compilation |
| NVIDIA GPU Operator | Kubernetes integration for GPU drivers, runtime components, and monitoring. | L2 runtime, L12 policy |
Boundary questions¶
- Are scheduler plugins part of L2 because they expose devices, or L12 because they enforce placement policy?
- When a runtime library performs graph optimization, should it move up to L3?
- How should AILIS represent local-device inference where L1 and L2 are hidden inside the operating system?
Signals to watch¶
- Better accelerator virtualization and multi-tenant isolation.
- Runtime support for confidential or attested AI workloads.
- Scheduling APIs that expose power, memory, and accelerator topology to higher layers.