Skip to content

L2 - System & Driver Runtime

L2 covers the runtime bridge between hardware and ML frameworks: drivers, kernel APIs, device memory management, container integration, scheduling hooks, and low-level libraries. It is where hardware becomes programmable by the rest of the stack.

L2System & Driver Runtime
  1. Drivers
  2. Kernel APIs
  3. Memory management
  4. Cluster scheduling hooks

What belongs here

L2 is lower than model graph compilation. It asks whether accelerated hardware can be addressed, scheduled, isolated, monitored, and shared safely by higher layers.

Representative projects

Project Why it might fit Adjacent layers
NVIDIA CUDA Programming model, libraries, and runtime for NVIDIA GPU computing. L1 GPUs, L3 compilation
AMD ROCm Open software stack for AMD GPU acceleration. L1 accelerators, L3 frameworks
Apple Metal Low-level graphics and compute API used by Apple silicon workloads. L1 devices, L7 local inference
Intel oneAPI Cross-architecture programming model and toolkits. L1 CPU/GPU, L3 compilation
NVIDIA GPU Operator Kubernetes integration for GPU drivers, runtime components, and monitoring. L2 runtime, L12 policy

Boundary questions

  • Are scheduler plugins part of L2 because they expose devices, or L12 because they enforce placement policy?
  • When a runtime library performs graph optimization, should it move up to L3?
  • How should AILIS represent local-device inference where L1 and L2 are hidden inside the operating system?

Signals to watch

  • Better accelerator virtualization and multi-tenant isolation.
  • Runtime support for confidential or attested AI workloads.
  • Scheduling APIs that expose power, memory, and accelerator topology to higher layers.