Skip to content

L3 - ML Graph & Compilation

L3 covers graph representation, lowering, optimization, and compilation. This layer transforms model programs into forms that can execute efficiently on specific runtimes and hardware.

L3ML Graph & Compilation
  1. Graph IR
  2. Operator fusion
  3. Lowering
  4. Runtime targets

What belongs here

L3 is not the model weights themselves, and it is not the serving engine. It is the translation layer that decides how a model computation graph can be represented, optimized, and mapped to execution backends.

Representative projects

Project Why it might fit Adjacent layers
ONNX Runtime Runtime and tooling around the ONNX model representation and execution providers. L3 graph, L7 inference
OpenXLA XLA Compiler technology for optimizing ML workloads across backends. L2 runtime, L3 compilation
Apache TVM Compiler stack for deep learning models across hardware targets. L1 hardware, L3 compilation
MLIR Compiler infrastructure useful for representing and lowering domain-specific computations. L3 graph, L4 numeric
TensorRT-LLM Optimization and runtime path for LLM inference on NVIDIA platforms. L3 compilation, L7 serving
OpenVINO Toolkit for optimizing and deploying AI inference across Intel hardware. L3 optimization, L7 inference

Boundary questions

  • Does an inference framework belong in L3 when its main value is compilation, and L7 when its main value is serving?
  • Should model exchange formats be separated from compiler execution backends?
  • How should the layer handle dynamic agent graphs that are not pure tensor graphs?

Signals to watch

  • More attention to portable intermediate representations for AI workloads.
  • Compiler support for speculative decoding, mixture-of-experts routing, and sparse computation.
  • Cross-vendor pressure for optimization paths that do not lock models to one accelerator family.