L3 - ML Graph & Compilation¶

L3 covers graph representation, lowering, optimization, and compilation. This layer transforms model programs into forms that can execute efficiently on specific runtimes and hardware.

L3ML Graph & Compilation

Graph IR
Operator fusion
Lowering
Runtime targets

What belongs here¶

L3 is not the model weights themselves, and it is not the serving engine. It is the translation layer that decides how a model computation graph can be represented, optimized, and mapped to execution backends.

Representative projects¶

Project	Why it might fit	Adjacent layers
ONNX Runtime	Runtime and tooling around the ONNX model representation and execution providers.	L3 graph, L7 inference
OpenXLA XLA	Compiler technology for optimizing ML workloads across backends.	L2 runtime, L3 compilation
Apache TVM	Compiler stack for deep learning models across hardware targets.	L1 hardware, L3 compilation
MLIR	Compiler infrastructure useful for representing and lowering domain-specific computations.	L3 graph, L4 numeric
TensorRT-LLM	Optimization and runtime path for LLM inference on NVIDIA platforms.	L3 compilation, L7 serving
OpenVINO	Toolkit for optimizing and deploying AI inference across Intel hardware.	L3 optimization, L7 inference

Boundary questions¶

Does an inference framework belong in L3 when its main value is compilation, and L7 when its main value is serving?
Should model exchange formats be separated from compiler execution backends?
How should the layer handle dynamic agent graphs that are not pure tensor graphs?

Signals to watch¶

More attention to portable intermediate representations for AI workloads.
Compiler support for speculative decoding, mixture-of-experts routing, and sparse computation.
Cross-vendor pressure for optimization paths that do not lock models to one accelerator family.

L3 - ML Graph & Compilation¶

What belongs here¶

Representative projects¶

Boundary questions¶

Signals to watch¶

Links¶