L5 - Tokenization & Encoders¶

L5 covers the conversion of raw inputs into model-facing representations: tokenization, image encoders, audio encoders, patchification, embeddings, and modality-specific preprocessing. It is where the world becomes model input.

L5Tokenization & Encoders

Text tokens
Vision patches
Audio frames
Embedding spaces

What belongs here¶

L5 is lower than prompt construction. It does not decide what context is selected, but it does define how selected context is represented and how much of it can fit.

Representative projects¶

Project	Why it might fit	Adjacent layers
Hugging Face Tokenizers	Fast tokenization library used across transformer workflows.	L5 tokenization, L6 model compatibility
OpenAI tiktoken	Tokenizer library used for OpenAI model token accounting and encoding.	L5 tokenization, L8 prompting
SentencePiece	Unsupervised text tokenizer and detokenizer commonly used in NLP models.	L5 tokenization, L6 models
CLIP	Vision-text representation model that illustrates multimodal encoding boundaries.	L5 encoders, L6 architecture
Whisper	Speech recognition model with audio preprocessing and encoding concerns.	L5 audio, L16 applications
SigLIP	Vision-language encoder family useful for multimodal retrieval and classification.	L5 encoders, L9 retrieval

Boundary questions¶

Does an embedding model belong here, in L6 model architecture, or in L9 retrieval when it is used for search?
Should token counting be modeled as L5 mechanics or L8 context budgeting?
How should AILIS represent multimodal systems where each modality has a different encoder stack?

Signals to watch¶

Longer-context models making tokenization less visible but still economically important.
Multimodal encoders becoming more composable across products.
Tokenizer mismatch causing retrieval, evaluation, or governance failures.

L5 - Tokenization & Encoders¶

What belongs here¶

Representative projects¶

Boundary questions¶

Signals to watch¶

Links¶