Skip to content

Model Registry

The model registry (v1/models/*.yaml) maps model identifiers to provider configurations, recording capabilities, context windows, and pricing for each model.

Models are organized by family (GPT, Claude, Gemini, etc.):

v1/models/
├── gpt.yaml # OpenAI GPT models
├── claude.yaml # Anthropic Claude models
├── gemini.yaml # Google Gemini models
├── deepseek.yaml # DeepSeek models
├── qwen.yaml # Alibaba Qwen models
├── mistral.yaml # Mistral models
├── llama.yaml # Meta Llama models
└── ... # 28+ model files

Each model entry includes:

models:
gpt-4o:
provider: openai
model_id: "gpt-4o"
context_window: 128000
max_output_tokens: 16384
capabilities:
- chat
- streaming
- tools
- vision
- json_mode
pricing:
input_per_token: 0.0000025
output_per_token: 0.00001
release_date: "2024-05-13"

Runtimes use a provider/model format to identify models:

anthropic/claude-3-5-sonnet
openai/gpt-4o
deepseek/deepseek-chat
gemini/gemini-2.0-flash
qwen/qwen-plus

The runtime splits this into:

  1. Provider ID (anthropic) → loads provider manifest
  2. Model name (claude-3-5-sonnet) → looks up in model registry

Standard capability flags:

CapabilityDescription
chatBasic chat completions
streamingStreaming responses
toolsFunction/tool calling
visionImage understanding
audioAudio input/output
reasoningExtended thinking (CoT)
agenticMulti-step agent workflows
json_modeStructured JSON output

Per-token pricing enables cost estimation in runtimes:

pricing:
input_per_token: 0.000003 # $3 per 1M input tokens
output_per_token: 0.000015 # $15 per 1M output tokens
cached_input_per_token: 0.0000003 # Cached prompt discount

Both Rust and Python runtimes use this data for CostEstimate calculations.

Models can include verification status for production deployments:

verification:
status: "verified"
last_checked: "2025-01-15"
verified_capabilities:
- chat
- streaming
- tools