Model routing

Sadie’s LLM calls never hardcode a vendor model. Every call declares a task class; the router resolves that class to a tier; the tier resolves to a provider + model at runtime. Swap vendors by changing env vars; no application code moves.

Implementation: packages/ai/src/model-router.ts and packages/ai/src/index.ts.

The four tiers

Tier	Intent	Canonical Anthropic	Canonical OpenAI
`tier0`	Deterministic code, no model needed
`tier1`	Frontier-small, high-volume maintenance	`claude-haiku-4-5-20251001`	`gpt-4o-mini`
`tier2`	Workhorse synthesis, interactive chat	`claude-sonnet-4-6`	`gpt-4o`
`tier3`	Deliberate reasoning, expensive	`claude-opus-4-7`	`o1`
`embeddings`	Vector embeddings		`text-embedding-3-small`

These are the defaults in DEFAULT_MODELS when a raw ANTHROPIC_API_KEY or OPENAI_API_KEY is set without tier-specific overrides. Anthropic is preferred when both keys are present.

Task class to tier

Every LLM call in the codebase passes a TaskClass to getProviderForTask(taskClass, userPrefs). The task class determines the tier.

Task class	Tier	Where it fires
`source_extraction`	tier1	Wiki compile, per-source theme / entity extraction.
`candidate_page_matching`	tier1	Matching incoming material against existing wiki entries.
`wiki_patch_draft`	tier1	Drafting a candidate patch.
`contradiction_detection`	tier1	Lint: does this source disagree with an existing claim?
`today_card_features`	tier1	Feature extraction for Today ranking.
`preference_normalization`	tier1	Clustering raw preference signals into canonical kinds.
`wiki_lint`	tier1	Wiki lint pass findings.
`wiki_page_create`	tier2	First-time wiki page synthesis.
`today_card_copy`	tier2	Final copy on a Today card.
`brief_generation`	tier2	Brief prose.
`grounded_chat`	tier2	Normal interactive chat.
`studio_rewrite`	tier2	Studio rewrites.
`contradiction_resolution`	tier3	Resolving a detected contradiction into a unified synthesis.
`deep_synthesis_chat`	tier3	Opt-in slow chat mode.

The full map lives in TASK_TIER_MAP in packages/ai/src/model-router.ts. Task classes are a closed set; adding one requires adding both the type and an entry in the map.

Resolution order

getProviderForTask walks five priorities:

User-supplied API key for their preferred provider (from sadieSettings.payload.userApiKeys, decrypted server-side).
User’s preferred provider with an env-var key (ANTHROPIC_API_KEY or OPENAI_API_KEY).
AI_FRONTIER_* tier env vars. Each tier has its own _PROVIDER + _MODEL pair.
Raw vendor key. Synthesize a provider from DEFAULT_MODELS for the resolved tier.
Local stub. sadie-local deterministic provider. Development only (NODE_ENV !== "production"), and suppressible via SADIE_ALLOW_LOCAL_AI_STUB=0.

If none of the above resolve, getLlmProvider throws MissingLlmProviderError. The chat route catches this and surfaces the error to the user as a config-hint message rather than silently falling back.

Tier env vars

All optional. Setting them pins a specific vendor + model per tier.

AI_FRONTIER_SMALL_PROVIDER=anthropic
AI_FRONTIER_SMALL_MODEL=claude-haiku-4-5-20251001

AI_FRONTIER_WORKHORSE_PROVIDER=anthropic
AI_FRONTIER_WORKHORSE_MODEL=claude-sonnet-4-6

AI_FRONTIER_REASONING_PROVIDER=anthropic
AI_FRONTIER_REASONING_MODEL=claude-opus-4-7

AI_FRONTIER_EMBEDDINGS_PROVIDER=openai
AI_FRONTIER_EMBEDDINGS_MODEL=text-embedding-3-small

When a tier env var is set but the matching vendor key is not, the router cascades to the next cheaper tier rather than failing. That keeps things running when a tier is partially misconfigured.

Why tiers, not direct model picks

The product surface changes more slowly than the frontier. Sadie’s cost profile is dominated by tier1 (high-volume, low-stakes) and tier2 (interactive). Tier3 is rare and deliberate. Locking a specific model at the call site would mean every vendor swap touches dozens of files; tiers mean one env var update.

The router also enables per-task-class escalation in the future. A low-confidence tier2 output could automatically retry at tier3, though that path is not wired yet.

Logging

Every routed call passes through withUsageLogging → logUsage in packages/ai/src/usage-logger.ts. That produces a UsageRecord with tier, provider, model, latency, token counts, and origin. The current impl is console.log but the type is stable for persisting into a table when needed.

Convenience: `routeAndComplete`

One-call convenience that resolves the tier, creates the provider, calls completeOnce, and logs usage:

const { text, resolved } = await routeAndComplete(
  { taskClass: "source_extraction", userId, origin: "/api/compile/wiki" },
  { system: "...", messages: [{ role: "user", content: "..." }] },
);

Use it for background compile work. For streaming chat, call getProviderForTask("grounded_chat", userPrefs).streamChat(...) directly so you can pipe tokens.