Skip to content

Model routing

Sadie’s LLM calls never hardcode a vendor model. Every call declares a task class; the router resolves that class to a tier; the tier resolves to a provider + model at runtime. Swap vendors by changing env vars; no application code moves.

Implementation: packages/ai/src/model-router.ts and packages/ai/src/index.ts.

TierIntentCanonical AnthropicCanonical OpenAI
tier0Deterministic code, no model needed
tier1Frontier-small, high-volume maintenanceclaude-haiku-4-5-20251001gpt-4o-mini
tier2Workhorse synthesis, interactive chatclaude-sonnet-4-6gpt-4o
tier3Deliberate reasoning, expensiveclaude-opus-4-7o1
embeddingsVector embeddingstext-embedding-3-small

These are the defaults in DEFAULT_MODELS when a raw ANTHROPIC_API_KEY or OPENAI_API_KEY is set without tier-specific overrides. Anthropic is preferred when both keys are present.

Every LLM call in the codebase passes a TaskClass to getProviderForTask(taskClass, userPrefs). The task class determines the tier.

Task classTierWhere it fires
source_extractiontier1Wiki compile, per-source theme / entity extraction.
candidate_page_matchingtier1Matching incoming material against existing wiki entries.
wiki_patch_drafttier1Drafting a candidate patch.
contradiction_detectiontier1Lint: does this source disagree with an existing claim?
today_card_featurestier1Feature extraction for Today ranking.
preference_normalizationtier1Clustering raw preference signals into canonical kinds.
wiki_linttier1Wiki lint pass findings.
wiki_page_createtier2First-time wiki page synthesis.
today_card_copytier2Final copy on a Today card.
brief_generationtier2Brief prose.
grounded_chattier2Normal interactive chat.
studio_rewritetier2Studio rewrites.
contradiction_resolutiontier3Resolving a detected contradiction into a unified synthesis.
deep_synthesis_chattier3Opt-in slow chat mode.

The full map lives in TASK_TIER_MAP in packages/ai/src/model-router.ts. Task classes are a closed set; adding one requires adding both the type and an entry in the map.

getProviderForTask walks five priorities:

  1. User-supplied API key for their preferred provider (from sadieSettings.payload.userApiKeys, decrypted server-side).
  2. User’s preferred provider with an env-var key (ANTHROPIC_API_KEY or OPENAI_API_KEY).
  3. AI_FRONTIER_* tier env vars. Each tier has its own _PROVIDER + _MODEL pair.
  4. Raw vendor key. Synthesize a provider from DEFAULT_MODELS for the resolved tier.
  5. Local stub. sadie-local deterministic provider. Development only (NODE_ENV !== "production"), and suppressible via SADIE_ALLOW_LOCAL_AI_STUB=0.

If none of the above resolve, getLlmProvider throws MissingLlmProviderError. The chat route catches this and surfaces the error to the user as a config-hint message rather than silently falling back.

All optional. Setting them pins a specific vendor + model per tier.

Terminal window
AI_FRONTIER_SMALL_PROVIDER=anthropic
AI_FRONTIER_SMALL_MODEL=claude-haiku-4-5-20251001
AI_FRONTIER_WORKHORSE_PROVIDER=anthropic
AI_FRONTIER_WORKHORSE_MODEL=claude-sonnet-4-6
AI_FRONTIER_REASONING_PROVIDER=anthropic
AI_FRONTIER_REASONING_MODEL=claude-opus-4-7
AI_FRONTIER_EMBEDDINGS_PROVIDER=openai
AI_FRONTIER_EMBEDDINGS_MODEL=text-embedding-3-small

When a tier env var is set but the matching vendor key is not, the router cascades to the next cheaper tier rather than failing. That keeps things running when a tier is partially misconfigured.

The product surface changes more slowly than the frontier. Sadie’s cost profile is dominated by tier1 (high-volume, low-stakes) and tier2 (interactive). Tier3 is rare and deliberate. Locking a specific model at the call site would mean every vendor swap touches dozens of files; tiers mean one env var update.

The router also enables per-task-class escalation in the future. A low-confidence tier2 output could automatically retry at tier3, though that path is not wired yet.

Every routed call passes through withUsageLogginglogUsage in packages/ai/src/usage-logger.ts. That produces a UsageRecord with tier, provider, model, latency, token counts, and origin. The current impl is console.log but the type is stable for persisting into a table when needed.

One-call convenience that resolves the tier, creates the provider, calls completeOnce, and logs usage:

const { text, resolved } = await routeAndComplete(
{ taskClass: "source_extraction", userId, origin: "/api/compile/wiki" },
{ system: "...", messages: [{ role: "user", content: "..." }] },
);

Use it for background compile work. For streaming chat, call getProviderForTask("grounded_chat", userPrefs).streamChat(...) directly so you can pipe tokens.