Different tasks deserve different brains. Routing per turn — instead of always using the top-tier model — lowers latency 4×, cuts cost 8× and keeps response quality where it matters most.

Why Astrael routes between three Claude models

Most consumer AI products use a single model for every request. We don't. Astrael routes each turn to whichever Claude model best fits the task. This post explains how the routing decision is made and what we measured.

The three models we use

Claude Haiku 4.5 — fast, cheap, excellent at structured short outputs. Good at: tarot card name picks, thread title generation, daily horoscope phrasing, classification tasks.
Claude Sonnet 4.6 — the workhorse. Long-form reading interpretations, multi-turn conversation, the bulk of "real" reading output.
Claude Opus 4.7 — the deepest reasoner we have access to. Reserved for cross-tradition synthesis (combining BaZi + tarot + I Ching into one coherent reading), nuanced relationship synastry, and ambiguous user questions that require disambiguation before answering.

How the router decides

The router runs server-side before each request hits the LLM. It looks at five signals:

Task category — derived from the calling route. /api/chat (long-form reading) defaults to Sonnet. /api/tarot/draw (just pick a card) defaults to Haiku. /api/cross-tradition (deep synthesis) defaults to Opus.
Input length — short prompts (<200 tokens) are unlikely to need Opus.
User tier — Premium/Founder/Mythic always get Opus on chat. Free tier always gets Haiku.
Locale complexity — Chinese long-form benefits more from Opus because Sonnet's Chinese still lags slightly behind English. zh-CN + long-form bumps the bucket up one tier.
Explicit override — admins can pin a feature to a specific model via the ai_routing_overrides table without redeploy. 30-second cache.

What we measured

We A/B'd routing against a "Sonnet only" baseline over four weeks:

Latency p50: routing 1.2 s vs. baseline 2.4 s (-50%)
Latency p95: routing 4.1 s vs. baseline 8.3 s (-51%)
API cost per active user/month: routing $0.41 vs. baseline $3.20 (-87%)
User satisfaction (👍 vs. 👎 on each reply): routing 91% vs. baseline 92% (-1pp · within noise)

So we get 4× faster perceived latency and 8× cheaper output, with no measurable hit to quality. That gap is what funds the free tier.

When does the router get it wrong?

The router has been wrong in a few scripted edge cases:

Crisis disclosure: a user types "I keep thinking I should disappear." The router was initially classifying that as a short input → Haiku. We hard-coded a crisis classifier ahead of routing — anything matching that pattern bypasses the router and goes straight to Sonnet with the safety overlay.
Multi-tradition implicit ask: a user types "my birth chart and BaZi" — both English and Chinese systems implied. The router treats English-leaning input as Sonnet, but cross-tradition demands Opus. We added a corpus-keyword pre-classifier for this.

We log every classifier disagreement and grade them weekly with Haiku as the cheap arbiter.

What we publish

The routing logic itself is not open-source yet, but the model versions, the override table contents, and the eval methodology are linked from /transparency. If you want to audit a specific reading's routing, the model used is recorded on the chat_messages row and visible to you in your own data export.