Self-critique loop · how the Mythic tier sharpens before it streams
When a user picks "Mythic" depth, we don't just give them a more expensive model. We give them a different pipeline.
The three passes
-
Primary model drafts — Opus 4.7 with adaptive thinking + xhigh effort writes a full reading. We buffer this output server-side instead of streaming it to the client. Tool-injected markers (the cards / chart / hexagram visuals) still go through immediately so the user sees them within a few seconds.
-
Haiku 4.5 reviews — the buffered draft is handed to Haiku with a focused critique prompt: identify 2–4 specific problems (translation-stiff Chinese, off-topic, over-hedged, generic, repetitive) and quote the offending fragments. Haiku writes notes, not rewrites.
-
Primary model revises — the draft + Haiku's notes go back to Opus 4.7. It rewrites the reading per the notes and streams the revised version to the user.
Why a different model for critique
Same-model self-critique has a known blind spot: the writer model tends to defend its own choices. A cheaper, faster, differently-trained model reviewing the same output catches things that the writer is structurally prone to miss — verbose paragraphs that feel like padding, sentences that hedge into vagueness, cliched closing remarks, and (the big one in Chinese) syntax that reads like a translation rather than native Chinese metaphysics register.
Haiku 4.5 is a good pick for this role: it costs ~$0.005 per critique, returns in under a second, and has different stylistic biases than its larger siblings.
What it costs
Roughly 1.7× a single-pass Mythic call. Most of that is the second Opus pass; Haiku critique is rounding error. This is why we gate the loop to Mystic / Founder / Mythic tiers — their plan headroom absorbs the spend, and they're the cohort where premium quality is the product.
What it improves
We don't have published benchmarks yet — we hold off on publishing numbers until the eval framework is shipped and we've run a few hundred A/B comparisons. Anecdotally during internal testing the revised output is meaningfully more specific to the user's actual question and noticeably less hedged. Eval data will replace this paragraph when it lands.
What it does NOT do
The critique pass is not a safety filter — Anthropic's own safety and our pre-stream gates (runSafetyCheck, banned-words list, crisis classifier) all run before the loop ever starts.
It's also not a fact-checker. Computational results (BaZi pillars, hexagram numbers, planetary positions) all come from our deterministic compute pipeline before the LLM is even called. The critique pass cares about voice and depth, not arithmetic.
Where this fits in the stack
- Free / Wanderer / Pro / Premium users: single-pass.
- Mystic / Founder / Mythic users at
depth = "mythic": three-pass loop. - Mystic / Founder / Mythic users at
depth = "deep": single-pass with adaptive thinking but no critique loop.
The pipeline switches at runtime per request. The Haiku critique runs only when the user actually opts into mythic depth via the depth picker.
Posted 2026-05-15 · Astrael Engineering. We're an AI company — this is what we do.