R
Back to blog
AI Solutions

The Claude Advisor Strategy: What Practitioners Learned

Anthropic just named the advisor strategy: cheap executor, smart advisor. Here's how it works, the cross-vendor version, and when not to use it.

TecMinds TeamApril 9, 202612 min read

The Claude Advisor Strategy, Explained by Practitioners Who Already Shipped It

Most teams pricing out an AI agent start with the same question: how big does the model on top need to be? Today, Anthropic published a post that reframes the question entirely.

The Claude advisor strategy puts a cheap model on top, either Sonnet or Haiku, and pulls in Opus only when the cheap one gets stuck. No end-to-end Opus bill. No solo Haiku weakness. The big model shows up only when it's actually needed.

If you've already tried running Opus end-to-end and watched the cost per task climb past your ceiling, this will sound right. If you've tried Haiku solo and watched it fumble the hard decisions, it will sound right too. We've been running a cross-vendor version of the same pattern for SME clients for months, with GPT-5.4-mini as the advisor and a mix of GPT-5.4-nano and Gemini 2.5 Flash Lite workers underneath. Anthropic's post is the first time we've seen someone name and benchmark the pattern cleanly.

This is the practitioner's read. You'll get the plain-English definition, Anthropic's published numbers, a comparison to the patterns it replaces, a decision framework for your own workload, and the failure modes Anthropic's post doesn't mention. If you want to shortcut all of it, you can book a free 30-minute AI Potenzial-Check and bring your agent use case to the call.

What the Claude Advisor Strategy Actually Is

The Claude advisor strategy is a multi-model agent pattern where a cheap executor runs the task end-to-end and calls a smarter model as an advisor only when it hits a decision it cannot solve. The advisor returns a plan, a correction, or a stop signal, and the executor resumes. The advisor never calls tools. It never produces user-facing output. It only nudges the executor.

In Anthropic's version, the executor is Sonnet or Haiku, the advisor is Opus, and the advisor is wired as a single tool named advisor_20260301 inside the executor's normal tool list. No orchestration framework, no extra round-trips, no separate API call to stitch together. You declare the advisor tool in the same request that starts the agent, and the executor decides on its own when to call it.

That last part is the important one. The advisor strategy is not a router. It is not an orchestrator deciding who does what. It is a dumb-until-proven-stuck executor that escalates on its own recognizance.

The Numbers Anthropic Published

Anthropic shipped the post with two benchmark comparisons, and they're worth reading carefully before you copy the pattern.

  • Sonnet with an Opus advisor vs. Sonnet alone: +2.7 percentage points on SWE-bench Multilingual, and 11.9% cheaper per agentic task.
  • Haiku with an Opus advisor vs. Haiku alone on BrowseComp: 41.2% vs. 19.7%, more than double the score. About 85% cheaper per task than Sonnet solo.

Two things jump out. First, the Sonnet-plus-Opus combination is rare in AI cost curves: it's both cheaper and better than Sonnet alone.

Second, Haiku plus advisor is not better than Sonnet alone. Anthropic is honest about this. Haiku with an advisor still trails Sonnet solo by about 29% on BrowseComp score. What it has going for it is price: you're paying tiny-model rates for most of the task and advisor rates only for the escalations.

Typical advisor output is 400 to 700 tokens per invocation. That's a paragraph, not a plan document. The advisor is there to unstick the executor, not to rewrite the strategy.

For an SME running a high-volume agent where cost per task is the constraint, Haiku plus advisor is the interesting combination. For a team where the agent has to be as smart as possible on each individual task, Sonnet plus advisor wins outright.

How the Claude Advisor Strategy Differs From Classic Orchestrator-Worker

The classic orchestrator-worker pattern puts the smart model on top. The orchestrator receives the task, decomposes it into subtasks, delegates each subtask to a cheaper worker, and stitches the results back together. It's the default architecture in most agent frameworks because it matches how humans delegate: the senior hire plans, the junior hires execute.

The advisor strategy inverts that. The cheap model runs the whole task. The smart model only speaks when spoken to. There's no decomposition step, no stitching, no plan-and-delegate ceremony. The executor just runs, and when it hits a wall, it asks.

There's also a third direction worth knowing about. NVIDIA recently published work on a small orchestration agent called Orchestrator-8B, which flips the classic pattern differently: small orchestrator on top, large workers underneath for specialized tasks. NVIDIA's post is worth the read if you want to see the full cost-quality trade space.

Three patterns, one decision:

Three LLM agent patterns compared: classic orchestrator-worker with a smart model on top, the Claude advisor strategy with a cheap executor on top and an advisor called on demand, and a small-orchestrator pattern with a small router directing large workers.
Three LLM agent patterns compared: classic orchestrator-worker with a smart model on top, the Claude advisor strategy with a cheap executor on top and an advisor called on demand, and a small-orchestrator pattern with a small router directing large workers.

PatternWho's on topWho does the workWhen it wins
Classic orchestrator-workerSmart orchestratorCheap workersTasks that need decomposition before execution
Claude advisor strategyCheap executorCheap executor, with advisor escalationsTasks where most steps are easy and a few are hard
Small orchestrator, large workersSmall orchestratorLarge specialized workersTasks where routing is hard and execution is specialized

If you're writing a spec for a new agent this week, this table is the decision you need to make before you pick a model.

We've Been Running This Cross-Vendor With GPT-5.4-Mini and Gemini

A practitioner's note, because the news-wave reads are going to miss this part: the advisor strategy is not Anthropic-specific. The advisor_20260301 tool is specific to Claude. The pattern is not.

We've been running a cross-vendor version of this pattern for SME clients for months. GPT-5.4-mini is the advisor. GPT-5.4-nano and Gemini 2.5 Flash Lite are the workers. The mini advises everything.

When a worker hits a decision it can't solve, it hands the context to the mini, which returns a plan, a correction, or a stop signal. The worker then resumes and finishes the task.

Two things surprised us when we shipped the first version.

The first: you don't need a frontier model as the advisor. Anthropic uses Opus, their biggest model. We use GPT-5.4-mini, which is mid-tier.

For the SME workloads we typically handle, document classification, ticket triage, structured extraction, light browser tasks, the decisions the worker gets stuck on aren't frontier-hard. They're medium-hard. A mid-tier advisor is enough, and it pushes the cost curve further down than any Opus pairing ever could.

The second: the advisor doesn't care which lab built the worker. One ask_advisor(context) call works the same whether the caller is an OpenAI nano or a Google Flash Lite. You trade away the one-round-trip convenience of advisor_20260301 and pay a little extra latency on each escalation. In return you keep cross-vendor portability, which matters when your customers are watching three frontier labs leapfrog each other every quarter.

If you want a read on whether this fits your stack, it's the kind of thing we cover in the custom AI agents we build for SMEs. Book a free 30-minute AI Potenzial-Check and we'll walk your use case through the decision framework below before you write a line of code.

When the Advisor Pattern Is the Right Call

A five-question decision framework, in the order we walk clients through it:

  1. Are most of your task steps solvable by a small model, with only a few hard decisions? If yes, a small executor can carry the load and escalate only on the hard parts.
  2. Can the hard decisions be named in advance? You need to be able to tell the executor "when you see X, ask the advisor." Vague escalation boundaries produce either constant escalation or none at all.
  3. Is the advisor's output bounded? Plan, correction, or stop signal. If you need the advisor to write full essays or call tools, you're building something else.
  4. Can you cap advisor invocations per task? max_uses in Anthropic's version, or a counter in yours. Without a cap, advisor thrash blows up your budget.
  5. Is the total token budget still lower than running the big model end-to-end? Add worker tokens and advisor tokens together and check. If not, just use the big model.

Five yeses and the advisor pattern is the right call. Any no and you should use something else. We've walked this checklist with multiple clients at the architecture stage, and every time it prevents a more expensive mistake later.

When It's the Wrong Call

The advisor pattern is wrong in four common situations.

Most of your steps are hard. If the worker escalates on every second step, you're paying advisor rates constantly and inheriting advisor latency on top of that. Just run the big model end-to-end. You'll be simpler and cheaper.

You can't name the escalation boundary. If the executor can't tell which decisions are hard and which are easy, it either never escalates (silent wrong answers) or always escalates (cost blowout). Classic orchestrator-worker is better here: the smart orchestrator makes the routing call instead of delegating that call to a cheap model.

You need user-facing reasoning. The advisor strategy hides the advisor's thinking inside the advisor turn. If your users need to see why the agent made a decision, put a reasoning model in the front and let the advisor sit quietly behind it.

Your worker can't recognize its own stuck state. A worker that doesn't know it's stuck will ship wrong answers confidently. Before you deploy the advisor pattern, test your worker on tasks you know are hard and watch whether it escalates. If it doesn't, either the worker is too weak or the escalation prompt is too narrow.

How to Wire This Without Anthropic's API

A minimal sketch for implementing the advisor pattern with any two models:

Escalation flow for the Claude advisor strategy showing the executor loop calling a single advisor tool that returns a plan, a correction, or a stop signal, with a hard cap on advisor invocations per task.
Escalation flow for the Claude advisor strategy showing the executor loop calling a single advisor tool that returns a plan, a correction, or a stop signal, with a hard cap on advisor invocations per task.

def run_agent(task, worker, advisor, max_advisor_calls=5):
    context = new_context(task)
    advisor_calls = 0
    while not context.done:
        step = worker.next_step(context)
        if step.is_stuck and advisor_calls < max_advisor_calls:
            guidance = advisor.ask(context.summary(), step.question)
            advisor_calls += 1
            if guidance.kind == "stop":
                return context.halt(reason=guidance.text)
            context.apply(guidance)
        else:
            context.apply(step.result)
    return context.result

That's the whole thing. The executor runs its normal loop, the advisor is a single function call, and a counter caps the advisor budget per task. Anthropic's advisor_20260301 tool saves you the round-trip inside Claude's stack because the executor doesn't have to exit and re-enter an API boundary. That's a latency and ergonomics win, not an architectural difference.

A few guardrails we always ship with this:

  • A hard cap on advisor calls per task, enforced in code, not just in the prompt.
  • Logging on every escalation and every non-escalation on a decision the prompt said was hard. You can't tune the escalation boundary without this data.
  • A total-token budget so the task halts cleanly if it runs away.

Limitations and Failure Modes Anthropic Didn't Mention

Anthropic's post is a product post. It names the pattern, publishes the numbers, and ships the tool. It does not list the failure modes. Four are worth knowing before you deploy this in production.

Advisor thrash. The executor misreads easy decisions as hard and escalates on every step. Symptom: cost per task climbs past your ceiling within a few days of production traffic. Fix: tighten the prompt that decides "do I escalate?", and add an escalation-rate alarm so you catch it in hours instead of weeks.

Latency inflation. Every advisor call is a sequential LLM call. If the worker escalates twice per task and the advisor averages two seconds, that's four extra seconds on top of baseline. Fine for async agents. Painful for real-time UX.

Context-window pollution. The advisor reads the executor's context. Long-running agents with fat contexts make every escalation expensive. Budget for context compression before you ship a long-running agent, or you'll find out the hard way when the token bill arrives.

The silent-non-escalation failure. The worst case, and the one we flag hardest for clients. The executor should have escalated, didn't, and ships a confident wrong answer with no alarm. The mitigation is boring but necessary: log every decision the prompt marked as hard, sample the non-escalated ones, and review them in your normal QA cadence. If you skip this, you will eventually ship the silent failure that erodes trust.

What the Claude Advisor Strategy Actually Changes

The Claude advisor strategy is a named, benchmarked, wired-up version of a pattern production teams have been running for a while. Anthropic's contribution is the naming, the benchmarks, and the in-API advisor_20260301 tool that collapses the whole thing into one API call inside their stack. That's a real win if you're already committed to Claude.

If you're not, the pattern still applies. Cheap executor, smart advisor, bounded escalation, hard cap on invocations. We've been running it cross-vendor with GPT-5.4-mini advising GPT-5.4-nano and Gemini 2.5 Flash Lite workers, and it's survived contact with real SME workloads. The architecture is what matters. The model choice is a knob you can turn.

If you're evaluating this for your own agent, book a free 30-minute AI Potenzial-Check. Bring the use case, we'll tell you whether the advisor pattern is the right shape before you write a line of code. For more on AI agent architecture for SMEs, our pillar on how to keep AI agents inside a vault by default covers the guardrails side, and the follow-up on what OpenAI's restricted cyber model means for SMEs covers the cross-vendor landscape.

The pattern has a name now. Use it or don't. Just evaluate it the next time you reach for an orchestrator.

#Claude Advisor Strategy#LLM Advisor Pattern#AI Agent Cost Optimization#Multi-Model AI Agent

Share this article

Related articles

Let's work together

Have a project in mind?

Let's talk about how AI and custom software can move your business forward.

Book a free consultation