Skip to main content
Subscribe
AI & Agentic

Claude Code Router: Cut Your Claude Bill 21x

Claude Code Router: Cut Your Claude Bill 21x

Anthropic’s annualized run rate crossed $44 billion in May 2026, and Claude Code alone hit $2.5B annualized by February (Sacra, 2026). Most of that revenue comes from one tab in one terminal. And almost all of it could route somewhere cheaper — if you knew the switch existed.

Claude code router is that switch. It’s a small TypeScript proxy from a developer named musistudio that intercepts the requests Claude Code makes to api.anthropic.com and forwards them to whichever provider you want — DeepSeek, Gemini, GLM-4.5, OpenRouter, a local Ollama instance, anything that speaks OpenAI- or Gemini-format. The agent thinks it’s still talking to Claude. The bill says otherwise.

I’ve been running it for four months. This guide is the architecture, the cost math, the config I actually ship, and the tradeoffs nobody mentions in the README.

the broader multi-model workflow this router fits into

Key Takeaways

  • claude-code-router is a localhost proxy that swaps Claude Code’s backend model for any OpenAI- or Gemini-compatible API. The repo is at ~34,000 GitHub stars and MIT-licensed (musistudio/claude-code-router, 2026).
  • DeepSeek-V4-Flash costs $0.14/M input and $0.28/M output — 21x cheaper on input, 53x cheaper on output than Claude Sonnet 4.6 (DeepSeek, 2026). One Substack writeup tracked $1,200/yr to $60/yr after the swap (John Rodrigues, 2026).
  • It also unblocks Claude Code in mainland China, Russia, Iran, and the other ~15 regions Anthropic doesn’t serve directly (Anthropic, 2026).
  • The tradeoff is real: prompt caching, native computer use, and tooluse strictness degrade outside Anthropic. The fix is the Router config — keep Claude for code, route everything else.

What Is Claude Code Router and How Does the Proxy Architecture Work?

Claude code router is a localhost HTTP server you install via npm install -g @musistudio/claude-code-router, started with the ccr code command, that exports ANTHROPIC_BASE_URL=http://127.0.0.1:3456 before Claude Code spawns (musistudio/claude-code-router, 2026). Claude Code makes its usual Anthropic Messages API calls; the router rewrites them in flight to whichever provider format the destination wants, then translates the response back.

That’s the whole trick. Anthropic’s CLI doesn’t pin certificates or check the server identity beyond the env var. Set the variable, point it somewhere local, and the agent will happily stream tokens from DeepSeek thinking it’s chatting with Claude.

Architecture diagram showing how claude-code-router sits between Claude Code and multiple model providers. Claude Code sends Anthropic Messages API requests to localhost port 3456. The router parses each request, applies routing rules based on context length, prompt type, and model name, then forwards translated requests to DeepSeek, Gemini, GLM, OpenRouter, or Ollama. Responses flow back through the router and are reshaped to Anthropic format before reaching Claude Code.

The clever part is the routing layer. The Router config block in ~/.claude-code-router/config.json has five keys: default, background, think, longContext, and webSearch. Each maps a request shape to a <provider>,<model> pair. When a request’s prompt token count exceeds longContextThreshold (default 60,000), it goes to longContext. When Claude Code marks a request as “background” — file reads, status checks, summarization between tool calls — it goes to background. Everything else hits default.

The reason this works at all is that Claude Code’s tool-use protocol is a thin layer over a generic chat API. Once you can pass JSON-schema tool definitions and parse tool_use blocks in the response, almost any modern model can play. The router’s transformer plugins (deepseek, gemini, openrouter, tooluse, maxtoken, reasoning) handle the dialect differences — DeepSeek’s reasoning tokens, Gemini’s functionCall shape, OpenRouter’s quirks around streaming.

why deterministic layers like the router beat in-prompt steering


How Much Money Can Claude Code Router Actually Save You?

The honest number depends on what you do, but the spread is brutal. Claude Sonnet 4.6 charges $3 per million input tokens and $15 per million output (Anthropic, 2026). DeepSeek-V4-Flash charges $0.14 and $0.28 — 21x and 53x cheaper respectively (DeepSeek, 2026). One developer documented dropping from $1,200/yr to $60/yr after routing routine work to DeepSeek (John Rodrigues, 2026). An independent review tracked $150–200/mo bills falling to $30–50/mo, a 75–80% reduction (AI Tool Analysis, 2026).

A rack of servers in a darkened data center server room with status LEDs

The output-token spread is where the savings live. Coding agents are output-heavy — they emit diffs, write files, generate plans, summarize after every tool call. One eight-month Claude Code marathon by a developer named Reddit user u/_atomicbomb burned 10 billion tokens, roughly $15,000 at Sonnet list price (Morph, 2026). The same token volume on DeepSeek-V4-Flash would have cost about $300 — still a lot, but a different conversation with your finance team.

Bar chart comparing output token prices per million tokens across six models. Claude Opus 4.7 at 25 dollars, Claude Sonnet 4.6 at 15 dollars, Gemini 2.5 Pro at 10 dollars, GLM-4.5 at 2.20 dollars, DeepSeek-R1 at 2.19 dollars, and DeepSeek-V4-Flash at 0.28 dollars. The cheapest option is 89 times less expensive than the most expensive.

Source: Anthropic, Google AI, DeepSeek, Z.ai pricing pages, May 2026.

A common objection here is that you get what you pay for, and the cheap models are dumb. That used to be true. On SWE-bench Verified in May 2026, DeepSeek-V4-Pro scores 73%, well behind Sonnet 4.6 at 79.6% but ahead of GPT-4o and last year’s Claude Opus. For 80% of what a coding agent actually does — file reads, regex finds, formatting fixes, doc lookups, dependency bumps — that’s enough. The router lets you reserve the expensive model for the 20% that needs it.

My own bill, four months in: $213 in March 2026 on direct Anthropic API → $41 in April after routing background, longContext, and “write a commit message” type calls to DeepSeek-V4-Flash. Default code edits still go to Sonnet. Quality on the work I actually ship hasn’t moved. The diff is entirely “stuff Claude was doing in the background that nobody needed to be Claude.”

my deeper take on DeepSeek’s place in a coding stack


Why Geographic Restrictions Make Claude Code Router Essential Outside the US

Anthropic’s API is unavailable in roughly fifteen countries and territories — mainland China, Russia, Iran, North Korea, Belarus, Cuba, Syria, Crimea, Donetsk, Luhansk, Kherson, Zaporizhzhia, and a handful of other African and South Asian nations (Anthropic Supported Countries, 2026). Anthropic tightened the rules in September 2025 to block entities more than 50% owned by parties headquartered in unsupported regions, regardless of where those entities physically operate (Anthropic, 2025).

For a developer in Shanghai or Tehran, that’s the end of the conversation with Claude Code via the official path. A VPN doesn’t fix it — Anthropic terminates accounts that trip its fraud heuristics, and many corporate environments forbid VPNs anyway. The Claude Code repo on GitHub has open issues from Chinese developers hitting this wall on every fresh install (anthropics/claude-code#2656, 2025).

Claude code router solves it cleanly by routing through a provider that does serve the user’s region:

The legal nuance matters. Routing through a third-party provider that itself has access to the model is different from circumventing Anthropic. Many enterprises in restricted regions still need an AI coding workflow; the router lets them have one without forcing the user to perjure themselves on a signup form.

According to Menlo Ventures’ 2025 enterprise survey, 60% of enterprises now deploy three or more foundation models in production, and the share spent on Anthropic models rose to 40% globally (Menlo Ventures, 2025). The router is what makes those numbers reachable for the parts of the world where the official Anthropic path is closed.


How Do You Install and Configure Claude Code Router?

The install is two commands and a config file. From the repo’s current README (musistudio/claude-code-router, 2026):

# Prereq: Claude Code already installed
npm install -g @anthropic-ai/claude-code

# Install the router itself
npm install -g @musistudio/claude-code-router

# Launch Claude Code through the router
ccr code

That last command does three things: starts the local proxy on port 3456, exports ANTHROPIC_BASE_URL=http://127.0.0.1:3456 plus a dummy ANTHROPIC_API_KEY, then spawns Claude Code with those env vars. If you’ve never run ccr before, the first launch creates ~/.claude-code-router/config.json with a placeholder.

Source code displayed on a black terminal screen, illustrating the Claude Code CLI environment

The config file has two top-level sections that matter: Providers (an array of upstream endpoints) and Router (the rules that map request shapes to providers). Here’s a minimal working version:

{
  "Providers": [
    {
      "name": "deepseek",
      "api_base_url": "https://api.deepseek.com/v1/chat/completions",
      "api_key": "$DEEPSEEK_API_KEY",
      "models": ["deepseek-chat", "deepseek-reasoner"],
      "transformer": { "use": ["deepseek"] }
    },
    {
      "name": "gemini",
      "api_base_url": "https://generativelanguage.googleapis.com/v1beta/models/",
      "api_key": "$GEMINI_API_KEY",
      "models": ["gemini-2.5-pro", "gemini-2.5-flash"],
      "transformer": { "use": ["gemini"] }
    }
  ],
  "Router": {
    "default": "deepseek,deepseek-chat",
    "background": "deepseek,deepseek-chat",
    "longContext": "gemini,gemini-2.5-pro",
    "longContextThreshold": 60000,
    "think": "deepseek,deepseek-reasoner"
  }
}

The $ENV_VAR syntax for api_key is a 2026 addition — it pulls from your shell environment so the config file itself stays safe to commit (assuming you don’t commit the env vars). The transformer.use array is the dialect plugin; it reshapes each request body to whatever the upstream API expects.

Inside Claude Code, the /model deepseek,deepseek-reasoner slash command switches the default route for the current session. There’s also a <CCR-SUBAGENT-MODEL>provider,model</CCR-SUBAGENT-MODEL> prefix you can drop into a prompt to override the route for a single subagent call — useful when a subagent’s job is “summarize this PR” and you want it cheap.

For a visual editor, ccr ui opens a localhost web UI for managing providers, models, and routes — added in v1.0.30. For production usage, ccr start | stop | restart runs the proxy as a daemon so ccr code reuses it across sessions.

patterns for routing different subagents to different models


What Routing Rules Actually Work in Practice?

The default config that ships with the router treats every request the same. That’s the wrong move — you’ll either burn money on the cheap path or get bad output on the expensive one. The rule I’ve landed on after four months is to split by what the request is for, not what model the user asked for.

Here’s the actual Router block from my config:

"Router": {
  "default": "anthropic,claude-sonnet-4-6",
  "background": "deepseek,deepseek-chat",
  "longContext": "gemini,gemini-2.5-pro",
  "longContextThreshold": 80000,
  "think": "deepseek,deepseek-reasoner",
  "webSearch": "openrouter,perplexity/sonar-pro"
}

The reasoning:

Donut chart showing how request volume splits across providers in my config over a typical week — DeepSeek at 58 percent, Anthropic Sonnet at 27 percent, Gemini at 11 percent, OpenRouter at 4 percent.

Personal data, week of May 5–11, 2026, exported via ccr’s status-line monitoring beta.

The interesting inversion is in the second line of the figcaption: 58% of requests go to DeepSeek but it accounts for 8% of spend. Anthropic gets 27% of requests and 71% of spend. The router isn’t doing the savings work by being clever — it’s doing it by stopping the expensive model from getting requests it didn’t need to see.

why request shape matters more than model choice for cost


What Tradeoffs Should You Expect When Routing Around Anthropic?

The router does break things, and the README undersells it. Three real problems show up in production.

Prompt caching dies on most non-Anthropic routes. Anthropic’s cache is 90% off on cached input — Sonnet 4.6 becomes $0.30 instead of $3 on repeat reads (Anthropic, 2026). DeepSeek has its own cache (the V4-Flash $0.0028 cache-hit rate is published), but the router’s transformer doesn’t always preserve the cache breakpoints Claude Code sets. If your repo is small and your sessions are long, this can wipe out the headline savings.

Computer use, vision, and PDF support degrade. Anthropic’s computer-use tool is a model-trained capability; DeepSeek doesn’t have it. Gemini has its own vision but the request shape is different. If you use Claude Code for browser automation or PDF analysis, the router either drops those calls or returns errors the agent doesn’t know how to parse.

Tool-use strictness varies. Anthropic’s models are aggressively trained to emit valid tool_use JSON. DeepSeek and GLM are looser — they sometimes emit partial JSON, malformed name fields, or hallucinated tool names. The tooluse transformer plugin in the router papers over the worst of it, but I still see “Claude Code stuck because the model said it called a tool that doesn’t exist” maybe twice a week.

A network switch with multi-colored ethernet cables connected — representing multi-provider routing

A 2026 community guide on tokenmix.ai also notes that Opus 4.7’s tokenizer counts roughly 35% more tokens than competitors for the same prompt, which means cost comparisons in the router’s favor are actually understated — but only if you actually move that workload, not if you let Claude Code keep routing default (Finout, 2026).

What I learned the hard way: my first config sent everything to DeepSeek. Three commits in, the model invented a function name and Claude Code happily called it across four files before crashing on the test step. I rolled back, narrowed DeepSeek to background and think, and the failure rate dropped to noise. The savings dropped 15% and the regret dropped 100%.


Is It Safe to Route API Traffic Through a Local Proxy?

This is the question nobody else writing about claude-code-router seems to answer, and it has three real surfaces.

Supply-chain risk on the install. Sonatype tracked 454,648 malicious npm packages published in 2025, with the npm registry hosting more than 99% of all OSS malware (Sonatype 2026 State of the Software Supply Chain, 2026). The Shai-Hulud worm in September 2025 was the first self-replicating npm worm and hit 500+ packages (Sonatype, 2025). claude-code-router is npm install -g, which means it runs install scripts as your user. The repo is open, the maintainer is reputable, and 34,000 stars is some signal — but you should pin the version, audit the dependency tree at least once, and ideally install into a per-project node prefix rather than globally.

Blast radius on your provider keys. The config file at ~/.claude-code-router/config.json holds plaintext API keys for every provider you’ve added. If you’ve got DeepSeek, Gemini, Anthropic, OpenRouter, and a Z.ai key in one file, one machine compromise hands an attacker the lot. The 2026 env-var interpolation ($VAR_NAME) helps — store the keys in your shell environment or a secret manager and let the router read them at startup. Don’t commit the file.

Data residency and prompt flow. Every prompt you send goes to the provider you routed it to. If you’re a US developer routing background calls to DeepSeek, your code snippets and context are flowing through a Chinese-domiciled API. If your employer has any data classification policy more serious than “don’t paste passwords,” you should read it before turning this on. For non-confidential personal projects, it’s a non-issue. For client work, talk to your compliance lead first.

A useful framing: the router’s security posture is roughly that of any other developer SaaS proxy you’ve already installed (Vercel CLI, Supabase CLI, the Firebase tools). It’s not worse than those. It’s also not better, and the keys it holds are more valuable.

comparing security postures across the AI coding agent stack


Frequently Asked Questions

Does claude-code-router work with Claude Code’s plan mode and subagents?

Yes. Plan mode is just a request flag; the router forwards it. Subagents work too, and the router supports a <CCR-SUBAGENT-MODEL>provider,model</CCR-SUBAGENT-MODEL> prefix that lets you override the route per subagent call. I route my “summarize this PR” subagent to DeepSeek-V4-Flash and keep my “review this code” subagent on Sonnet 4.6.

Will the router break when Claude Code updates?

Sometimes briefly. The router translates Claude Code’s request shape, so when Anthropic changes the protocol (the September 2025 PostToolBatch event broke it for two days), there’s a lag while the maintainer ships a fix. Pinning to a known-good version and watching the GitHub issues is the practical mitigation.

Can I use Ollama locally and route everything offline?

Yes for some workloads. Add Ollama as a provider with api_base_url: http://localhost:11434/v1/chat/completions, point default and background at it, and Claude Code works fully offline. Quality on Qwen2.5-Coder or Llama 3.3 is solid for simple edits but degrades fast on complex multi-file refactors. Best as a fallback when your internet’s down, not a daily driver.

Does claude-code-router work with MCP servers?

Yes — and the confusion is understandable, because it sounds like two routing layers fighting. They don’t overlap. MCP configuration lives inside Claude Code itself (your .mcp.json and the servers Claude Code spawns); claude-code-router only swaps the model backend the request is sent to. The router sits between Claude Code and the model API, so your MCP tools, their tool_use calls, and the results all pass through it untouched. The one real caveat is tool-use strictness: if you route an MCP-heavy session to a looser model like DeepSeek or GLM, you’ll see more malformed tool_use JSON than you would on Sonnet, so keep MCP-heavy work on the default (Anthropic) route. If you haven’t set your servers up yet, start with my Claude Code MCP configuration guide — the router changes nothing about how MCP is wired.

Is the project legitimate or a stealthy way to harvest API keys?

The musistudio/claude-code-router source is fully open, MIT-licensed, and has 34,000 GitHub stars with hundreds of contributors as of May 2026. Outbound traffic only goes to the providers you configure. The bigger risk is install-time supply-chain compromise via the npm registry — pin the version, audit the lockfile, and re-audit on every upgrade.

Will Anthropic ban my account for using a router?

The router doesn’t touch Anthropic’s API unless you route to it, and Claude Code is the official client either way. The Terms of Service don’t prohibit routing requests through a local proxy. The actual risk is using a non-Anthropic provider whose ToS forbids competing with their own coding assistant — read the DeepSeek and Z.ai terms carefully if your usage is commercial.


Conclusion

Claude code router is a small piece of software that changes the economics of agentic coding. For US developers, it’s a 60–90% cost cut on the workloads that don’t need Anthropic’s lead. For developers outside the supported regions, it’s the only way to use Claude Code at all without leaving the laws on the table. The cost is real — degraded caching, looser tool use, real-but-bounded security surface — but the cost is also manageable if you split the routes by what the request is actually for.

The version I run today routes 58% of requests to DeepSeek and pays Anthropic for the 27% that matters. The bill dropped from $213 to $41 in one month. The shipped code didn’t change.

If you’ve been using Claude Code at scale and the bill is starting to bite, install the router, point background at DeepSeek, and watch what happens. The full config from this post is in my multi-model workflow guide. The next thing I’d read is the hook patterns post if you want the deterministic side of the same control story.

Author: Nishil Bhave — solo developer, four-month claude-code-router user, runs the maketocreate.com publishing stack on a mix of Anthropic, DeepSeek, and Gemini.

Written by Nishil Bhave

Builder, maker, and tech writer at MakeToCreate.

Never miss a post

Get the latest tech insights delivered to your inbox. No spam, unsubscribe anytime.

Related Posts