Skip to main content
Automation

Claude Opus 4.7 is here: what changed for our AI stack

· 9 min read

Heads up: This article includes a Claude Code guest-pass link from our account. Anthropic gave us 3 passes to share. Each pass gives a friend a free week of Claude Code and Cowork. We don’t earn a commission. We share it because we use Claude Code every day.

Claude Opus 4.7 shipped today. Here’s what Anthropic actually changed, which breaking changes will silently ship degraded output in your integrations, and where 4.7 fits in our AI agent stack.

This post assumes you already use Claude via API or Claude Code. If you don’t, skip to the “Should you upgrade” section at the end.

What Anthropic actually shipped today

Claude Opus 4.7 is the successor to Opus 4.6. Same pricing ($5 input, $25 output per million tokens). Same 1M token context window, now confirmed at standard API pricing with no long-context premium. New tokenizer, new vision stack, a new task budget API, and three breaking changes that will bite you if you skip the release notes.

Model ID: claude-opus-4-7. Available today on Claude.ai, the Anthropic API, AWS Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

The headline features per Anthropic’s official announcement:

  • High-resolution vision. Images up to 2576px (3.75 MP). That’s a 3x jump from the previous 1568px / 1.15 MP ceiling. Pixel coordinates map 1:1 with the model’s internal coordinates, so your computer-use agents don’t have to do scale math anymore.
  • New xhigh effort level. A tier above high for coding and agentic runs. Messages API only.
  • Task budgets (beta). An advisory token budget across a full agentic loop. The model sees a running countdown and paces itself.
  • Adaptive thinking only. Extended thinking with a fixed budget_tokens is gone. Adaptive thinking is the only thinking-on mode, and it’s off by default.

Anthropic reports in their release notes that adaptive thinking reliably outperforms the old extended-thinking budget in their internal evals. That matches Anthropic’s broader framing on adaptive thinking: let the model decide how much reasoning a given step deserves, instead of allocating a fixed budget upfront.

The benchmarks worth paying attention to

Opus 4.7’s headline jumps are in agentic coding and vision. CursorBench moves from 58% to 70%. Visual acuity on computer-use tasks jumps from 54.5% to 98.5%. Rakuten-SWE-Bench shows 3x more production task resolution. For long-horizon agent work and screen-reading, 4.7 is in a different weight class than 4.6.

BenchmarkOpus 4.6Opus 4.7What it measures
CursorBench58%70%Real coding tasks inside an IDE
Visual acuity (computer-use)54.5%98.5%Reading pixels in screenshots
Rakuten-SWE-Bench1x3xProduction task resolution
CodeRabbit recallbaseline+10%Code review coverage
Harvey BigLaw Benchn/a90.9%Legal reasoning (high effort)

The CursorBench and computer-use numbers are the ones that changed our minds. Our agents spend most of their day reading dashboards, parsing logs, and running linters through screen capture. A 44-point jump on visual acuity is not a rounding error.

What doesn’t change much: pure chat, single-turn Q&A, anything where the model just needs to generate a paragraph from a clear prompt. If that’s your whole workload, you can stay on 4.6 for another cycle and not notice.

The breaking changes that will silently hurt you

Three API-level changes will break your integrations if you don’t handle them. None error in obvious ways. All of them degrade output quality if you miss them.

Extended thinking budgets are dead

# Before (Opus 4.6): works
thinking = {"type": "enabled", "budget_tokens": 32000}

# After (Opus 4.7): 400 error
thinking = {"type": "adaptive"}
output_config = {"effort": "high"}

If you relied on fixed thinking budgets for cost predictability, you need a different knob. That knob is task_budget, which is advisory (not a hard cap) and requires the beta header task-budgets-2026-03-13. max_tokens is still the hard per-request ceiling.

Sampling parameters return 400 errors

temperature, top_p, top_k: all rejected on Opus 4.7. If you were setting temperature=0 for “determinism” (which was never real determinism anyway), remove it. Guide behavior through prompting instead.

Thinking content is omitted by default

Thinking blocks still stream, but their thinking field is empty unless you opt in:

thinking = {
    "type": "adaptive",
    "display": "summarized",  # default is "omitted"
}

This is the silent failure mode that will break any product streaming reasoning to users. The UI now shows a long pause before output begins, which can read as a dead API. If your UX streams thinking to end users, opt back in with display: 'summarized' before you flip the model ID.

The tokenizer change nobody is talking about

The new tokenizer uses 1.0 to 1.35x more tokens per prompt. That’s up to 35% more on identical inputs. Same wallet, more bytes.

Anthropic frames this as a net win: token efficiency on coding evals improved despite higher per-message token counts. We believe them for coding. For chat workloads where you pay for every token whether the model “used” them well or not, this is a 10 to 35% cost bump at the same price-per-token.

If you run structured agents with heavy tool use, bump your max_tokens parameters and compaction triggers by 20 to 25% before you flip the model ID. Anthropic explicitly flags this in the migration guide: triggers sized for 4.6’s token counts will start firing earlier on 4.7.

How we plan to use Claude Opus 4.7 at Hayka Pacha

We run an AI agent fleet across 400+ sites. The job: monitor, maintain, ship changes, respond to alerts. Here’s our migration plan for 4.7.

Site-maintenance agents: upgrading. The xhigh effort level on Claude Code looks worth the extra latency when an agent has to read a stacktrace, check git history, and write a patch. We’ll re-baseline task completion rates once it’s rolled out.

For our content automation pipeline: staying on 4.6 for now. It’s a narrow chat workload. No agentic loops, no vision, no long-horizon tasks. 4.7 would cost the same per token but burn roughly 10 to 35% more tokens per call thanks to the new tokenizer. Not worth it today.

The /ultrareview workflow: adding it. Anthropic shipped a dedicated code review command inside Claude Code. We’re wiring it into our pre-merge hook this week. We’ll report back once we have enough review cycles to compare it against our existing review agent.

Want to try Opus 4.7 inside Claude Code without paying for a seat? We have a Claude Code guest pass to share. It gives a friend a free week of Claude Code and Cowork. Three passes total. First come, first served.

The behavior changes you’ll feel in your prompts

These don’t error. They just change what you get back. All of them are documented in Anthropic’s migration guide.

  • More literal instruction following. The model no longer silently generalizes. If you wrote “fix the bug in auth.ts” and expected it to also touch auth.test.ts, say so.
  • Response length scales to task complexity. No more fixed verbosity. Short tasks get short answers.
  • Fewer tool calls at low effort. 4.7 reasons more before calling. Raise effort if you want more tool use.
  • Fewer subagents by default. If your workflow depended on aggressive fan-out, prompt for it explicitly.
  • More direct tone, fewer emoji. Less warmth, less validation-forward phrasing. If your UX relied on Opus 4.6’s friendlier tone, your users will notice.

If your prompts include self-checking scaffolding like “double-check the slide layout before returning,” Anthropic’s migration guide explicitly recommends removing it on 4.7: the model now handles that verification internally. The same advice applies to prompts that counted on silent generalization across similar files. 4.7 is more literal, so be explicit about every file you want touched.

How to migrate: the fast path

Anthropic ships an official Claude API skill that auto-applies the breaking changes to your codebase. Inside Claude Code, run:

/claude-api migrate this project to claude-opus-4-7

The skill handles the model ID swap, removes temperature/top_p/top_k, converts thinking: {type: "enabled", budget_tokens: N} to thinking: {type: "adaptive"}, cleans up now-GA beta headers (effort-2025-11-24, interleaved-thinking-2025-05-14), and recommends an effort starting point.

Per Anthropic’s skill documentation, the migration produces a checklist of items that need manual verification: length-control prompts, integration tests, and cost or rate-limit re-baselining. These are the same items you’d plan to verify on any model migration.

If you don’t use Claude Code, the skill is open source on GitHub and installable anywhere Agent Skills work.

Should you upgrade?

Upgrade today if you do any of these:

  • Agentic coding or code review (the CursorBench delta alone pays for the migration work)
  • Computer-use or screenshot-heavy workflows (98.5% vs 54.5% is not subtle)
  • Long-horizon agent loops with memory
  • Anything that reads charts, diagrams, or technical images

Stay on 4.6 for now if you do:

  • Pure chat workloads with no vision and no tool use
  • High-volume, cost-sensitive inference where a 10 to 35% token overhead actually matters
  • Products with tight latency budgets that can’t absorb the new default thinking latency

Either way, read the migration guide before you flip the model ID. The silent defaults (omitted thinking content, fewer tool calls, fewer subagents) will change your product’s behavior even if nothing errors.

What we’re watching next

The real story with 4.7 is agentic coding and computer-use. Not chat. That’s exactly where AI is getting economically interesting, and that’s where our stack lives. Our next post will track what happens when our agent fleet (400+ sites on Kubernetes, GitOps-managed) switches to 4.7 across every node. The plan: measure task completion, human-intervention events, and per-task token cost against our Opus 4.6 baselines.

If you’re building on Claude and want to compare notes, or you want us to help you migrate a production agent stack, get in touch. Got a friend who wants to try Claude Code? Claim one of our Claude Code guest passes. Three available, first come first served. Either way, read the full release notes before you ship it to your users.

Sources


FAQ

No. Opus 4.7 introduces three breaking changes: extended thinking budgets return 400 errors (only adaptive thinking works), sampling parameters like temperature and top_p are rejected, and thinking content is omitted from responses by default. Update your API client before switching the model ID.

The per-token price is identical ($5 input, $25 output per million). But the new tokenizer consumes 1.0 to 1.35x more tokens on the same prompt. Expect a 10 to 35% real-world cost increase on workloads you don't re-tune. Coding evals net out better.

Anthropic recommends xhigh as the starting point for coding and agentic use cases, and at least high for most intelligence-sensitive tasks. Lower effort levels produce fewer tool calls and shorter outputs. Raise effort if the agent is stopping early or skipping steps.

1M tokens at standard API pricing with no long-context premium. Max output tokens is 128k. Adaptive thinking is included but off by default, so set thinking: {type: 'adaptive'} explicitly if you want reasoning enabled on a request.

No. Extended thinking with budget_tokens is removed and returns a 400 error on Opus 4.7. Adaptive thinking is the only thinking-on mode. To control costs across an agentic loop, use the new task_budget output config (requires the task-budgets-2026-03-13 beta header).