Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Inteligencia Artificial

anthropic claude claude-code ia llm razonamiento

Claude 3.7 Sonnet: the intermediate step toward the 4 family

March 7, 2025 11 min read 62 reads

Table of contents

Key takeaways
The hybrid model and extended thinking
Claude Code as terminal companion
What practically improves over 3.5 Sonnet
What’s still pending
My reading

Actualizado: 2026-05-03

Anthropic released Claude 3.7 Sonnet on February 24 and, ten days later, it’s becoming clearer where this model fits inside the Claude family and where it doesn’t. It isn’t a major leap in the 3-to-3.5 mold, but a careful refinement with two important new elements: an extended thinking mode activated on demand, and a command-line tool called Claude Code aimed at programmers.

Key takeaways

The same model can answer in two ways: standard mode (fast, low cost) or extended thinking mode (more tokens, more latency, better on complex problems).
Three clearly improved areas vs. 3.5: programming (SWE-bench substantially better), formal reasoning, and instruction following in long conversations.
Claude Code works well for scoped tasks on known codebases; falters on undefined architectures or open-ended tasks.
The context window remains at 200K tokens, at a disadvantage vs. Gemini’s million.
No Opus tier in 3.7, suggesting the 4 family will arrive with its own structure.

The hybrid model and extended thinking

The most interesting design decision in 3.7 is that the same model can answer in two ways. In standard mode it behaves as a normal Sonnet: fast, low-latency, good for most tasks. In extended thinking mode, the model internally generates a longer chain of reasoning before the final answer, charges more tokens, takes longer, but clearly improves on:

Competition-level math.
Complex logic debugging.
Planning tasks with several simultaneous requirements.

The elegance of the design is that the choice stays on the user’s side. On the API endpoint you can request a thinking token budget, and on claude.ai a toggle enables it. There’s no separate “o1” and “o” model; there’s one model with two modes. This simplifies application architectures.

What I’ve observed in use is that extended mode performs better on a relatively clear subset of tasks: programming problems across multiple files, non-trivial algorithmic designs, third-party code analysis with tangled dependencies. Outside that category, standard mode performs equally or better.

Claude Code as terminal companion

The second novelty is Claude Code, a command-line tool opening an interactive session with project access, capable of reading files, proposing changes, and running commands under supervision. It’s in preview and installs via npm.

In tests, its strength is operating over codebases you know well: refactoring specific functions, writing tests for existing code, exploring someone else’s project to understand data flows. Its weakness is when the task is unscoped or implies architectural decisions.

Two design elements feel right:

Each modifying action asks for explicit confirmation by default. You can grant granular permissions, but by default the tool doesn’t run off.
Project context is handled by reading files on demand rather than loading the whole repository. This keeps token cost bounded.

What practically improves over 3.5 Sonnet

Three areas where 3.7 clearly surpasses the prior version:

Programming: SWE-bench scores reflect substantial improvement, and in use it shows in generated code needing fewer on-the-fly corrections.
Formal reasoning: with or without extended thinking.
Instruction following in long conversations: 3.7 respects constraints given many turns earlier more reliably.

Areas where there’s no noticeable difference: creative writing, casual conversation, translation, text summarization.

The context window remains at 200K tokens. It hasn’t grown, which contrasts with Gemini’s 2M and OpenAI’s movements toward longer contexts.

What’s still pending

Several things 3.7 doesn’t solve:

Very long context management: the 200K limit is still the most visible constraint.
More autonomous work for hours without continuous supervision without drifting.
Uncertainty calibration: the model still answers with apparent confidence where it should say “I don’t know.”
External tool integration: the MCP ecosystem is advancing, but in Claude Code interaction with third-party services requires manual configuration.

My reading

Claude 3.7 Sonnet is a well-thought-out release that moves the programming state of the art and introduces two architectural patterns I believe will stick. The hybrid model with optional thinking is a sensible way to expose deep reasoning without maintaining separate models, and Claude Code points clearly toward assisted-development tools that go beyond editor autocomplete.

What it isn’t is a generational leap. Anthropic has preferred to consolidate what 3.5 already did well and add specific capabilities where there was room.

There’s no Opus tier in 3.7, something striking that suggests the 4 family will arrive with its own structure.

For now, 3.7 Sonnet in standard mode covers most day-to-day uses, extended thinking is reserved for problems that need it, and Claude Code is a tool worth trying if you program often. It’s the honest intermediate step, not the promise of the next jump.

Was this useful?

[Total: 0 · Average: 0]

Post Views: 62

Written by

Javier Cañete

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Claude 3.7 Sonnet: the intermediate step toward the 4 family

Key takeaways

The hybrid model and extended thinking

Claude Code as terminal companion

What practically improves over 3.5 Sonnet

What’s still pending

My reading

Related posts

Multi-agent systems: LangGraph vs CrewAI vs Autogen in 2026

How to build a production-ready agent with the Anthropic SDK, step by step

Claude Code vs Cursor vs GitHub Copilot in 2026: a comparison with measured tasks

MCP (Model Context Protocol) in 2026: the complete guide for engineering teams