Claude 3.5 Sonnet (Anthropic, June 2024) was the most surprising model in 2024. Quality at Claude 3 Opus level, Sonnet pricing, higher speed. Particularly strong in coding: HumanEval 92%, notably better real-world code generation. Introduced Artifacts in Claude.ai. This article covers why it changed the market and when to choose.
Specs
- Top-tier benchmarks: MMLU 88.7, HumanEval 92.0, GSM8K 95.0.
- 200k context window.
- Pricing: $3/1M input, $15/1M output.
- Vision: integrated, strong in OCR + diagrams.
- Tool use: excellent function calling.
- Speed: 2x faster than Claude 3 Opus.
Why It Changed the Market
Pre-3.5 Sonnet: Claude 3 Opus top, but $15/$75 pricing. GPT-4o won on price ($5/$15).
Post-3.5 Sonnet: Opus-quality at $3/$15. GPT-4o had to defend with more features (voice realtime, etc). Competition intensified — users win.
Coding: Where It Really Excels
Comparative benchmarks:
- HumanEval: 92.0 vs GPT-4o 90.2.
- SWE-bench: significant lead on real-world GitHub fixes.
- Function calling: arguably best-in-class.
- Long-context code: 200k = entire mid-codebases.
Developers adopted massively for coding assistants.
Artifacts
Feature in Claude.ai (not API):
- Generates code / documents in separate panel.
- Iterative refinement: “now add tests”, “refactor this”.
- Preview: runs simple HTML/SVG/etc inline.
UX step forward for AI-assisted creation.
API Usage
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-3-5-sonnet-20240620",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a Python fibonacci"}]
)
print(message.content[0].text)
Clean SDK. Fast responses.
vs GPT-4o
| Aspect | Claude 3.5 Sonnet | GPT-4o |
|---|---|---|
| MMLU | 88.7 | 88.7 |
| HumanEval | 92.0 | 90.2 |
| Speed | ~80 tok/s | ~80 tok/s |
| Context | 200k | 128k |
| Multimodal | Text, image | Text, image, audio, video |
| Input pricing | $3/1M | $5/1M |
| Output pricing | $15/1M | $15/1M |
| Realtime voice | No | Yes |
| Vision | Strong | Strong |
Claude 3.5 Sonnet: coding + writing + cheaper input. GPT-4o: multimodal (audio, voice realtime).
For most non-audio use cases, Claude 3.5 Sonnet won in 2024.
Availability
- Anthropic API: direct.
- AWS Bedrock: available.
- Google Vertex AI: available.
- Azure: NO (Azure is OpenAI-exclusive).
Multi-cloud for enterprises.
Claude.ai
Consumer Claude product:
- Free tier: limited chat.
- Pro: $20/mo, access to 3.5 Sonnet.
- Team: $25/user/mo.
- Enterprise: custom.
Artifacts + Projects makes it seriously useful workflow tool.
Projects
Claude.ai feature:
- Upload documents Claude remembers cross-conversations.
- Custom instructions per project.
- Separate chat histories.
Replaces some Custom GPTs use cases.
Limitations
- Audio/voice: absent vs GPT-4o.
- Image generation: no generate (vs GPT-4o-via-DALL-E).
- Multimodal video: limited vs GPT-4o.
- Plugin ecosystem: smaller than OpenAI.
- Tokenizer inefficiency: some tasks cost more in tokens.
Optimal Use Cases
- Coding assistants: Cursor, Aider default to Claude 3.5 Sonnet.
- Long-document analysis: 200k context.
- Creative writing: cleaner voice.
- Instruction following: better than GPT on complex instructions.
- Technical Q&A: strong reasoning.
Adoption
Post-launch:
- Cursor: prominently featured.
- Aider: default model.
- Many startups: migrate from GPT-4o.
- Enterprise: added to multi-provider.
Anthropic’s market share grew substantially.
Sonnet 3.5 v2
October 2024: “Claude 3.5 Sonnet (new)” — further improvements:
- Computer Use capability (experimental).
- Coding benchmarks even higher.
- Refined reasoning.
Same pricing. Continuous improvement.
Safety and Alignment
Anthropic’s focus:
- Constitutional AI approach.
- Reduced refusals vs Claude 2/3.0 (less overcautious).
- Harm prevention: honest about limitations.
For balanced tone, generally considered more “agreeable” than GPT-4o.
Prompt Engineering Tips
Working patterns:
- XML tags:
<example>,<task>— Claude handles well. - Explicit structure: “think step by step” works.
- Examples before task: few-shot strong.
- Roles: system prompt shapes behavior significantly.
Conclusion
Claude 3.5 Sonnet redefined 2024 in LLM market. For coding, writing, reasoning — frequently preferred model. For multimodal (audio, voice), GPT-4o still leads. For enterprises with multi-LLM stack, Claude 3.5 Sonnet must be included. Competitive pricing + top quality = winning combination. Sonnet demonstrated “cheaper tier” doesn’t mean “second-tier quality” — paradigm shift beneficial to ecosystem.
Follow us on jacar.es for more on Anthropic, frontier LLMs, and multi-model strategy.