Claude 3.5 Sonnet (Anthropic, junio 2024) fue el modelo que más sorprendió en 2024. Calidad al nivel de Claude 3 Opus, precio de Sonnet, velocidad mayor. Particularmente fuerte en coding: HumanEval 92%, real-world code generation notably better. Introdujo Artifacts en Claude.ai. Este artículo cubre por qué cambió el mercado y cuándo elegir.
Specs
- Benchmarks top-tier: MMLU 88.7, HumanEval 92.0, GSM8K 95.0.
- 200k context window.
- Pricing: $3/1M input, $15/1M output.
- Vision: integrado, fuerte en OCR + diagramas.
- Tool use: excellent function calling.
- Speed: 2x más rápido que Claude 3 Opus.
Por qué changed market
Pre-3.5 Sonnet: Claude 3 Opus top, pero pricing $15/$75. GPT-4o ganaba en precio ($5/$15).
Post-3.5 Sonnet: Opus-quality a $3/$15. GPT-4o tuvo que defender con más features (voice real-time, etc). Competition intensified — users win.
Coding: donde realmente sobresale
Benchmarks comparativas:
- HumanEval: 92.0 vs GPT-4o 90.2.
- SWE-bench: significant lead en real-world GitHub fixes.
- Function calling: arguably best-in-class.
- Long-context code: 200k = entire mid codebases.
Developers adoptaron masivamente para coding assistants.
Artifacts
Feature en Claude.ai (not API):
- Generates code / documents en separate panel.
- Iterative refinement: “now add tests”, “refactor this”.
- Preview: runs simple HTML/SVG/etc inline.
UX step forward para AI-assisted creation.
API usage
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-3-5-sonnet-20240620",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a Python fibonacci"}]
)
print(message.content[0].text)
Clean SDK. Fast responses.
vs GPT-4o
| Aspect | Claude 3.5 Sonnet | GPT-4o |
|---|---|---|
| MMLU | 88.7 | 88.7 |
| HumanEval | 92.0 | 90.2 |
| Speed | ~80 tok/s | ~80 tok/s |
| Context | 200k | 128k |
| Multimodal | Text, image | Text, image, audio, video |
| Pricing input | $3/1M | $5/1M |
| Pricing output | $15/1M | $15/1M |
| Voice realtime | No | Sí |
| Vision | Strong | Strong |
Claude 3.5 Sonnet: coding + writing + cheaper input. GPT-4o: multimodal (audio, voice realtime).
For most non-audio use cases, Claude 3.5 Sonnet won in 2024.
Disponibilidad
- Anthropic API: direct.
- AWS Bedrock: available.
- Google Vertex AI: available.
- Azure: NO (Azure is OpenAI-exclusive).
Multi-cloud para enterprises.
Claude.ai
Consumer product Claude:
- Free tier: limited chat.
- Pro: $20/mes, access a 3.5 Sonnet.
- Team: $25/user/mes.
- Enterprise: custom.
Artifacts + Projects makes it seriously useful workflow tool.
Projects
Feature Claude.ai:
- Upload documents que Claude remember cross-conversations.
- Custom instructions per project.
- Separate chat histories.
Reemplaza some use cases de Custom GPTs.
Limitations
- Audio/voice: ausente vs GPT-4o.
- Image generation: no generate (vs GPT-4o-via-DALL-E).
- Multimodal video: limited vs GPT-4o.
- Plugins ecosystem: menor que OpenAI.
- Tokenizer inefficiency: some tasks cost more en tokens.
Use cases óptimos
- Coding assistants: Cursor, Aider default to Claude 3.5 Sonnet.
- Long-document analysis: 200k context.
- Creative writing: cleaner voice.
- Instruction following: better que GPT en complex instructions.
- Technical Q&A: strong reasoning.
Adoption
Post-launch:
- Cursor: prominently featured.
- Aider: default model.
- Many startups: migrate from GPT-4o.
- Enterprise: added to multi-provider.
Anthropic’s market share grew substantially.
Sonnet 3.5 v2
October 2024: “Claude 3.5 Sonnet (new)” — improvements further:
- Computer Use capability (experimental).
- Coding benchmarks even higher.
- Reasoning refined.
Same pricing. Continuous improvement.
Safety y alignment
Anthropic’s focus:
- Constitutional AI approach.
- Reduced refusals vs Claude 2/3.0 (less overcautious).
- Harm prevention: honest about limitations.
For balanced tone, generally considered más “agreeable” que GPT-4o.
Prompt engineering tips
Patterns que funcionan:
- XML tags:
<example>,<task>— Claude handles well. - Explicit structure: “think step by step” works.
- Examples before task: few-shot strong.
- Roles: system prompt shapes behavior significantly.
Conclusión
Claude 3.5 Sonnet redefinió 2024 en LLM market. Para coding, writing, reasoning — frequently preferred model. For multimodal (audio, voice), GPT-4o still leads. Para empresas con stack multi-LLM, Claude 3.5 Sonnet debe estar. Pricing competitive + quality top = winning combination. Sonnet demonstrated que “cheaper tier” doesn’t mean “second-tier quality” — paradigm shift beneficial to ecosystem.
Síguenos en jacar.es para más sobre Anthropic, LLMs frontier y estrategia multi-modelo.