Claude Code vs Cursor vs GitHub Copilot in 2026: a comparison with measured tasks

Three production coding agents, five real tasks measured by time, tokens, and PR quality. No marketing — the actual measurement.

May 8, 2026 5 min 1.1K

Architecture

Skills and subagents: the agent reuse pattern

Skills package reusable capabilities; subagents isolate bounded-task execution. Together they form the most effective pattern for composing complex agents in 2026.

April 28, 2026 3 min 583 4.3

Artificial Intelligence

Mature LLM-as-judge: when to trust and when not

Using an LLM to judge another LLM became widespread in 2024 and remains, in 2026, the only scalable way to evaluate qualitative quality in LLM systems. It is reliable when judge-human correlation exceeds 0.7 on 30 cases and gets recalibrated quarterly; below that threshold, do not trust the number.

April 28, 2026 2 min 282 4.7

Artificial Intelligence

Claude Opus 4.7 and long-horizon tasks: real changes

Opus 4.7 launched as Anthropic's most capable model, with emphasis on long-horizon agentic work. After two months of intensive use, these are the practical changes versus Opus 4.6.

April 28, 2026 3 min 327 4.7

Artificial Intelligence

FinOps on agent tokens: the invoice that surprises

The first invoice for a production agent usually runs double or triple the estimate. This article walks through five real levers, in priority order, caching, routing, context control, batching, and telemetry, to cut cost without touching perceived quality.

April 28, 2026 3 min 264 4.2

Artificial Intelligence

AI agent incidents: recovery runbooks that work

AI agents fail in production, and what matters is how you respond in the first twenty minutes. This runbook covers severity classification, isolating before investigating, purging contaminated memory, communicating without inventing facts, and turning every incident into a regression test before closing it as done.

April 28, 2026 4 min 229 4.7

Artificial Intelligence

LLM red teaming: a practical playbook

LLM red teaming has gone from an esoteric activity to a mandatory practice. With the OWASP Agentic Top 10 and the CSA Agentic AI Red Teaming Guide converging on shared vocabulary, this is the operational playbook any team deploying agents needs to have.

April 26, 2026 6 min 226 4.2

Artificial Intelligence

Production-grade agent evaluations: the framework that works

Después de año y medio llenando tableros con agentes en producción, la pregunta que separa equipos que envían fiable de los que van a ciegas sigue siendo la misma: ¿cómo mides que el agente está funcionando?

April 22, 2026 7 min 261 4.3

Architecture

Agent OS in production: real cases without the marketing

El concepto de Agent OS pasó del slide al despliegue en 2025. Seis meses en producción dejan patrones visibles: qué arquitecturas funcionan, dónde se rompe el modelo y qué aporta frente a correr agentes sobre pila existente.

April 13, 2026 6 min 270 4.3

Artificial Intelligence

Enterprise agent governance: the controls that are no longer optional

After two years of pilots and a year of agents in production, governance has moved from an aspirational committee to an operational control. What audits ask for, what broke in 2025, and which guardrails absorb most incidents.

April 1, 2026 6 min 252 4.4

Artificial Intelligence

Lessons from agents in production in 2025: summary for 2026

Durante 2025 cientos de equipos pusieron agentes IA en producción real. A principios de 2026, con datos suficientes, emergen lecciones consistentes sobre qué falla, qué funciona, cuánto cuesta y qué tareas no encajan. Repaso ordenado para equipos que empiezan ahora.

March 26, 2026 7 min 247 4.7

Architecture

Consolidated MCP ecosystem: a quick map for 2026

Twenty months after the initial announcement, Model Context Protocol went from curiosity to de-facto standard among agent clients and servers. What is available, which servers are worth it, which problems remain open, and how it compares to earlier protocol maps.

March 23, 2026 6 min 278 4.2

Artificial Intelligence

Claude Haiku 4.5: lightweight power for massive agent fleets

Anthropic publicó Haiku 4.5 en octubre de 2025 y el modelo ha madurado rápido: rendimiento cercano a Sonnet 4 en tareas estructuradas a un tercio del coste, ventana amplia y latencia baja. Es la pieza que faltaba para desplegar agentes a escala sin quemar presupuesto.

February 18, 2026 6 min 258

Artificial Intelligence

UX for agents: first design consensus

After two years watching every product invent its own interface for talking to an agent, by January 2026 a stable design consensus is emerging about which patterns work, which do not, and what the average user already expects. Time to write down what has settled.

January 28, 2026 7 min 299 4.5

Architecture

Agent-to-agent protocols v1: what we have in hand

Six months after A2A landed at the Linux Foundation, and after several implementation cycles from Google, Microsoft, and open projects, what version 1 of the protocol means and whether it is safe to build on yet.

January 25, 2026 5 min 225 4.4

Architecture

Agent-to-agent protocols: the next open layer

With MCP solving the agent-to-tool layer, a parallel problem surfaces: how do two agents from different vendors communicate with each other. Google's Agent2Agent protocol, donated to the Linux Foundation in June 2025, tries to fill that gap with an open standard.

December 23, 2025 5 min 257 4.4

Artificial Intelligence

AI agent observability: what to instrument first

Agents that chain calls to models, tools and memory are hard to debug without instrumentation designed for them. After a long year running agents in production, I cover what to measure first, which standards are consolidating, and which costly mistakes are avoided by getting the traces right from the start.

December 8, 2025 8 min 256

Artificial Intelligence

Computer Use in production: agents that drive the interface

Casi nueve meses después del lanzamiento de Computer Use, algunos equipos lo han llevado a producción para tareas reales. Dónde funciona, dónde todavía no conviene, y qué patrones están emergiendo para que un agente que maneja ratón y teclado no acabe siendo más problema que solución.

July 14, 2025 7 min 240 4.5

Artificial Intelligence

MCP clients in editors: AI integrates where you already work

Los editores de código han empezado a incorporar MCP como cliente nativo: VS Code, Zed, Cursor y varios forks de Neovim. Esto cambia la forma en que el agente accede al contexto del proyecto y abre preguntas prácticas sobre qué servidores activar y cómo configurarlos sin abrir puertas.

July 11, 2025 7 min 243 4.5

Artificial Intelligence

Continuous integration with AI agents: early patterns

AI agents are starting to earn a real place in continuous integration pipelines: reviewing diffs, proposing fixes, generating missing tests. Six months of real-world use to separate the patterns that work from the ones that end up costing more time than they save.

July 8, 2025 7 min 272 4.4

Artificial Intelligence

UI design for agents: principles we’re starting to understand

A year after chat stopped being the only acceptable way to talk to an agent, UI patterns built specifically for agent tasks are emerging. I go through the ones starting to stick and the ones that are just cycle fashion.

June 14, 2025 6 min 267 4.1

Artificial Intelligence

Community MCP servers: which ones are worth it

Seis meses después de que MCP se volviera el protocolo común de integración de agentes, el catálogo comunitario supera el millar de servidores. Repaso cuáles uso a diario, cuáles son ruido y cómo separarlos sin caer en la trampa de la novedad.

June 11, 2025 6 min 227 4.2

Startup

Y Combinator 2025: trends from the AI cohorts

Y Combinator's W25 and S25 cohorts show a historic tilt toward vertical agents and developer tools, with outcome-based pricing emerging as a new model. I break down the visible patterns, the business models on display, and what founders operating outside Silicon Valley should copy from this reading of the batch.

May 24, 2025 6 min 696 4.3