Mature LLM-as-judge: when to trust and when not

Using an LLM to judge another LLM became widespread in 2024 and remains the only scalable way to evaluate qualitative quality. The mature question is when to trust those numbers.

139 5 min April 28, 2026 4.7

Arquitectura

MCP as multi-vendor standard: patterns already mature

The Model Context Protocol, proposed by Anthropic in late 2024 and adopted through 2025-2026 by every major vendor, has proven operational patterns. This is the state of the art.

174 5 min April 28, 2026 4.5

Inteligencia Artificial

Synthetic training data in 2026: when it works

Synthetic data has moved from precarious substitute for real data to central component of modern training. These are the patterns that work and those still failing.

144 5 min April 28, 2026 4.3

Arquitectura

Skills and subagents: the agent reuse pattern

Skills package reusable capabilities; subagents isolate bounded-task execution. Together they form the most effective pattern for composing complex agents in 2026.

380 5 min April 28, 2026 4.5

Inteligencia Artificial

DPO and alternatives to RLHF: practical state in 2026

Direct Preference Optimization and its relatives have displaced RLHF as the preferred alignment method in much of the ecosystem. This is the practical state of the field in 2026.

762 5 min April 28, 2026 4.7

Experiencia de Usuario

Runtime-generated UI: the first serious year

La idea de que la UI se genere sobre la marcha en lugar de ser prediseñada llegó a producción en 2025. Tras un año de casos reales, el balance es más matizado que el entusiasmo inicial.

271 6 min April 28, 2026 4.2

Inteligencia Artificial

AI agent incidents: recovery runbooks that work

Los agentes fallan. La pregunta no es si, sino cómo y qué haces en los primeros veinte minutos. Este es el runbook que distingue un incidente contenido de una reputación dañada.

120 8 min April 28, 2026 4.7

Inteligencia Artificial

LLM red teaming: a practical playbook

El red teaming de modelos de lenguaje ha pasado de actividad esotérica a práctica obligatoria. Con OWASP Agentic Top 10 y CSA Agentic AI Red Teaming Guide convergiendo en un vocabulario común, este es el manual operativo que cualquier equipo que despliegue agentes necesita tener.

124 12 min April 26, 2026 4.3

Inteligencia Artificial

Production-grade agent evaluations: the framework that works

Después de año y medio llenando tableros con agentes en producción, la pregunta que separa equipos que envían fiable de los que van a ciegas sigue siendo la misma: ¿cómo mides que el agente está funcionando?

144 15 min April 22, 2026 4.3

Inteligencia Artificial

Prompt Engineering: From Trick to Mature Discipline

Prompt engineering ha pasado de ser una colección de trucos virales a una disciplina con patrones reproducibles, librerías dedicadas y herramientas de observabilidad.

151 10 min April 17, 2026 4.7

Arquitectura

Agent OS in production: real cases without the marketing

El concepto de Agent OS pasó del slide al despliegue en 2025. Seis meses en producción dejan patrones visibles: qué arquitecturas funcionan, dónde se rompe el modelo y qué aporta frente a correr agentes sobre pila existente.

164 13 min April 13, 2026 4.5

Inteligencia Artificial

Deploy Llama 3.3 and Mistral locally with Ollama and Open WebUI on Ubuntu 24.04

Step-by-step tutorial in the established jacar.es series: install, GPU setup, quantized models, and secure exposure behind Traefik.

102 7 min April 12, 2026