Synthetic training data in 2026: when it works

Synthetic data has moved from precarious substitute for real data to central component of modern training. These are the patterns that work and those still failing.

71 5 min April 28, 2026 4.3

Inteligencia Artificial

DPO and alternatives to RLHF: practical state in 2026

Direct Preference Optimization and its relatives have displaced RLHF as the preferred alignment method in much of the ecosystem. This is the practical state of the field in 2026.

279 5 min April 28, 2026 4.7

Inteligencia Artificial

Alignment evaluation: RLHF, DPO, and recent alternatives

Tres años después de que RLHF se hiciera popular, el paisaje del alineamiento de modelos es más rico. Repaso de RLHF, DPO y los métodos más recientes como KTO o ORPO, con criterios para elegir.

92 11 min February 8, 2025

Inteligencia Artificial

Reinforcement Learning: An Autonomous Learning Technique

El aprendizaje por refuerzo enseña a los sistemas de IA a tomar decisiones óptimas mediante recompensas y penalizaciones. Componentes, aplicaciones y limitaciones de esta técnica clave.

64 10 min March 18, 2023 4.3