Artificial Intelligence
DPO and alternatives to RLHF: practical state in 2026
Direct Preference Optimization and its relatives have displaced RLHF as the preferred alignment method in much of the ecosystem. This is the practical state of the field in 2026.