Three years after RLHF became popular, the alignment landscape is richer. A review of RLHF, DPO, and recent methods like KTO or ORPO, with criteria for choosing.
Read moreTag: fine-tuning
LoRA and QLoRA: Efficient Fine-Tuning on a Single Laptop
LoRA dramatically cuts fine-tune cost. QLoRA goes even further. How, when, and what quality to expect.
Read moreLLM Fine-Tuning: When It’s Worth Training Your Own
Fine-tuning remains expensive and operationally complex. A guide to deciding between RAG, prompt engineering, and your own training.
Read more