Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Desarrollo de Software Inteligencia Artificial

fine-tuning ia generativa llm lora mlops qlora rag

LLM Fine-Tuning: When It’s Worth Training Your Own

July 13, 2023 9 min read 150 reads

Table of contents

Key Takeaways
The Three Customisation Levels
When Fine-Tuning Genuinely Makes Sense
LoRA and QLoRA: Accessible Fine-Tuning
What Actually Costs
Alternatives Before Deciding
Conclusion

Actualizado: 2026-05-03

The question “should we fine-tune our own LLM?” reaches architecture tables almost monthly. The short answer, almost always, is not yet. The long answer is that there are legitimate cases, costs have come down thanks to LoRA and QLoRA — but remain considerable — and alternatives like RAG or prompt engineering solve 80% of needs without training’s operational overhead.

Key Takeaways

Fine-tuning is the third customisation tier: the most expensive and the slowest to iterate.
LoRA and QLoRA have lowered the GPU bar from “8 × A100” to “an RTX 4090”.
The real cost isn’t GPUs: it’s data, evaluation, and ongoing operation.
For most teams, RAG + prompt engineering covers the case without training anything.
Fine-tuning is justified when the style, format, or cost problem has hit a ceiling with the other two approaches.

The Three Customisation Levels

To frame the problem, there are three layers of LLM customisation, from lowest to highest cost:

Prompt engineering. Tune instructions, few-shot examples, chain-of-thought. Marginal cost, iteration in minutes. Covers the vast majority of well-defined tasks.
Retrieval-Augmented Generation (RAG). Retrieve relevant chunks from a knowledge base and pass them into the model’s context. Medium cost (embeddings + vector store), iteration in days.
Fine-tuning. Modify model weights with your own examples. High cost (data, GPUs, validation), iteration in weeks.

Jumping directly to fine-tuning is the most common mistake. Most teams that try could have gotten equivalent or better results with a well-designed RAG — see the vector database comparison as a pipeline foundation.

When Fine-Tuning Genuinely Makes Sense

Three cases where fine-tuning justifies its cost:

Very specific style or voice. If you need the model to respond with an exact brand personality — idioms, grammar structures, a tone you can’t capture in a long system prompt — fine-tuning internalises it.
Very structured output format. Models fine-tuned to always return a specific JSON, or to follow a proprietary markup schema, are more reliable than prompted ones: the format becomes “sewn into” the model.
Cost and latency reduction with small models. A 7B-parameter model fine-tuned on your domain can match or beat GPT-3.5 for that specific task, at 10–20% of the cost per token and better latency.

Outside these cases, RAG usually wins.

LoRA and QLoRA: Accessible Fine-Tuning

The big recent shift is that fine-tuning went from “you need 8 A100s” to “you can do it on an RTX 4090”. The key technique is LoRA^[1] (Low-Rank Adaptation): instead of training all weights, you add low-rank matrices over the frozen model. The result is practically identical to full fine-tuning at 1% of the GPU cost.

QLoRA^[2] combines LoRA with 4-bit quantisation. It lets you fine-tune 65-billion-parameter models on a single 48 GB VRAM GPU — something previously unthinkable.

Libraries like PEFT^[3] from Hugging Face and axolotl^[4] wrap these methods with declarative config. A LoRA pipeline over Llama 2 7B fits in a 30-line YAML — directly related to what the post on LLaMA 2 and open models describes.

What Actually Costs

The real cost of fine-tuning isn’t GPUs — it’s everything else.

Preparing the dataset. Between 500 and 5,000 quality examples (prompt + ideal response) require substantial manual investment. Poorly designed examples poison the model with biases and failures.
Iteration and evaluation. A bad fine-tune can look good on the happy path and fail catastrophically on edge cases. You need automated evals before and after.
Production operation. Your own model means managing inference, updates, and drift monitoring. This isn’t just “calling an API” anymore.

Realistic budget for a first serious fine-tune: 2–3 engineering weeks + 1–5k USD in GPU + a basic MLOps pipeline for evaluation.

Alternatives Before Deciding

Before committing to fine-tuning, exhaust these options:

RAG over your domain. With pgvector or Pinecone plus good reranking, you cover “the model needs to know company-specific data” without training anything.
Longer prompts with careful examples. GPT-4 with 16 careful few-shot examples often beats a fine-tuned 7B model if examples are good.
Function calling with structured response. If you’re after structure, function calling solves most cases without training.
Existing specialised models. For common tasks (code, medical, legal) the community already has fine-tuned models: CodeLlama^[5], Med-PaLM, and others.

Conclusion

Fine-tuning has become technically democratised thanks to LoRA and QLoRA, but operationally it’s still a serious investment. For the vast majority of teams, starting with prompt engineering + RAG is the right path; fine-tuning is reserved for problems where the other two have clearly hit a ceiling. When you do pursue it, rigorous evaluation before and after training is as important as the training itself.

Was this useful?

[Total: 14 · Average: 4.6]

Post Views: 150

Written by

Javier Cañete

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Desarrollo de Software

AI editors in 2026: comparison after a year of use

Claude Code, Cursor, Aider, Copilot, Windsurf. Tras un año intenso con los principales editores asistidos por IA, esta es la comparativa que importa para quien elige hoy.

228 5 min April 28, 2026

Desarrollo de Software

AI tools for developers: the 2026 stack

El stack de herramientas IA que un desarrollador usa en 2026 es distinto al de hace dieciocho meses. Editores agénticos, herramientas de revisión, agentes de terminal y asistentes de pruebas se han estabilizado en roles reconocibles. Guía práctica por categoría.

164 13 min March 29, 2026 4.5

Desarrollo de Software

Rust in the Linux kernel: balance after several years

Cuatro años y medio después de la entrada oficial de Rust en el kernel Linux 6.1, con drivers reales de GPU Apple y NVMe en producción y tras varios conflictos mediáticos entre mantenedores, toca hacer balance técnico sin histrionismo. Qué funciona, qué cuesta y hacia dónde va la próxima fase.

140 11 min March 8, 2026 4.3

Desarrollo de Software

WASI preview 3: adoption and real cases

WASI preview 3 llegó como estándar estable a finales de 2025 y ha tenido unos meses para demostrar si realmente desbloquea los casos que preview 2 se quedaba cortos. Recorrido honesto por adopciones reales, bibliotecas maduras y patrones que empiezan a funcionar en producción.

240 13 min February 6, 2026 4.6

LLM Fine-Tuning: When It’s Worth Training Your Own

Key Takeaways

The Three Customisation Levels

When Fine-Tuning Genuinely Makes Sense

LoRA and QLoRA: Accessible Fine-Tuning

What Actually Costs

Alternatives Before Deciding

Conclusion

Related posts

AI editors in 2026: comparison after a year of use

AI tools for developers: the 2026 stack

Rust in the Linux kernel: balance after several years

WASI preview 3: adoption and real cases