Customizing a model: prompt, RAG, or fine-tune
There are three ways to bend an off-the-shelf model to your needs — write a better prompt, give it your sources with RAG, or fine-tune its weights. Each fixes a different gap, and reaching for the heaviest one first is the classic mistake. Learn what each lever does, when to use it, and the simple rule of thumb: start with prompting, add RAG, fine-tune only if needed.
01What "customizing a model" means
Buying an off-the-rack suit and having a tailor take it in is far easier than sewing one from scratch — and you still end up with something that fits you. Customizing an AI model works the same way: you start from a capable, ready-made model that's decent at many things, and adjust what it produces for your specific use case instead of building one from nothing. You almost never start from random weights; you start from a capable model and steer it. There are three levers to do that, from lightest to heaviest: prompting, RAG, and fine-tuning.
- Two of the three levers — prompting and RAG — change the output without touching the weights at all.
- Only fine-tuning changes the model itself, by continuing its training on examples.
- The art is matching the lever to the gap: is it a knowledge gap, or a behavior gap?
02The three levers
Think of customization as three dials you can turn. Prompting shapes the output at request time with instructions, context, and examples. RAG adds knowledge at answer time by retrieving your documents. Fine-tuning changes the model's behavior by training it on examples. Tap each lever to see exactly what it changes — and what it leaves alone.
Prompting
Shape the output at request time: write clearer instructions, add context, give a few examples (few-shot), or constrain the format. The model and its weights are unchanged. It's the cheapest and fastest lever — and the recommended first step for almost any task.
03Which method should you use?
The right lever depends on the gap you're closing. Pick the need that sounds most like yours — the explorer highlights the recommended lever, tells you why in one line, and shows how all three compare for that need. These are teaching defaults, not hard rules: the levers can be combined.
04What fine-tuning actually does
Fine-tuning is the only lever that changes the model itself. It continues training the model on a curated set of example input/output pairs, so the desired pattern is learned into the weights and becomes part of the model. That's the sharp contrast with RAG: RAG leaves the weights untouched and instead adds retrieved knowledge to the prompt at answer time. Put simply — fine-tuning changes behavior; RAG changes available knowledge. One re-trains the model; the other just hands it better context when a question arrives.
- Fine-tuning bakes a tone, format, or skill into the weights — update it by training again.
- RAG bakes nothing into the model — update it by editing the knowledge base.
- Fast-changing facts suit RAG; consistent behavior suits fine-tuning. They combine cleanly.
05How fine-tuning is done: SFT, LoRA & PEFT
"Fine-tuning" is an umbrella. The most common form is supervised fine-tuning (SFT) — and full fine-tuning every parameter is expensive, so most teams use parameter-efficient methods like LoRA instead. Switch between the three views to see what each one is, at a high level.
SFT & instruction tuning — learn from examples
Supervised fine-tuning (SFT) trains the model on pairs of inputs and desired outputs — the simplest and most common way to adapt a model to a target dataset. Instruction tuning is SFT on instruction-and-response examples, which teaches a base model to follow instructions and hold a conversation.
LoRA / PEFT — fine-tune without training everything
PEFT (parameter-efficient fine-tuning) adapts a model by training only a small number of (often added) parameters while freezing the rest — cutting compute and storage cost, often with comparable quality. LoRA is the popular method: it freezes the pretrained weights and injects small trainable low-rank matrices, training only those. You can keep multiple lightweight task adapters, with no added inference latency once merged.
The tradeoffs — cost, data, maintenance
The levers line up along cost and upkeep. Prompting: lowest cost and data, instant to change, changes nothing in the model. RAG: moderate setup, update by editing the knowledge base — facts stay fresh and traceable. Fine-tuning: highest data, compute, and ongoing maintenance, and not suited to fast-changing facts. PEFT/LoRA lowers fine-tuning's compute cost, but doesn't erase the data and maintenance burden.