AI & ML Advanced By Samson Tanimawo, PhD Published Feb 24, 2026 7 min read

Fine-Tuning Llama and Mistral for Domain Tasks

The open-weight base models are great. They’re also generic. A few thousand domain examples and a weekend of fine-tuning turn them into your tool, not Meta’s.

When to fine-tune

Three signals you need fine-tuning, not just prompting:

If those don’t apply, prompting + RAG is cheaper and faster.

Dataset preparation

Aim for 1,000-10,000 high-quality (input, ideal-output) pairs. Quality dominates quantity.

LoRA recipe

Reasonable defaults:

A 7B Llama LoRA fine-tune fits on a single 24GB consumer GPU. 70B fits on 1xH100 at 4-bit.

Eval

Run the same benchmark before and after. If accuracy didn’t go up, the fine-tune is doing nothing useful (or worse). Common eval setups:

Deploy

LoRA adapters are tiny (50-500 MB). Serve via vLLM with multi-LoRA support, or merge into the base weights for slightly faster inference. Per-tenant LoRAs let you serve customised models without N base-model copies.

Four mistakes

  1. Fine-tuning on too little data. Below 500 examples, you usually overfit. Get more data.
  2. Fine-tuning before trying prompting hard. The fine-tune is only worth it if prompting can’t close the gap.
  3. Catastrophic forgetting. Aggressive fine-tuning damages general capability. Mix in a small fraction of generic instruction-following data.
  4. Not versioning the dataset. Three months later you can’t reproduce or compare. Treat training data with the same discipline as production code.