Reading a Model Name
Model names like qwen2.5:7b-instruct-q4_K_M or llama3.2:3b-instruct-q4_K_M look cryptic, but each part tells something specific about the model. Once you know the pattern, you can immediately what you are getting before you download anything.
The anatomy of a model name
Taking qwen2.5:7b-instruct-q4_K_M as an example:
qwen2.5 : 7b - instruct - q4_K_M
│ │ │ │
│ │ │ └── quantization level
│ │ └────────────── variant (how it was fine-tuned)
│ └───────────────────── parameter count
└────────────────────────────── model family and versionModel family and version
The first part identifies the model and its generation. qwen2.5 is version 2.5 of the Qwen family (from Alibaba). llama3.2 is Meta’s Llama 3, revision 2. Newer versions of the same family are usually more capable and more efficient.
Some well-known families:
- Llama (Meta)
- Qwen (Alibaba)
- Mistral (Mistral AI)
- Gemma (Google)
- Phi (Microsoft)
- DeepSeek (DeepSeek AI)
Parameter count
7b, 3b, or 70b is the number of parameters, in billions, in the model. Parameters are the values learned during training that encode the model’s knowledge and capabilities. They are numerical numbers living in big tensors (multi-dimensional arrays).
| Size | Typical use | What to expect |
|---|---|---|
| 1B – 3B | Edge devices, fast experimentation | Fast, lightweight, limited reasoning |
| 7B – 9B | Everyday local use | Good balance of speed and capability |
| 13B – 14B | Better quality on a single GPU | Noticeably more capable, slower |
| 70B+ | High-end workloads | Near-frontier quality, needs serious hardware |
More parameters generally means better capability but slower inference and higher memory requirements. A 7B model typically fits in 4–8 GB of RAM (depending on quantization explained below), while a 70B model needs 40 GB or more even at 4-bit quantization.
Variant
The variant tells you what kind of fine-tuning was applied after the initial training.
| Variant | Meaning |
|---|---|
(none) or base | Raw pretrained model, not fine-tuned for conversation |
instruct | Fine-tuned to follow instructions |
chat | Fine-tuned for conversational use (similar to instruct) |
code | Fine-tuned for code generation and understanding |
vision | Multimodal which can process images as well as text |
If you want a model you can talk to or give tasks to, you almost always want instruct or chat. A base model will complete your text rather than answer your question, which is useful for research and fine-tuning, but maybe not for direct use.
Quantization
Quantization reduces the precision of the model’s weights to save memory and speed up inference, at the cost of some quality. This is why the same model can come in several sizes on disk.
| Level | Bits | Memory vs quality |
|---|---|---|
f16 / fp16 | 16-bit | Full quality, highest memory usage |
q8_0 | 8-bit | Near-identical quality, ~50% less memory than f16 |
q4_K_M | 4-bit (medium) | Small quality loss, very practical for local use |
q4_K_S | 4-bit (small) | Slightly lower quality than _M, smaller file |
q3_K_M | 3-bit | Noticeable quality degradation, very small |
For most local use cases, q4_K_M is the standard starting point. It uses about 4x less memory than f16 with minimal quality loss. If you have enough VRAM or RAM, q8_0 is worth considering for tasks where quality is important.
Other parts you may see
- Context window:
128kin a name likephi3:mini-128k-instructmeans the model can process up to 128,000 tokens in one prompt. - Version tags:
v0.2, orv2is the minor model revisions within the same family. turbo: usually a faster, more efficient variant of a larger model.mini/small/large: informal size labels used by some families instead of exact parameter counts.
Examples
| Name | Family | Params | Variant | Quantization |
|---|---|---|---|---|
qwen2.5:7b-instruct-q4_K_M | Qwen 2.5 | 7B | Instruct | 4-bit medium |
llama3.2:3b-instruct-q4_K_M | Llama 3.2 | 3B | Instruct | 4-bit medium |
mistral:7b-instruct-v0.2 | Mistral | 7B | Instruct | none (f16) |
gemma2:9b-instruct-q8_0 | Gemma 2 | 9B | Instruct | 8-bit |
phi3:mini-128k-instruct | Phi-3 | ~3.8B | Instruct | none (f16) |
deepseek-r1:7b | DeepSeek R1 | 7B | (reasoning) | none (f16) |
7b-instruct-q4_K_M can be a good starting point on most machines. This model is a good fit for a wide range of tasks, it is fast enough and quite light enough to fit on consumer hardware.