Reading a Model Name

Model names like qwen2.5:7b-instruct-q4_K_M or llama3.2:3b-instruct-q4_K_M look cryptic, but each part tells something specific about the model. Once you know the pattern, you can immediately what you are getting before you download anything.

The anatomy of a model name

Taking qwen2.5:7b-instruct-q4_K_M as an example:

qwen2.5 : 7b - instruct - q4_K_M
  │        │      │           │
  │        │      │           └── quantization level
  │        │      └────────────── variant (how it was fine-tuned)
  │        └───────────────────── parameter count
  └────────────────────────────── model family and version

Model family and version

The first part identifies the model and its generation. qwen2.5 is version 2.5 of the Qwen family (from Alibaba). llama3.2 is Meta’s Llama 3, revision 2. Newer versions of the same family are usually more capable and more efficient.

Some well-known families:

Llama (Meta)
Qwen (Alibaba)
Mistral (Mistral AI)
Gemma (Google)
Phi (Microsoft)
DeepSeek (DeepSeek AI)

Parameter count

7b, 3b, or 70b is the number of parameters, in billions, in the model. Parameters are the values learned during training that encode the model’s knowledge and capabilities. They are numerical numbers living in big tensors (multi-dimensional arrays).

Size	Typical use	What to expect
1B – 3B	Edge devices, fast experimentation	Fast, lightweight, limited reasoning
7B – 9B	Everyday local use	Good balance of speed and capability
13B – 14B	Better quality on a single GPU	Noticeably more capable, slower
70B+	High-end workloads	Near-frontier quality, needs serious hardware

More parameters generally means better capability but slower inference and higher memory requirements. A 7B model typically fits in 4–8 GB of RAM (depending on quantization explained below), while a 70B model needs 40 GB or more even at 4-bit quantization.

Variant

The variant tells you what kind of fine-tuning was applied after the initial training.

Variant	Meaning
(none) or `base`	Raw pretrained model, not fine-tuned for conversation
`instruct`	Fine-tuned to follow instructions
`chat`	Fine-tuned for conversational use (similar to instruct)
`code`	Fine-tuned for code generation and understanding
`vision`	Multimodal which can process images as well as text

If you want a model you can talk to or give tasks to, you almost always want instruct or chat. A base model will complete your text rather than answer your question, which is useful for research and fine-tuning, but maybe not for direct use.

Quantization

Quantization reduces the precision of the model’s weights to save memory and speed up inference, at the cost of some quality. This is why the same model can come in several sizes on disk.

Level	Bits	Memory vs quality
`f16` / `fp16`	16-bit	Full quality, highest memory usage
`q8_0`	8-bit	Near-identical quality, ~50% less memory than f16
`q4_K_M`	4-bit (medium)	Small quality loss, very practical for local use
`q4_K_S`	4-bit (small)	Slightly lower quality than `_M`, smaller file
`q3_K_M`	3-bit	Noticeable quality degradation, very small

For most local use cases, q4_K_M is the standard starting point. It uses about 4x less memory than f16 with minimal quality loss. If you have enough VRAM or RAM, q8_0 is worth considering for tasks where quality is important.

Other parts you may see

Context window: 128k in a name like phi3:mini-128k-instruct means the model can process up to 128,000 tokens in one prompt.
Version tags: v0.2, or v2 is the minor model revisions within the same family.
turbo: usually a faster, more efficient variant of a larger model.
mini / small / large: informal size labels used by some families instead of exact parameter counts.

Examples

Name	Family	Params	Variant	Quantization
`qwen2.5:7b-instruct-q4_K_M`	Qwen 2.5	7B	Instruct	4-bit medium
`llama3.2:3b-instruct-q4_K_M`	Llama 3.2	3B	Instruct	4-bit medium
`mistral:7b-instruct-v0.2`	Mistral	7B	Instruct	none (f16)
`gemma2:9b-instruct-q8_0`	Gemma 2	9B	Instruct	8-bit
`phi3:mini-128k-instruct`	Phi-3	~3.8B	Instruct	none (f16)
`deepseek-r1:7b`	DeepSeek R1	7B	(reasoning)	none (f16)

7b-instruct-q4_K_M can be a good starting point on most machines. This model is a good fit for a wide range of tasks, it is fast enough and quite light enough to fit on consumer hardware.