Hugging Face
Every time you run ollama pull llama3.2 or deploy a model with vLLM, that model came from somewhere. For the vast majority of open-source models, that place is Hugging Face.
Knowing how to navigate it saves time and avoids mistakes, especially around licensing, model size, and compatibility.
What Hugging Face is
Hugging Face is a platform with three main things:
- Models: a hub of hundreds of thousands of pre-trained models, openly downloadable
- Datasets: data used to train or evaluate models
- Spaces: hosted demos where you can try models directly in a browser
The model hub hosts everything from small embedding models to large language models, image generators, speech recognition systems, and more.
Finding a model
The model hub lets you filter by task, language, license, and size. For LLMs specifically, the most useful filters are:
- Task: “Text Generation” covers conversational and completion models
- Library: filter by
transformers,gguf, orollamadepending on how you need to run it - License: important if you plan to use the model commercially
The search also supports filtering by organization. Most well-known models come from Meta (meta-llama), Google (google), Mistral AI (mistralai), and the Hugging Face team itself (HuggingFaceTB).
Reading a model card
Every model has a model card: a page describing what the model is, how it was trained, what it is good at, and how to use it. Before downloading anything, the model card is worth reading carefully.
Key things to check:
| Section | What to look for |
|---|---|
| License | Can you use it commercially? Does it restrict redistribution? |
| Model size | Number of parameters and disk size (affects memory requirements) |
| Context length | Maximum number of tokens the model can process at once |
| Intended use | What tasks it was trained for |
| Limitations | Known weaknesses, biases, failure modes |
| Evaluation | Benchmark scores to compare against other models |
The SmolLM2-135M model used in the LLM Internals content is a good example: its card clearly states the architecture, training data, and benchmark results.
Downloading models
With the CLI
The hf CLI tool is the simplest way to download a model:
pip install huggingface_hubDownload a full model to a local directory:
hf download HuggingFaceTB/SmolLM2-135M --local-dir ./smollmFor large models you can download only the files you need rather than the full repository:
hf download meta-llama/Llama-3.2-3B-Instruct --include "*.safetensors" --local-dir ./llamaMost models are public and require no account. Some require accepting a license agreement on the website before downloading. For those, you need to authenticate first:
hf loginWith the Python library
The huggingface_hub library gives programmatic access to the same functionality:
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="HuggingFaceTB/SmolLM2-135M",
local_dir="./smollm"
)This is useful when you need to automate downloads as part of a pipeline or initContainer.
Model formats
Models on Hugging Face are stored in different formats depending on how they are meant to be used:
| Format | Used by | Notes |
|---|---|---|
safetensors | PyTorch, transformers | Standard format for most modern models |
GGUF | Ollama, llama.cpp | Quantized format optimized for CPU inference |
ONNX | inference runtimes | Portable format for deployment |
When you run ollama pull, Ollama fetches a GGUF version of the model from its own registry. Many of those models originate from Hugging Face, converted to GGUF format.