Hugging Face

Every time you run ollama pull llama3.2 or deploy a model with vLLM, that model came from somewhere. For the vast majority of open-source models, that place is Hugging Face.

Knowing how to navigate it saves time and avoids mistakes, especially around licensing, model size, and compatibility.

What Hugging Face is

Hugging Face is a platform with three main things:

Models: a hub of hundreds of thousands of pre-trained models, openly downloadable
Datasets: data used to train or evaluate models
Spaces: hosted demos where you can try models directly in a browser

The model hub hosts everything from small embedding models to large language models, image generators, speech recognition systems, and more.

Finding a model

The model hub lets you filter by task, language, license, and size. For LLMs specifically, the most useful filters are:

Task: “Text Generation” covers conversational and completion models
Library: filter by transformers, gguf, or ollama depending on how you need to run it
License: important if you plan to use the model commercially

The search also supports filtering by organization. Most well-known models come from Meta (meta-llama), Google (google), Mistral AI (mistralai), and the Hugging Face team itself (HuggingFaceTB).

Reading a model card

Every model has a model card: a page describing what the model is, how it was trained, what it is good at, and how to use it. Before downloading anything, the model card is worth reading carefully.

Key things to check:

Section	What to look for
License	Can you use it commercially? Does it restrict redistribution?
Model size	Number of parameters and disk size (affects memory requirements)
Context length	Maximum number of tokens the model can process at once
Intended use	What tasks it was trained for
Limitations	Known weaknesses, biases, failure modes
Evaluation	Benchmark scores to compare against other models

The SmolLM2-135M model used in the LLM Internals content is a good example: its card clearly states the architecture, training data, and benchmark results.

Downloading models

With the CLI

The hf CLI tool is the simplest way to download a model:

pip install huggingface_hub

Download a full model to a local directory:

hf download HuggingFaceTB/SmolLM2-135M --local-dir ./smollm

For large models you can download only the files you need rather than the full repository:

hf download meta-llama/Llama-3.2-3B-Instruct --include "*.safetensors" --local-dir ./llama

Most models are public and require no account. Some require accepting a license agreement on the website before downloading. For those, you need to authenticate first:

hf login

With the Python library

The huggingface_hub library gives programmatic access to the same functionality:

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="HuggingFaceTB/SmolLM2-135M",
    local_dir="./smollm"
)

This is useful when you need to automate downloads as part of a pipeline or initContainer.

Model formats

Models on Hugging Face are stored in different formats depending on how they are meant to be used:

Format	Used by	Notes
`safetensors`	PyTorch, transformers	Standard format for most modern models
`GGUF`	Ollama, llama.cpp	Quantized format optimized for CPU inference
`ONNX`	inference runtimes	Portable format for deployment

When you run ollama pull, Ollama fetches a GGUF version of the model from its own registry. Many of those models originate from Hugging Face, converted to GGUF format.