Hugging Face

Every time you run ollama pull llama3.2 or deploy a model with vLLM, that model came from somewhere. For the vast majority of open-source models, that place is Hugging Face.

Knowing how to navigate it saves time and avoids mistakes, especially around licensing, model size, and compatibility.

What Hugging Face is

Hugging Face is a platform with three main things:

  • Models: a hub of hundreds of thousands of pre-trained models, openly downloadable
  • Datasets: data used to train or evaluate models
  • Spaces: hosted demos where you can try models directly in a browser

The model hub hosts everything from small embedding models to large language models, image generators, speech recognition systems, and more.

Finding a model

The model hub lets you filter by task, language, license, and size. For LLMs specifically, the most useful filters are:

  • Task: “Text Generation” covers conversational and completion models
  • Library: filter by transformers, gguf, or ollama depending on how you need to run it
  • License: important if you plan to use the model commercially

The search also supports filtering by organization. Most well-known models come from Meta (meta-llama), Google (google), Mistral AI (mistralai), and the Hugging Face team itself (HuggingFaceTB).

Reading a model card

Every model has a model card: a page describing what the model is, how it was trained, what it is good at, and how to use it. Before downloading anything, the model card is worth reading carefully.

Key things to check:

SectionWhat to look for
LicenseCan you use it commercially? Does it restrict redistribution?
Model sizeNumber of parameters and disk size (affects memory requirements)
Context lengthMaximum number of tokens the model can process at once
Intended useWhat tasks it was trained for
LimitationsKnown weaknesses, biases, failure modes
EvaluationBenchmark scores to compare against other models

The SmolLM2-135M model used in the LLM Internals content is a good example: its card clearly states the architecture, training data, and benchmark results.

Downloading models

With the CLI

The hf CLI tool is the simplest way to download a model:

pip install huggingface_hub

Download a full model to a local directory:

hf download HuggingFaceTB/SmolLM2-135M --local-dir ./smollm

For large models you can download only the files you need rather than the full repository:

hf download meta-llama/Llama-3.2-3B-Instruct --include "*.safetensors" --local-dir ./llama

Most models are public and require no account. Some require accepting a license agreement on the website before downloading. For those, you need to authenticate first:

hf login

With the Python library

The huggingface_hub library gives programmatic access to the same functionality:

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="HuggingFaceTB/SmolLM2-135M",
    local_dir="./smollm"
)

This is useful when you need to automate downloads as part of a pipeline or initContainer.

Model formats

Models on Hugging Face are stored in different formats depending on how they are meant to be used:

FormatUsed byNotes
safetensorsPyTorch, transformersStandard format for most modern models
GGUFOllama, llama.cppQuantized format optimized for CPU inference
ONNXinference runtimesPortable format for deployment

When you run ollama pull, Ollama fetches a GGUF version of the model from its own registry. Many of those models originate from Hugging Face, converted to GGUF format.

Resources