Tools Overview

The AI tooling landscape moves fast and many tools seem to deserve the same purpose. This page gives a summary to quickly get what each tool does and where it fits.

Model hub

Tool	What it does
HuggingFace	The main hub for open-source models. Most tools in this list can pull models directly from it.

Running models locally

The simplest starting point where your local machine is all that you eed.

Tool	What it does
Ollama	Runs open-source models locally with a simple CLI and API.
llama.cpp	Runs quantized models efficiently on CPU. This is the engine behind many local tools, including Ollama.

Serving models in production

When you need to handle multiple users at once, local tools are not enough.

Tool	What it does
vLLM	High-throughput inference server built for GPU. The most widely used option for self-hosted production serving.
HuggingFace TGI	HuggingFace’s own inference server. Similar positioning to vLLM.
llm-d	Distributed inference on Kubernetes, designed for large models that need multiple GPUs or nodes.
LiteLLM	A proxy that exposes a unified API in front of many backends (OpenAI, Anthropic, vLLM, Ollama…). Useful when you want to switch providers without changing your application code.

Building on top of models

Libraries and frameworks that help you build applications on top of models.

Tool	What it does
LangChain	Orchestration framework for chaining model calls, managing prompts, and integrating tools.
LlamaIndex	Focused on retrieval-augmented generation (RAG): connecting models to your own data.