Tools Overview

The AI tooling landscape moves fast and many tools seem to deserve the same purpose. This page gives a summary to quickly get what each tool does and where it fits.

Model hub

ToolWhat it does
HuggingFaceThe main hub for open-source models. Most tools in this list can pull models directly from it.

Running models locally

The simplest starting point where your local machine is all that you eed.

ToolWhat it does
OllamaRuns open-source models locally with a simple CLI and API.
llama.cppRuns quantized models efficiently on CPU. This is the engine behind many local tools, including Ollama.

Serving models in production

When you need to handle multiple users at once, local tools are not enough.

ToolWhat it does
vLLMHigh-throughput inference server built for GPU. The most widely used option for self-hosted production serving.
HuggingFace TGIHuggingFace’s own inference server. Similar positioning to vLLM.
llm-dDistributed inference on Kubernetes, designed for large models that need multiple GPUs or nodes.
LiteLLMA proxy that exposes a unified API in front of many backends (OpenAI, Anthropic, vLLM, Ollama…). Useful when you want to switch providers without changing your application code.

Building on top of models

Libraries and frameworks that help you build applications on top of models.

ToolWhat it does
LangChainOrchestration framework for chaining model calls, managing prompts, and integrating tools.
LlamaIndexFocused on retrieval-augmented generation (RAG): connecting models to your own data.