AI & LLMs
This section is the result of my exploration of the AI domain. Some articles started as a question I asked myself, and are the result of many back-and-forth conversations with an AI, rephrasing, refining, and digging into the details. Others are hands-on tutorials built from experiments.
Understand the foundations
A plain-language overview: tokens, attention, MLP blocks, sampling, and what it all means for the tools you use
The deep dive: tensor inventory, Q/K/V mechanics, the full forward pass, KV cache, and prefill vs decode
A quick map of the AI tooling: Ollama, vLLM, LiteLLM, llm-d, and where each one fits
Where open-source models live: how to find a model, read a model card, and download what you need
What qwen2.5:7b-instruct-q4_K_M actually means: family, parameter count, variant, and quantization explained
Run locally first
The fastest way to get started with LLMs is to run them on your own machine. No API keys, no subscription, full control over the model.