luc.run Clusters Practice Ecosystem Tips & Tricks AI & LLMs Apps Portfolio About TechWhale@YouTube Homelab

CTRL K

CTRL K

Clusters
- Local clusters
- Exoscale SKS
Practice
Ecosystem
Tips & Tricks
AI & LLMs
Apps
Portfolio
- Events
  - DevOpsDays Zurich 2026
  - KubeCon Amsterdam 2026
About

How LLMs Work
From Prompt to Tokens
Tools Overview
Hugging Face
Reading a Model Name
Running Ollama on your Mac
Ollama on Kubernetes
Why vLLM?
vLLM on Kubernetes

From Prompt to Tokens

This article explains what happens internally when an LLM performs inference, transforming a prompt into generated tokens.

Topics covered include::

prefill and decode
hidden states
attention
MLP
KV cache
and transformer layers

This article was published on Exoscale blog: Inside an LLM: From Prompt to Tokens