From Prompt to Tokens
This article explains what happens internally when an LLM performs inference, transforming a prompt into generated tokens.
Topics covered include::
- prefill and decode
- hidden states
- attention
- MLP
- KV cache
- and transformer layers
This article was published on Exoscale blog: Inside an LLM: From Prompt to Tokens