From Prompt to Tokens

This article explains what happens internally when an LLM performs inference, transforming a prompt into generated tokens.

Topics covered include::

  • prefill and decode
  • hidden states
  • attention
  • MLP
  • KV cache
  • and transformer layers

This article was published on Exoscale blog: Inside an LLM: From Prompt to Tokens