Utilumo
LightDarkSystem
Explainer1 min readUpdated June 25, 2026

What is a token in AI language models?

Short answer

A token is the unit of text a language model actually processes. Tokens are usually subword pieces, so a common word may be one token while a rare or long word splits into several. Roughly, English text averages about four characters per token.

Tokens are not words

Before a model reads text, the text is split into tokens by a tokenizer. A token is often a whole common word, but it can also be part of a word, a single character, or a piece of punctuation. Many tokenizers use an approach called byte pair encoding, which builds a vocabulary of frequent character sequences.

  • Short, common words are usually one token: the, and, cat
  • Longer or rarer words split into pieces: tokenization might be token + ization
  • Spaces and punctuation count too
  • Numbers and code often use more tokens than plain prose
A rough rule of thumbFor typical English, one token is about four characters, so 100 tokens is roughly 75 words. This is only an approximation; the exact split depends on the model's tokenizer.
Try it: Token CounterEstimate how many tokens your prompt uses and compare it against common context sizes.Open tool

Why token counts matter

Models read and generate text in tokens, and both their limits and their pricing are usually measured in tokens. Knowing the token count of a prompt helps you stay within a model's context window and predict cost.

References

Questions

How many tokens is a word?

On average a little more than one. Common English words are often a single token, while long or unusual words split into several. A frequent estimate is about 0.75 words per token.

Why do different tools report different token counts?

Each model family uses its own tokenizer with its own vocabulary, so the same text can split differently. Counts are model-specific, which is why an estimate is only approximate.

Does the token counter upload my prompt?

No. The estimate is calculated locally in your browser tab and your text is never uploaded.

Keep reading