What is the "lost in the middle" effect?

Research shows LLMs attend more strongly to the start and end of their context window, sometimes underweighting or missing information placed in the middle of a long prompt.

Module 1Lesson 2

Tokens & Context Windows

Learn about tokens, context windows, and why they determine what AI can and cannot do.

6 min read

2 quiz questions2 templates

AI models don't read words — they read tokens. A token is a chunk of text, typically 3-4 characters. The word "hamburger" becomes three tokens: "ham", "bur", "ger". The word "the" is one token. Understanding tokens matters because everything in AI is measured in them — cost, speed, and the amount of text the model can process.

English averages about 1 token per 0.75 words (or ~4 characters per token)
A typical page of text is about 300-400 tokens
Code is more token-dense — a line of code may use 10-20 tokens
Non-English languages often use more tokens per word

The context window is the total amount of text the model can "see" at once — your prompt plus its response. Think of it as the model's working memory. Once you exceed the context window, the model literally cannot see the earlier parts of your conversation.

Prompt

Context window sizes vary dramatically across models:

Current GPT flagship tier

Large context window suitable for long documents, code, and multi-turn workflows

Current Claude flagship tier

Very large context window often favored for long-form analysis and writing

Current Gemini long-context tier

Ultra-long context windows designed for very large text, code, audio, and video inputs

Bigger context windows don't always mean better results. Models tend to pay more attention to the beginning and end of the context, sometimes losing information in the middle. This is called the "lost in the middle" effect.

Every token costs money when using AI APIs, and longer prompts take longer to process. Being concise isn't just about clarity — it's about cost and speed. A prompt that uses 500 tokens instead of 2,000 is 4x cheaper and noticeably faster.

Put the most important information at the beginning and end of your prompt
Remove filler words and redundant instructions
For long documents, summarize or extract key sections before sending to the model
Track your token usage — most API dashboards show this

Prompt Templates

Document Summarizer (Token-Efficient)

Extracts key information while keeping token usage low.

Summarize this document in under 200 words, focusing on: [SPECIFIC ASPECT]. Use bullet points for key facts. Skip background information I already know.

Document:
[PASTE TEXT]

Long Document Analyzer

Efficiently analyzes specific parts of long documents.

I'm going to give you a long document. Focus on these sections specifically:
1. [SECTION/TOPIC 1]
2. [SECTION/TOPIC 2]

For each, extract: the main claim, supporting evidence, and any caveats. Ignore everything else.

Document:
[PASTE TEXT]

Test Your Knowledge

Knowledge Check

1 / 2

Approximately how many tokens does the average English word use?

Key Takeaways

✓Tokens are the fundamental unit of AI text processing — typically 3-4 characters each
✓The context window is the model's total working memory for your conversation
✓Place critical information at the beginning and end of long prompts
✓Concise prompts are cheaper, faster, and often more effective
✓Different models have vastly different context window sizes

Previous Lesson Next Lesson

Continue Learning

What Are LLMs

A plain-English explanation of large language models and why they behave the way they do.

8 min

Temperature & Sampling

Understand temperature, top-p, and other settings that control how creative or deterministic AI outputs are.

6 min

Understanding Model Capabilities

What AI models can and cannot do, and how to choose the right model for your task.

7 min