When should you use shorter chunks (128-256 tokens)?

Shorter chunks work best when the expected answers are concise facts. They reduce noise in retrieved context. Longer chunks are better when answers require surrounding context to make sense.

Module 2Lesson 2

Chunking Strategies

Learn how to split documents into chunks that maximize retrieval quality and minimize noise.

7 min read

2 quiz questions2 templates

You can't embed an entire 200-page document as one vector — it would lose specificity. Instead, you split documents into chunks. The way you chunk directly impacts retrieval quality: too large and chunks contain too much noise; too small and chunks lose context.

Fixed-size chunking: Split every N tokens (e.g., 512) with overlap (e.g., 50 tokens). Simple but ignores document structure.
Recursive character splitting: Split by paragraphs first, then sentences, then characters — preserving natural boundaries.
Semantic chunking: Use embeddings to detect topic shifts and split where meaning changes. Higher quality but more complex.
Document-structure chunking: Split by headings, sections, or HTML tags. Best when documents have clear structure.

There is no universal optimal chunk size — it depends on your content and queries. However, 256–512 tokens works well for most use cases. Shorter chunks (128–256) work better for precise factual retrieval. Longer chunks (512–1024) work better when answers require broader context.

Always include overlap between chunks (10-20% of chunk size). Without overlap, information at chunk boundaries gets split across two chunks and may never be retrieved as a complete thought.

Attach metadata to each chunk: source document, section title, page number, date, and any tags. This enables filtered retrieval (e.g., "only search documents from 2024") and helps the model cite sources.

Prompt Templates

Chunking Strategy Selector

Gets tailored chunking advice for your specific RAG use case.

I need to build a RAG system for [DOCUMENT TYPE, e.g., "technical documentation with clear section headings"]. My typical user queries are [QUERY TYPE, e.g., "specific how-to questions"].

Recommend: chunk method, chunk size, overlap, and metadata to attach. Explain your reasoning.

Chunk Quality Evaluator

Audits individual chunks for quality and coherence.

Here is a chunk from my RAG system:
---
[CHUNK TEXT]
---

Evaluate: (1) Is this chunk self-contained enough to answer a question without external context? (2) Does it contain a single coherent topic or multiple? (3) Suggest improvements to the chunking.

Test Your Knowledge

Knowledge Check

1 / 2

Why is chunk overlap important?

Key Takeaways

✓Chunk size directly impacts retrieval quality — 256-512 tokens is a good default with 10-20% overlap
✓Structure-aware chunking (by sections, paragraphs) outperforms naive fixed-size splitting for structured documents
✓Metadata enrichment enables filtered retrieval and source citation in generated answers

Previous Lesson Next Lesson

Continue Learning

Embeddings & Vector Stores

Understand how text becomes searchable vectors and how vector databases power semantic search.

8 min

Building RAG Pipelines

Assemble end-to-end RAG systems with query routing, re-ranking, and answer synthesis.

9 min

Tree of Thoughts & Self-Consistency

Explore branching reasoning paths and majority-vote strategies to dramatically improve accuracy on hard problems.

9 min