How does DSPy differ from traditional prompt engineering?

DSPy separates the "what" (typed signatures defining inputs and outputs) from the "how" (prompt text and examples). Optimizers automatically search for the best prompt construction, making the process programmatic rather than manual.

Module 6Lesson 1

APE & DSPy

Learn how Automatic Prompt Engineering and DSPy use AI to discover optimal prompts.

8 min read

2 quiz questions2 templates

Automatic Prompt Engineering (APE) flips the script: instead of you writing prompts, an LLM generates and evaluates candidate prompts to find the best one. Research shows APE-generated prompts often outperform human-written ones, especially for well-defined tasks with clear evaluation criteria.

Define your task with input-output examples (e.g., 10-20 examples of correct behavior)
Ask an LLM to generate diverse candidate prompts that could produce those outputs from those inputs
Evaluate each candidate prompt against your eval suite
Select the top performers, generate variations of them, and repeat
After several iterations, the best prompt emerges

APE in practice: Task: Classify customer emails as "urgent" or "normal" Examples: 15 labeled emails APE generates candidates: Prompt A: "Classify this email as urgent or normal." → Accuracy: 72% Prompt B: "You are an email triage specialist..." → Accuracy: 85% Prompt C: "Read this email and determine if it requires immediate attention..." → Accuracy: 89% Iteration 2 (variations of Prompt C): Prompt C1: Adds "Consider keywords like 'ASAP', 'deadline'..." → Accuracy: 93%

DSPy (by Stanford) takes a more structured approach. Instead of treating prompts as text, you define your pipeline as a program with typed signatures (input → output). DSPy's optimizers then automatically find the best prompt, few-shot examples, and even fine-tuning data for each step.

Key DSPy concepts: Signatures define input/output types. Modules are composable pipeline steps. Teleprompters (optimizers) search for the best prompt + examples. This declarative approach separates what you want from how to prompt for it.

APE works best when you have clear evaluation criteria and at least 10-20 input/output examples. For open-ended creative tasks without clear "right answers," human prompt engineering still wins.

Prompt Templates

APE Prompt Generator

Generates diverse candidate prompts for APE-style optimization.

I need a prompt that takes [INPUT DESCRIPTION] and produces [OUTPUT DESCRIPTION].

Here are examples of correct input → output:
[EXAMPLE 1]
[EXAMPLE 2]
[EXAMPLE 3]

Generate 5 diverse candidate prompts that could produce these outputs. Vary the approach: try direct instruction, role-based, few-shot, step-by-step, and constraint-based styles. For each, explain your design rationale.

Prompt Variation Generator

Creates targeted variations of a prompt for iterative optimization.

This prompt scores [SCORE] on my eval suite:

[CURRENT BEST PROMPT]

Generate 5 variations that might improve performance. Try:
1. More specific instructions
2. Different framing/role
3. Added constraints or examples
4. Restructured order
5. Simplified version

For each, explain what you changed and why it might help.

Test Your Knowledge

Knowledge Check

1 / 2

What does Automatic Prompt Engineering (APE) use to discover optimal prompts?

Key Takeaways

✓APE uses LLMs to generate, evaluate, and iteratively improve prompts — often outperforming human-written ones
✓DSPy separates intent (typed signatures) from implementation (prompt text), enabling programmatic optimization
✓Both approaches require clear evaluation criteria and sufficient examples (10-20 minimum) to work well

Previous Lesson Next Lesson

Continue Learning

Meta-Prompting & Compression

Use meta-prompts to generate task-specific prompts and compress prompts without losing quality.

7 min

Tree of Thoughts & Self-Consistency

Explore branching reasoning paths and majority-vote strategies to dramatically improve accuracy on hard problems.

9 min

ReAct & Reflexion

Learn how interleaving reasoning with actions and self-reflection loops push model capabilities further.

8 min