For a production chatbot answering 10,000 queries/day, what's the best model strategy?

At scale, cost matters enormously. Start with the cheapest model that meets quality requirements, then selectively use premium models only for queries that need them.

Module 2Lesson 2

Model Selection Strategy

A practical framework for choosing the right model for every task.

6 min read

2 quiz questions2 templates

Most people pick one model and use it for everything. That's like using a hammer for every home repair. Different tasks have different requirements for quality, speed, cost, and capability — and different models optimize for different combinations of these factors.

Quality required — Is this a critical business document or a quick brainstorm? High-stakes tasks justify premium models.
Speed required — Do you need real-time responses (chatbot) or is batch processing acceptable? Smaller models respond faster.
Cost sensitivity — Are you making 10 queries a day or 10,000? At scale, model choice dramatically affects your bill.
Special capabilities — Do you need vision, long context, tool use, or real-time information? Not all models support all features.

Here is a practical decision framework for common task categories:

Quick Q&A or simple tasks → Use a fast, low-cost model tier first
Complex reasoning or math → Use a reasoning-oriented tier where accuracy matters more than speed
Long document analysis → Use the model family with the strongest long-context behavior for your stack
Creative writing → Test at least two strong general-purpose model families; preferences here are often subjective
Code generation → Start with a strong coding-capable family, then benchmark on your actual repository and task type
Multimodal (image/video input) → Prefer a family with strong native vision, audio, or video support
Production APIs at scale → Start with the cheapest model that meets quality bar, upgrade only where needed

The model that's "best" on benchmarks isn't always best for your specific task. Always test your actual prompts across 2-3 models before committing to one for production use.

In production systems, a powerful pattern is to cascade: start with a fast, cheap model, and only escalate to a more capable one when needed. For example, use a low-cost model tier to classify incoming requests, then route only the hardest cases to a reasoning or flagship tier. This can dramatically reduce costs while preserving quality where it matters.

Task Complexity Classifier

Use a small model to route tasks to the appropriate model tier.

Classify this user request as SIMPLE, MODERATE, or COMPLEX based on these criteria:

- SIMPLE: Factual lookup, simple formatting, basic Q&A
- MODERATE: Requires some analysis, multiple steps, or domain knowledge
- COMPLEX: Requires deep reasoning, multi-step logic, or creative expertise

Request: "[USER REQUEST]"

Classification:

Prompt Templates

Model Evaluation Template

Standardized template for fair cross-model comparison.

I'm evaluating AI models for [USE CASE]. Please complete this task so I can compare your output with other models:

Task: [SPECIFIC TASK]
Quality criteria: [WHAT I'M EVALUATING]
Format: [EXACT OUTPUT FORMAT]

Please respond exactly in the specified format with no additional commentary.

Cost-Quality Analysis

Framework for balancing cost and quality at scale.

I need to process [VOLUME] of [TASK TYPE] per [TIME PERIOD]. Help me think through model selection:

1. What's the minimum model quality needed for this task?
2. What's the cost at this volume for different model tiers?
3. Where could I use a cheaper model without quality loss?
4. What percentage of requests likely need a premium model?

Assume standard API pricing.

Test Your Knowledge

Knowledge Check

1 / 2

What is the "cascade pattern" in model selection?

Key Takeaways

✓Model selection is a core prompt engineering skill — the right model matters as much as the right prompt
✓Evaluate models across four factors: quality, speed, cost, and special capabilities
✓Use the cascade pattern in production to optimize cost without sacrificing quality
✓Always test your specific prompts across multiple models before committing
✓The cheapest model that meets your quality bar is the correct choice for production

Previous Lesson Next Lesson

Continue Learning

GPT vs Claude vs Gemini

Understand the practical differences between the major AI models.

8 min

Cross-Model Prompt Adaptation

How to adapt your prompts when switching between different AI models.

7 min

What Is Chain-of-Thought Prompting?

Understand the technique that dramatically improves AI reasoning on complex problems.

7 min