Build an AI Code Review Tool with API Integration
Create a Python-based code review assistant that uses LLM API calls to analyze code for bugs, security issues, style violations, and improvement opportunities.
Why AI Code Review Matters
Manual code reviews are essential but slow. Reviewers get fatigued, miss edge cases, and often focus on style over substance. AI code review automation does not replace human reviewers — it augments them by catching the obvious issues before a human ever looks at the code. This means human reviewers can focus on architecture decisions, business logic, and design patterns instead of spotting missing null checks.
In this project, you will build a Python tool that sends code to an LLM API and receives structured review feedback. You will design specialized prompts for different review concerns (bugs, security, readability) and learn how to parse and format the AI's feedback. This is a real tool you can integrate into your development workflow.
Project
intermediate60 minProject Overview
Setting Up the Foundation
The tool uses the OpenAI Python SDK (which also works with compatible APIs). Here is the basic structure that sends code to an LLM and gets a review back. This forms the backbone of the tool — everything else builds on this pattern.
import openai
import json
from pathlib import Path
client = openai.OpenAI() # Uses OPENAI_API_KEY env var
def review_code(code: str, filename: str, review_type: str = "general") -> dict:
"""Send code to LLM for review and return structured feedback."""
system_prompt = get_review_prompt(review_type)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": f"Review this file ({filename}):\n\n```\n{code}\n```"}
],
response_format={"type": "json_object"},
temperature=0.1 # Low temperature for consistent, precise analysis
)
return json.loads(response.choices[0].message.content)
def get_review_prompt(review_type: str) -> str:
"""Return the appropriate system prompt for the review type."""
prompts = {
"general": GENERAL_REVIEW_PROMPT,
"security": SECURITY_REVIEW_PROMPT,
"bugs": BUG_DETECTION_PROMPT,
}
return prompts.get(review_type, GENERAL_REVIEW_PROMPT)Prompt 1: General Code Quality Review
The general review prompt is the workhorse of the tool. It evaluates readability, naming, structure, and common anti-patterns. The key to making this work well is asking for structured JSON output with severity levels, line numbers, and specific suggestions — not just "this code could be better."
General Code Review Prompt
The system prompt for general code quality reviews. It enforces structured JSON output with severity levels and actionable suggestions.
You are an expert code reviewer. Analyze the provided code and return a JSON object with the following structure:
{
"summary": "One-paragraph overall assessment",
"score": 1-10,
"issues": [
{
"severity": "critical" | "warning" | "suggestion",
"line": <line number or null>,
"category": "bug" | "readability" | "performance" | "naming" | "structure" | "duplication",
"description": "What the issue is",
"suggestion": "Specific fix or improvement",
"code_before": "problematic code snippet",
"code_after": "suggested replacement"
}
],
"positives": ["List of things done well"]
}
Rules:
- Be specific: include line numbers and code snippets in every issue
- Prioritize: critical issues first, suggestions last
- Be constructive: every issue must include a concrete suggestion
- Recognize good patterns: the "positives" list matters for morale
- Do not flag style preferences (tabs vs spaces, bracket placement) — focus on substancePrompt 2: Security-Focused Review
Security review requires a different lens. The prompt needs to check for OWASP top 10 vulnerabilities, injection risks, authentication flaws, and data exposure. Security prompts should be explicit about what to look for — LLMs are better at checking a list than doing an open-ended "find security issues" search.
Security Review Prompt
A security-focused review prompt that checks for OWASP vulnerabilities and returns structured findings with remediation steps.
You are a security-focused code reviewer. Analyze the provided code for security vulnerabilities.
Check for ALL of the following:
1. **Injection** — SQL injection, command injection, XSS, template injection
2. **Authentication/Authorization** — missing auth checks, privilege escalation, insecure session handling
3. **Data exposure** — logging sensitive data, hardcoded secrets, PII in error messages
4. **Input validation** — missing or insufficient validation, type confusion
5. **Cryptography** — weak algorithms, hardcoded keys, improper random number generation
6. **Dependencies** — known vulnerable patterns, unsafe deserialization
Return a JSON object:
{
"risk_level": "high" | "medium" | "low" | "none",
"vulnerabilities": [
{
"severity": "critical" | "high" | "medium" | "low",
"type": "OWASP category or CWE ID",
"line": <line number>,
"description": "What the vulnerability is and how it could be exploited",
"remediation": "Exact code change to fix this",
"references": ["Link to relevant documentation"]
}
],
"secure_practices_found": ["List of security best practices already in the code"]
}
IMPORTANT: Do not flag theoretical risks with no practical exploit path. Every vulnerability must include a realistic attack scenario.Prompt 3: Bug Detection
Bug Detection Prompt
Focuses exclusively on finding bugs, logic errors, and unhandled edge cases. Returns findings with specific trigger conditions.
You are a bug-hunting code reviewer. Your job is to find logic errors, edge cases, and runtime failures.
Analyze the code for:
1. **Off-by-one errors** in loops, array access, and string manipulation
2. **Null/undefined handling** — variables that could be null but are not checked
3. **Race conditions** — concurrent access to shared state
4. **Resource leaks** — unclosed files, connections, or streams
5. **Type mismatches** — implicit conversions that could fail
6. **Edge cases** — empty inputs, very large inputs, negative numbers, unicode
7. **Logic errors** — conditions that are always true/false, unreachable code, wrong operators
Return a JSON object:
{
"bugs_found": [
{
"severity": "critical" | "probable" | "possible",
"line": <line number>,
"description": "What the bug is",
"trigger": "Specific input or condition that would trigger this bug",
"fix": "Code to fix the issue"
}
],
"edge_cases_to_test": ["List of edge cases the code should be tested against"]
}Wiring It Together: The CLI Tool
Now let's combine the prompts into a usable command-line tool. The CLI accepts a file path and review type, sends the code for review, and outputs formatted results.
import argparse
import sys
def format_issues(review: dict) -> str:
"""Format review results for terminal output."""
output = []
if "summary" in review:
output.append(f"\n📋 Summary: {review['summary']}")
output.append(f" Score: {review.get('score', 'N/A')}/10\n")
if "risk_level" in review:
output.append(f"\n🔒 Risk Level: {review['risk_level'].upper()}\n")
# Format issues/vulnerabilities/bugs
items = review.get("issues", review.get("vulnerabilities", review.get("bugs_found", [])))
severity_icons = {
"critical": "🔴", "high": "🟠",
"warning": "🟡", "probable": "🟡", "medium": "🟡",
"suggestion": "🔵", "possible": "🔵", "low": "🔵"
}
for i, item in enumerate(items, 1):
sev = item.get("severity", "info")
icon = severity_icons.get(sev, "⚪")
line = f" (line {item['line']})"
if item.get("line") else ""
output.append(f"{icon} #{i} [{sev.upper()}]{line}")
output.append(f" {item.get('description', '')}")
fix = item.get("suggestion", item.get("fix", item.get("remediation", "")))
if fix:
output.append(f" 💡 Fix: {fix}")
output.append("")
# Positives
for positive in review.get("positives", review.get("secure_practices_found", [])):
output.append(f"✅ {positive}")
return "\n".join(output)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="AI Code Reviewer")
parser.add_argument("file", help="Path to the file to review")
parser.add_argument(
"--type", "-t",
choices=["general", "security", "bugs"],
default="general",
help="Type of review to perform"
)
args = parser.parse_args()
code = Path(args.file).read_text()
print(f"Reviewing {args.file} ({args.type} review)...\n")
result = review_code(code, args.file, args.type)
print(format_issues(result))Extending the Tool
Once the basic tool works, there are several ways to make it more powerful:
- Run all three review types in parallel using Python's asyncio and aggregate the results
- Add a --diff flag that reviews only changed lines from a git diff instead of the full file
- Integrate with GitHub Actions to run the review automatically on every pull request
- Add a --fix flag that applies suggested fixes automatically (with confirmation)
- Cache results so re-running on unchanged files is instant
Best Practices for AI Code Review
- Use structured JSON output — it makes results parseable and consistent across reviews
- Set temperature low (0.0-0.2) — you want precision, not creativity
- Include positive feedback — reviewers and authors both need to know what is going well
- Require specific line numbers — vague feedback like "consider improving error handling" is useless
- Validate the AI's findings — AI code review is a first pass, not the final word. Human review is still essential.
Test Your Knowledge
Knowledge Check
1 / 3
Why should you set temperature to a low value (0.0-0.2) for AI code review?
Key Takeaways
- ✓AI code review augments human reviewers by catching obvious issues before a human ever looks at the code
- ✓Use specialized prompts for different review types (general, security, bugs) rather than one combined prompt
- ✓Require structured JSON output with line numbers and severity levels for parseable, actionable feedback
- ✓Set temperature low (0.0-0.2) for consistent, precise analysis
- ✓Always validate AI findings with human judgment — AI review is a first pass, not the final word
- ✓Consider token costs: review only changed lines (diffs) when possible, and use cheaper models for initial screening
Continue Learning
Build a Complete AI Content Creation Workflow
Design and execute a multi-step content pipeline: research, outline, draft, edit, and SEO optimize — all powered by AI prompts.
Design a Complete AI Customer Support System Prompt
Build a professional system prompt for a customer support chatbot that handles tone, boundaries, escalation, and common questions gracefully.
Analyze Data with AI: From Raw Data to Insights
A step-by-step guide to using AI for data analysis: describe your data, ask the right questions, extract insights, and generate visualizations — no coding required.