Field Manual · AI Code-Quality

Qodo.

How a quality-first AI platform actually thinks — and what to steal from it.

Read time ~14 min From CodiumAI · rebranded 2024 Core is Apache-2.0 Sourced from aggressive web research

Scroll

00 / Executive summary

The 30-second version

Read this and you can hold a conversation. The rest of the page earns you the deep version.

1
Qodo is a "quality-first" AI dev platform — its bet is that AI should verify and explain code, not just generate it.
2
Three surfaces over one brain: Gen (IDE), Merge (PR review), Command (CLI agents) — all fed by a cross-repo Context Engine.
3
Qodo Merge = the old PR-Agent, which is open source (Apache-2.0) and the single best thing to read in the whole Qodo universe.
4
Two transferable ideas: flow engineering (don't one-shot hard tasks) and agentic RAG (retrieve code by structure + intent, not raw tokens).
5
The craft secret everywhere: force the model into typed structured output, then render and gate on it.

01 / Mental model

Quality is the product, not speed.

Qodo (Tel Aviv, founded 2022 by Itamar Friedman & Dedy Kredo; rebranded from CodiumAI in September 2024 alongside a $40M Series A) sells a thesis: most AI tooling optimizes for writing code faster, and almost nothing optimizes for trusting it afterwards.

That framing is exactly why it's worth studying if you care about explanation and evaluation. Qodo's center of gravity — review, test, audit, justify — is precisely the territory of a tool that exists to make people want to read an analysis of their code.

Everything below is one of two things: a product you might use, or an idea you should internalize. The ideas outlast the products.

02 / The product map

Three surfaces, one brain.

Naming churns constantly — PR-Agent became Qodo Merge; Qodo Gen CLI became Qodo Command. Don't let the renames confuse you.

Qodo Merge OSS · Apache-2.0

★★★ your closest blueprint

Formerly PR-Agent. AI pull-request review & automation across GitHub, GitLab, Bitbucket, Azure. Open-source core + hosted Pro tier.

Git platforms · CLI · CI · webhook

Qodo Command Credits

★★ structured analysis on tap

A CLI to build, run and schedule custom AI agents defined as TOML files. Can emit validated JSON, write files, run as an MCP server.

Terminal · CI · MCP

Qodo Gen Free + Teams

low — IDE authoring

The IDE plugin (VS Code, JetBrains): code generation, test generation, chat, local review.

VS Code · JetBrains

Context Engine Aware · Enterprise

★★ feeds whole-repo context

Agentic RAG across thousands of repos (semantic + architectural + temporal). Open Aware is the free MCP variant for public repos.

API · MCP

03 / Two ideas that matter

Internalize these and the rest is detail.

Idea 01

Flow engineering

Their AlphaCodium research (arXiv 2401.08500) lifted GPT-4 on CodeContests from 19% → 44% — not with a bigger model, but by structuring the task into verified stages: reflect → propose → generate tests → run & fix.

Lesson for you: a good audit is never "dump the diff, ask for a review." It's describe → analyze → evaluate → rank → explain, each stage feeding the next.

Idea 02

Agentic RAG

You can't paste a repo into a prompt. Qodo indexes with AST-aware chunking (a method kept with its class signature + imports) and embeds an LLM-written description next to each chunk, so retrieval matches intent, not just tokens.

Three retrieval axes — semantic, architectural, temporal — map almost 1:1 onto your scopes: module/repo, dependencies, and commit/branch history.

04 / Inside PR-Agent

The reference implementation.

It's open source and it already does the "analyze a changeset" half of any code-audit tool. Here is how a /review actually flows from a diff to a comment.

Provider

github · gitlab · local

Ingest

hunks + line numbers

Compress

fit context window

Prompt

toml + jinja2

LLM

via litellm

Parse

forgiving YAML

Publish

markdown comment

Two pieces are pure gold to copy. First, the diff is reformatted with explicit __new hunk__ / __old hunk__ markers plus line numbers — that's the trick that lets the model cite an exact file:line. Second, the git-provider abstraction includes a local provider, so it diffs a local branch, not only a hosted PR.

Run it in ten minutes

terminal

pip install pr-agent
export OPENAI_KEY=sk-...                       # or ANTHROPIC_KEY, via litellm

# review a hosted pull request
pr-agent --pr_url https://github.com/owner/repo/pull/123 review
pr-agent --pr_url ...                          describe
pr-agent --pr_url ...                          improve

# steer it inline
pr-agent --pr_url ... review --pr_reviewer.extra_instructions="focus on security"

# a local branch with no hosted PR: set config.git_provider="local"

Config lives in settings/configuration.toml, overridable per-repo with a root .pr_agent.toml. The gap worth noticing: PR-Agent emits Markdown only and has no single-commit / module / whole-repo audit mode. That gap is the opportunity.

05 / Concrete anatomy

What the artifacts actually look like.

Seeing the real shapes is what turns "I know what Qodo is" into "I know how Qodo thinks." Examples are faithful in shape, paraphrased for clarity.

A. The output of `/review`

Note it's a table of typed fields, not prose — typed fields render into UI and gate CI; free prose does neither.

PR comment · markdown

## PR Review

**Estimated effort to review [1-5]:** 3 — touches auth middleware + a refresh path.
**Score:** 72
**Relevant tests:** No — the token-refresh path has no coverage.
**Possible security concern:** Yes — `refresh_token` logged at INFO in `auth/session.py:88`.

### Key issues to review
| Issue | File | Why it matters |
|-------|------|----------------|
| Missing null-check on `user` | `api/handler.py:42` | NPE on anonymous requests |
| Broad `except:` swallows errors | `core/db.py:130` | Hides connection failures |

B. The most instructive artifact — a prompt

Prompts are Jinja2 templates stored inside .toml files. The schema is the prompt: define the YAML shape, demand conformance, parse defensively.

settings/pr_reviewer_prompts.toml

[pr_review_prompt]
system = """You are PR-Reviewer, an AI reviewing a Git Pull Request.
Respond in YAML conforming exactly to this schema:

review:
  estimated_effort_to_review_[1-5]: int   # 1 = trivial, 5 = very hard
  score: str                              # 0-100
  relevant_tests: str                     # "yes" / "no"
  security_concerns: str                  # "No" or describe + file:line
  key_issues_to_review:
    - relevant_file: str
      issue_header: str
      issue_content: str
      start_line: int
"""

user = """PR title: {{ title }}
The diff (line-numbered, with '__new hunk__' / '__old hunk__' markers):
{{ diff }}
Respond ONLY with valid YAML matching the schema above."""

C. A Qodo Command agent

A "custom agent" is just a TOML block. output_schema forces validated JSON; execution_strategy="plan" is flow-engineering as a knob; exit_expression turns analysis into a CI gate.

agent.toml

[commands.audit_module]
description = "Audit a module and emit structured findings"
instructions = "Analyze files under {{ module_path }}; rank issues by severity."
arguments = [
  { name = "module_path", type = "string", required = true },
]
tools = ["filesystem", "ripgrep", "git", "qodo_aware_context_retriever"]
execution_strategy = "plan"            # multi-step  (vs "act" = one-shot)
exit_expression = "issues.length == 0" # CI pass / fail gate

D. The pattern you'll reuse most — annotate the diff

The single most persuasive format: render the real change, drop a severity-colored callout at the exact line. Showing beats telling.

Critical · outage / exploit

Warning · measurable regression

Suggestion · improvement

auth/session.py · annotated

  def refresh(token):
-     log.debug("refreshing")
+     log.info("refreshing %s", refresh_token)

⚠ Critical · security · line 88

The raw refresh_token is now written to INFO-level logs, which are typically shipped off-box and retained. Log a redacted prefix instead, e.g. refresh_token[:6] + "…".

06 / Compression strategy

How a huge diff fits in a small window.

The hardest part of any code-analysis tool. Worth knowing cold.

Additions > deletions. Collapse deleted files to a one-line list; drop deletion-only hunks.
Token-cost sort. Estimate tokens per file with tiktoken; group by language.
Greedy packing. Add patches until a buffer threshold below the model max; clip any single oversized patch.
List the rest. Files that didn't fit are named ("other modified files") so the model knows they exist.
Reserve output budget. Hold back tokens so the response isn't truncated mid-YAML.

07 / Glossary

Lock in the vocabulary.

Flow engineering: Structuring an LLM task into verified multi-step stages instead of one prompt. AlphaCodium's core idea.

Agentic RAG: Retrieval where an agent iteratively re-queries and reasons over results, vs one-shot fetch-then-answer.

PR compression: The token-aware diff-packing that fits a large changeset into a context window.

Structured output: Forcing the model into a typed YAML/JSON schema so the result is machine-renderable and gate-able.

Context Engine: Qodo's multi-repo index: AST chunking + NL descriptions + semantic / architectural / temporal retrieval.

MCP: Model Context Protocol — the open standard by which Qodo exposes tools/context to and consumes them from Claude, Cursor, etc.

08 / Go deeper

The canonical sources, in order.

github.com/qodo-ai/pr-agent
Read settings/*_prompts.toml and algo/pr_processing.py. The highest-signal artifact in the whole Qodo universe.
qodo-merge-docs.qodo.ai
The "core-abilities" pages on compression strategy and metadata.
arXiv:2401.08500 — AlphaCodium
The intellectual foundation: flow engineering + test-based iteration.
qodo.ai/blog — "RAG for large-scale code repos"
The Context Engine internals: AST chunking, embedded descriptions, retrieval axes.
github.com/qodo-ai/agents
Example agents — look at ArchMind and Changelog Generator, the closest analogs to "audit → document."
deepwiki.com/qodo-ai/pr-agent
An auto-generated architecture map if you want orientation before the source.

09 / Pitfalls & honest caveats

What the marketing won't tell you.

Names drift constantly. PR-Agent ↔ Qodo Merge, Gen CLI ↔ Command. Always check which era a blog post is from.

Pricing numbers in third-party articles conflict ($19 vs $30/user; 75 vs 250 free credits). Trust only qodo.ai/pricing.

Some 2026 claims are unverified. "Series B", "Qodo 2.0", SWE-bench scores mostly trace to SEO sites, not primary sources.

≠

Qodo is not deterministic SAST. It complements Sonar's rules and gates; it doesn't replace them. AI review is probabilistic — it can miss, and it can hallucinate.

The 30-second version

Quality is the product, not speed.

Three surfaces, one brain.

Internalize these and the rest is detail.

Flow engineering

Agentic RAG

The reference implementation.

Run it in ten minutes

What the artifacts actually look like.

A. The output of /review

B. The most instructive artifact — a prompt

C. A Qodo Command agent

D. The pattern you'll reuse most — annotate the diff

How a huge diff fits in a small window.

Lock in the vocabulary.

The canonical sources, in order.

What the marketing won't tell you.

A. The output of `/review`