Single-repo LLM audit — landscape & recommendation

TL;DR

Scopeone repo at a time. Tooling chosen by the repo's stack. Two outputs: whole-repo audit and per-MR review packet.

ReframeSemgrep is AppSec, not architecture or quality. Use stack-native arch tools (dependency-cruiser, ArchUnitNET, import-linter, Conftest) and quality tools (Sonar, kube-linter, ruff, etc.) on their own axes.

Pipelinedetect stack → run stack-native analyzers → load CLAUDE.md rulebook → Claude-Agent-SDK triages SARIF + reasons about architecture → Markdown out → render to HTML packet.

Recommended pathExtend sw-audit with two per-repo commands: audit and review <mr|range>. Lean on PR-Agent for review prompts; lean on tech-debt-skill / fan-out-audit / metis patterns for whole-repo. Markdown canonical → static HTML.

The reframe — four independent axes

The earlier report leaned on Semgrep across axes. Correction: Semgrep is one input to the security axis; each axis has its own canonical tooling per stack.

Axis	Canonical tooling (cross-stack)	LLM adds
Security (SAST + SCA + secrets + IaC)	Snyk / Trivy / Checkov / Bandit / Semgrep / Roslyn-security / kube-linter	Triage, dedup, exploitability rationale
Architecture / layering	dependency-cruiser, ts-arch, ArchUnitNET, NetArchTest, import-linter, Conftest/OPA	Narrative across modules; cite Clean-Arch rule violated
Quality / smells / debt	Sonar family, ESLint+SonarJS, Roslyn+SonarAnalyzer, ruff+pylint, tflint, kube-linter, CodeScene	Dedup overlapping rules; rank by blast radius
Custom rules / fitness functions	OPA/Conftest (IaC/K8s), Roslyn custom analyzers (.NET/Unity), dep-cruiser/ts-arch rules (TS), import-linter contracts (Py)	Reason about rules too soft for static encoding ("should-not-have-done")

Per-stack recipes

For each stack: minimum analyzer set, where to put custom rules, and what the LLM layer adds on top.

NestJS / TypeScript backend

Security: npm audit / Snyk / OSV; eslint-plugin-security, eslint-plugin-nestjs-security; Semgrep TS rules
Architecture: dependency-cruiser (SARIF), ts-arch, Madge (cycles), eslint-plugin-boundaries or Sheriff; Nx tags if monorepo
Quality: ESLint + @typescript-eslint + SonarJS + eslint-plugin-import + eslint-plugin-nestjs-typed; knip (dead exports); tsc --noEmit
Custom rules: dep-cruiser forbidden in .dependency-cruiser.cjs; ts-arch as Jest tests; ESLint rules for Nest decorator misuse

LLM: ingest SARIF (ESLint, SonarJS, Semgrep) + dep-cruiser JSON → cluster by module/controller, explain controller→repo bypasses, rank by blast radius.

C# / .NET backend

Security: Security Code Scan, Roslyn security rules, dotnet list package --vulnerable, Snyk/OSV
Architecture: ArchUnitNET (fluent xUnit assertions) and/or NetArchTest; NsDepCop (namespace deps); NDepend (CQLinq, commercial, deep)
Quality: SonarAnalyzer.CSharp, StyleCop.Analyzers, JetBrains InspectCode CLI (SARIF), dotnet format
Custom rules: Custom Roslyn analyzers for project Clean-Arch rules; CQLinq in NDepend; xUnit project hosting ArchUnitNET asserts

LLM: feed Roslyn/Sonar SARIF + ArchUnitNET text + NDepend XML → layered-violation narrative, dedup Sonar smells vs. ArchUnit findings, severity rationale.

Unity client C# / C++ · thin tooling

Security: Limited. Treat as C# (Roslyn security) + asset/secret scanning (Trivy fs, gitleaks)
Architecture: ArchUnitNET / NetArchTest against built assemblies in Edit-mode tests; NDepend on the generated SLN
Quality: Microsoft.Unity.Analyzers (official, active), JetBrains Rider Unity inspections via InspectCode, SonarQube on the generated SLN
Custom rules: Custom Roslyn analyzers for project conventions (no GameObject.Find, no Resources.Load, MonoBehaviour boundary rules); IL post-processors for deeper checks

LLM: cross-reference analyzer output with Unity Profiler/Memory Profiler exports → narrate "perf-architecture" issues (e.g., GC-alloc hotspots tied to MonoBehaviour anti-patterns). No silver bullet here.

Terraform infra

Security: Trivy (trivy config, tfsec merged in) and/or Checkov (broadest rules, native SARIF); KICS alternative
Architecture: Conftest + OPA/Rego on terraform show -json plan.json (module composition, tag/region/network constraints); terraform-compliance for BDD; CUE if schema-first
Quality: tflint (provider plugins), terraform fmt -check, terraform validate
Custom rules: Rego in policy/ via Conftest; Checkov custom YAML/Python; tflint custom rules

LLM: SARIF (Checkov/Trivy) + Conftest JSON + Infracost diff → triage by blast radius (public S3 in prod), explain drift, group remediations per module.

Helm / Kubernetes manifests

Security: kubescape (NSA/CIS/MITRE, SARIF), Trivy k8s/config, Checkov k8s
Architecture: Conftest/OPA Rego against rendered manifests (helm template | conftest test -); required labels, NetworkPolicy, namespace shape. Note: datree is sunset; use Monokle / kubescape.
Quality: kube-linter (StackRox), Polaris (Fairwinds), kube-score, kubeconform (kubeval deprecated), helm lint
Custom rules: Conftest Rego in policy/; kube-linter custom templates; Polaris custom checks

LLM: ingest kube-linter/Polaris/kubescape SARIF + rendered manifest tree → cluster per workload, explain prod-readiness gaps, propose chart-level (not per-manifest) fixes.

Python data platform data

Security: bandit (or ruff S-rules), pip-audit / Snyk, Semgrep py rules
Architecture: import-linter (layers/forbidden/independence contracts in .importlinter), pydeps / grimp for graphs; lint-imports in CI
Quality: Ruff (replaces flake8/isort/pyupgrade/pydocstyle/black-format), pyright or mypy, pylint for deeper smells, vulture (dead code)
Custom rules: import-linter contracts; Ruff per-file selectors; Semgrep custom py rules; dbt-checkpoint / sqlfluff if dbt/SQL

LLM: SARIF (Ruff, Bandit, Pyright) + import-linter text → narrate layer violations ("domain/ importing infra/"), dedup Ruff/Pylint overlap, rank by pipeline criticality.

Stack detection (single-repo driver)

package.json → Node. dependencies contains @nestjs/core → NestJS recipe. nx.json → Nx variant.
*.sln / *.csproj → .NET. Presence of ProjectSettings/ProjectVersion.txt or Assets/ + Packages/manifest.json → Unity recipe.
*.tf / terragrunt.hcl → Terraform.
Chart.yaml → Helm. Else kustomization.yaml or any *.yaml with apiVersion:/kind: → K8s manifests.
pyproject.toml / requirements*.txt → Python. dbt_project.yml → dbt overlay.
Tie-break: largest LOC stack wins. Multi-stack repo → run multiple recipes, tag findings with stack.
Output: each tool writes to .audit/<stack>/<tool>.sarif; driver merges via SARIF multi-tool runs.

Canonical pipeline

1Detect stackLockfiles + manifest fingerprints pick the recipe.

2Run analyzersStack-native security / arch / quality tools emit SARIF.

3Load rulebookCLAUDE.md + .claude/skills/audit/SKILL.md injected at session start.

4Index repoAider repo-map / Claude Code @codebase for navigation.

5Agent fan-outPer-module subagents apply rulebook over small file batches.

6LLM triageDedup SARIF; explain; sketch fixes (Metis pattern).

7Severity + citefile:line citations, severity bucket, effort estimate.

8Emit packetAUDIT.md canonical → audit.html; audit.sarif sidecar.

Whole-repo audit — top picks

1ksimback/tech-debt-skill github.com/ksimback/tech-debt-skill

Claude Code skill; produces TECH_DEBT_AUDIT.md with file-cited findings, severity, effort. Per-repo override via .claude/skills/tech-debt-audit/SKILL.md. MIT.

Tradeoff: tech-debt slant; extend the skill for Clean-Arch rules.

2lachiejames/fan-out-audit github.com/lachiejames/fan-out-audit

Slash command spawning ~200 parallel Claude subagents over small file batches; explicit precision-vs-context-bloat design. Whole-repo, target-dir scopable.

Tradeoff: raw token cost; needs orchestration around it.

3arm/metis github.com/arm/metis

OSS AI security review with native SARIF triage (--triage); validates third-party SAST; supports vLLM / Ollama for local models.

Tradeoff: security-shaped; useful as the SARIF-triage component, not the whole audit.

alsoevilsocket/audit github.com/evilsocket/audit

8-stage Claude-Agent-SDK pipeline with prompts + JSON schemas + SQLite state. Closest "single-repo audit driver" reference implementation.

MR / diff review packet — top picks

1Qodo PR-Agent github.com/qodo-ai/pr-agent · GitLab self-host docs

/describe /review /improve; hierarchical best_practices.md with per-folder/per-group scope; on-prem GitLab; CLI returns Markdown/JSON.

Tradeoff: no first-class standalone HTML — you wrap and template.

2anthropics/claude-code-security-review github.com/anthropics/claude-code-security-review

Official GitHub Action + /security-review slash command; CLAUDE.md-driven; reads pending diff.

Tradeoff: GitHub-bound, security-focused; pattern is reusable for GitLab via Claude Agent SDK.

3GitLab Duo Code Review Flow docs.gitlab.com/.../code_review/

GA from GitLab 18.8 (Jan 2026), Sonnet 4.6 in 19.1; supports repo-specific review instructions; self-managed via Duo Self-Hosted AI Gateway.

Tradeoff: UI-only output today, no portable review packet, no arbitrary commit range.

alsocodedog · Greptile · CodeRabbit · kodus/Kody

codedog emits a Markdown report; Greptile has rich .greptile/rules.md; CodeRabbit has .coderabbit.yaml learnings; Kody is OSS, model-agnostic.

Rulebook ingestion — pick one

Use `CLAUDE.md` (root) + `.claude/skills/<audit>/SKILL.md`

Two reasons:

Claude Code's progressive-loading: SKILL.md loads only when the audit runs, so the big rulebook doesn't bloat every session; the small CLAUDE.md just dispatches.
Co-located with the repo, version-controlled, reusable from CLI, GitHub/GitLab Action, or Claude Agent SDK.

For MR review only, Qodo's best_practices.md is more expressive — drive Qodo from the same Markdown via symlink / extra_instructions.

Report format — pick one

Markdown canonical → static HTML render

Markdown is what every agent (Claude, Qodo, Greptile, codedog, Metis) emits natively, and what GitLab/GitHub render inline.
Pipe through pandoc or a single marked + Prism + diff2html template for the standalone "review packet" HTML/PDF.
Generating HTML directly from the LLM wastes tokens, breaks diffability, and loses the GitLab MR-comment path.
Pair with audit.sarif sidecar for GitHub/GitLab Code Scanning ingestion.

Three ways forward

	A. Integrate heavy	B. Thin custom on top of OSS pieces (recommended)	C. All-custom from scratch
Whole-repo audit	Qodo self-host "rule system"; or commercial Sonar + CodeScene	tech-debt-skill / fan-out-audit pattern + Claude Agent SDK; analyzers per stack; Metis-style SARIF triage	Hand-rolled prompts + own orchestrator
MR / diff packet	GitLab Duo Code Review Flow inline; abandon HTML packet	GitLab API → PR-Agent CLI → diff2html + Jinja → static HTML	Hand-rolled prompts + UI
Rulebook	Per-tool config (Qodo, Sonar)	`CLAUDE.md` + `SKILL.md` (single source); also feeds PR-Agent `best_practices.md`	Custom prompt template
Engineering cost	Low	Medium	High
Lock-in / recurring $	High	Low (Apache-2.0 / MIT)	None
Risk	Vendor roadmap	PR-Agent / skill prompts (forkable)	Prompt-engineering bills + time

Recommendation

Path B — thin per-repo CLI on top of OSS pieces

Two new sw-audit commands, both single-repo:

sw-audit audit <repo> — detect stack, run stack-native analyzers, load CLAUDE.md rulebook, fan out Claude subagents (tech-debt-skill / fan-out-audit pattern), triage SARIF (Metis-style), emit AUDIT.md → render audit.html.
sw-audit review <repo> <mr|range> — fetch diff + commits + before/after trees via GitLab API, run PR-Agent CLI with project rulebook, render via diff2html + Jinja into a self-contained review.html packet.

Main tradeoff: you maintain the glue between detector, analyzers, and LLM layer. Mitigation: each piece is replaceable; rulebook lives in the repo, not in the tool.

Highest-value links

ksimback/tech-debt-skill — whole-repo Claude Code audit reference
lachiejames/fan-out-audit — parallel subagent pattern
arm/metis — SARIF-native LLM triage, local-model support
anthropics/claude-code-security-review — official diff-review action
qodo-ai/pr-agent · PR-Agent GitLab self-host
Qodo Rule System / best_practices.md
GitLab Duo Code Review Flow (GA 18.8, Sonnet 4.6 in 19.1)
dependency-cruiser — TS/JS arch SARIF
ArchUnitNET — .NET arch rules
import-linter — Python layered contracts
kube-linter · kubescape
Checkov · Trivy
codedog — Markdown review report
diff2html
evilsocket/audit — 8-stage Claude-Agent-SDK pipeline

Generated for the sw-audit project · 2026-05-23 · single-repo scope, two outputs, four axes