TL;DR
CLAUDE.md rulebook → Claude-Agent-SDK triages SARIF + reasons about architecture → Markdown out → render to HTML packet.sw-audit with two per-repo commands: audit and review <mr|range>. Lean on PR-Agent for review prompts; lean on tech-debt-skill / fan-out-audit / metis patterns for whole-repo. Markdown canonical → static HTML.The reframe — four independent axes
The earlier report leaned on Semgrep across axes. Correction: Semgrep is one input to the security axis; each axis has its own canonical tooling per stack.
| Axis | Canonical tooling (cross-stack) | LLM adds |
|---|---|---|
| Security (SAST + SCA + secrets + IaC) | Snyk / Trivy / Checkov / Bandit / Semgrep / Roslyn-security / kube-linter | Triage, dedup, exploitability rationale |
| Architecture / layering | dependency-cruiser, ts-arch, ArchUnitNET, NetArchTest, import-linter, Conftest/OPA | Narrative across modules; cite Clean-Arch rule violated |
| Quality / smells / debt | Sonar family, ESLint+SonarJS, Roslyn+SonarAnalyzer, ruff+pylint, tflint, kube-linter, CodeScene | Dedup overlapping rules; rank by blast radius |
| Custom rules / fitness functions | OPA/Conftest (IaC/K8s), Roslyn custom analyzers (.NET/Unity), dep-cruiser/ts-arch rules (TS), import-linter contracts (Py) | Reason about rules too soft for static encoding ("should-not-have-done") |
Per-stack recipes
For each stack: minimum analyzer set, where to put custom rules, and what the LLM layer adds on top.
NestJS / TypeScript backend
- Security
npm audit/ Snyk / OSV;eslint-plugin-security,eslint-plugin-nestjs-security; Semgrep TS rules- Architecture
- dependency-cruiser (SARIF), ts-arch, Madge (cycles),
eslint-plugin-boundariesor Sheriff; Nx tags if monorepo - Quality
- ESLint +
@typescript-eslint+ SonarJS +eslint-plugin-import+eslint-plugin-nestjs-typed; knip (dead exports);tsc --noEmit - Custom rules
- dep-cruiser
forbiddenin.dependency-cruiser.cjs; ts-arch as Jest tests; ESLint rules for Nest decorator misuse
C# / .NET backend
- Security
- Security Code Scan, Roslyn security rules,
dotnet list package --vulnerable, Snyk/OSV - Architecture
- ArchUnitNET (fluent xUnit assertions) and/or NetArchTest; NsDepCop (namespace deps); NDepend (CQLinq, commercial, deep)
- Quality
- SonarAnalyzer.CSharp, StyleCop.Analyzers, JetBrains InspectCode CLI (SARIF),
dotnet format - Custom rules
- Custom Roslyn analyzers for project Clean-Arch rules; CQLinq in NDepend; xUnit project hosting ArchUnitNET asserts
Unity client C# / C++ · thin tooling
- Security
- Limited. Treat as C# (Roslyn security) + asset/secret scanning (Trivy fs, gitleaks)
- Architecture
- ArchUnitNET / NetArchTest against built assemblies in Edit-mode tests; NDepend on the generated SLN
- Quality
- Microsoft.Unity.Analyzers (official, active), JetBrains Rider Unity inspections via
InspectCode, SonarQube on the generated SLN - Custom rules
- Custom Roslyn analyzers for project conventions (no
GameObject.Find, noResources.Load, MonoBehaviour boundary rules); IL post-processors for deeper checks
Terraform infra
- Security
- Trivy (
trivy config, tfsec merged in) and/or Checkov (broadest rules, native SARIF); KICS alternative - Architecture
- Conftest + OPA/Rego on
terraform show -json plan.json(module composition, tag/region/network constraints); terraform-compliance for BDD; CUE if schema-first - Quality
- tflint (provider plugins),
terraform fmt -check,terraform validate - Custom rules
- Rego in
policy/via Conftest; Checkov custom YAML/Python; tflint custom rules
Helm / Kubernetes manifests
- Security
- kubescape (NSA/CIS/MITRE, SARIF), Trivy k8s/config, Checkov k8s
- Architecture
- Conftest/OPA Rego against rendered manifests (
helm template | conftest test -); required labels, NetworkPolicy, namespace shape. Note: datree is sunset; use Monokle / kubescape. - Quality
- kube-linter (StackRox), Polaris (Fairwinds), kube-score, kubeconform (kubeval deprecated),
helm lint - Custom rules
- Conftest Rego in
policy/; kube-linter custom templates; Polaris custom checks
Python data platform data
- Security
- bandit (or ruff S-rules),
pip-audit/ Snyk, Semgrep py rules - Architecture
- import-linter (layers/forbidden/independence contracts in
.importlinter), pydeps / grimp for graphs;lint-importsin CI - Quality
- Ruff (replaces flake8/isort/pyupgrade/pydocstyle/black-format), pyright or mypy, pylint for deeper smells, vulture (dead code)
- Custom rules
- import-linter contracts; Ruff per-file selectors; Semgrep custom py rules; dbt-checkpoint / sqlfluff if dbt/SQL
domain/ importing infra/"), dedup Ruff/Pylint overlap, rank by pipeline criticality.Stack detection (single-repo driver)
package.json→ Node.dependenciescontains@nestjs/core→ NestJS recipe.nx.json→ Nx variant.*.sln/*.csproj→ .NET. Presence ofProjectSettings/ProjectVersion.txtorAssets/+Packages/manifest.json→ Unity recipe.*.tf/terragrunt.hcl→ Terraform.Chart.yaml→ Helm. Elsekustomization.yamlor any*.yamlwithapiVersion:/kind:→ K8s manifests.pyproject.toml/requirements*.txt→ Python.dbt_project.yml→ dbt overlay.- Tie-break: largest LOC stack wins. Multi-stack repo → run multiple recipes, tag findings with stack.
- Output: each tool writes to
.audit/<stack>/<tool>.sarif; driver merges via SARIF multi-tool runs.
Canonical pipeline
CLAUDE.md + .claude/skills/audit/SKILL.md injected at session start.@codebase for navigation.AUDIT.md canonical → audit.html; audit.sarif sidecar.Whole-repo audit — top picks
TECH_DEBT_AUDIT.md with file-cited findings, severity, effort. Per-repo override via .claude/skills/tech-debt-audit/SKILL.md. MIT.--triage); validates third-party SAST; supports vLLM / Ollama for local models.MR / diff review packet — top picks
/describe /review /improve; hierarchical best_practices.md with per-folder/per-group scope; on-prem GitLab; CLI returns Markdown/JSON./security-review slash command; CLAUDE.md-driven; reads pending diff..greptile/rules.md; CodeRabbit has .coderabbit.yaml learnings; Kody is OSS, model-agnostic.Rulebook ingestion — pick one
Use CLAUDE.md (root) + .claude/skills/<audit>/SKILL.md
Two reasons:
- Claude Code's progressive-loading:
SKILL.mdloads only when the audit runs, so the big rulebook doesn't bloat every session; the smallCLAUDE.mdjust dispatches. - Co-located with the repo, version-controlled, reusable from CLI, GitHub/GitLab Action, or Claude Agent SDK.
For MR review only, Qodo's best_practices.md is more expressive — drive Qodo from the same Markdown via symlink / extra_instructions.
Report format — pick one
Markdown canonical → static HTML render
- Markdown is what every agent (Claude, Qodo, Greptile, codedog, Metis) emits natively, and what GitLab/GitHub render inline.
- Pipe through
pandocor a singlemarked + Prism + diff2htmltemplate for the standalone "review packet" HTML/PDF. - Generating HTML directly from the LLM wastes tokens, breaks diffability, and loses the GitLab MR-comment path.
- Pair with
audit.sarifsidecar for GitHub/GitLab Code Scanning ingestion.
Three ways forward
| A. Integrate heavy | B. Thin custom on top of OSS pieces (recommended) | C. All-custom from scratch | |
|---|---|---|---|
| Whole-repo audit | Qodo self-host "rule system"; or commercial Sonar + CodeScene | tech-debt-skill / fan-out-audit pattern + Claude Agent SDK; analyzers per stack; Metis-style SARIF triage | Hand-rolled prompts + own orchestrator |
| MR / diff packet | GitLab Duo Code Review Flow inline; abandon HTML packet | GitLab API → PR-Agent CLI → diff2html + Jinja → static HTML | Hand-rolled prompts + UI |
| Rulebook | Per-tool config (Qodo, Sonar) | CLAUDE.md + SKILL.md (single source); also feeds PR-Agent best_practices.md | Custom prompt template |
| Engineering cost | Low | Medium | High |
| Lock-in / recurring $ | High | Low (Apache-2.0 / MIT) | None |
| Risk | Vendor roadmap | PR-Agent / skill prompts (forkable) | Prompt-engineering bills + time |
Recommendation
Path B — thin per-repo CLI on top of OSS pieces
Two new sw-audit commands, both single-repo:
sw-audit audit <repo>— detect stack, run stack-native analyzers, loadCLAUDE.mdrulebook, fan out Claude subagents (tech-debt-skill / fan-out-audit pattern), triage SARIF (Metis-style), emitAUDIT.md→ renderaudit.html.sw-audit review <repo> <mr|range>— fetch diff + commits + before/after trees via GitLab API, run PR-Agent CLI with project rulebook, render via diff2html + Jinja into a self-containedreview.htmlpacket.
Main tradeoff: you maintain the glue between detector, analyzers, and LLM layer. Mitigation: each piece is replaceable; rulebook lives in the repo, not in the tool.
Highest-value links
- ksimback/tech-debt-skill — whole-repo Claude Code audit reference
- lachiejames/fan-out-audit — parallel subagent pattern
- arm/metis — SARIF-native LLM triage, local-model support
- anthropics/claude-code-security-review — official diff-review action
- qodo-ai/pr-agent · PR-Agent GitLab self-host
- Qodo Rule System / best_practices.md
- GitLab Duo Code Review Flow (GA 18.8, Sonnet 4.6 in 19.1)
- dependency-cruiser — TS/JS arch SARIF
- ArchUnitNET — .NET arch rules
- import-linter — Python layered contracts
- kube-linter · kubescape
- Checkov · Trivy
- codedog — Markdown review report
- diff2html
- evilsocket/audit — 8-stage Claude-Agent-SDK pipeline