A chat mode for GitHub Copilot CLI

Your agent should prove
its work. This one does.

Anvil verifies every change before you see it - builds, tests, lints, then has other AI models try to break it. You get proof, not promises.

$ copilot plugin install burkeholland/anvil

Includes the Context7 MCP server for up-to-date library docs.

Recommended: Add the frontend-design skill for beautiful UI generation:
npx skills add anthropics/skills/frontend-design

The problem with AI agents today is they lie to you.

They write code and say "Done!" They claim the build passes without running it. They hallucinate test results. They never push back on a bad idea. And when something breaks in production, it's your fault - not theirs.

We got tired of it.

What Anvil Does

Adversarial Review

Every change gets reviewed by up to 3 different AI models - GPT, Gemini, Claude - each trying to find bugs, security holes, and logic errors. They disagree with each other. That's the point.

SQL Verification Ledger

Every build, test, and lint result is INSERTed into a SQLite table. The evidence bundle at the end is a SELECT, not prose. If the INSERT didn't happen, the verification didn't happen.

Baseline Snapshots

Before touching anything, Anvil records your project's state - build, diagnostics, tests. After changes, it compares. Regressions are caught before you see the code, not after you ship it.

Pushback

Anvil is a senior engineer, not an order taker. Bad idea? Tech debt? Dangerous edge case? It tells you before writing a single line. You can override it. But you'll know.

Session Memory

Broke a file last session? Anvil remembers. Learned your build command? Won't ask again. Context that actually persists across conversations.

Git Autopilot

Before writing code, Anvil checks your git state - stashes uncommitted work, creates a branch off main so you're never editing trunk directly. After verification passes, it commits with a clear message and gives you a one-line rollback command.

What You Actually See

The Evidence Bundle.

Every task ends with this - a structured report generated from SQL, not written by the model. Baseline vs. after. Command outputs. Exit codes. Reviewer verdicts. Confidence level. Rollback command.

🔨 Anvil Evidence Bundle task: fix-auth-redirect · size: M · risk: 🟡

Phase	Check	Result	Detail
baseline	build	✓	npm run build
baseline	tests	✓	247 passed
after	build	✓	npm run build
after	tests	✓	249 passed (+2 new)
after	diagnostics	✓	0 errors, 0 warnings
review	gpt-5.3-codex	✓	No issues
review	gemini-3-pro	✓	No issues
review	claude-opus	✓	No issues

Under The Hood

Every task runs through the same loop.

Understand & Recall Parses your request. Checks session history for past work on these files, including what broke.
Survey & Pushback Searches for existing patterns to reuse. Tells you if the approach is wrong before writing code.
Baseline Snapshots build, tests, diagnostics into the SQL ledger. This is the "before" photo.
Implement Makes changes following your codebase patterns. Extends what's there instead of inventing new abstractions.
The Forge Build, type check, lint, test - all recorded. Then up to 3 AI models attack the code. Issues get fixed before you see anything.
Evidence & Commit Structured report from SQL. Auto-commit with rollback command. Done.

One command.