A chat mode for GitHub Copilot CLI

Your agent should prove
its work. This one does.

Anvil verifies every change before you see it - builds, tests, lints, then has other AI models try to break it. You get proof, not promises.

$ copilot plugin install burkeholland/anvil

Includes the Context7 MCP server for up-to-date library docs.

Recommended: Add the frontend-design skill for beautiful UI generation:
npx skills add anthropics/skills/frontend-design

The problem with AI agents today is they lie to you.

They write code and say "Done!" They claim the build passes without running it. They hallucinate test results. They never push back on a bad idea. And when something breaks in production, it's your fault - not theirs.

We got tired of it.

Adversarial Review

Every change gets reviewed by up to 3 different AI models - GPT, Gemini, Claude - each trying to find bugs, security holes, and logic errors. They disagree with each other. That's the point.

SQL Verification Ledger

Every build, test, and lint result is INSERTed into a SQLite table. The evidence bundle at the end is a SELECT, not prose. If the INSERT didn't happen, the verification didn't happen.

Baseline Snapshots

Before touching anything, Anvil records your project's state - build, diagnostics, tests. After changes, it compares. Regressions are caught before you see the code, not after you ship it.

Pushback

Anvil is a senior engineer, not an order taker. Bad idea? Tech debt? Dangerous edge case? It tells you before writing a single line. You can override it. But you'll know.

Session Memory

Broke a file last session? Anvil remembers. Learned your build command? Won't ask again. Context that actually persists across conversations.

Git Autopilot

Before writing code, Anvil checks your git state - stashes uncommitted work, creates a branch off main so you're never editing trunk directly. After verification passes, it commits with a clear message and gives you a one-line rollback command.

The Evidence Bundle.

Every task ends with this - a structured report generated from SQL, not written by the model. Baseline vs. after. Command outputs. Exit codes. Reviewer verdicts. Confidence level. Rollback command.

๐Ÿ”จ Anvil Evidence Bundle task: fix-auth-redirect ยท size: M ยท risk: ๐ŸŸก
PhaseCheckResultDetail
baselinebuildโœ“npm run build
baselinetestsโœ“247 passed
afterbuildโœ“npm run build
aftertestsโœ“249 passed (+2 new)
afterdiagnosticsโœ“0 errors, 0 warnings
reviewgpt-5.3-codexโœ“No issues
reviewgemini-3-proโœ“No issues
reviewclaude-opusโœ“No issues

Every task runs through the same loop.

  1. Understand & Recall Parses your request. Checks session history for past work on these files, including what broke.
  2. Survey & Pushback Searches for existing patterns to reuse. Tells you if the approach is wrong before writing code.
  3. Baseline Snapshots build, tests, diagnostics into the SQL ledger. This is the "before" photo.
  4. Implement Makes changes following your codebase patterns. Extends what's there instead of inventing new abstractions.
  5. The Forge Build, type check, lint, test - all recorded. Then up to 3 AI models attack the code. Issues get fixed before you see anything.
  6. Evidence & Commit Structured report from SQL. Auto-commit with rollback command. Done.

One command.

$ copilot plugin install burkeholland/anvil

Then run copilot and select the Anvil agent.

Recommended: npx skills add anthropics/skills/frontend-design