Everything you need to ship software while you sleep.
arrow_back Back to HomeSolopreneurs juggle product, code, design, QA, and deploy. You can't hire 7 people. But you can deploy 7 agents. Each one is a specialist. Together they're a full engineering team that works while you sleep.
Every session starts from zero. Agents report AUDITED work as DONE. The same bug ships again three weeks later. Without structure, AI agents are a liability pretending to be an asset.
Rules they can't forget. A verifier they can't override. Memory that compounds. The system gets smarter with every project. Every mistake becomes a rule. Rules compound. Mistakes don't repeat.
Seven specialized agents. Each with a distinct lens, a defined role, and the obligation to disagree when they see something others missed.
"I still believe in heroes."
Sees the whole board. Asks WHY until the real problem surfaces. Writes requirements that are testable, not aspirational.
"I went forward in time to view alternate futures."
14 million possible designs. Picks the one that ships. Every decision documented with context, alternatives, and consequences.
"Just because something works doesn't mean it can't be improved."
Bridges innovation and usability. Error states aren't optional. Designs experiences, not screens.
"I am Iron Man."
Writes the code. Tests alongside. Never self-grades. Ultra-succinct -- speaks in file paths and acceptance criteria IDs.
"I've got red in my ledger."
Opens a real browser. Clicks real buttons. Finds what everyone missed. Never grep-audits as testing.
"I can see all nine realms."
The all-seeing guardian. Cannot be overridden. DONE means runtime evidence. If Stark and Heimdall disagree, Heimdall wins.
"I observe. I record. I do not interfere... unless the stakes demand it."
The chronicler. Maintains institutional memory. Distills knowledge from completed work. How the system remembers.
Three modes, one system. Each step has a gate. Nothing advances without verification.
Fury interviews you. Runs 4-5 elicitation methods. Asks why until the real problem surfaces. Produces a product brief capturing vision, users, success metrics, and scope.
Shuri maps every requirement to a human interaction. User journeys, screen specs, component strategy, accessibility. Error states aren't optional.
Strange designs system architecture from validated PRD. ADRs for every non-obvious choice. Data models, API contracts, integration boundaries.
Fury decomposes into epics and stories. Each story file contains everything an agent needs to implement it independently. FR traceability required.
Stark reads the full story spec, referenced FRs, UX spec, and architecture before writing any code. Tests alongside. Never self-grades.
Widow opens a real browser. Clicks real buttons. Fills real forms. Screenshots every step. Golden path plus top error paths for every feature.
Heimdall emits structured verdicts: DONE, VERIFIED, or AUDITED. Cannot be overridden. If the evidence is AUDITED-level, the verdict is AUDITED -- even if Stark claims DONE.
Watcher compiles knowledge from the completed work. Wiki articles, decision records, incident patterns. The system's institutional memory.
16 commands organized into 4 groups. Each one does exactly one thing.
/becky-onboard
Interactive walkthrough of how Becky works. Covers folder structure, agents, commands, rules engine, and the learning loop. Run this first to understand the system before using it.
/becky-learn
Import existing documentation into Becky's knowledge base. Reads PRDs, architecture docs, UX specs from any path and populates the wiki so agents have project context from day one.
/becky-scan
Scans an existing project -- detects BMad, Hermes, and other agent frameworks. Reads planning artifacts, analyzes the codebase, and tells you what's done, what's pending, and where to start. Like a smart colleague who just spent an hour reading through everything.
/becky-greenfield
Start a new project from scratch. Creates an 8-phase task folder with discovery through knowledge. The full pipeline from "what are we building?" to "it's documented and verified."
/becky-brownfield
Work on existing code. Creates a 7-phase task starting with archaeology -- understand what exists before changing it. Audit, document, plan, intervene, test, verify, learn.
/becky-run
Execute the current phase of the active task. The owning agent reads inputs from known locations and writes outputs to known locations. One phase at a time, manually gated.
/becky-approve
Pass the current gate and advance to the next phase. This is the human checkpoint -- you review the output, decide it's good enough, and let the pipeline continue.
/becky-revise
Send feedback and re-run the current phase. The agent gets your feedback as context, goes back, and redoes the work. The gate stays closed until you approve.
/becky-autopilot
Run all remaining phases unattended. The orchestrator chains agents through their phases, passing outputs to inputs. You come back to a morning brief summarizing everything that happened.
/becky-assemble
All 7 agents in a war room. For when you're stuck, facing a high-stakes decision, or dealing with a P0. Structured protocol: situation brief, first reads, deep dives, debate, convergence.
/becky-retro
Retrospective on completed work. Extracts what worked, what failed, and what to encode as rules. Produces structured findings that feed back into the learning loop.
/becky-status
Dashboard of everything. Shows active tasks, current phase, rule count, wiki articles, memory entries. The single place to see what's happening across all your projects.
/becky-rules-add
Create a new rule with proper frontmatter. Rules live in core/rules/ and are the source of truth. CLAUDE.md and AGENTS.md are compiled outputs. Edit rules, not outputs.
becky compile
Generate CLAUDE.md and AGENTS.md from the rules in core/rules/. This is the compilation step that turns your rules into runtime-specific instructions for each agent runtime.
becky verify
Lint rules for issues. Checks for missing fields, duplicates, stale references, and inconsistencies. Think of it as the type checker for your rules engine.
becky init
Portable installation into any project. Creates a .becky/ workspace and copies all 13 slash commands to the target's .claude/commands/. Also scaffolds core/rules/, wiki/, memory/, tasks/, and agent configs. Detects existing agent frameworks (BMad, Hermes) and suggests running becky scan.
What happens when you type becky autopilot and go to sleep.
Read _morning-brief.md. "2 flags. 1 draft rule. Review when ready." All 8 phases documented. Every decision recorded. Every output verified.
Three tiers of truth. Why this matters for solopreneurs: you need to trust what the agents tell you. Self-grading is the root cause of inflated completion reports.
The agent that builds (Stark) should never be the agent that grades. Heimdall exists because without independent verification, agents consistently report AUDITED work as DONE. Heimdall cannot be overridden. If Stark and Heimdall disagree, Heimdall wins.
| State | Evidence Required |
|---|---|
| DONE | Runtime artifact: API response body, DB query result, browser screenshot proving the feature works against a real database |
| VERIFIED | Acceptance criteria reviewed line-by-line with code citations (file:line) for each point |
| AUDITED | File existence confirmed, LOC counted, function names match spec. This is the lowest tier -- grep matches, not runtime proof. |
git clone https://github.com/evanpaul90/becky.git
cd becky
npm install
npx becky init . # install slash commands + .becky/ workspace
npx becky scan . # analyze existing work
npx becky greenfield "my first project"
npx becky autopilot