BECKY
arrow_back Home GitHub
DOCUMENTATION v1.0

The Complete Guide

Everything you need to ship software while you sleep.

arrow_back Back to Home

Why you're here

person

You're shipping alone

Solopreneurs juggle product, code, design, QA, and deploy. You can't hire 7 people. But you can deploy 7 agents. Each one is a specialist. Together they're a full engineering team that works while you sleep.

psychology

AI agents forget and lie

Every session starts from zero. Agents report AUDITED work as DONE. The same bug ships again three weeks later. Without structure, AI agents are a liability pretending to be an asset.

shield

Becky gives agents a boss

Rules they can't forget. A verifier they can't override. Memory that compounds. The system gets smarter with every project. Every mistake becomes a rule. Rules compound. Mistakes don't repeat.

AGENT_ROSTER

Meet the team

Seven specialized agents. Each with a distinct lens, a defined role, and the obligation to disagree when they see something others missed.

visibility

Fury

DISCOVERY

"I still believe in heroes."

Sees the whole board. Asks WHY until the real problem surfaces. Writes requirements that are testable, not aspirational.

5 Whys / Elicitation / PRD
account_tree

Strange

ARCHITECTURE

"I went forward in time to view alternate futures."

14 million possible designs. Picks the one that ships. Every decision documented with context, alternatives, and consequences.

Architecture Trace / ADRs
palette

Shuri

EXPERIENCE

"Just because something works doesn't mean it can't be improved."

Bridges innovation and usability. Error states aren't optional. Designs experiences, not screens.

Broken Promise Audit / UX
code

Stark

BUILD

"I am Iron Man."

Writes the code. Tests alongside. Never self-grades. Ultra-succinct -- speaks in file paths and acceptance criteria IDs.

Code Trace / Implementation
bug_report

Widow

TEST

"I've got red in my ledger."

Opens a real browser. Clicks real buttons. Finds what everyone missed. Never grep-audits as testing.

Reproduction Protocol / Playwright
shield

Heimdall

VERIFY

"I can see all nine realms."

The all-seeing guardian. Cannot be overridden. DONE means runtime evidence. If Stark and Heimdall disagree, Heimdall wins.

Pre-mortem / Evidence Escalation
auto_stories

Watcher

MEMORY

"I observe. I record. I do not interfere... unless the stakes demand it."

The chronicler. Maintains institutional memory. Distills knowledge from completed work. How the system remembers.

Pattern Match / Wiki Curation

The Three Workflows

Three modes, one system. Each step has a gate. Nothing advances without verification.

1
FURY | discovery

Product Brief

Fury interviews you. Runs 4-5 elicitation methods. Asks why until the real problem surfaces. Produces a product brief capturing vision, users, success metrics, and scope.

GATE: Founder approves brief
Produces: brief.md
2
SHURI | experience

UX Design

Shuri maps every requirement to a human interaction. User journeys, screen specs, component strategy, accessibility. Error states aren't optional.

GATE: Fury confirms FR coverage
Produces: ux-spec.md
3
STRANGE | architecture

Architecture

Strange designs system architecture from validated PRD. ADRs for every non-obvious choice. Data models, API contracts, integration boundaries.

GATE: Fury + Shuri confirm alignment
Produces: architecture.md, ADRs
4
FURY | planning

Stories

Fury decomposes into epics and stories. Each story file contains everything an agent needs to implement it independently. FR traceability required.

GATE: Readiness check passes
Produces: stories/
5
STARK | build

Implementation

Stark reads the full story spec, referenced FRs, UX spec, and architecture before writing any code. Tests alongside. Never self-grades.

GATE: Tests pass, push checklist clean
Produces: code + tests
6
WIDOW | test

Testing

Widow opens a real browser. Clicks real buttons. Fills real forms. Screenshots every step. Golden path plus top error paths for every feature.

GATE: All critical paths covered
Produces: test-report.md + screenshots
7
HEIMDALL | verify

Verification

Heimdall emits structured verdicts: DONE, VERIFIED, or AUDITED. Cannot be overridden. If the evidence is AUDITED-level, the verdict is AUDITED -- even if Stark claims DONE.

GATE: Verdict filed
Produces: verdict.yaml
8
WATCHER | memory

Knowledge

Watcher compiles knowledge from the completed work. Wiki articles, decision records, incident patterns. The system's institutional memory.

GATE: Index updated
Produces: wiki articles

Every Command Explained

16 commands organized into 4 groups. Each one does exactly one thing.

rocket_launch Getting Started

/becky-onboard

Interactive walkthrough of how Becky works. Covers folder structure, agents, commands, rules engine, and the learning loop. Run this first to understand the system before using it.

Watcher
$ becky onboard
/becky-learn

Import existing documentation into Becky's knowledge base. Reads PRDs, architecture docs, UX specs from any path and populates the wiki so agents have project context from day one.

Watcher
$ becky learn /path/to/existing/docs
/becky-scan

Scans an existing project -- detects BMad, Hermes, and other agent frameworks. Reads planning artifacts, analyzes the codebase, and tells you what's done, what's pending, and where to start. Like a smart colleague who just spent an hour reading through everything.

All agents
$ becky scan .

build Day-to-Day Work

/becky-greenfield

Start a new project from scratch. Creates an 8-phase task folder with discovery through knowledge. The full pipeline from "what are we building?" to "it's documented and verified."

Fury starts
$ becky greenfield "checkout redesign"
/becky-brownfield

Work on existing code. Creates a 7-phase task starting with archaeology -- understand what exists before changing it. Audit, document, plan, intervene, test, verify, learn.

Strange Stark start
$ becky brownfield "fix the checkout bug"
/becky-run

Execute the current phase of the active task. The owning agent reads inputs from known locations and writes outputs to known locations. One phase at a time, manually gated.

Current agent
$ becky run
/becky-approve

Pass the current gate and advance to the next phase. This is the human checkpoint -- you review the output, decide it's good enough, and let the pipeline continue.

You
$ becky approve
/becky-revise

Send feedback and re-run the current phase. The agent gets your feedback as context, goes back, and redoes the work. The gate stays closed until you approve.

Current agent
$ becky revise "add error states for payment failures"

smart_toy Automation

/becky-autopilot

Run all remaining phases unattended. The orchestrator chains agents through their phases, passing outputs to inputs. You come back to a morning brief summarizing everything that happened.

All agents
$ becky autopilot
/becky-assemble

All 7 agents in a war room. For when you're stuck, facing a high-stakes decision, or dealing with a P0. Structured protocol: situation brief, first reads, deep dives, debate, convergence.

All 7 agents
$ becky assemble "checkout flow is silently failing"
/becky-retro

Retrospective on completed work. Extracts what worked, what failed, and what to encode as rules. Produces structured findings that feed back into the learning loop.

Watcher Fury
$ becky retro checkout-redesign

settings System

/becky-status

Dashboard of everything. Shows active tasks, current phase, rule count, wiki articles, memory entries. The single place to see what's happening across all your projects.

$ becky status
/becky-rules-add

Create a new rule with proper frontmatter. Rules live in core/rules/ and are the source of truth. CLAUDE.md and AGENTS.md are compiled outputs. Edit rules, not outputs.

$ becky rules add "no silent error drops"
becky compile

Generate CLAUDE.md and AGENTS.md from the rules in core/rules/. This is the compilation step that turns your rules into runtime-specific instructions for each agent runtime.

$ becky compile
becky verify

Lint rules for issues. Checks for missing fields, duplicates, stale references, and inconsistencies. Think of it as the type checker for your rules engine.

$ becky verify
becky init

Portable installation into any project. Creates a .becky/ workspace and copies all 13 slash commands to the target's .claude/commands/. Also scaffolds core/rules/, wiki/, memory/, tasks/, and agent configs. Detects existing agent frameworks (BMad, Hermes) and suggests running becky scan.

$ becky init /path/to/project

The Overnight Flow

What happens when you type becky autopilot and go to sleep.

8:00 PM
terminal You type the command
$ becky greenfield "checkout redesign"
>> Task folder created with 8 phases
8:01 PM
smart_toy Autopilot engaged
$ becky autopilot
>> Orchestrator starts. Go to sleep.
8:05 PM
PHASE 1 Fury discovers
>> brief.md (12 functional requirements found)
8:45 PM
PHASE 2 Shuri designs
>> ux-spec.md (8 screens, 3 error states)
9:30 PM
PHASE 3 Strange architects
>> architecture.md + 4 ADRs
10:15 PM
PHASE 4 Fury creates stories
>> 14 stories across 3 epics
11:00 PM
PHASE 5 Stark builds
>> 14 stories implemented, 23 tests pass
1:00 AM
PHASE 6 Widow tests
>> 31 Playwright tests, 29 pass, 2 flagged
2:00 AM
PHASE 7 Heimdall verifies
>> VERIFIED (10 DONE, 3 VERIFIED, 1 AUDITED)
2:30 AM
PHASE 8 Watcher documents
>> 5 wiki articles, 1 draft rule
3:00 AM
description Morning brief generated
>> _morning-brief.md complete
>> _status.md complete
>> Retro template created
7:00 AM
coffee You open your laptop

Read _morning-brief.md. "2 flags. 1 draft rule. Review when ready." All 8 phases documented. Every decision recorded. Every output verified.

The Anti-Inflation Standard

Three tiers of truth. Why this matters for solopreneurs: you need to trust what the agents tell you. Self-grading is the root cause of inflated completion reports.

shield

Heimdall holds the line

The agent that builds (Stark) should never be the agent that grades. Heimdall exists because without independent verification, agents consistently report AUDITED work as DONE. Heimdall cannot be overridden. If Stark and Heimdall disagree, Heimdall wins.

State Evidence Required
DONE Runtime artifact: API response body, DB query result, browser screenshot proving the feature works against a real database
VERIFIED Acceptance criteria reviewed line-by-line with code citations (file:line) for each point
AUDITED File existence confirmed, LOC counted, function names match spec. This is the lowest tier -- grep matches, not runtime proof.

Violations

  • x Reporting AUDITED stories as DONE or VERIFIED
  • x "Verified via grep" is AUDITED, not VERIFIED
  • x "tsc clean + tests pass" is not sufficient for DONE
  • x A migration file on disk with no staging verification

The Standard

  • + Runtime evidence is required for DONE -- always
  • + Three tiers reported separately, never combined
  • + Verifier is independent from builder
  • + Apply migrations, then verify, then claim done
verdict.yaml
story: "S-042"
verdict: VERIFIED
evidence:
- ac: "FR-12: Guest can check in"
tier: DONE
proof: "Screenshot of check-in flow, DB row status=checked_in"
- ac: "FR-13: Folio created at check-in"
tier: VERIFIED
proof: "billing-engine.ts:142 -- createFolio called in handleCheckIn"
violations:
- rule: "D-1"
file: "billing-engine.ts:89"
detail: "as any cast without migration reference"
verifier: HEIMDALL

Get started

terminal -- becky_setup
1

git clone https://github.com/evanpaul90/becky.git

2

cd becky

3

npm install

4

npx becky init . # install slash commands + .becky/ workspace

5

npx becky scan . # analyze existing work

6

npx becky greenfield "my first project"

7

npx becky autopilot

8
>> Squad deployed. 7 agents online. Go to sleep.