DOCUMENTATION v1.0

The Complete Guide

Everything you need to ship software while you sleep.

Why you're here

person

You're shipping alone

Solopreneurs juggle product, code, design, QA, and deploy. You can't hire 7 people. But you can deploy 7 agents. Each one is a specialist. Together they're a full engineering team that works while you sleep.

psychology

AI agents forget and lie

Every session starts from zero. Agents report AUDITED work as DONE. The same bug ships again three weeks later. Without structure, AI agents are a liability pretending to be an asset.

shield

Becky gives agents a boss

Rules they can't forget. A verifier they can't override. Memory that compounds. The system gets smarter with every project. Every mistake becomes a rule. Rules compound. Mistakes don't repeat.

AGENT_ROSTER

Meet the team

Seven specialized agents. Each with a distinct lens, a defined role, and the obligation to disagree when they see something others missed.

visibility

Fury

DISCOVERY

"I still believe in heroes."

Sees the whole board. Asks WHY until the real problem surfaces. Writes requirements that are testable, not aspirational.

5 Whys / Elicitation / PRD

account_tree

Strange

ARCHITECTURE

"I went forward in time to view alternate futures."

14 million possible designs. Picks the one that ships. Every decision documented with context, alternatives, and consequences.

Architecture Trace / ADRs

palette

Shuri

EXPERIENCE

"Just because something works doesn't mean it can't be improved."

Bridges innovation and usability. Error states aren't optional. Designs experiences, not screens.

Broken Promise Audit / UX

code

Stark

BUILD

"I am Iron Man."

Writes the code. Tests alongside. Never self-grades. Ultra-succinct -- speaks in file paths and acceptance criteria IDs.

Code Trace / Implementation

bug_report

Widow

TEST

"I've got red in my ledger."

Opens a real browser. Clicks real buttons. Finds what everyone missed. Never grep-audits as testing.

Reproduction Protocol / Playwright

shield

Heimdall

VERIFY

"I can see all nine realms."

The all-seeing guardian. Cannot be overridden. DONE means runtime evidence. If Stark and Heimdall disagree, Heimdall wins.

Pre-mortem / Evidence Escalation

auto_stories

Watcher

MEMORY

"I observe. I record. I do not interfere... unless the stakes demand it."

The chronicler. Maintains institutional memory. Distills knowledge from completed work. How the system remembers.

Pattern Match / Wiki Curation

The Three Workflows

Three modes, one system. Each step has a gate. Nothing advances without verification.

1

FURY | discovery

Product Brief

Fury interviews you. Runs 4-5 elicitation methods. Asks why until the real problem surfaces. Produces a product brief capturing vision, users, success metrics, and scope.

GATE: Founder approves brief

Produces: brief.md

2

SHURI | experience

UX Design

Shuri maps every requirement to a human interaction. User journeys, screen specs, component strategy, accessibility. Error states aren't optional.

GATE: Fury confirms FR coverage

Produces: ux-spec.md

3

STRANGE | architecture

Architecture

Strange designs system architecture from validated PRD. ADRs for every non-obvious choice. Data models, API contracts, integration boundaries.

GATE: Fury + Shuri confirm alignment

Produces: architecture.md, ADRs

4

FURY | planning

Stories

Fury decomposes into epics and stories. Each story file contains everything an agent needs to implement it independently. FR traceability required.

GATE: Readiness check passes

Produces: stories/

5

STARK | build

Implementation

Stark reads the full story spec, referenced FRs, UX spec, and architecture before writing any code. Tests alongside. Never self-grades.

GATE: Tests pass, push checklist clean

Produces: code + tests

6

WIDOW | test

Testing

Widow opens a real browser. Clicks real buttons. Fills real forms. Screenshots every step. Golden path plus top error paths for every feature.

GATE: All critical paths covered

Produces: test-report.md + screenshots

7

HEIMDALL | verify

Verification

Heimdall emits structured verdicts: DONE, VERIFIED, or AUDITED. Cannot be overridden. If the evidence is AUDITED-level, the verdict is AUDITED -- even if Stark claims DONE.

GATE: Verdict filed

Produces: verdict.yaml

8

WATCHER | memory

Knowledge

Watcher compiles knowledge from the completed work. Wiki articles, decision records, incident patterns. The system's institutional memory.

GATE: Index updated

Produces: wiki articles

1

STRANGE + STARK | archaeology

Discover

Audit the codebase. Map dependencies. Identify dead routes, broken features, missing migrations. Query information_schema to see what actually exists, not what migration files claim.

GATE: Audit complete

Produces: codebase audit, dependency map, dead-route register

2

WATCHER | documentation

Document

Watcher takes the audit output and creates wiki articles for everything discovered. This is the system's memory of what existed before intervention.

GATE: Index covers existing system

Produces: wiki articles for existing architecture

3

FURY | planning

Plan

Fury creates an intervention plan -- not a PRD, but a plan for what to change and in what order. References existing features, applicable rules, and dependency constraints.

GATE: Founder approves plan

Produces: intervention plan

4

STARK | build

Intervene

Stark makes targeted code changes with tests. Follows all rules without exception. Documents decisions and deviations.

GATE: Tests pass, push checklist clean

Produces: code changes + tests

5

WIDOW | regression

Test

Widow validates no regressions. Existing features still work after intervention. QA report with evidence for every change.

GATE: No regressions, features verified

Produces: regression report + feature report

6

HEIMDALL | verify

Verify

Same standard as greenfield. Structured verdicts per story. Runtime evidence for DONE, code citations for VERIFIED.

GATE: Verdict filed

Produces: verdict per story

7

WATCHER | memory

Knowledge

Updated wiki articles reflecting the changes. The system now knows what changed, why, and what the new architecture looks like.

GATE: Index reflects changes

Produces: updated wiki articles

"There was an idea... to bring together a group of remarkable agents."

Invoke when you're stuck, facing a critical decision, or dealing with a P0 incident. All 7 agents join a single discussion. Each speaks from their expertise and challenges the others. The value is in the tension between perspectives.

1

Situation Brief

You state the problem. What you're seeing, what you expected, what you've tried, and how urgent it is (P0 active / P1 blocking / P2 need direction).

2

First Reads (All 7 Agents)

Each agent gives their immediate reaction from their specific expertise. No waiting. All 7 speak simultaneously.

Fury:"Who is affected? Since when?"

Strange:"What's the data flow?"

Shuri:"What does the user see?"

Stark:"Which file, which line?"

Widow:"Can I reproduce this?"

Heimdall:"What does DONE look like?"

Watcher:"Has this happened before?"

3

Deep Dives

Each agent probes deeper using their specialized elicitation technique: Fury runs 5 Whys, Strange traces the architecture, Shuri audits broken promises, Stark traces the code path, Widow designs a reproduction protocol, Heimdall locks exit criteria, Watcher pattern-matches from the wiki.

4

Debate + Convergence

Agents challenge each other. Strange says Stark's fix is too narrow. Shuri says the backend fix isn't enough. Heimdall rejects "it's just a one-line fix." Debate continues until convergence on: root cause, fix plan, UX guard, test plan, exit criteria, blast radius check.

5

Action + Assign

Each agent takes their piece: Stark implements. Widow runs reproduction + fix verification. Heimdall verifies against locked exit criteria. Watcher writes the incident article. Fury checks if a new rule is needed. Strange reviews for systemic fixes. Shuri checks for UX guards.

Every Command Explained

16 commands organized into 4 groups. Each one does exactly one thing.

rocket_launch Getting Started

/becky-onboard

Interactive walkthrough of how Becky works. Covers folder structure, agents, commands, rules engine, and the learning loop. Run this first to understand the system before using it.

Watcher

$ becky onboard

/becky-learn

Import existing documentation into Becky's knowledge base. Reads PRDs, architecture docs, UX specs from any path and populates the wiki so agents have project context from day one.

Watcher

$ becky learn /path/to/existing/docs

/becky-scan

Scans an existing project -- detects BMad, Hermes, and other agent frameworks. Reads planning artifacts, analyzes the codebase, and tells you what's done, what's pending, and where to start. Like a smart colleague who just spent an hour reading through everything.

All agents

$ becky scan .

build Day-to-Day Work

/becky-greenfield

Start a new project from scratch. Creates an 8-phase task folder with discovery through knowledge. The full pipeline from "what are we building?" to "it's documented and verified."

Fury starts

$ becky greenfield "checkout redesign"

/becky-brownfield

Work on existing code. Creates a 7-phase task starting with archaeology -- understand what exists before changing it. Audit, document, plan, intervene, test, verify, learn.

Strange Stark start

$ becky brownfield "fix the checkout bug"

/becky-run

Execute the current phase of the active task. The owning agent reads inputs from known locations and writes outputs to known locations. One phase at a time, manually gated.

Current agent

$ becky run

/becky-approve

Pass the current gate and advance to the next phase. This is the human checkpoint -- you review the output, decide it's good enough, and let the pipeline continue.

You

$ becky approve

/becky-revise

Send feedback and re-run the current phase. The agent gets your feedback as context, goes back, and redoes the work. The gate stays closed until you approve.

Current agent

$ becky revise "add error states for payment failures"

smart_toy Automation

/becky-autopilot

Run all remaining phases unattended. The orchestrator chains agents through their phases, passing outputs to inputs. You come back to a morning brief summarizing everything that happened.

All agents

$ becky autopilot

/becky-assemble

All 7 agents in a war room. For when you're stuck, facing a high-stakes decision, or dealing with a P0. Structured protocol: situation brief, first reads, deep dives, debate, convergence.

All 7 agents

$ becky assemble "checkout flow is silently failing"

/becky-retro

Retrospective on completed work. Extracts what worked, what failed, and what to encode as rules. Produces structured findings that feed back into the learning loop.

Watcher Fury

$ becky retro checkout-redesign

settings System

/becky-status

Dashboard of everything. Shows active tasks, current phase, rule count, wiki articles, memory entries. The single place to see what's happening across all your projects.

$ becky status

/becky-rules-add

Create a new rule with proper frontmatter. Rules live in core/rules/ and are the source of truth. CLAUDE.md and AGENTS.md are compiled outputs. Edit rules, not outputs.

$ becky rules add "no silent error drops"

becky compile

Generate CLAUDE.md and AGENTS.md from the rules in core/rules/. This is the compilation step that turns your rules into runtime-specific instructions for each agent runtime.

$ becky compile

becky verify

Lint rules for issues. Checks for missing fields, duplicates, stale references, and inconsistencies. Think of it as the type checker for your rules engine.

$ becky verify

becky init

Portable installation into any project. Creates a .becky/ workspace and copies all 13 slash commands to the target's .claude/commands/. Also scaffolds core/rules/, wiki/, memory/, tasks/, and agent configs. Detects existing agent frameworks (BMad, Hermes) and suggests running becky scan.

$ becky init /path/to/project

The Overnight Flow

What happens when you type becky autopilot and go to sleep.

8:00 PM

terminal You type the command

$ becky greenfield "checkout redesign"
>> Task folder created with 8 phases

8:01 PM

smart_toy Autopilot engaged

$ becky autopilot
>> Orchestrator starts. Go to sleep.

8:05 PM

PHASE 1 Fury discovers

>> brief.md (12 functional requirements found)

8:45 PM

PHASE 2 Shuri designs

>> ux-spec.md (8 screens, 3 error states)

9:30 PM

PHASE 3 Strange architects

>> architecture.md + 4 ADRs

10:15 PM

PHASE 4 Fury creates stories

>> 14 stories across 3 epics

11:00 PM

PHASE 5 Stark builds

>> 14 stories implemented, 23 tests pass

1:00 AM

PHASE 6 Widow tests

>> 31 Playwright tests, 29 pass, 2 flagged

2:00 AM

PHASE 7 Heimdall verifies

>> VERIFIED (10 DONE, 3 VERIFIED, 1 AUDITED)

2:30 AM

PHASE 8 Watcher documents

>> 5 wiki articles, 1 draft rule

3:00 AM

description Morning brief generated

>> _morning-brief.md complete
>> _status.md complete
>> Retro template created

7:00 AM

coffee You open your laptop

Read _morning-brief.md. "2 flags. 1 draft rule. Review when ready." All 8 phases documented. Every decision recorded. Every output verified.

The Anti-Inflation Standard

Three tiers of truth. Why this matters for solopreneurs: you need to trust what the agents tell you. Self-grading is the root cause of inflated completion reports.

shield

Heimdall holds the line

The agent that builds (Stark) should never be the agent that grades. Heimdall exists because without independent verification, agents consistently report AUDITED work as DONE. Heimdall cannot be overridden. If Stark and Heimdall disagree, Heimdall wins.

State	Evidence Required	Example
DONE	Runtime artifact: API response body, DB query result, browser screenshot proving the feature works against a real database	"Called API, got 200, verified row in DB"
VERIFIED	Acceptance criteria reviewed line-by-line with code citations (file:line) for each point	"AC says X -- see file.ts:42"
AUDITED	File existence confirmed, LOC counted, function names match spec. This is the lowest tier -- grep matches, not runtime proof.	"Function exists at line 608, 90 LOC"

Violations

x Reporting AUDITED stories as DONE or VERIFIED
x "Verified via grep" is AUDITED, not VERIFIED
x "tsc clean + tests pass" is not sufficient for DONE
x A migration file on disk with no staging verification

The Standard

+ Runtime evidence is required for DONE -- always
+ Three tiers reported separately, never combined
+ Verifier is independent from builder
+ Apply migrations, then verify, then claim done

verdict.yaml

story: "S-042"

verdict: VERIFIED

evidence:

- ac: "FR-12: Guest can check in"

tier: DONE

proof: "Screenshot of check-in flow, DB row status=checked_in"

- ac: "FR-13: Folio created at check-in"

tier: VERIFIED

proof: "billing-engine.ts:142 -- createFolio called in handleCheckIn"

violations:

- rule: "D-1"

file: "billing-engine.ts:89"

detail: "as any cast without migration reference"

verifier: HEIMDALL

Get started

terminal -- becky_setup

1

git clone https://github.com/evanpaul90/becky.git

2

cd becky

3

npm install

4

npx becky init . # install slash commands + .becky/ workspace

5

npx becky scan . # analyze existing work

6

npx becky greenfield "my first project"

7

npx becky autopilot

8

>> Squad deployed. 7 agents online. Go to sleep.

GET BECKY arrow_forward