Gru

CI License: Apache 2.0

Gru turns GitHub issues into merged PRs — autonomously, locally, with the AI coding agent of your choice.

Point it at an issue and it handles the rest: implementation, PR, code review, CI fixes, rebases — all in an isolated worktree that never touches your working directory.

Animated terminal demo: running "gru do <issue>" — Gru fetches the issue, spawns an agent, and opens a PR autonomously

Gru is agent-agnostic. It ships with backends for Claude Code and OpenAI Codex, and its pluggable architecture makes it straightforward to add more.

Quick Start

# Install prebuilt binary (macOS Apple Silicon)
curl -fL https://github.com/fotoetienne/gru/releases/latest/download/gru-aarch64-apple-darwin.tar.gz | tar xz
sudo mv gru /usr/local/bin/

# Or install from crates.io (all platforms, requires Rust 1.73+)
cargo install gru

# Initialize a repo
gru init owner/repo

# Fix an issue — Gru handles the rest
gru do 42

Prebuilt binaries for macOS x86_64 and Linux are on the Releases page.

For a full walkthrough, see Getting Started.

How It Works

  1. gru init owner/repo creates a bare git mirror at ~/.gru/repos/
  2. gru do 42 creates an isolated worktree, spawns the agent, and monitors progress via streaming JSON
  3. The agent reads the issue, explores the code, makes changes, and runs tests
  4. After committing, a code-reviewer subagent checks for correctness, security, and convention issues before the PR is created
  5. Gru opens a PR, watches CI, and feeds failures back to the agent for auto-fix (up to 2 attempts before escalating)
  6. Review comments are forwarded to the agent for responses
  7. Labels (gru:todogru:in-progressgru:done / gru:failed) track state on GitHub

Getting Started with Gru

This guide walks you from zero to your first autonomous PR in about 10 minutes.

Prerequisites

Before you start, you need:

  1. Rust (1.73 or later)

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    rustc --version  # should print 1.73+
    
  2. GitHub CLI (gh) — authenticated

    brew install gh       # or see https://cli.github.com/
    gh auth login         # follow the prompts
    gh auth status        # confirm: "Logged in to github.com"
    
  3. Claude Code (default agent backend)

    npm install -g @anthropic-ai/claude-code
    claude --version      # confirm it's installed
    

    Using OpenAI Codex instead? See AGENTS.md.

Install Gru

git clone https://github.com/fotoetienne/gru.git
cd gru
cargo install --path .
gru --version   # confirm: prints version number

The gru binary is now at ~/.cargo/bin/gru. Make sure ~/.cargo/bin is on your $PATH.

Initialize a Repo

Tell Gru about the repo you want to automate:

gru init owner/repo

This creates a bare git mirror at ~/.gru/repos/owner/repo.git/. Gru uses bare repos so Minions can work in isolated worktrees without interfering with each other or your own working directory.

You only need to run gru init once per repo.

Label an Issue

Gru picks up issues labeled gru:todo. Find a small, well-scoped issue in your repo and add the label:

gh issue edit 42 --add-label "gru:todo" --repo owner/repo

Note: gru init automatically creates all required gru:* labels when you initialize a repo, so you normally don't need to create them manually.

Tip: Start with a small bug fix or docs update. A focused issue with clear acceptance criteria gives Gru the best chance of getting it right on the first try.

Run gru do

Now kick off the agent:

# From inside the repo
gru do 42

# Or from anywhere with a full URL
gru do https://github.com/owner/repo/issues/42

Here's what happens:

  1. Worktree created — Gru creates an isolated git checkout at ~/.gru/work/owner/repo/minion/issue-42-M001/checkout/. Your working directory is untouched.
  2. Agent spawned — Claude Code starts in that worktree with the issue as its task. It has full tool access: reading/writing files, running tests, calling the GitHub API.
  3. PR created — When the agent is satisfied, it opens a draft PR and continues monitoring.
  4. CI and reviews handled — Gru watches for CI failures and review comments, feeding them back to the agent for fixes.

The terminal shows a live progress stream. You'll see tool calls, test runs, and status updates as they happen.

Monitor Progress

You can check in on running Minions at any time:

gru status          # list all active Minions

Output looks like:

MINION   AGENT    REPO        ISSUE    TASK  PR    BRANCH                MODE                   UPTIME   TOKENS
M001     claude   owner/repo  #42      -     #43   minion/issue-42-M001  monitoring (PR ready)  5m       1.2M

Attach to a running Minion to see the live stream:

gru attach M001

Press Ctrl-C to exit the interactive session — the Minion is paused. Run gru resume M001 to let it continue working autonomously.

If you need to pause:

gru stop M001       # stop the Minion
gru resume M001     # pick up where it left off

See the Result

Once the agent finishes its work, you'll have a PR open against your repo. Gru marks it ready for review when CI passes and any requested changes are addressed.

gh pr view 43 --repo owner/repo --web   # open in browser

The PR description includes a summary of what changed and a test plan generated by the agent. Review it, request changes if needed — Gru will handle the responses.

When the PR is merged, the worktree is cleaned up automatically. Or clean up manually:

gru clean   # removes worktrees for merged/closed PRs

Next Steps

  • GitHub Enterprise Server — Using GHES instead of github.com? See docs/GHES_SETUP.md for a step-by-step walkthrough.
  • Configure defaults~/.gru/config.toml lets you set a default agent backend, polling intervals, merge thresholds, and more. Copy docs/config.example.toml to ~/.gru/config.toml and uncomment the sections you need. The example file includes annotated explanations of every option.
  • Lab mode — Run gru lab to let Gru continuously pick up gru:todo issues and work on them unattended. Useful for letting it run overnight.
  • Multiple backends — Use gru do 42 --agent codex to use OpenAI Codex instead of Claude. See docs/AGENTS.md for setup.
  • Architecture — Curious how it all fits together? docs/DESIGN.md covers the full system design.

Gru Concepts

Gru runs AI agents that autonomously fix GitHub issues end-to-end. Here is the mental model.


Minions

A Minion is an agent session with a unique ID (M000, M001, M002…). When you run gru do 42, Gru creates a Minion, assigns it issue #42, and lets it run autonomously. Each Minion handles the full lifecycle: claim the issue, implement a fix, open a PR, monitor CI, and respond to review comments — with no further input from you.

If a Minion fails and you retry, a fresh Minion with a new ID takes over.

Worktrees

Each Minion works in an isolated git worktree under ~/.gru/work/. Your main working directory is never touched. Multiple Minions can work in parallel on different issues without stepping on each other.

~/.gru/work/owner/repo/minion/issue-42-M001/
└── checkout/   ← the actual git worktree

The worktree's branch is named minion/issue-42-M001. Worktrees persist after the PR merges; run gru clean to remove them.

Labels as State

GitHub labels are Gru's state machine. Gru looks for labels to decide what to do next.

Issue labels:

LabelMeaning
gru:todoReady for a Minion to pick up
gru:in-progressClaimed by a Minion, work underway
gru:donePR opened, agent finished
gru:failedMinion gave up, needs human review
gru:blockedMinion hit a wall, needs human input

PR labels:

LabelMeaning
gru:ready-to-mergeAll checks passed, awaiting merge
gru:auto-mergeMerge will happen automatically when checks pass
gru:needs-human-reviewMinion escalated; human sign-off required

Add gru:todo to an issue to queue it. Remove it to dequeue.

GitHub as Database

Gru has no external database. Everything lives in GitHub:

  • Issues are the task queue
  • Labels are the state
  • PRs are the output
  • Comments are the logs

This means Gru's task state is visible in the GitHub UI, survives restarts, and requires no setup beyond gh auth. (Some local runtime metadata — Minion IDs, registry — lives under ~/.gru/state/.)

Lab Mode

gru lab runs Gru as a daemon. It polls your configured repositories every 30 seconds by default, picks up any issue labeled gru:todo, and spawns a Minion for it. Once running, it claims and works any gru:todo issue without further input.

gru lab   # watches all repos in ~/.gru/config.toml

You configure which repositories to watch in ~/.gru/config.toml. Run gru init in a repo directory to register it.

Agent Backends

Gru's orchestration is backend-agnostic. The default backend is Claude Code CLI, but you can configure others (e.g., OpenAI Codex) in ~/.gru/config.toml. The same lifecycle — claim, implement, PR, monitor, review — runs regardless of which LLM is doing the work.

The Lifecycle

Issue labeled gru:todo
        │
        ▼
Minion claims issue (gru:in-progress)
        │
        ▼
Worktree created, agent implements fix
        │
        ▼
PR opened (gru:done) ──► CI monitored ◄──────────────────────┐
                                │                             │
                                ├─ CI fails ──► auto-fix      │
                                │              attempted      │
                                │              (2x max)       │
                                │              │              │
                                │              └─ still fails │
                                │                 ──► gru:blocked
                                │
                                ▼
                        Review comments handled
                                │
                                ├─ Blocked ──► gru:blocked + escalation
                                │
                                ├─ Changes requested ──► push fix ──────┘
                                │                        (re-run CI)
                                │
                                ▼
                        All checks pass + approved
                                │
                                ▼
                            PR merged

If anything goes unresolvably wrong, the Minion labels the issue gru:blocked or gru:failed and typically leaves a comment explaining what it needs.

Agent Backends

Gru uses a pluggable agent architecture. Each backend implements the AgentBackend trait, which normalizes different CLI tools into a common event stream that Gru can monitor, log, and act on.

Available Backends

BackendCLI ToolFlag ValueStatus
Claude Codeclaude--agent claudeDefault
OpenAI Codexcodex--agent codexSupported

Claude Code (default)

Claude Code is the default backend.

Install

npm install -g @anthropic-ai/claude-code

Verify

claude --version
claude --help

Configure

No configuration is required — Gru uses Claude Code by default. Optionally override the binary path in ~/.gru/config.toml:

[agent.claude]
binary = "/usr/local/bin/claude"

How Gru Uses It

Gru spawns Claude Code in non-interactive mode with stream JSON output:

claude --print --verbose --session-id <uuid> --output-format stream-json --dangerously-skip-permissions --include-partial-messages "<prompt>"

Key flags:

  • --print — non-interactive (prints to stdout and exits)
  • --verbose — include tool calls in output
  • --output-format stream-json — real-time event stream
  • --dangerously-skip-permissions — autonomous operation
  • --session-id <uuid> — maintain context across resumes

OpenAI Codex

Codex CLI is an alternative backend using OpenAI models.

Install

npm install -g @openai/codex

Authenticate

Set your OpenAI API key:

export OPENAI_API_KEY="sk-..."

Verify

codex --version
codex --help

How Gru Uses It

Gru spawns Codex in full-auto mode with JSON output:

codex exec --json --full-auto "<prompt>"

Resume support uses:

codex exec resume --last --json --full-auto "<prompt>"

Note: Codex does not support interactive resume (gru attach will not work with Codex minions). Codex also ignores the session_id parameter — it relies on its own session persistence for both new and resumed sessions.

Selecting a Backend

Per-command

Use the --agent flag on any command that spawns an agent:

gru do 42 --agent codex
gru review 42 --agent codex
gru prompt my-prompt --agent codex

As default

Set the default in ~/.gru/config.toml:

[agent]
default = "codex"

The --agent flag always overrides the config default.

Feature Comparison

FeatureClaude CodeCodex
Autonomous work (gru do)YesYes
PR review (gru review)YesYes
Custom prompts (gru prompt)YesYes
Session resume (gru resume)YesYes (non-interactive)
Interactive attach (gru attach)YesNo
Token usage trackingYesYes
Stream monitoringYesYes

Adding a New Backend

To add a new agent backend:

  1. Create src/<name>_backend.rs implementing the AgentBackend trait from src/agent.rs
  2. Register it in src/agent_registry.rs (add to AVAILABLE_AGENTS and the match in resolve_backend)
  3. Map the backend's output format to AgentEvent variants in parse_events()

The AgentBackend trait requires:

  • name() — human-readable identifier
  • build_command() — construct the CLI command for a new session
  • parse_events() — convert stdout lines to normalized AgentEvents
  • build_resume_command() — (optional) construct command to resume a session
  • build_interactive_resume_command() — (optional) construct command for gru attach; return None to disable attach support

GitHub Enterprise Server (GHES) Setup

This guide walks you through configuring Gru to work with a GitHub Enterprise Server instance. If you're using github.com, see GETTING_STARTED.md instead.

Prerequisites

Complete the basic Getting Started prerequisites first (Rust, Claude Code). Then come back here for GHES-specific setup.

Step 1: Authenticate gh CLI with your GHES host

Gru delegates all GitHub API calls to the gh CLI, so you need gh authenticated against your GHES instance.

gh auth login --hostname github.netflix.com

Follow the prompts to authenticate. When asked about protocol, HTTPS is recommended for GHES.

Verify it worked:

gh auth status --hostname github.netflix.com

You should see output like:

github.netflix.com
  ✓ Logged in to github.netflix.com account yourname

Multi-host authentication

The gh CLI supports being logged into multiple hosts simultaneously. If you use both github.com and a GHES instance:

# Authenticate to both
gh auth login                                  # github.com
gh auth login --hostname github.netflix.com       # GHES

# Verify both
gh auth status

Gru sets GH_HOST on every gh CLI invocation, so the correct host is always targeted — you don't need to worry about which is the "default".

Token scope requirements

Your token needs the following scopes:

ScopeWhy
repoRead/write access to repositories, issues, and PRs
read:orgDiscover repos and check org membership

If you authenticate interactively with gh auth login, the default scopes are usually sufficient. If you're using a personal access token, make sure it includes the scopes above.

To verify your token's scopes without exposing the token value, use the GitHub API — the response headers include X-OAuth-Scopes:

gh api --hostname github.netflix.com /user --include 2>&1 | grep -i x-oauth-scopes

Alternatively, navigate to your GHES instance → Settings → Developer settings → Personal access tokens to view scopes in the UI.

Step 2: Configure ~/.gru/config.toml

Create or edit ~/.gru/config.toml and add a named host entry:

[github_hosts.netflix]
host = "github.netflix.com"

The name (netflix in this example) is your shorthand — you'll use it when referencing repos.

Optional: web_url

If the GHES web UI lives on a different domain than the API/git host (uncommon), set web_url. Note that host is a bare hostname while web_url is a full URL including scheme:

[github_hosts.netflix]
host = "github.netflix.com"
web_url = "https://github-web.netflix.com"

Most GHES installations don't need this.

Step 3: Reference repos using the named host

In your config's daemon.repos, reference GHES repos with the name:owner/repo format (this shorthand is for config files only, not CLI arguments):

[daemon]
repos = [
    "netflix:myteam/myapp",
    "netflix:myteam/mylib",
]

This tells Gru that myteam/myapp lives on the host defined in [github_hosts.netflix].

Repo format reference

FormatExampleResolves to
owner/repomyorg/appgithub.com/myorg/app
name:owner/reponetflix:myteam/myappUses [github_hosts.netflix]
host/owner/repogithub.netflix.com/org/svcLegacy — use name:owner/repo instead

The name:owner/repo format is recommended for GHES.

Step 4: Initialize and run

Initialize your GHES repo. If you're passing owner/repo explicitly, use --host:

gru init myteam/myapp --host github.netflix.com

Alternatively, if you run gru init from inside an already-cloned GHES repository, Gru can infer the host from the git remote automatically — no --host flag needed in that case.

Then use Gru normally:

# Work on a single issue (use the full URL for GHES issues)
gru do https://github.netflix.com/myteam/myapp/issues/42

# Or from within the worktree, just use the issue number
gru do 42

# Run lab mode to poll for gru:todo issues
gru lab

Full config example

Here's a complete config for a team using GHES with two repos:

[github_hosts.netflix]
host = "github.netflix.com"

[daemon]
repos = [
    "netflix:myteam/api-gateway",
    "netflix:myteam/web-ui",
]
poll_interval_secs = 60
max_slots = 4

[agent]
default = "claude"

[merge]
confidence_threshold = 9

See config.example.toml for all available options with explanations.

Troubleshooting

gh auth status fails for GHES host

github.netflix.com
  X Not logged in to github.netflix.com

Fix: Run gh auth login --hostname github.netflix.com and authenticate.

"Unknown host name" error

Unknown host name 'netflix' in repo 'netflix:myteam/myapp'.
Add a [github_hosts.netflix] section to config.toml

Fix: Add the missing host entry to ~/.gru/config.toml:

[github_hosts.netflix]
host = "github.netflix.com"

403 Forbidden on API calls

Your token likely lacks required scopes. Re-authenticate with appropriate scopes:

gh auth login --hostname github.netflix.com --scopes repo,read:org

Or generate a new personal access token in your GHES settings with the scopes listed in Step 1.

SSL/TLS certificate errors

If your GHES instance uses a self-signed or internal CA certificate, the most reliable fix is to add the certificate to your system trust store. The gh CLI uses Go's standard TLS stack, which defers to the OS trust store.

On macOS:

sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain /path/to/ca-cert.crt

On Linux (Debian/Ubuntu):

sudo cp /path/to/ca-cert.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates

For git operations only (does not affect gh API calls):

git config --global http.https://github.netflix.com/.sslCAInfo /path/to/ca-bundle.crt

gru init hangs or times out

Check network connectivity to your GHES host:

gh api --hostname github.netflix.com /meta

If this fails, the issue is network-level (VPN, firewall, DNS).

Issue comments or labels not appearing

Verify your token has repo scope and that you have write access to the repository:

gh api --hostname github.netflix.com repos/myteam/myapp

If you get a 404, you either don't have access or the repo path is wrong.

Gru: Design Document

Version: 1.0 (Single-Lab MVP) Last Updated: 2025-12-02

Table of Contents

  1. Introduction
  2. Architecture Overview
  3. Core Components
  4. Session Management & Attach
  5. Data Model
  6. GitHub Integration
  7. Minion Lifecycle
  8. State Management
  9. CI/CD Integration
  10. Error Handling & Recovery
  11. Security Model
  12. API Specification
  13. File System Layout
  14. Configuration
  15. Observability
  16. Future Roadmap

Introduction

What is Gru?

Gru is a local-first orchestrator that runs LLM-powered agents (called Minions) to autonomously work on GitHub issues. Each Gru instance (a Lab) continuously monitors GitHub repositories for issues labeled gru:todo, claims them, implements solutions, runs tests via GitHub Actions, opens pull requests, and responds to code review feedback.

Design Philosophy

  1. Local-first: Works offline except for GitHub API calls
  2. GitHub as database: No separate state store; GitHub is source of truth
  3. Simple before clever: Start with polling + labels, optimize later
  4. Autonomous agents: Minions handle the full lifecycle from claim to merge
  5. Human-in-the-loop: Clear escalation paths when Minions need help

V1 Scope

This design describes the single-Lab MVP:

  • ✅ One Lab instance (no multi-Lab coordination)
  • ✅ Multi-repo support (Lab watches multiple repos)
  • ✅ Simple label state machine (gru:todogru:in-progressgru:done/gru:failed)
  • ✅ Local testing via pre-commit hooks + GitHub Actions for verification
  • ✅ In-memory state (no SQLite), file-based cursors
  • ✅ CLI-only (no web UI/Tower)
  • ✅ Polling (30s interval, no webhooks yet)
  • ✅ Tokens via environment variables only

Core Technology Stack

Language: Rust Runtime: Tokio (async) CLI Framework: Clap GraphQL: async-graphql (for Tower in future) Web: Axum (for Tower in future) GitHub API: gh CLI wrappers

Rationale: See docs/DECISIONS.md for quantitative DMX analysis. Rust scored 0.890 vs Python 0.110 for this use case.


Implementation Approach

Minion Execution: CLI + Stream Parsing

Each Minion is a Claude Code CLI process spawned with stream JSON output:

claude --print \
  --session-id <UUID> \
  --output-format stream-json \
  --dangerously-skip-permissions \
  "<task prompt>"

Key Flags:

  • --print: Non-interactive mode (no TTY needed)
  • --session-id: Maintains conversation context across restarts
  • --output-format stream-json: Real-time event stream on stdout
  • --dangerously-skip-permissions: Autonomous operation (no approval prompts)

Stream Monitoring

Lab parses JSON events from Claude's stdout in real-time:

#![allow(unused)]
fn main() {
#[derive(Debug, Deserialize)]
#[serde(tag = "type")]
enum ClaudeEvent {
    #[serde(rename = "thinking")]
    Thinking { content: String },

    #[serde(rename = "tool_use")]
    ToolUse {
        name: String,           // "bash", "write", "read", etc
        input: serde_json::Value
    },

    #[serde(rename = "message")]
    Message { content: String },

    #[serde(rename = "complete")]
    Complete,

    #[serde(rename = "error")]
    Error { message: String },
}
}

Benefits:

  • Real-time state tracking (thinking → tool_use → message → complete)
  • Event-based stuck detection (no activity timeout)
  • Structured logging to events.jsonl
  • JSON parsing is stable (no regex fragility)
  • No subprocess complexity (no tmux/zellij dependency)

Why This Approach

After evaluating 6 approaches via spike testing and DMX analysis:

ApproachScoreStatus
CLI + Stream Parsing0.735Selected
ACP Integration0.706Future (V2+)
Agent SDK (Python)0.688Too complex
Pure CLI0.559No monitoring
Rust + tmux0.466High fragility
Zellij0.210Failed tests

Key Advantages:

  • Perfect monitoring (10/10) via structured events
  • Proven in tests (10/10) - production-ready code exists
  • Balanced complexity (~300 LOC for full implementation)
  • Simple deployment (single binary, no tmux dependency)
  • Low maintenance (JSON format stable)

See: experiments/DMX_ANALYSIS.md for full quantitative comparison.


Architecture Overview

┌──────────────────────────────────────────────────────────────┐
│                           Lab Process                         │
│                                                               │
│  ┌─────────────┐      ┌──────────────┐    ┌──────────────┐  │
│  │   Poller    │─────>│ Scheduler    │───>│ Minion Pool  │  │
│  │             │      │              │    │              │  │
│  │ - Fetch     │      │ - Claim      │    │ - M1 (run)   │  │
│  │   issues    │      │   issues     │    │ - M2 (run)   │  │
│  │ - Check     │      │ - Assign     │    │ - M3 (pause) │  │
│  │   PRs       │      │   slots      │    │              │  │
│  └─────────────┘      └──────────────┘    └──────────────┘  │
│         │                     │                    │         │
└─────────┼─────────────────────┼────────────────────┼─────────┘
          │                     │                    │
          └─────────────────────┴────────────────────┘
                                │
                         GitHub REST/GraphQL API
                                │
          ┌─────────────────────┴────────────────────┐
          │                                          │
    ┌─────▼──────┐                          ┌───────▼──────┐
    │   Issues   │                          │  Workflows   │
    │            │                          │              │
    │ - Labels   │◄────────────────────────►│ - Checks API │
    │ - Comments │                          │ - Status     │
    │ - Timeline │                          │              │
    └────────────┘                          └──────────────┘
          │
          │
    ┌─────▼──────┐
    │  Pull      │
    │  Requests  │
    │            │
    │ - Reviews  │
    │ - Comments │
    └────────────┘

Key Architectural Decisions

DecisionRationale
Single binaryEasy deployment, no dependencies
GitHub as state storeEliminates DB complexity, provides audit trail
Git worktreesIsolated workspaces per Minion
Draft PR earlyNatural lock mechanism, visible progress
GitHub Actions for CIReuses existing infra, proper isolation
YAML commentsHuman-readable structured data
Polling (V1)Simple, reliable; webhooks deferred to V2

Core Components

Lab

The Lab is the main process that orchestrates Minions.

Responsibilities:

  • Poll GitHub for gru:todo issues
  • Manage Minion slots (max concurrent Minions)
  • Monitor PRs for review feedback and CI failures
  • Persist Minion state to disk
  • Expose GraphQL API for introspection (future)

Configuration:

# ~/.gru/config.toml
[daemon]
repos = ["owner/repo1", "owner/repo2"]
poll_interval_secs = 30       # How often to check for new issues
max_slots = 2                 # Max concurrent Minions
label = "gru:todo"             # Label to watch for

Minion

A Minion is a Claude Code session working on a single issue.

Key Insight: Each Minion IS a Claude Code session. One-to-one mapping.

Minion M042  =  Claude Code session in worktree ~/.gru/work/owner/repo/minion/issue-42-M042/checkout/
Minion M043  =  Claude Code session in worktree ~/.gru/work/owner/repo/minion/issue-43-M043/checkout/

How Lab manages Minions:

  1. Lab spawns Claude Code process with initial prompt
  2. Claude Code session runs autonomously in dedicated worktree
  3. Lab monitors Claude Code stdout/stderr for events
  4. Lab handles GitHub integration (labels, comments, PR operations)
  5. When complete, Lab terminates session and archives logs

Lifecycle States:

  • InProgress - Actively working (planning, implementing, testing, reviewing)
  • Failed - Exceeded retry limit (10-15 attempts), paused for human help
  • Done - PR merged or issue closed, session terminated and archived
  • Orphaned - Issue closed while Minion was running, kept alive for inspection

Note: Detailed sub-states (planning, testing, review) tracked internally and in comment events, not as explicit enum values.

Minion Structure:

#![allow(unused)]
fn main() {
use chrono::{DateTime, Utc};

struct Minion {
    id: String,              // Base36: "M001", "M002", ..., "M0zz"
    lab_id: String,          // hostname of Lab
    repo: String,            // "owner/repo"
    issue_number: i32,       // 123
    branch: String,          // e.g., "minion/issue-123-M007"
    state: MinionState,      // InProgress, Failed, Done, Orphaned

    worktree_path: String,   // ~/.gru/work/owner/repo/minion/issue-123-M042/checkout/
    pr_number: Option<i32>,  // None until PR created

    started_at: DateTime<Utc>,
    last_activity: DateTime<Utc>,

    metrics: MinionMetrics,
}

struct MinionMetrics {
    tokens_used: i32,
    commits_created: i32,
    ci_runs: i32,
    retry_count: i32,
    duration_seconds: i32,
}
}

Initial Prompt Template:

You are Minion M042 working on issue #123 in owner/repo.

## Issue
{issue_title}

{issue_body}

## Your Mission
1. Understand the issue requirements
2. Explore the codebase to identify relevant files
3. Implement the requested changes
4. Commit after each logical unit of work (tests run automatically via pre-commit hook)
5. Push commits to trigger GitHub Actions verification
6. Monitor CI, stale branches, merge conflicts - keep PR up to date
7. Respond to review feedback autonomously
8. Mark ready for review when complete

## Working Environment
- Directory: {worktree_path}
- Branch: {branch_name} (branched from {default_branch})
- Commit prefix: [minion:{minion_id}]

## Guidelines
- Commit after each logical unit of work (tests run automatically via pre-commit hook)
- Use descriptive commit messages
- You are autonomous - implement review suggestions, decline with reasoning, or create follow-up issues
- Resolve merge conflicts yourself (run tests to verify, only push if tests pass)
- If blocked after 10+ retry attempts, pause and request human help
- Keep the PR updated - rebase when stale, resolve conflicts proactively

## Context
Everything else (README, CONTRIBUTING.md, git history) is in the repository. 
Explore as needed. Use subagents for test execution to avoid context bloat.

Start working now.

Poller

The Poller monitors GitHub for work.

Issue Polling:

query FindReadyIssues($repo: String!) {
  repository(owner: $owner, name: $name) {
    issues(
      first: 20
      states: OPEN
      labels: ["gru:todo"]
      orderBy: {field: CREATED_AT, direction: ASC}
    ) {
      nodes {
        number
        title
        body
        labels(first: 10) { nodes { name } }
        createdAt
      }
    }
  }
}

PR Polling (for active Minions):

  • Check for new review comments
  • Check for failed check runs
  • Check for merge events

Polling Strategy:

#![allow(unused)]
fn main() {
use tokio::time::{interval, Duration};
use tokio::select;

async fn run(&self, mut shutdown: tokio::sync::watch::Receiver<bool>) {
    let mut ticker = interval(self.poll_interval);

    loop {
        select! {
            _ = ticker.tick() => {
                // Poll for new issues
                if self.lab.has_available_slots() {
                    let issues = self.fetch_ready_issues().await;
                    for issue in issues {
                        self.scheduler.enqueue(issue).await;
                    }
                }

                // Poll active PRs for updates
                for minion in self.lab.active_minions() {
                    let updates = self.fetch_pr_updates(minion.pr_number).await;
                    minion.handle_updates(updates).await;
                }
            }

            _ = shutdown.changed() => {
                return;
            }
        }
    }
}
}

Scheduler

The Scheduler assigns issues to available Minion slots.

Prioritization (V1 - simple):

  1. Issues with priority:critical label
  2. Issues with priority:high label
  3. Issues with priority:medium label
  4. Unlabeled issues (fall between medium and low)
  5. Issues with priority:low label

Within the same tier, oldest issues first (FIFO).

Slot Management:

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use std::sync::Arc;
use tokio::sync::RwLock;

struct Scheduler {
    max_slots: usize,
    active: Arc<RwLock<HashMap<String, Arc<Minion>>>>, // minion_id -> Minion
}

async fn try_claim_issue(&self, issue: Issue) -> Result<Arc<Minion>, SchedulerError> {
    let active = self.active.read().await;
    if active.len() >= self.max_slots {
        return Err(SchedulerError::NoSlotsAvailable);
    }
    drop(active);

    let minion = self.create_minion(issue);

    // Attempt to claim on GitHub
    self.claim_issue(&minion).await?;

    let minion_arc = Arc::new(minion);
    self.active.write().await.insert(minion_arc.id.clone(), minion_arc.clone());

    let minion_clone = minion_arc.clone();
    tokio::spawn(async move {
        minion_clone.run().await;
    });

    Ok(minion_arc)
}
}

Session Management & Attach

Claude Code Session Status

Each Minion (Claude Code session) has an internal status that the Lab tracks:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq)]
enum SessionStatus {
    Thinking,          // Processing/reasoning
    UsingTool,         // Executing tool
    Responding,        // Generating response
    WaitingInput,      // Needs user input
    WaitingPermission, // Needs approval
    Idle,              // Ready for instruction
    Complete,          // Task finished
    Error,             // Encountered error
}
}

Lab monitors status by:

  • Parsing Claude Code's JSON output stream
  • Detecting tool invocation events
  • Tracking output timestamps (idle detection)
  • Watching for completion/error signals

Status transition examples:

idle → thinking → using_tool(git_commit) → thinking → responding → idle
idle → thinking → using_tool(bash) → waiting_permission → idle
idle → thinking → responding → complete

Attach Architecture

Users can attach to running Minions to observe or interact:

┌─────────────┐
│  User TTY   │  gru attach M42
└──────┬──────┘
       │
       ▼
┌─────────────────────────────┐
│          Lab                │
│                             │
│  ┌───────────────────┐      │
│  │  AttachManager    │      │
│  │  - sessions       │      │
│  │  - multiplex I/O  │      │
│  └────────┬──────────┘      │
└───────────┼─────────────────┘
            │
            │ Bidirectional stream
            ▼
┌─────────────────────────────┐
│   Claude Code Session       │
│   (Minion M42)              │
│                             │
│   stdin  ← Lab + Attachers  │
│   stdout → Lab + Attachers  │
│   stderr → Lab + Attachers  │
└─────────────────────────────┘

Attach Modes

#![allow(unused)]
fn main() {
use tokio::sync::mpsc;
use chrono::{DateTime, Utc};

#[derive(Debug, Clone, PartialEq, Eq)]
enum AttachMode {
    ReadOnly,     // Observe only
    Interactive,  // Can send input
}

struct AttachSession {
    id: String,
    minion_id: String,
    user_id: String,
    mode: AttachMode,
    started_at: DateTime<Utc>,
    expires_at: DateTime<Utc>,  // Max 30 minutes

    input: mpsc::Sender<Vec<u8>>,   // User → Minion
    output: mpsc::Receiver<Vec<u8>>, // Minion → User
}
}

Attach Commands

# List active Minions
gru minions list

# Attach read-only (watch mode)
gru attach M42

# Attach read-only with live streaming
gru attach M42 --follow

# Attach interactive (can send input)
gru attach M42 --interactive

# Detach (Minion continues)
<Ctrl+D> or type 'detach'

Attach Session Management

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use std::sync::Arc;
use tokio::sync::RwLock;
use tokio::io::{AsyncBufReadExt, AsyncWriteExt, BufReader};
use tokio::time::{timeout, Duration};
use tracing::{info, error};

struct AttachManager {
    sessions: Arc<RwLock<HashMap<String, AttachSession>>>,
}

async fn attach(&self, minion_id: String, mode: AttachMode) -> Result<AttachSession, AttachError> {
    let minion = self.lab.get_minion(&minion_id)
        .ok_or(AttachError::MinionNotFound)?;

    // Check interactive attach limit
    if mode == AttachMode::Interactive {
        if self.has_interactive_session(&minion_id).await {
            return Err(AttachError::InteractiveSessionExists);
        }
    }

    let (input_tx, input_rx) = mpsc::channel(100);
    let (output_tx, output_rx) = mpsc::channel(1000);

    let session = AttachSession {
        id: generate_session_id(),
        minion_id: minion_id.clone(),
        user_id: String::new(),
        mode: mode.clone(),
        started_at: Utc::now(),
        expires_at: Utc::now() + chrono::Duration::minutes(30),
        input: input_tx,
        output: output_rx,
    };

    // Start multiplexing Minion output to this session
    let minion_clone = minion.clone();
    let output_tx_clone = output_tx.clone();
    tokio::spawn(async move {
        self.stream_output(minion_clone, output_tx_clone).await;
    });

    // If interactive, forward user input to Minion
    if mode == AttachMode::Interactive {
        let minion_clone = minion.clone();
        tokio::spawn(async move {
            self.stream_input(input_rx, minion_clone).await;
        });
    }

    self.sessions.write().await.insert(session.id.clone(), session.clone());
    info!(
        session_id = %session.id,
        minion_id = %minion_id,
        mode = ?mode,
        "user attached to minion"
    );

    Ok(session)
}

async fn stream_output(&self, minion: Arc<Minion>, output: mpsc::Sender<Vec<u8>>) {
    let mut reader = BufReader::new(minion.claude_session.stdout);

    loop {
        let mut line = Vec::new();
        match reader.read_until(b'\n', &mut line).await {
            Ok(0) | Err(_) => return, // Session ended
            Ok(_) => {}
        }

        // Send to attached session
        if timeout(Duration::from_secs(1), output.send(line)).await.is_err() {
            // Session not consuming, drop line
        }
    }
}

async fn stream_input(&self, mut input: mpsc::Receiver<Vec<u8>>, minion: Arc<Minion>) {
    while let Some(data) = input.recv().await {
        // Forward to Minion's stdin
        if let Err(e) = minion.claude_session.stdin.write_all(&data).await {
            error!(error = %e, "failed to send input to minion");
            return;
        }
    }
}
}

Attach UI

╭──────────────────────────────────────────────────╮
│ 🤖 Attached to Minion M42                        │
│ Issue: #123 - Add user authentication            │
│ Status: using_tool(bash)                         │
│ Uptime: 15m 32s | Commits: 2 | Tokens: 12.5k    │
╰──────────────────────────────────────────────────╯

[M42 15:23:45] Running: npm test
[M42 15:23:45] 
[M42 15:23:47] > test
[M42 15:23:47] > jest
[M42 15:23:48] 
[M42 15:23:50]  PASS  src/auth.test.js
[M42 15:23:50]    ✓ should generate JWT token (45ms)
[M42 15:23:50]    ✓ should validate token (12ms)
[M42 15:23:50] 
[M42 15:23:50] Tests: 2 passed, 2 total
[M42 15:23:50] Status: thinking
[M42 15:23:52] All tests passed! Committing changes...
[M42 15:23:52] Status: using_tool(git_commit)

╭──────────────────────────────────────────────────╮
│ Mode: readonly | Press Ctrl+D to detach          │
╰──────────────────────────────────────────────────╯

Security & Constraints

Session limits:

  • Maximum 30-minute duration (auto-expire)
  • Only 1 interactive session per Minion
  • Unlimited read-only sessions per Minion
  • Session activity logged for audit

Permissions:

  • Lab always retains control
  • User input mediated through Lab
  • Can't force-terminate Minion from attach
  • Can suggest actions but Lab decides

Use cases:

  1. Debugging - Attach interactive to unstick blocked Minion
  2. Monitoring - Watch Minion work in real-time
  3. Learning - Observe agent reasoning and tool use
  4. Intervention - Provide clarification when Minion asks questions

Data Model

Issue States (Labels)

Label state machine:

┌──────────┐
│ gru:todo │  (user adds this when issue is ready)
└────┬─────┘
     │
     │ Lab claims issue
     ▼
┌─────────────────┐
│ gru:in-progress │  (Minion actively working)
└────────┬────────┘
     │
     ├───────────────┐
     │               │
     │               │ Max retries exceeded
     │               ▼
     │         ┌──────────────┐
     │         │ gru:failed   │  (agent encountered failure)
     │         └──────────────┘
     │
     │ PR merged or issue closed
     ▼
┌──────────┐
│ gru:done │  (archived, cleaned up)
└──────────┘

Additional labels:

  • gru:blocked — Minion needs human help
  • gru:ready-to-merge — PR passes checks and is ready
  • gru:auto-merge — Auto-merge enabled on PR
  • gru:needs-human-review — PR requires human review before merge

Note: Detailed states (planning, implementing, testing, review, blocked) are tracked in YAML comment events, not labels. Labels only reflect high-level lifecycle.

Event Types (YAML Comments)

All structured Minion events are posted as GitHub comments with YAML frontmatter:

---
event: <event_type>
minion_id: <string>
timestamp: <ISO8601>
# ... event-specific fields
---

Event Types:

EventFieldsPosted When
minion:claimlab_id, branchIssue claimed
minion:planplan_summary, estimated_tokensExecution plan generated
minion:progressphase, commits, tests_passingPeriodic updates
minion:commitsha, message, ci_run_idCode committed
minion:blockedreason, questionNeeds human input
minion:failedfailure_reason, attempts, last_errorEscalation
minion:donepr_number, commits, total_costCompletion

Example:

🤖 **Minion M42 claimed this issue**

---
event: minion:claim
minion_id: M42
lab_id: macbook-pro.local
branch: minion/issue-123-M42
timestamp: 2025-01-30T14:23:45Z
---

Starting work on this issue. I'll create a draft PR shortly.

📋 **Execution Plan**
1. Implement user authentication endpoints
2. Add JWT token generation
3. Write unit tests
4. Update API documentation

GitHub Integration

Authentication

GitHub App (preferred for production):

  • Scoped permissions: contents:write, issues:write, pull_requests:write, checks:read
  • Per-repository installation
  • Audit trail via GitHub Apps

Personal Access Token (V1 approach):

  • Classic token with repo and workflow scopes
  • Stored in environment variable GRU_GITHUB_TOKEN (not in config file)

API Usage Patterns

Label Operations:

# Add label
gh issue edit {issue_number} --repo {owner}/{repo} --add-label "gru:in-progress"

# Remove label
gh issue edit {issue_number} --repo {owner}/{repo} --remove-label "gru:todo"

Comment Posting:

#![allow(unused)]
fn main() {
// POST /repos/{owner}/{repo}/issues/{issue_number}/comments
// Body: {
//     "body": "🤖 **Minion M42**\n\n---\nevent: minion:progress\n..."
// }
}

Timeline Retrieval:

#![allow(unused)]
fn main() {
// GET /repos/{owner}/{repo}/issues/{issue_number}/timeline
// Accept: application/vnd.github.mockingbird-preview+json

// Parse YAML frontmatter from comments
for event in timeline {
    if event.event_type == "commented" {
        let yaml_data = extract_frontmatter(&event.body);
        let minion_event = parse_yaml(&yaml_data)?;
    }
}
}

Draft PR Creation:

#![allow(unused)]
fn main() {
// POST /repos/{owner}/{repo}/pulls
// Body: {
//     "title": "[DRAFT] Fixes #123: Add user authentication",
//     "head": "minion/issue-123-M42",
//     "base": "main",
//     "body": "🤖 This PR is being worked on by Minion M42...",
//     "draft": true
// }
}

Check Runs Monitoring:

#![allow(unused)]
fn main() {
use tokio::time::{sleep, Duration};

// GET /repos/{owner}/{repo}/commits/{sha}/check-runs

// Subscribe to check run completion
loop {
    let check_runs = fetch_check_runs(&commit_sha).await?;
    if all_complete(&check_runs) {
        if all_passed(&check_runs) {
            minion.on_ci_pass().await;
        } else {
            minion.on_ci_fail(&check_runs).await;
        }
        break;
    }
    sleep(Duration::from_secs(10)).await;
}
}

Rate Limiting

GitHub REST API rate limits:

  • Authenticated: 5,000 requests/hour
  • GraphQL: 5,000 points/hour (cost varies by query)

Mitigation strategies:

#![allow(unused)]
fn main() {
use chrono::{DateTime, Utc};
use tokio::time::sleep_until;
use std::collections::HashMap;

struct RateLimiter {
    remaining: i32,
    reset_at: DateTime<Utc>,
}

impl RateLimiter {
    async fn wait(&self) {
        if self.remaining < 100 {
            sleep_until(self.reset_at.into()).await;
        }
    }
}

// Use conditional requests where possible
let mut headers = HashMap::new();
headers.insert("If-None-Match", etag);
headers.insert("If-Modified-Since", last_modified);
// Returns 304 Not Modified (doesn't count against quota)
}

Minion Lifecycle

1. Claim Phase

#![allow(unused)]
fn main() {
async fn claim(&mut self, issue: Issue) -> Result<(), MinionError> {
    // 1. Add 'gru:in-progress' label
    self.github.add_label(&issue, "gru:in-progress").await?;

    // 2. Post claim comment
    let comment = format_claim_comment(&self.id, &self.lab_id);
    self.github.post_comment(&issue, &comment).await?;

    // 3. Create branch
    self.branch = format!("minion/issue-{}-{}", issue.number, self.id);
    self.git.create_branch(&self.branch, "main").await?;

    // 4. Create draft PR (acts as lock)
    match self.github.create_draft_pr(&issue, &self.branch).await {
        Ok(pr) => {
            self.pr_number = Some(pr.number);
        }
        Err(e) if e.is_pr_exists() => {
            // Another Lab may have claimed this issue
            self.cleanup().await;
            return Err(MinionError::LostRace);
        }
        Err(e) => return Err(e.into()),
    }

    // 5. Remove 'gru:todo' label
    self.github.remove_label(&issue, "gru:todo").await?;

    Ok(())
}
}

2. Planning Phase

#![allow(unused)]
fn main() {
async fn generate_plan(&self) -> Result<Plan, MinionError> {
    // Read issue context
    let issue = self.github.get_issue(self.issue_number).await?;

    // Fetch relevant codebase context
    let files = self.identify_relevant_files(&issue).await;
    let code_context = self.read_files(&files).await;

    // Generate execution plan via LLM
    let prompt = format_planning_prompt(&issue, &code_context);
    let plan = self.llm.generate(&prompt).await?;

    // Post plan to issue
    let plan_comment = format_plan_comment(&self.id, &plan);
    self.github.post_comment(&issue, &plan_comment).await?;

    // Save plan locally
    self.save_plan(&plan).await?;

    Ok(plan)
}
}

3. Implementation Phase

#![allow(unused)]
fn main() {
async fn implement(&mut self, plan: Plan) -> Result<(), MinionError> {
    for step in &plan.steps {
        // Generate code changes
        let changes = self.generate_changes(step).await;

        // Apply changes to worktree
        for change in changes {
            self.apply_change(&change).await;
        }

        // Run local validation (optional)
        if let Err(_) = self.run_local_checks().await {
            // Fix and retry
            continue;
        }

        // Commit changes
        let commit_msg = format_commit_message(&self.id, step);
        let sha = self.git.commit(&commit_msg).await?;

        // Push to trigger CI
        self.git.push(&self.branch).await?;

        // Wait for CI
        if let Err(e) = self.wait_for_ci(&sha).await {
            // CI failed, attempt to fix
            if let Err(_) = self.fix_ci_failure(&sha).await {
                self.metrics.retry_count += 1;
                if self.metrics.retry_count > MAX_RETRIES {
                    return self.escalate("CI failures exceeded max retries").await;
                }
            }
        }

        // Post progress update
        self.post_progress_update(step).await;
    }

    Ok(())
}
}

4. Pre-PR Self-Review Phase

After committing the implementation and before pushing, the Minion spawns a code-reviewer subagent to act as an independent reviewer. The subagent examines the diff for:

  • Code correctness and logic errors
  • Security vulnerabilities
  • Error handling gaps and edge cases
  • Adherence to project conventions
  • Test coverage

Any issues identified are addressed before the PR is created, which may involve additional commits. This step acts as a quality gate that catches problems before they reach human reviewers or CI.

5. Review Phase

#![allow(unused)]
fn main() {
use tokio::time::{sleep, Duration};

async fn handle_review(&mut self) -> Result<(), MinionError> {
    // Convert draft to ready for review
    self.github.mark_pr_ready(self.pr_number.unwrap()).await?;

    // Update labels
    self.github.replace_labels(&self.issue, &["review"]).await?;

    // Post summary comment
    let summary = self.generate_summary();
    self.github.post_comment(&self.issue, &summary).await?;

    // Monitor for review feedback
    loop {
        // Check for new review comments
        let comments = self.github.get_review_comments(
            self.pr_number.unwrap(),
            self.last_seen_comment_id
        ).await?;

        for comment in comments {
            if self.can_handle_autonomously(&comment) {
                // Implement requested changes
                self.implement_review_feedback(&comment).await?;
            } else {
                // Ask for clarification or escalate
                self.respond_to_reviewer(&comment).await?;
            }
        }

        // Check if merged
        let pr = self.github.get_pr(self.pr_number.unwrap()).await?;
        if pr.merged {
            return self.complete().await;
        }

        // Check if closed without merge
        if pr.state == "closed" && !pr.merged {
            return self.abandon().await;
        }

        sleep(Duration::from_secs(30)).await;
    }
}
}

6. Completion Phase

#![allow(unused)]
fn main() {
async fn complete(&mut self) -> Result<(), MinionError> {
    // Add done label
    self.github.add_label(&self.issue, "gru:done").await?;

    // Post completion comment with metrics
    self.post_completion_comment().await?;

    // Archive logs and events
    self.archive_logs().await?;

    // Cleanup worktree
    self.git.remove_worktree(&self.worktree_path).await?;

    // Remove from active Minions
    self.lab.remove_minion(&self.id).await?;

    Ok(())
}
}

State Management

Local State (SQLite)

-- ~/.gru/state/minions.db

CREATE TABLE minions (
    id TEXT PRIMARY KEY,
    lab_id TEXT NOT NULL,
    repo TEXT NOT NULL,
    issue_number INTEGER NOT NULL,
    branch TEXT NOT NULL,
    state TEXT NOT NULL,
    pr_number INTEGER,
    
    started_at TIMESTAMP NOT NULL,
    last_activity TIMESTAMP NOT NULL,
    
    tokens_used INTEGER DEFAULT 0,
    commits_created INTEGER DEFAULT 0,
    ci_runs INTEGER DEFAULT 0,
    retry_count INTEGER DEFAULT 0,
    
    UNIQUE(repo, issue_number)
);

CREATE TABLE timeline_cursors (
    repo TEXT NOT NULL,
    issue_number INTEGER NOT NULL,
    cursor TEXT NOT NULL,
    last_checked TIMESTAMP NOT NULL,
    
    PRIMARY KEY (repo, issue_number)
);

Why SQLite?

  • Single file, no daemon process
  • ACID transactions
  • Efficient queries for active Minions
  • Simple backup (copy file)

GitHub State (Source of Truth)

GitHub stores the canonical state via:

  • Labels - Current state visible in UI
  • Timeline - Complete event history
  • PR - Work artifacts and review discussions

State Reconciliation:

#![allow(unused)]
fn main() {
async fn reconcile_state(&self) -> Result<(), LabError> {
    // On startup, rebuild state from GitHub
    let local_minions = self.db.get_active_minions().await?;

    for mut minion in local_minions {
        // Fetch issue timeline
        let timeline = self.github.get_timeline(&minion.issue).await?;

        // Reconstruct state from events
        let actual_state = derive_state_from_timeline(&timeline);

        if actual_state != minion.state {
            // GitHub is source of truth
            minion.state = actual_state.clone();
            self.db.update_minion(&minion).await?;
        }

        // Resume or abandon based on state
        match actual_state.as_str() {
            "gru:in-progress" => {
                let minion_clone = minion.clone();
                tokio::spawn(async move {
                    minion_clone.resume().await;
                });
            }
            "review" => {
                let minion_clone = minion.clone();
                tokio::spawn(async move {
                    minion_clone.monitor_review().await;
                });
            }
            "gru:done" | "gru:failed" => {
                minion.cleanup().await?;
            }
            _ => {}
        }
    }

    Ok(())
}
}

CI/CD Integration

GitHub Actions Workflow

Minions rely on repository's existing workflows:

# .github/workflows/ci.yml
name: CI

on:
  push:
    branches: ['**']  # Run on all branches including minion branches
  pull_request:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Setup
        run: make setup
      - name: Lint
        run: make lint
      - name: Test
        run: make test
      - name: Build
        run: make build

Monitoring Check Runs

#![allow(unused)]
fn main() {
use tokio::time::{timeout, sleep, Duration};

async fn wait_for_ci(&mut self, commit_sha: &str) -> Result<(), MinionError> {
    let timeout_duration = Duration::from_secs(30 * 60);
    let mut interval = tokio::time::interval(Duration::from_secs(10));

    timeout(timeout_duration, async {
        loop {
            interval.tick().await;

            let check_runs = self.github.get_check_runs(commit_sha).await?;

            if !all_complete(&check_runs) {
                continue;
            }

            if all_passed(&check_runs) {
                self.metrics.ci_runs += 1;
                return Ok(());
            }

            // CI failed - fetch logs
            let logs = self.github.get_check_run_logs(&check_runs).await?;
            return Err(MinionError::CIFailure {
                check_runs,
                logs,
            });
        }
    })
    .await
    .map_err(|_| MinionError::CITimeout)?
}
}

Handling CI Failures

#![allow(unused)]
fn main() {
async fn fix_ci_failure(&mut self, err: &CIFailureError) -> Result<(), MinionError> {
    // Analyze failure logs
    let analysis = self.analyze_failure(&err.logs).await;

    // Classify failure type
    match analysis.failure_type {
        FailureType::FlakyTest => {
            // Retry without changes
            self.git.commit("Retry flaky tests", &["--allow-empty"]).await?;
            self.git.push(&self.branch).await?;
            Ok(())
        }

        FailureType::TestFailure => {
            // Generate fix
            let fix = self.llm.generate_fix(&analysis).await?;
            self.apply_fix(&fix).await;
            self.git.commit(&format!("[minion:{}] Fix test failures", self.id), &[]).await?;
            self.git.push(&self.branch).await?;
            Ok(())
        }

        FailureType::BuildError => {
            // Attempt to fix build
            let fix = self.llm.generate_build_fix(&analysis).await?;
            self.apply_fix(&fix).await;
            self.git.commit(&format!("[minion:{}] Fix build errors", self.id), &[]).await?;
            self.git.push(&self.branch).await?;
            Ok(())
        }

        FailureType::Timeout => {
            // Escalate - likely infrastructure issue
            self.escalate("CI timeout - may need human intervention").await
        }

        _ => {
            self.escalate(&format!("Unknown CI failure: {}", analysis.summary)).await
        }
    }
}
}

Error Handling & Recovery

Retry Strategy

#![allow(unused)]
fn main() {
use tokio::time::{sleep, Duration};
use rand::Rng;

struct RetryConfig {
    max_attempts: usize,        // Default: 5
    initial_backoff: Duration,  // Default: 5s
    max_backoff: Duration,      // Default: 5m
    backoff_factor: f64,        // Default: 2.0
}

async fn retry_with_backoff<F, Fut, T, E>(
    &self,
    operation: F,
    config: RetryConfig,
) -> Result<T, E>
where
    F: Fn() -> Fut,
    Fut: std::future::Future<Output = Result<T, E>>,
    E: std::error::Error,
{
    let mut backoff = config.initial_backoff;
    let mut rng = rand::thread_rng();

    for attempt in 0..config.max_attempts {
        match operation().await {
            Ok(result) => return Ok(result),
            Err(err) => {
                // Check if error is retryable
                if !is_retryable(&err) {
                    return Err(err);
                }

                if attempt == config.max_attempts - 1 {
                    return Err(err);
                }

                // Exponential backoff with jitter
                let jitter_ms = rng.gen_range(0..=(backoff.as_millis() / 10)) as u64;
                let jitter = Duration::from_millis(jitter_ms);
                sleep(backoff + jitter).await;

                backoff = Duration::from_secs_f64(backoff.as_secs_f64() * config.backoff_factor);
                if backoff > config.max_backoff {
                    backoff = config.max_backoff;
                }
            }
        }
    }

    Err(MinionError::MaxRetriesExceeded)
}
}

Escalation to Humans

#![allow(unused)]
fn main() {
use chrono::Utc;

async fn escalate(&mut self, reason: &str) -> Result<(), MinionError> {
    // Update state
    self.state = MinionState::Blocked;
    self.db.update_minion(self).await?;

    // Add label
    self.github.add_label(&self.issue, "gru:blocked").await?;

    // Post escalation comment
    let comment = format!(
        r#"❌ **Minion {} needs help**

---
event: minion:blocked
minion_id: {}
reason: {}
attempts: {}
timestamp: {}
---

I've encountered an issue I can't resolve on my own. Could a human take a look?

**What I tried:**
{}

**Logs:** See [workflow run]({})

cc @repo-maintainer
"#,
        self.id,
        self.id,
        reason,
        self.metrics.retry_count,
        Utc::now().to_rfc3339(),
        self.format_attempt_history(),
        self.get_workflow_run_url()
    );

    self.github.post_comment(&self.issue, &comment).await?;

    // Pause Minion (don't cleanup - human may resume)
    self.pause().await?;

    Ok(())
}
}

Crash Recovery

#![allow(unused)]
fn main() {
use chrono::Utc;

async fn recover(&self) -> Result<(), LabError> {
    // On startup, check for orphaned Minions
    let active_minions = self.db.get_active_minions().await?;

    for mut minion in active_minions {
        let last_activity = minion.last_activity;

        // If no activity for > 1 hour, likely crashed
        if Utc::now().signed_duration_since(last_activity).num_hours() > 1 {
            // Check GitHub state
            let issue = self.github.get_issue(minion.issue_number).await?;
            let labels = &issue.labels;

            if has_label(labels, "gru:in-progress") {
                // Still marked as in-progress on GitHub
                // Try to resume or fail gracefully
                if minion.can_resume() {
                    let minion_clone = minion.clone();
                    tokio::spawn(async move {
                        minion_clone.resume().await;
                    });
                } else {
                    minion.escalate("Lab crashed, unable to resume").await?;
                }
            }
        }
    }

    Ok(())
}
}

Security Model

Threat Model

Trusted:

  • Lab operator (has filesystem access)
  • GitHub (source of truth)
  • LLM provider (Anthropic, OpenAI)

Untrusted:

  • Issue authors (may request malicious code)
  • PR reviewers (compromise unlikely but possible)
  • Dependencies installed during CI (supply chain risk)

Mitigations

1. GitHub Token Security

#![allow(unused)]
fn main() {
// GitHub auth: gh/ghe CLI token (primary) or GRU_GITHUB_TOKEN env var (fallback)
// Tokens are never stored in config files

// Never log tokens
impl std::fmt::Display for Config {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "token=***REDACTED***")
    }
}
}

2. Secret Detection

#![allow(unused)]
fn main() {
use regex::Regex;

// Pre-commit hook
async fn pre_commit_check(&self, files: &[String]) -> Result<(), MinionError> {
    for file in files {
        if contains_secrets(file)? {
            return Err(MinionError::SecretsDetected(file.clone()));
        }
    }
    Ok(())
}

lazy_static::lazy_static! {
    static ref SECRET_PATTERNS: Vec<Regex> = vec![
        Regex::new(r"(?i)api[_-]?key\s*=\s*['"][a-zA-Z0-9]{20,}['"]").unwrap(),
        Regex::new(r"(?i)password\s*=\s*['"][^'"]+['"]").unwrap(),
        Regex::new(r"ghp_[a-zA-Z0-9]{36}").unwrap(),
        Regex::new(r"-----BEGIN PRIVATE KEY-----").unwrap(),
    ];
}
}

3. Worktree Isolation

# Each Minion gets isolated worktree
~/.gru/work/owner/repo/M42/  # No access to other Minions

4. CI Sandboxing

  • Tests run in GitHub Actions (containerized)
  • No local code execution beyond git operations
  • Minion has no access to repo secrets (only CI does)

5. Rate Limiting

#![allow(unused)]
fn main() {
// Prevent resource exhaustion
struct ResourceLimits {
    max_tokens_per_issue: i32,   // 100k tokens
    max_commits_per_issue: i32,  // 50 commits
    max_ci_runs_per_issue: i32,  // 20 runs
    max_retries_per_issue: i32,  // 5 attempts
}
}

API Specification

GraphQL Schema (Future - V2)

schema {
  query: Query
  mutation: Mutation
  subscription: Subscription
}

type Query {
  lab: Lab!
  minion(id: ID!): Minion
  minions(state: [MinionState!], repo: String): [Minion!]!
  issue(repo: String!, number: Int!): Issue
}

type Mutation {
  claimIssue(repo: String!, number: Int!): Minion
  pauseMinion(id: ID!): Boolean!
  resumeMinion(id: ID!): Boolean!
  abandonMinion(id: ID!): Boolean!
}

type Subscription {
  minionEvents(id: ID!): MinionEvent!
  labEvents: LabEvent!
}

type Lab {
  id: ID!
  hostname: String!
  version: String!
  slots: Int!
  activeMinions: Int!
  startedAt: DateTime!
}

type Minion {
  id: ID!
  labId: ID!
  repo: String!
  issueNumber: Int!
  branch: String!
  state: MinionState!
  prNumber: Int
  
  startedAt: DateTime!
  lastActivity: DateTime!
  
  metrics: MinionMetrics!
  events: [MinionEvent!]!
}

type MinionMetrics {
  tokensUsed: Int!
  commitsCreated: Int!
  ciRuns: Int!
  retryCount: Int!
  durationSeconds: Int!
}

enum MinionState {
  CLAIMED
  PLANNING
  IMPLEMENTING
  TESTING
  BLOCKED
  REVIEW
  DONE
  FAILED
}

type MinionEvent {
  type: String!
  timestamp: DateTime!
  data: JSON!
}

REST Endpoints (V1)

GET  /health                  # Health check
GET  /metrics                 # Prometheus metrics
GET  /api/lab                 # Lab info
GET  /api/minions             # List active Minions
GET  /api/minions/:id         # Minion details
POST /api/minions/:id/pause   # Pause Minion
POST /api/minions/:id/resume  # Resume Minion
POST /api/minions/:id/abandon # Abandon Minion

File System Layout

~/.gru/
├── config.toml                      # Lab configuration
├── repos/                           # Bare repository mirrors
│   └── owner/
│       └── repo.git/                # Bare clone
├── work/                            # Active worktrees
│   └── owner/
│       └── repo/
│           └── minion/
│               └── issue-42-M042/   # Minion directory (branch name)
│                   ├── events.jsonl
│                   ├── .gru_pr_state.json
│                   └── checkout/    # Git worktree (repo files)
│                       ├── .git
│                       └── <repo files>
├── state/
│   ├── next_id.txt                  # Monotonic counter for Minion IDs (base36)
│   ├── minions.json                 # Persistent Minion registry
│   └── minions.json.lock            # Registry file lock
└── archive/                         # Completed work (future)

Configuration

Environment Variables

GitHub authentication priority:

  1. gh/ghe CLI token (gh auth token --hostname <host>)
  2. GRU_GITHUB_TOKEN environment variable (fallback)
# Optional fallback if gh CLI auth is not configured
export GRU_GITHUB_TOKEN="ghp_xxxxxxxxxxxx"

# Claude CLI handles its own authentication

config.toml

Non-sensitive settings only:

# ~/.gru/config.toml

[daemon]
# Repositories to monitor (multi-repo support)
repos = ["owner/repo1", "owner/repo2", "anotherowner/repo3"]

# Max concurrent Minions
max_slots = 2

# Polling interval (seconds)
poll_interval_secs = 30

# Label to watch for issues (default: "gru:todo")
label = "gru:todo"

Observability

Structured Logging

#![allow(unused)]
fn main() {
use tracing::{info, error};

info!(
    minion_id = %self.id,
    issue = self.issue_number,
    repo = %self.repo,
    "minion claimed issue"
);

error!(
    minion_id = %self.id,
    commit_sha = %sha,
    attempts = self.metrics.retry_count,
    error = %err,
    "CI failed"
);
}

Metrics (Prometheus)

# HELP gru_minions_active Number of active Minions
# TYPE gru_minions_active gauge
gru_minions_active{state="implementing"} 2
gru_minions_active{state="review"} 1

# HELP gru_issues_claimed_total Total issues claimed
# TYPE gru_issues_claimed_total counter
gru_issues_claimed_total 42

# HELP gru_prs_merged_total Total PRs merged
# TYPE gru_prs_merged_total counter
gru_prs_merged_total 35

# HELP gru_ci_runs_total Total CI runs triggered
# TYPE gru_ci_runs_total counter
gru_ci_runs_total{result="pass"} 120
gru_ci_runs_total{result="fail"} 15

# HELP gru_tokens_used_total Total LLM tokens consumed
# TYPE gru_tokens_used_total counter
gru_tokens_used_total 1523421

# HELP gru_issue_duration_seconds Time from claim to completion
# TYPE gru_issue_duration_seconds histogram
gru_issue_duration_seconds_bucket{le="300"} 5
gru_issue_duration_seconds_bucket{le="1800"} 15
gru_issue_duration_seconds_bucket{le="3600"} 30

Event Log (events.jsonl)

{"event":"claimed","minion_id":"M42","issue":123,"timestamp":"2025-01-30T12:34:56Z"}
{"event":"plan_generated","minion_id":"M42","tokens":450,"steps":4}
{"event":"commit","minion_id":"M42","sha":"abc123","message":"Add auth"}
{"event":"ci_triggered","minion_id":"M42","workflow":"test","run_id":123456}
{"event":"ci_passed","minion_id":"M42","duration_ms":45000}
{"event":"pr_ready","minion_id":"M42","pr_number":789}
{"event":"review_comment","minion_id":"M42","comment_id":999}
{"event":"merged","minion_id":"M42","pr_number":789}
{"event":"completed","minion_id":"M42","total_tokens":15234,"commits":5,"duration_s":1800}

Future Roadmap

V2: Multi-Lab Coordination

  • Distributed locking via GitHub Projects v2
  • Heartbeat protocol for liveness detection
  • Stale issue reclamation (Labs can pick up abandoned work)

V3: Tower & Web UI

  • Central dashboard for all Labs
  • Real-time attach sessions (PTY streaming)
  • Live event subscriptions via WebSocket
  • OAuth authentication for users

V4: Advanced Features

  • Issue dependency DAG (wait for blockers)
  • Codebase RAG (semantic search via embeddings)
  • Learned prioritization (predict issue complexity)
  • Cost optimization (model selection, prompt caching)
  • Webhook support (replace polling)
  • Slack/email notifications
  • Multi-repo orchestration
  • Review feedback learning (improve from past PRs)

References

External Documentation

  • Sweep - AI junior developer
  • Devin - AI software engineer
  • AutoGPT - Autonomous agents

Inspirational Resources


Last Updated: 2025-01-30
Version: 1.0 (Single-Lab MVP)

Gru Design Decisions

Last Updated: 2025-12-02 Status: Final decisions made via quantitative DMX analysis


Critical Implementation Decisions (2025-12-02)

Decision 1: Architecture Approach

Question: How to spawn and manage autonomous Claude Code agents?

Answer: CLI + Stream Parsing (scored 0.735/1.0)

After evaluating 6 approaches through spike testing and DMX analysis:

ApproachScoreVerdict
CLI + Stream Parsing0.735✅ SELECTED
ACP Integration0.706Future (V2+)
Agent SDK (Python)0.688Too complex
Pure CLI0.559No monitoring
Rust + tmux0.466High fragility
Zellij0.210Failed tests

Implementation:

claude --print \
  --session-id <UUID> \
  --output-format stream-json \
  --dangerously-skip-permissions

Parse JSON events from stdout for real-time monitoring.

See: experiments/DMX_ANALYSIS.md for full analysis.

Decision 2: Implementation Language

Question: Python or Rust?

Answer: Rust (scored 0.890 vs Python 0.110 - 8x advantage)

Rust scored perfectly on all high-priority criteria:

  • Single Binary Deployment: 10/10 (Python: 4/10)
  • Daemon Reliability: 10/10 (Python: 6/10)
  • Concurrency: 10/10 (Python: 6/10)
  • Type Safety: 10/10 (Python: 3/10)

Rationale: The vision is "single-binary, local-first" (mentioned 3x in docs). Architecture requires 24/7 daemon with true concurrency for 10+ minions. Rust is the only logical choice for the production system.

See: experiments/LANGUAGE_DECISION.md for full analysis.

Technology Stack

Core:

  • Language: Rust
  • Runtime: Tokio async
  • CLI: Clap
  • GraphQL: async-graphql
  • Web: Axum
  • GitHub: octocrab

Key Dependencies:

[dependencies]
tokio = { version = "1", features = ["full"] }
async-graphql = "7"
axum = "0.7"
clap = { version = "4", features = ["derive"] }
octocrab = "0.38"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
anyhow = "1"

Timeline: 6-8 weeks to production-ready V1


V1: Simplified Single-Lab Design

Core Principles

  1. Single Lab assumption - No distributed coordination needed initially
  2. Simple labels for state - gru:todo, gru:in-progress, gru:done, gru:failed
  3. Comments as event log - GitHub timeline API provides complete audit trail
  4. Early draft PR - Create as soon as branch exists, provides natural lock mechanism
  5. GitHub Actions for CI - Delegate test execution to existing infrastructure
  6. Claude Code for agents (V1) - Use Claude Code as the initial agent runtime, design for pluggable agents later

Agent Runtime

V1: Claude Code

Decision: Start with Claude Code as the agent runtime for Minions.

Rationale:

  • Built-in tool use - Git, file operations, bash commands already available
  • Agentic by default - Designed for autonomous multi-step tasks
  • MCP support - Can integrate with external tools and services
  • Proven - Battle-tested for coding tasks
  • Fast iteration - Focus on orchestration, not building agent infrastructure

Integration:

# config.yaml
agent:
  runtime: claude-code
  model: claude-sonnet-4-5
  tools:
    - git
    - bash
    - file_operations
  mcp_servers:
    - github  # For GitHub API operations

Minion = Claude Code Session:

Each Minion is a Claude Code session. One-to-one mapping.

Minion M42  =  Claude Code session working in worktree ~/.gru/work/owner/repo/M42
Minion M43  =  Claude Code session working in worktree ~/.gru/work/owner/repo/M43

How it works:

  1. Lab spawns Claude Code process for each Minion (separate session per worktree)
  2. Passes initial prompt with issue description and codebase context
  3. Claude Code autonomously works: reads code, makes changes, commits, pushes
  4. Lab monitors Claude Code output for events and state changes
  5. Lab handles GitHub integration layer (labels, comments, PR creation)
  6. When issue complete, Lab terminates Claude Code session and cleans up

Example Minion initialization:

# Lab spawns Claude Code for Minion M42
cd ~/.gru/work/owner/repo/M42
claude --session M42 --context issue-123.md --autonomous

Prompt template:

You are Minion M42 working on issue #123 in owner/repo.

## Issue
[issue description]

## Your Task
1. Read the issue carefully
2. Explore the codebase to understand relevant code
3. Implement the requested changes
4. Commit when a logical unit of work is complete AND tests pass
5. Push commits to trigger CI
6. Monitor CI results and fix failures
7. When ready, notify the Lab that you're done

## Guidelines
- Commit frequently (after each successful CI run)
- Use commit messages: [minion:M42] <description>
- If stuck or tests fail repeatedly, escalate
- Working directory: ~/.gru/work/owner/repo/M42
- Branch: minion/issue-123-M42

## Tools Available
- Git operations (commit, push, diff)
- File read/write
- Bash commands
- GitHub API (via MCP)

Begin work now.

Session Status Tracking:

Claude Code sessions have internal states that Lab must track:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq)]
enum SessionStatus {
    Thinking,          // Claude processing
    UsingTool,         // Executing tool (git, bash, etc)
    Responding,        // Generating response
    WaitingInput,      // Needs user input
    WaitingPermission, // Needs approval for action
    Idle,              // Waiting for next instruction
    Complete,          // Task finished
    Error,             // Encountered error
}

struct Minion {
    // ... existing fields ...

    session_status: SessionStatus,
    current_tool: Option<String>, // Which tool is being used (if UsingTool)
    last_output: DateTime<Utc>,   // Last time session produced output
}
}

Lab monitors session status by:

  1. Parsing Claude Code's JSON output stream
  2. Watching for status indicators in stdout/stderr
  3. Detecting tool use events
  4. Tracking idle time (no output = potentially stuck)

Example status transitions:

idle → thinking → using_tool(git) → thinking → responding → idle
idle → thinking → using_tool(bash) → waiting_permission → idle
idle → thinking → responding → complete

User Attach Sessions:

Users can attach to any active Minion to observe or intervene:

# Attach to Minion M42 (read-only by default)
gru attach M42

# Attach with interactive mode (can send input)
gru attach M42 --interactive

# Attach to see live output
gru attach M42 --follow

Architecture:

┌──────────────┐
│   User TTY   │
└──────┬───────┘
       │
       │ gru attach M42
       ▼
┌──────────────┐
│     Lab      │
│              │
│  ┌────────┐  │
│  │ Attach │  │
│  │Manager │  │
│  └───┬────┘  │
└──────┼───────┘
       │
       │ Multiplex session I/O
       ▼
┌──────────────────────┐
│ Claude Code Session  │
│ (Minion M42)        │
│                      │
│ stdin  ←─────────────┼─── Lab + Attached users
│ stdout ─────────────→│──→ Lab + Attached users
│ stderr ─────────────→│──→ Lab + Attached users
└──────────────────────┘

Attach Session Management:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use tokio::sync::{mpsc, RwLock};
use chrono::{DateTime, Utc, Duration};

struct AttachSession {
    id: String,
    minion_id: String,
    user_id: String,
    mode: AttachMode,
    started_at: DateTime<Utc>,
    expires_at: DateTime<Utc>, // Max 30 minutes

    // I/O streams
    input: mpsc::Sender<Vec<u8>>,    // User → Minion
    output: mpsc::Receiver<Vec<u8>>, // Minion → User
}

#[derive(Debug, Clone, PartialEq, Eq)]
enum AttachMode {
    ReadOnly,    // Just observe
    Interactive, // Can send input
}

struct AttachManager {
    sessions: Arc<RwLock<HashMap<String, AttachSession>>>,
}

impl AttachManager {
    async fn attach(&self, minion_id: &str, mode: AttachMode) -> Result<AttachSession, Error> {
        let minion = self.lab.get_minion(minion_id)
            .ok_or(Error::MinionNotFound)?;

        let (input_tx, input_rx) = mpsc::channel(100);
        let (output_tx, output_rx) = mpsc::channel(100);

        let session = AttachSession {
            id: generate_id(),
            minion_id: minion_id.to_string(),
            mode: mode.clone(),
            started_at: Utc::now(),
            expires_at: Utc::now() + Duration::minutes(30),
            input: input_tx,
            output: output_rx,
        };

        // Start multiplexing minion output to this session
        let minion_clone = minion.clone();
        let output_tx_clone = output_tx.clone();
        tokio::spawn(async move {
            Self::stream_output(minion_clone, output_tx_clone).await;
        });

        // If interactive, also forward input
        if mode == AttachMode::Interactive {
            let minion_clone = minion.clone();
            tokio::spawn(async move {
                Self::stream_input(input_rx, minion_clone).await;
            });
        }

        self.sessions.write().await.insert(session.id.clone(), session.clone());
        Ok(session)
    }
}
}

Security considerations:

  • Attach sessions timeout after 30 minutes
  • Only one interactive attach per Minion at a time
  • Multiple read-only attaches allowed
  • Lab retains full control; user input is mediated
  • Attach sessions preserved in audit log

Use cases:

# Watch Minion work in real-time
gru attach M42 --follow

# Debug stuck Minion
gru attach M42 --interactive
> # Can send commands to help unstick

# Review what Minion is doing
gru attach M42
[Shows current status: "using_tool(git)", last 100 lines of output]

# Detach but leave Minion running
Ctrl+D or type 'detach'

UI Display:

╭─────────────────────────────────────────────╮
│ Attached to Minion M42                      │
│ Issue: #123 - Add user authentication       │
│ Status: using_tool(bash)                    │
│ Uptime: 15m 32s                            │
│ Commits: 2                                  │
╰─────────────────────────────────────────────╯

[M42] Running: npm test
[M42] 
[M42] > test
[M42] > jest
[M42] 
[M42]  PASS  src/auth.test.js
[M42]    ✓ should generate JWT token (45ms)
[M42]    ✓ should validate token (12ms)
[M42] 
[M42] Tests: 2 passed, 2 total
[M42] Status: thinking
[M42] All tests passed! Committing changes...

Press Ctrl+D to detach (Minion continues running)

Implementation: tmux vs Custom (HISTORICAL — superseded by CLI + stream-json)

Note: This entire section is historical. The tmux approach was evaluated but never shipped. V1 uses CLI + stream-json parsing instead. Kept for decision context only.

Option A: Use tmux

Each Minion runs in a dedicated tmux session:

# Lab spawns Minion M42
tmux new-session -d -s "minion-M42" -c ~/.gru/work/owner/repo/M42
tmux send-keys -t "minion-M42" "claude --session M42 --context issue-123.md" Enter

# User attaches
gru attach M42
# → Lab runs: tmux attach-session -t "minion-M42" -r  (read-only)

# Interactive attach
gru attach M42 --interactive
# → Lab runs: tmux attach-session -t "minion-M42"

Pros:

  • Battle-tested - tmux handles all session management, multiplexing
  • Free scrollback - Built-in history buffer
  • Multiple attaches - tmux natively supports many viewers
  • Survives Lab restart - Sessions persist if Lab crashes
  • Standard tooling - Users already know tmux commands
  • Copy/paste - tmux copy mode works out of the box
  • Session recording - tmux pipe-pane for logging

Cons:

  • ⚠️ External dependency - Requires tmux installed
  • ⚠️ Less control - Harder to intercept/mediate I/O
  • ⚠️ Platform dependency - tmux not available on Windows (WSL only)
  • ⚠️ Session pollution - Orphaned tmux sessions if cleanup fails
  • ⚠️ Abstraction leakage - Users see raw tmux, not Gru's semantics

Option B: Custom I/O multiplexing

Lab manages stdin/stdout/stderr directly:

#![allow(unused)]
fn main() {
use tokio::process::Command;
use tokio::io::{AsyncBufReadExt, BufReader};
use std::process::Stdio;

struct Minion {
    // ... existing fields ...

    claude_session: Option<ClaudeCodeSession>,
    output_buffer: Arc<RwLock<RingBuffer>>, // Last N lines for late attachers
    attachments: Arc<RwLock<Vec<AttachSession>>>, // Currently attached users
}

impl Minion {
    async fn start(&mut self) -> Result<(), Error> {
        let mut cmd = Command::new("claude")
            .arg("--session")
            .arg(&self.id)
            .stdin(Stdio::piped())
            .stdout(Stdio::piped())
            .stderr(Stdio::piped())
            .spawn()?;

        // Capture I/O
        let stdin = cmd.stdin.take().ok_or(Error::StdinMissing)?;
        let stdout = cmd.stdout.take().ok_or(Error::StdoutMissing)?;
        let stderr = cmd.stderr.take().ok_or(Error::StderrMissing)?;

        self.claude_session = Some(ClaudeCodeSession {
            process: cmd,
            stdin,
            stdout,
            stderr,
        });

        // Start output multiplexing
        let minion_clone = self.clone();
        tokio::spawn(async move {
            minion_clone.multiplex_output(stdout).await;
        });

        let minion_clone = self.clone();
        tokio::spawn(async move {
            minion_clone.multiplex_output(stderr).await;
        });

        Ok(())
    }

    async fn multiplex_output<R: AsyncRead + Unpin>(&self, reader: R) {
        let mut lines = BufReader::new(reader).lines();

        while let Ok(Some(line)) = lines.next_line().await {
            let line_bytes = line.as_bytes().to_vec();

            // Save to buffer (for late attachers)
            self.output_buffer.write().await.write(&line_bytes);

            // Send to all attached sessions
            let attachments = self.attachments.read().await;
            for attach in attachments.iter() {
                // Non-blocking send
                let _ = attach.output.try_send(line_bytes.clone());
            }

            // Save to archive
            self.archive_output(&line_bytes).await;
        }
    }
}
}

Pros:

  • Full control - Lab can intercept/filter/log everything
  • No dependencies - Pure Go, works everywhere
  • Tight integration - Easy to parse status, trigger events
  • Clean semantics - Abstract away raw terminal details
  • Cross-platform - Works on Windows, Linux, macOS

Cons:

  • ⚠️ More code - Need to implement multiplexing, buffering
  • ⚠️ Terminal quirks - Have to handle PTY, control sequences
  • ⚠️ Less resilient - Sessions die with Lab (unless we persist)
  • ⚠️ Reinventing wheel - Solving problems tmux already solved

Decision: V1 uses CLI + Stream Parsing (tmux superseded)

IMPORTANT UPDATE (2025-12-02): After DMX analysis and spike testing, tmux approach has been superseded by CLI + stream-json parsing. This provides better monitoring with less complexity.

Original V1 tmux rationale (now superseded):

  1. Speed to MVP - tmux gives us attach functionality for free
  2. Reliability - tmux is rock-solid, handles edge cases we'd miss
  3. User familiarity - Developers already know tmux
  4. Easy migration - Can swap implementation later without changing API

Why CLI + stream-json is better:

  • No subprocess complexity (tmux sessions)
  • No fragile regex parsing
  • JSON events provide structured monitoring
  • Claude CLI has --output-format stream-json natively
  • Simpler deployment (no tmux dependency)
  • See experiments/DMX_ANALYSIS.md for quantitative comparison

V1 Implementation:

#![allow(unused)]
fn main() {
use tokio::process::Command;
use std::path::Path;

impl Minion {
    async fn start(&mut self) -> Result<(), Error> {
        let session_name = format!("gru-minion-{}", self.id);

        // Create tmux session
        Command::new("tmux")
            .args(["new-session", "-d", "-s", &session_name, "-c", &self.worktree_path])
            .output()
            .await?;

        // Start Claude Code in tmux
        let command = format!("claude --session {} --context {}", self.id, self.context_file);
        Command::new("tmux")
            .args(["send-keys", "-t", &session_name, &command, "Enter"])
            .output()
            .await?;

        // Enable logging
        let log_path = Path::new(&self.archive_path).join("session.log");
        let log_command = format!("cat >> {}", log_path.display());
        Command::new("tmux")
            .args(["pipe-pane", "-t", &session_name, &log_command])
            .output()
            .await?;

        self.tmux_session = Some(session_name);
        Ok(())
    }
}

impl AttachManager {
    async fn attach(&self, minion_id: &str, mode: AttachMode) -> Result<(), Error> {
        let minion = self.lab.get_minion(minion_id)
            .ok_or(Error::MinionNotFound)?;

        let mut args = vec!["attach-session", "-t", minion.tmux_session.as_ref().unwrap()];

        if mode == AttachMode::ReadOnly {
            args.push("-r"); // read-only flag
        }

        Command::new("tmux")
            .args(&args)
            .stdin(std::process::Stdio::inherit())
            .stdout(std::process::Stdio::inherit())
            .stderr(std::process::Stdio::inherit())
            .spawn()?
            .wait()
            .await?;

        Ok(())
    }
}
}

When to build custom (historical — no longer applicable):

  • Multiple users request Windows support (tmux unavailable)
  • Need fine-grained I/O interception for features
  • Want to eliminate external dependencies
  • Performance issues with tmux overhead

Note: The CLI + stream-json approach eliminated the need for both tmux and custom I/O multiplexing.

Alternative: Zellij (historical)

Zellij is a modern Rust-based terminal multiplexer gaining popularity:

Pros:

  • Better defaults - Sessions auto-managed, better UX out of the box
  • Plugin system - Native WASM plugins for extensibility
  • Modern codebase - Written in Rust, active development
  • Better UI - Context-aware bottom bar, easier discoverability
  • Simpler API - Cleaner command structure

Cons:

  • ⚠️ Less mature - Newer project, smaller ecosystem
  • ⚠️ Lower adoption - Not universally installed like tmux/screen
  • ⚠️ API stability - May change more rapidly than tmux

Zellij's Rust API:

If Gru is written in Rust, Zellij offers interesting integration possibilities:

Available Rust crates:

  • zellij-tile - Plugin API for extending Zellij
  • zellij-utils - CLI enums including Sessions and SessionCommand
  • zellij-client - Client library for programmatic control

Potential advantages:

#![allow(unused)]
fn main() {
// Hypothetical: Direct Rust API instead of shelling out
use zellij_client::Session;

let session = Session::new("gru-minion-M42")
    .working_dir(&minion.worktree_path)
    .create()?;

session.send_keys("claude --session M42...")?;
session.attach(AttachMode::ReadOnly)?;
}

However, research shows:

  • ⚠️ Plugin-focused - Zellij's Rust API primarily for WASM plugins, not embedding
  • ⚠️ Still CLI-based - Session control via zellij attach, zellij list-sessions commands
  • ⚠️ Not a library - No documented API for embedding Zellij as a library
  • ⚠️ Similar to tmux - Would still shell out to zellij binary

Reality check: Both tmux and Zellij are external binaries you invoke via CLI. Neither offers a true embeddable library API. The integration code looks nearly identical:

#![allow(unused)]
fn main() {
// tmux
Command::new("tmux")
    .args(["new-session", "-d", "-s", session_name])
    .spawn()?;

// zellij  
Command::new("zellij")
    .args(["--session", session_name])
    .spawn()?;
}

Historical verdict: tmux was initially favored for V1, but CLI + stream-json parsing proved superior and was selected instead. Neither tmux, Zellij, nor GNU Screen are used in the shipped implementation.


Implementation Language: Go vs Rust

Context

Gru is a CLI tool that:

  • Manages processes (Claude Code CLI sessions)
  • Makes HTTP calls (GitHub API)
  • Does file I/O (git worktrees, logs)
  • Provides CLI interface
  • Potentially exposes GraphQL API (future)

Option A: Go

Pros:

  • Fast to ship - Simpler syntax, faster compile times
  • Better for services - Excellent HTTP/gRPC libraries, proven for APIs
  • Easy concurrency - Goroutines + channels are simple and powerful
  • Great CLI libraries - cobra, viper mature and widely used
  • Deployment - Single static binary, cross-compile trivial
  • GitHub integrations - go-github library is comprehensive
  • Process management - os/exec is straightforward
  • Familiar - More developers know Go than Rust

Cons:

  • ⚠️ Error handling - Verbose if err != nil everywhere
  • ⚠️ Type safety - Weaker than Rust (no sum types, nil pointers)
  • ⚠️ Memory usage - Larger binaries, GC overhead
  • ⚠️ Less trendy - Rust has more mindshare in 2024/2025

Good fit for Gru because:

  • Orchestration layer (not performance-critical compute)
  • Heavy I/O and API calls (Go's sweet spot)
  • Need to ship quickly
  • Service patterns well-established

Option B: Rust

Pros:

  • Type safety - Sum types, no null, exhaustive matching
  • Performance - Zero-cost abstractions, no GC
  • Modern tooling - cargo, clippy, rustfmt excellent
  • Small binaries - More compact than Go
  • Growing ecosystem - tokio, serde, clap mature
  • Memory safety - Prevents entire classes of bugs

Cons:

  • ⚠️ Slower to ship - Longer compile times, steeper learning curve
  • ⚠️ Complexity - Lifetimes, ownership, async can be hard
  • ⚠️ Smaller ecosystem - Fewer libraries than Go for some domains
  • ⚠️ Harder onboarding - Contributors need Rust knowledge
  • ⚠️ Async ecosystem - Still evolving, some rough edges

Good fit for Gru because:

  • CLI tools are Rust's sweet spot
  • Type safety helps with state machine complexity
  • Modern developers prefer Rust
  • Can use Zellij plugin system (future)

Comparison for Gru's Specific Needs

AspectGoRust
HTTP clientnet/http ★★★★★reqwest ★★★★☆
GitHub APIgo-github ★★★★★octocrab ★★★☆☆
CLI frameworkcobra ★★★★★clap ★★★★★
Process mgmtos/exec ★★★★☆std::process ★★★★☆
GraphQLgqlgen ★★★★★async-graphql ★★★★☆
SQLitego-sqlite3 ★★★★★rusqlite ★★★★★
YAML parsinggopkg.in/yaml.v3 ★★★★★serde_yaml ★★★★★
Time to MVP★★★★★★★★☆☆
Type safety★★★☆☆★★★★★
Contributor pool★★★★★★★★☆☆

Decision: Rust for V1 ✅

CONFIRMED (2025-12-02): DMX analysis strongly validates Rust choice with 0.890 score vs Python 0.110.

Rationale:

  1. Single Binary - Emphasized 3x in product docs; Rust delivers perfectly (10/10)
  2. Daemon Reliability - 24/7 operation requires Rust's stability (10/10)
  3. True Concurrency - Managing 10+ minions needs no-GIL parallelism (10/10)
  4. Type Safety - State machine + lifecycle management benefit from compile-time guarantees (10/10)
  5. Production Polish - "It just works" deployment experience (10/10)
  6. Modern tooling - cargo, clippy, rustfmt are excellent
  7. CLI sweet spot - Rust excels at CLI tools (ripgrep, fd, bat, etc.)

Key Insight: The architecture was designed for Rust's strengths. Not a preference, but an alignment with requirements.

Code structure:

gru/
├── src/
│   ├── main.rs
│   ├── cli/
│   │   ├── mod.rs      # CLI setup (clap)
│   │   ├── lab.rs      # gru lab command
│   │   └── attach.rs   # gru attach command
│   ├── lab/
│   │   ├── mod.rs      # Lab orchestrator
│   │   ├── scheduler.rs
│   │   └── poller.rs
│   ├── minion/
│   │   ├── mod.rs      # Minion state machine
│   │   └── session.rs  # tmux session wrapper
│   ├── github/
│   │   ├── mod.rs      # GitHub API client (octocrab)
│   │   └── events.rs   # Timeline, labels, PRs
│   └── attach/
│       └── manager.rs  # Attach session management
├── Cargo.toml
└── Cargo.lock

Key dependencies:

[dependencies]
clap = { version = "4.5", features = ["derive"] }
tokio = { version = "1.40", features = ["full"] }
octocrab = "0.40"  # GitHub API
serde = { version = "1.0", features = ["derive"] }
serde_yaml = "0.9"
serde_json = "1.0"
sqlx = { version = "0.8", features = ["sqlite", "runtime-tokio"] }
anyhow = "1.0"
thiserror = "1.0"
tracing = "0.1"
tracing-subscriber = "0.3"

When to reconsider Rust:

  • Performance becomes measurable bottleneck
  • Want to leverage Zellij plugin system
  • Type safety bugs become significant
  • Rust ecosystem catches up for GitHub/API work
  • Team composition shifts to Rust-heavy

Hybrid approach (unlikely):

  • Core in Go for speed
  • Performance-critical pieces in Rust (via FFI)
  • Probably overkill for Gru's use case

Alternative: Why Not TypeScript/Python?

TypeScript:

  • ❌ Runtime overhead (Node.js)
  • ❌ Deployment complexity (node_modules)
  • ✅ Good for Tower web UI (future)

Python:

  • ❌ Deployment (virtualenv, dependencies)
  • ❌ Slower performance
  • ❌ No static typing (even with mypy)
  • ✅ Quick prototyping

Implementation Notes:

  • Rust + Tokio provides excellent async foundation
  • CLI + stream-json parsing eliminates need for complex session management
  • No tmux/zellij dependency reduces complexity
  • Single binary deployment aligns with product vision
  • See experiments/ for working prototypes validating approach

Future: Pluggable Agent Architecture

Goal: Support multiple agent runtimes as ecosystem evolves.

Design for extensibility:

#![allow(unused)]
fn main() {
use async_trait::async_trait;
use tokio::sync::mpsc;

// Agent interface
#[async_trait]
trait Agent {
    // Initialize agent with context
    async fn initialize(&mut self, ctx: AgentContext) -> Result<(), Error>;

    // Execute task
    async fn execute(&mut self, task: Task) -> Result<Result, Error>;

    // Stream events during execution
    fn events(&self) -> mpsc::Receiver<AgentEvent>;

    // Pause/Resume/Stop
    async fn pause(&mut self) -> Result<(), Error>;
    async fn resume(&mut self) -> Result<(), Error>;
    async fn stop(&mut self) -> Result<(), Error>;
}

// Agent runtime types
#[derive(Debug, Clone, PartialEq, Eq)]
enum AgentRuntime {
    ClaudeCode,   // claude-code
    OpenAIAgents, // openai-agents (Future)
    Custom,       // custom (Future)
    LocalLLM,     // local-llm (Future)
}
}

Potential future runtimes:

  • OpenAI Agents - When OpenAI ships their agent framework
  • Devin API - If Cognition Labs opens API access
  • Local LLMs - For privacy-sensitive codebases (Llama, Mistral)
  • Custom agents - User-defined agent scripts/binaries
  • Hybrid - Combine multiple agents (Claude for planning, specialist for security, etc.)

Adapter pattern:

#![allow(unused)]
fn main() {
struct ClaudeCodeAdapter {
    session: Option<ClaudeCodeSession>,
}

#[async_trait]
impl Agent for ClaudeCodeAdapter {
    async fn execute(&mut self, task: Task) -> Result<Result, Error> {
        // Translate task to Claude Code prompt
        let prompt = format_prompt(&task);

        // Execute via Claude Code API/CLI
        let session = self.session.as_mut().ok_or(Error::SessionNotInitialized)?;
        let result = session.run(&prompt).await?;

        // Parse result and events
        parse_result(&result)
    }

    // ... other trait methods
}
}

Why defer this:

  • Premature abstraction - Only one runtime for V1, wait for real use cases
  • Ecosystem immature - Agent frameworks still evolving rapidly
  • YAGNI - May never need multiple runtimes if Claude Code sufficient

When to build pluggable system:

  • Multiple users requesting alternative runtimes
  • Cost optimization needs (cheaper local models for simple tasks)
  • Privacy requirements (can't use cloud LLMs)
  • Performance needs (local agents faster for certain tasks)

State Management

Labels

Issue States:

  • gru:todo - Issue is ready to be claimed
  • gru:in-progress - Minion actively working
  • gru:done - Completed successfully
  • gru:failed - Failed after retries
  • gru:blocked - Minion needs human help
  • gru:ready-to-merge - PR passes checks and is ready
  • gru:auto-merge - Auto-merge enabled on PR
  • gru:needs-human-review - PR requires human review before merge

Rationale: Simple, visible in UI, easy to filter. The gru: prefix namespaces labels to avoid collisions with user labels.

Event Log via Comments

Use GitHub Timeline API (GET /repos/:owner/:repo/issues/:number/timeline) to:

  • Reconstruct full Minion lifecycle
  • Detect state transitions
  • Build audit trail
  • Track all labeled/commented/cross-referenced events

Comment Format (YAML):

---
event: minion:claim
minion_id: M42
lab_id: lab-hostname
branch: minion/issue-123-M42
timestamp: 2025-01-30T12:34:56Z
---

Example claim comment:

🤖 **Minion M42 claimed this issue**

---
event: minion:claim
minion_id: M42
lab_id: lab-macbook-pro.local
branch: minion/issue-123-M42
timestamp: 2025-01-30T12:34:56Z
---

I'll start working on this now. You can track progress in the draft PR.

Example progress update:

🔄 **Progress Update**

---
event: minion:progress
minion_id: M42
phase: implementation
commits: 2
tests_passing: true
duration_minutes: 15
---

Completed authentication endpoints. Running CI checks now.

Example failure report:

❌ **Minion M42 needs help**

---
event: minion:failed
minion_id: M42
failure_reason: ci_failed_max_retries
attempts: 5
last_error: "Test suite timeout after 10 minutes"
---

I've tried fixing the test failures 5 times but keep hitting timeouts. 
Could you take a look at the CI logs? @human-reviewer

Advantages:

  • ✅ Immutable, ordered history
  • ✅ Accessible via REST and GraphQL
  • ✅ No external database needed
  • ✅ Human-readable in UI

Draft PR as Lock Mechanism

Workflow

  1. Lab claims issue → add claimed label, post claim comment
  2. Lab creates branchminion/issue-123-M42
  3. Lab immediately creates draft PR → Title: [DRAFT] Fixes #123
  4. If PR creation succeeds → Lab owns the issue, proceed with work
  5. If PR creation fails (duplicate branch) → Another Lab won, abort gracefully

Commit Strategy

Rule of thumb: Minion commits when:

  • Set of related changes complete AND
  • CI passes (tests green)

Benefits:

  • ✅ Checkpoints progress (can resume on crash)
  • ✅ Incremental review possible
  • ✅ CI runs on each checkpoint
  • ✅ Clear history of Minion's work

Commit Message Format:

[minion:M42] Add user authentication

- Implement login endpoint
- Add JWT token generation
- Add password hashing

Tests: ✓ (12 passed)

GitHub Actions Integration

Lab Responsibilities

  • Trigger workflows via repository dispatch or workflow dispatch
  • Monitor check runs via GitHub Checks API
  • React to failures by fetching logs and attempting fixes

Minion Workflow

  1. Push commit to branch
  2. GitHub Actions workflow triggers automatically
  3. Minion polls check runs status
  4. On success → proceed to next task or mark ready for review
  5. On failure → fetch failure logs, analyze, attempt fix, commit retry

Advantages over Local Testing

  • ✅ No local test dependencies/setup
  • ✅ Proper isolation (containers/VMs)
  • ✅ Reuses existing CI configuration
  • ✅ Matches human developer workflow
  • ✅ No resource limits on Lab host

Issue Dependency DAG (Future)

Support dependencies via issue body metadata:

## Dependencies
- depends-on: #123
- blocks: #456

Implementation:

  • Parse issue body on claim
  • Check dependency status before starting
  • Wait if dependencies not resolved
  • Add blocked label with reason

Projects v2 Integration (Future)

When multi-Lab or advanced tracking needed:

Custom Fields

  • Status (single-select): Ready | Claimed | In Progress | Review | Done | Failed
  • Minion ID (text)
  • Lab (text)
  • Cost (number) - LLM tokens/dollars
  • Started (date)
  • Retries (number)

Advantages

  • Visual Kanban board
  • Rich querying capabilities
  • Better UX for filtering/sorting
  • Built-in cost tracking

Note: Defer until single-Lab proven and multi-Lab coordination needed.


File Layout

~/.gru/
  repos/
    <owner>/
      <repo>.git                    # Bare repository mirror
  work/
    <owner>/
      <repo>/
        <MINION_ID>/                # Git worktree for active Minion
          .git                      # Worktree metadata
          <repo files>              # Working copy
  archive/
    <MINION_ID>/
      events.jsonl                  # Structured event log
      plan.md                       # Minion's execution plan
      commits.log                   # Git commit history
      ci-results.json               # CI check run results
  state/
    minions.db                      # SQLite: active Minions state
    cursors.json                    # GitHub timeline cursors per issue
  config.yaml                       # Lab configuration

Simplified Lifecycle (V1)

Issue Claim

  1. Poll GitHub for issues with gru:todo label
  2. Select highest priority — four labeled tiers (priority:critical > priority:high > priority:medium > priority:low; unlabeled falls between medium and low); ties broken by oldest-first (FIFO)
  3. Add gru:in-progress label, remove gru:todo
  4. Post structured claim comment with Minion ID, timestamp
  5. Create branch minion/issue-123-M42
  6. Create draft PR immediately

Minion Work Loop

  1. Read issue description and comments
  2. Generate execution plan
  3. Implement changes in worktree
  4. Run local validation (lint, type check)
  5. Commit changes
  6. Push to branch → triggers CI
  7. Wait for CI results
  8. If CI passes → continue or mark ready for review
  9. If CI fails → analyze logs, attempt fix, goto step 4
  10. Max retries exceeded → add gru:failed label, request human help

PR Submission

  1. Convert draft PR to ready for review
  2. Post summary comment with:
    • Changes made
    • Test results
    • Cost estimate (tokens used)
    • Confidence score
  3. Subscribe to PR review events
  4. Monitor for review comments and check failures

Post-PR Monitoring

  1. Poll PR for new review comments
  2. Respond to review feedback:
    • Simple changes → implement and push
    • Unclear requests → ask clarifying questions
    • Complex refactors → create handoff for human
  3. Monitor check runs for failures
  4. On merge → add gru:done, archive logs, cleanup worktree

Error Handling

Retry Strategy

  • Flaky tests → retry up to 3 times with exponential backoff
  • CI failures → analyze logs, attempt fix, max 5 iterations
  • Rate limits → exponential backoff, switch to lower priority work
  • Network errors → retry with jitter

Escalation

After exhausting retries:

  1. Add gru:failed label
  2. Post detailed failure report comment
  3. Tag human for assistance
  4. Park Minion in paused state (don't cleanup)
  5. Human can resume via attach session or abandon

Observability

Structured Events (events.jsonl)

{"event":"claimed","minion_id":"M42","issue":123,"timestamp":"2025-01-30T12:34:56Z"}
{"event":"plan_generated","tokens":450,"plan":"..."}
{"event":"commit","sha":"abc123","message":"Add auth","tests_passed":true}
{"event":"ci_triggered","workflow":"test","run_id":123456}
{"event":"ci_passed","duration_ms":45000}
{"event":"pr_created","pr_number":789,"draft":false}

Metrics to Track

  • Issues claimed per hour
  • Time to first commit
  • Time to PR submission
  • CI pass rate
  • Review response time
  • Cost per issue (LLM tokens)
  • Success rate (merged vs abandoned)

Security

Token Scoping

  • Lab GitHub token requires: repo, workflow, read:org
  • Store in ~/.gru/config.yaml with restricted permissions (0600)

Sandbox Considerations

  • CI runs in GitHub Actions (already isolated)
  • Local worktrees isolated per Minion
  • No network access during local validation (future: use network namespace)

Secrets Handling

  • Never commit secrets (pre-commit hook checks)
  • Minion has no access to repo secrets (only CI does)
  • Redact sensitive data from logs and comments

Future Optimizations

Event-Driven Architecture

  • Replace polling with GitHub webhooks
  • Labs listen for issues.labeled, pull_request.review_requested, check_run.completed
  • Reduce API calls and latency

Caching & RAG

  • Local embedding index of codebase for semantic search
  • Cache GitHub API responses with ETags
  • Reuse test results across similar changes

Multi-Lab Coordination

  • First-PR-wins for conflict resolution
  • Heartbeat comments for liveness detection
  • Stale issue reclamation (after 1 hour no activity)

Cost Optimization

  • Model selection based on task complexity (Haiku for simple, Sonnet for complex)
  • Prompt caching for repeated codebase context
  • Incremental context updates (don't resend entire codebase)

Design Constraints

What We're NOT Building (Yet)

  • ❌ Multi-Lab distributed locking
  • ❌ Real-time collaboration between Minions
  • ❌ Custom test execution environments
  • ❌ Local LLM support (cloud-only for V1)
  • ❌ Web UI (Tower deferred to V2)
  • ❌ Slack/notifications
  • ❌ Learning from past PRs
  • ❌ Code review quality scoring

V1 Scope

  • ✅ Single Lab, local execution
  • ✅ Multi-repo support (one Lab watches multiple repos)
  • ✅ Simple label-based state machine (3 states + done/failed)
  • ✅ GitHub Actions for CI
  • ✅ Local testing via pre-commit hooks
  • ✅ Draft PR workflow
  • ✅ Basic error handling and retries
  • ✅ Event log in comments + local files
  • ✅ CLI-only interface (gru lab)
  • ✅ Manual issue prioritization
  • ✅ No SQLite (in-memory state, file-based cursors)

Additional V1 Design Decisions

State Management

No SQLite database:

  • In-memory state for active Minions
  • Simple JSON file for timeline cursors (~/.gru/state/cursors.json)
  • Recovery on restart: check Minion registry, fetch issue state from GitHub
  • Archive logs to disk for completed/failed Minions

Labels (simplified to 3 states):

  • gru:todogru:in-progressgru:done / gru:failed
  • No claimed intermediate state (goes directly to in-progress)
  • Detailed state (review, blocked, testing) in YAML comment events

Minion lifecycle:

  • Stay alive indefinitely until PR merged/closed
  • No timeout for inactive PRs (occupies slot but ensures responsiveness)
  • Failed Minions stay alive for debugging (gru attach to inspect)
  • Orphaned Minions (issue closed while running) marked as Orphaned state, kept alive

Testing & CI

Local testing via pre-commit hooks:

  • Lab runs repo-init to install git hooks
  • Pre-commit hook runs tests automatically before allowing commit
  • Minion commits frequently (each logical unit of work)
  • Tests run automatically, blocking bad commits
  • GitHub Actions runs as secondary verification

CI monitoring (30s poll interval):

  • Poll check runs every 30 seconds
  • High retry limit (10-15 attempts) before escalating
  • On max retries: pause (not fail), request human review
  • Minion monitors: failed checks, pending checks, stale branch, merge conflicts

Conflict resolution:

  • Minion attempts to resolve merge conflicts
  • Runs tests locally to verify resolution
  • Only pushes if tests pass
  • If tests fail after resolution: pause and request human help

Minion Behavior

Review autonomy:

  • Maximally autonomous - implements changes, answers questions, refactors
  • Can decline suggestions with reasoning
  • Can create follow-up issues for out-of-scope work (creates immediately, links in comment)
  • Only escalates when truly stuck

Minion ID format:

  • Sequential base36 with padding: M001, M002, ..., M00z, M010, ..., Mzzz
  • Compact, human-readable, sortable
  • Monotonic counter stored in ~/.gru/state/next_id.txt

Branch management:

  • Format: minion/issue-<number>-<minion-id>
  • Examples: minion/issue-123-M007, minion/issue-456-M00a
  • Branches from repository default branch (main/master/develop, detected via API)
  • On PR merge: delete both local and remote branch

Branch naming logic:

#![allow(unused)]
fn main() {
fn generate_branch_name(issue_number: i32, minion_id: &str) -> String {
    format!("minion/issue-{}-{}", issue_number, minion_id)
}
}

Configuration

Tokens in environment variables only:

export GRU_GITHUB_TOKEN="ghp_..."
export ANTHROPIC_API_KEY="sk-ant-..."

Config file for non-sensitive settings:

# ~/.gru/config.yaml
repos:
  - owner/repo1
  - owner/repo2

lab:
  slots: 2
  poll_interval: 30s

Multi-repo support:

  • One Lab instance watches all configured repos
  • Slots shared across all repos
  • Scheduler prioritizes across repos

Global config only (V1):

  • Per-repo overrides deferred to V2
  • Single ~/.gru/config.yaml

Draft PR Workflow

Initial creation:

  • Title: [DRAFT] Fixes #123: <issue title>
  • Body: Template with "🤖 Minion M042 is working on this..."
  • Created immediately after branch creation

Ready for review:

  • Title: Fixes #123: <descriptive title> (remove DRAFT prefix)
  • Body: Updated with proper description, changes, approach
  • Convert from draft to ready

Prompt Configuration

Minimal initial context:

  • Issue description only
  • Working directory and branch name
  • Guidelines about committing and testing
  • Everything else (README, CONTRIBUTING, git history) available in worktree

Commit guideline:

  • "Commit after each logical unit of work (tests run automatically via pre-commit hook)"

Archives

Retention policy:

  • Keep forever by default (V1)
  • Just text files, minimal space
  • User can manually clean ~/.gru/archive/
  • Add retention config later if needed

CLI Output

Human-readable by default:

$ gru minions list

ID    Issue  Repo          State        Uptime    Commits
───────────────────────────────────────────────────────────
M001  #123   owner/repo    in-progress  15m 32s   2
M002  #456   owner/other   review       2h 14m    5

Add --json flag in future if needed for scripting.

Onboarding

gru init command:

First-time setup wizard that:

  1. Creates ~/.gru/ directory structure
  2. Generates template config.yaml with comments
  3. Checks for required environment variables (GRU_GITHUB_TOKEN, ANTHROPIC_API_KEY)
  4. Validates GitHub token scopes (requires repo, workflow)
  5. Tests GitHub API connectivity
  6. Optionally clones/mirrors configured repos
  7. Sets up git config (user.name, user.email for commits)

Example flow:

$ gru init

🤖 Gru Setup Wizard

Checking environment variables...
✓ GRU_GITHUB_TOKEN found
✓ ANTHROPIC_API_KEY found

Validating GitHub token...
✓ Token has required scopes: repo, workflow
✓ Connected to GitHub as: username

Creating directory structure...
✓ Created ~/.gru/repos
✓ Created ~/.gru/work
✓ Created ~/.gru/archive
✓ Created ~/.gru/state
✓ Created ~/.gru/logs

Generated config file: ~/.gru/config.yaml
Edit this file to configure repositories and settings.

Repository setup:
? Clone repositories now? (y/n): y
✓ Cloned owner/repo1
✓ Cloned owner/repo2

✓ Setup complete! Run 'gru lab' to start.

Subsequent runs:

  • gru lab checks if ~/.gru/ exists
  • If missing, suggests running gru init first
  • Auto-creates missing subdirectories if root exists

Self-Review Strategy: Prompt-Based over Structured Loop

Question: How should Minions self-review their work before opening a PR?

Answer: Prompt-based self-review (Option A) — defer structured enforcement loop (Option B)

Context: Issue #515 proposed two main options for self-review:

  • Option A (Prompt-based): Add review instructions to the task prompt, relying on the code-reviewer agent
  • Option B (Structured loop): Add a dedicated review phase in the orchestration layer with DONE/ITERATE gating

Issue #655 revisited #515 after observing real-world Minion behavior and found Option A already working well.

Data (from #655, estimated from manual audit of Minion runs):

  • ~92% of Minion runs invoke the code-reviewer agent via prompt instructions
  • When consumed, 100% of reviews find actionable issues (~63% high-priority)
  • Minions consistently address review findings when they read them
  • The ~8% that skip review have legitimate reasons (duplicate issues, mechanical changes, post-review fixups)

Decision: Option A is sufficient. The prompt-based approach in src/prompt_loader.rs (Section 4: Code Review) already achieves high review rates without orchestration complexity. The main gap was an async fire-and-forget problem where reviews were triggered but not consumed — a prompt-level fix, not an orchestration change. Role guardrails for the review prompt were addressed in #649.

Why not Option B:

  • Adds orchestration complexity (DONE/ITERATE parsing, iteration caps, extra agent calls)
  • 2-3x token cost increase per Minion
  • Solves a problem that prompt engineering handles at ~92% rate
  • The remaining gap is addressable by improving prompt reliability, not adding a structured loop
  • Option B is additive and can be implemented incrementally if needed later

Revisit when:

  • Prompt-based review rate drops below 80%
  • Review quality degrades (reviews stop finding actionable issues)
  • Multi-agent workflows require explicit review gating

See: #515 (original proposal), #655 (data-driven update), #649 (review role guardrails)


Open Questions (Deferred)

  1. Comment rate limiting: How often should Minions post progress updates? (Freely vs batched vs significant events only)
  2. Cost limits: Max tokens per issue before pausing? (Default: unlimited for V1?)
  3. Per-repo config overrides: When to add support?
  4. Archive retention: When to add configurable cleanup?

References

Competitive Landscape

Last updated: 2026-03-21

This document tracks Gru's competitive positioning over time. Each section captures a point-in-time snapshot so we can see how the landscape evolves.


March 2026 Update

Added: 2026-03-21

Five new projects researched and added: Symphony (OpenAI), Barnum, Cook, Paperclip, and GitHub Spec-Kit. The space has exploded — OpenAI and GitHub have both entered, and community projects are attracting massive attention (Paperclip: 31k stars in 3 weeks).

Market Overview (March 2026)

The market has expanded beyond just "agent orchestrators" into several distinct categories:

  1. Agent Orchestrators: Tools that run multiple AI agents in parallel (Conductor, Emdash, Symphony, Gru)
  2. Task Executors: Tools that enhance individual agent task quality (Cook, Barnum)
  3. Specification Tools: Tools that generate specs/prompts for agents (Spec-Kit)
  4. Business Orchestrators: Platforms that coordinate agents toward business goals (Paperclip)
  5. Remote Control Layers: Mobile/remote interfaces for existing agents (Happy)
  6. IDE-Integrated Agents: Extensions and built-in coding assistants (Copilot, Cursor - not covered here)

New Entries (March 2026)

Symphony by OpenAI

GitHub: https://github.com/openai/symphony Status: Engineering preview, 13.7k stars License: Apache 2.0

What It Does

Long-running automation service that polls Linear for issues, creates isolated workspaces, and runs OpenAI Codex agents to autonomously implement work. Teams define agent behavior via WORKFLOW.md files (YAML front matter + Markdown prompt) checked into their repo.

Architecture

  • Language: Elixir/OTP (reference implementation); spec-first design encourages reimplementation in any language
  • Components: Workflow Loader, Issue Tracker Client (Linear), Orchestrator (poll loop), Workspace Manager, Agent Runner
  • Dashboard: Optional Phoenix LiveView web UI + JSON API
  • Concurrency: Configurable max concurrent agents (default 10), retry with exponential backoff

Business Model

  • Free, open source (Apache 2.0)
  • Drives OpenAI Codex adoption/revenue

Platform Support

  • Any platform with Elixir/Erlang runtime
  • SSH remote workers for distributed execution

Strengths

  • ✅ Spec-first design (portable across languages)
  • WORKFLOW.md — repo-owned prompt + config pattern
  • ✅ Built-in web dashboard (LiveView)
  • ✅ SSH remote workers for distributed execution
  • ✅ Strong community momentum (13.7k stars)
  • ✅ OpenAI brand backing

Weaknesses

  • ❌ Linear-only (no GitHub Issues, Jira)
  • ❌ Codex-only agent support
  • ❌ No GitHub-native workflow (doesn't use labels/PRs as state)
  • ❌ Explicitly "prototype software intended for evaluation only"
  • ❌ Elixir runtime requirement (niche language)
  • ❌ No built-in PR lifecycle (agent handles all writes)

Overlap with Gru

🔴 HIGH - Closest architectural match among new competitors:

  • Both: Autonomous polling daemon, per-issue workspace isolation, stuck detection, retry/backoff
  • Different: Symphony requires Linear + Codex; Gru uses GitHub as state store + supports multiple backends
  • Gru advantage: Full PR lifecycle (review response, CI fix, merge), single binary, GitHub-native
  • Symphony advantage: Spec-first portability, web dashboard, SSH remote workers

Barnum

GitHub: https://github.com/barnum-circus/barnum Status: Early stage, 2 contributors License: MIT

What It Does

Task queue orchestrator for AI agents using type-safe state machines. Workflows are defined as JSON configs with explicit states and transitions. Uses "progressive disclosure" — each agent step receives only the context it needs.

Architecture

  • Language: Rust monorepo
  • Components: Barnum CLI (orchestrator), Task Queue Library (state machines), Troupe (daemon managing agent pools)
  • Protocol: File-based dispatch between orchestrator and agent workers

Business Model

  • Free, open source (MIT)

Platform Support

  • Cross-platform (Rust binary), also distributed via npm/pnpm

Strengths

  • ✅ Type-safe Rust with schema validation
  • ✅ Progressive disclosure limits agent context per step
  • ✅ Persistent worker pools (no cold-start costs)
  • ✅ Explicit state transition logging and auditability

Weaknesses

  • ❌ Very early (2 contributors, brand new)
  • ❌ Generic orchestration — no built-in GitHub/PR/issue awareness
  • ❌ Steeper learning curve (state machines + JSON config)
  • ❌ File-based protocol may limit scaling

Overlap with Gru

🟡 MODERATE - Different philosophy, some shared goals:

  • Both: Rust, multi-agent parallelism, workspace isolation
  • Different: Barnum is domain-agnostic workflow engine; Gru is opinionated GitHub lifecycle manager
  • Barnum could theoretically serve as an orchestration layer under Gru, but has no built-in git/GitHub/PR understanding

Cook

Website: https://rjcorwin.github.io/cook/ GitHub: https://github.com/rjcorwin/cook Status: Early stage (293 stars), no license file License: None specified (legally ambiguous)

What It Does

CLI for orchestrating Claude Code, Codex, and OpenCode using composable operators. Chain iteration, parallelization, and quality gates: cook "Add dark mode" review v3 "cleanest result" races 3 implementations with review loops, then picks the best.

Core Operators

  • Loop: xN (repeat), review (critique-and-refine), ralph (task-list progression)
  • Composition: vN (race N approaches), vs (compare strategies)
  • Resolvers: pick, merge, compare

Architecture

  • Language: TypeScript/Node.js
  • Isolation: Git worktrees for parallel branches
  • Install: npm global or as a Claude Code skill

Business Model

  • Free, open source (but no license file)

Platform Support

  • macOS, Linux, Windows via Node.js

Strengths

  • ✅ Elegant composable operator syntax
  • ✅ Multi-agent support (Claude, Codex, OpenCode)
  • ✅ Parallel racing for quality (vN operator)
  • ✅ Built-in review loops
  • ✅ Can embed as a Claude Code skill

Weaknesses

  • ❌ No issue/project management or PR lifecycle
  • ❌ Stateless — no daemon mode, registry, or tracking
  • ❌ No license file (legally risky)
  • ❌ Very new (3 weeks old)
  • ❌ Requires human to initiate each task

Overlap with Gru

🟢 LOW - Complementary, not competitive:

  • Cook is a "task executor with quality multipliers" (racing, review loops)
  • Gru is an "autonomous project worker" (full issue-to-merge lifecycle)
  • Cook's racing/review operators could be interesting to integrate into Gru's agent execution layer

Paperclip

GitHub: https://github.com/paperclipai/paperclip Status: Early stage (31k stars in 3 weeks), MIT license

What It Does

Orchestration platform for coordinating multiple AI agents to run autonomous businesses. Models companies with org charts, budgets, goals, governance, and agent coordination. Tagline: "If OpenClaw is an employee, Paperclip is the company."

Architecture

  • Language: TypeScript (96.7%)
  • Backend: Node.js server with embedded or external PostgreSQL
  • Frontend: React UI dashboard
  • Agent support: OpenClaw, Claude Code, Codex, Cursor, Bash, HTTP agents

Key Features

  • Hierarchical goal alignment (goals cascade through tasks)
  • Per-agent monthly budgets with automatic throttling
  • Multi-company data isolation
  • Governance (approval gates, config versioning, rollback)
  • Built-in ticket system with full audit logs

Business Model

  • Free, open source (MIT)
  • ClipMart (coming soon) — marketplace for pre-built company templates

Platform Support

  • Linux/macOS/Windows via Node.js + Docker
  • Mobile-responsive web UI

Strengths

  • ✅ Explosive community growth (31k stars in 3 weeks)
  • ✅ Agent-agnostic (BYO agent)
  • ✅ Cost tracking with per-agent budget enforcement
  • ✅ Governance and audit capabilities
  • ✅ Simple onboarding (npx paperclipai onboard --yes)

Weaknesses

  • ❌ No GitHub-native workflow (own ticket system, no PR lifecycle)
  • ❌ Overkill for single-agent or small coding setups
  • ❌ Requires PostgreSQL
  • ❌ Not a code review tool (explicitly stated)
  • ❌ Very early (3 weeks old, 917 open issues)

Overlap with Gru

🟢 LOW - Different abstraction levels:

  • Paperclip is a "company orchestrator" for business goals across many agent types
  • Gru is a "coding agent orchestrator" for GitHub issue lifecycle
  • Paperclip could theoretically use Gru as one of its agents
  • Paperclip's cost tracking/budgets are a feature Gru lacks

GitHub Spec-Kit

GitHub: https://github.com/github/spec-kit Status: Shipped, backed by GitHub License: MIT

What It Does

"Spec-Driven Development" toolkit that guides developers through a structured workflow: Constitution → Specify → Plan → Tasks → Implement. Specifications become executable — they generate working implementations via AI coding agents.

Architecture

  • Language: Python (installed via uv)
  • Interface: CLI with slash commands (/speckit.specify, /speckit.plan, etc.)
  • Template System: 3-tier hierarchy (Core → Extensions → Presets → Project overrides) in .specify/ directories
  • Agent Support: 25+ agents (Claude Code, Copilot, Cursor, Gemini, Windsurf, Codex, etc.)

Business Model

  • Free, open source (MIT, Copyright GitHub, Inc.)

Platform Support

  • Cross-platform (Linux, macOS, Windows)
  • Enterprise/air-gapped deployment via wheel bundles

Strengths

  • ✅ Agent-agnostic (25+ supported agents)
  • ✅ Backed by GitHub (credibility, ecosystem integration path)
  • ✅ Strong specification methodology
  • ✅ Highly customizable (presets, extensions, overrides)
  • ✅ Enterprise-ready (air-gapped install, proxy support)

Weaknesses

  • ❌ Not an orchestrator — requires human to drive each step
  • ❌ No PR lifecycle management
  • ❌ No issue-to-task pipeline or autonomous claiming
  • ❌ No daemon/polling mode
  • ❌ Stateless — no persistent tracking

Overlap with Gru

🟢 LOW - Different layer entirely:

  • Spec-Kit answers "how do I write a good spec for an AI agent?"
  • Gru answers "how do I autonomously manage AI agents working on GitHub issues?"
  • Potentially complementary: Spec-Kit could generate specs that Gru's Minions implement
  • Risk: If GitHub extends Spec-Kit into full autonomous execution via GitHub Actions

Competitive Positioning Matrix (March 2026)

FeatureGruConductorEmdashSymphonyCookBarnumPaperclipSpec-KitHappy
PlatformCross-platformMac onlyCross-platformCross-platformCross-platformCross-platformCross-platformCross-platformMobile
Open Source✅ MIT/Apache❓ No license✅ Yes✅ Apache 2.0❓ No license✅ MIT✅ MIT✅ MIT✅ Yes
PricingFreeFreeFreeFreeFreeFreeFreeFreeFree
Agent ModelPersistent (PR lifecycle)One-shot (PR creation)Task-basedAutonomous (poll loop)Composable operatorsState machine tasksGoal-aligned agentsHuman-driven specsManual control
Post-PR Handling✅ Reviews, CI fixes❌ Done after PR❌ Task complete❌ Agent handles all❌ None❌ None❌ None❌ NoneN/A
GitHub IntegrationGitHub as databaseGitHub-nativeGitHub/Linear/Jira❌ Linear only❌ None❌ None❌ Own ticket system❌ NoneNone
ArchitectureSingle binaryDesktop appDesktop appElixir/OTP serviceNode.js CLIRust CLI + daemonNode.js + PostgresPython CLIMobile app
Multi-providerClaude + CodexClaude only✅ 15+ providersCodex onlyClaude/Codex/OpenCodeAgent-agnostic✅ BYO agent✅ 25+ agentsClaude only
Daemon/Polling✅ Lab mode✅ Poll loop✅ Troupe daemon✅ HeartbeatN/A
Web DashboardTower (Phase 3+)Desktop onlyDesktop only✅ LiveView✅ React UI✅ Mobile
Server Deployment✅ Headless Labs❌ Desktop app❌ Desktop app✅ Headless✅ Headless✅ DockerN/A
Cost Tracking✅ Per-agent budgetsN/A
MaturityV1ShippedShippedPreviewEarlyEarlyEarlyShippedShipped


December 2025 Analysis

Original analysis from 2025-12-13. Covered: Conductor, Emdash, Happy.

Direct Competitors (Dec 2025)

Conductor by Melty Labs

Website: https://conductor.build/ GitHub: https://github.com/ryanmac/code-conductor Status: Shipped, Y Combinator backed

What It Does

Run multiple Claude Code agents in parallel with GitHub-native orchestration. Each agent works in isolated git worktrees to avoid merge conflicts.

Business Model

  • Free - No separate pricing
  • Uses your existing Claude subscription (Pro, Max, or API)
  • Likely VC-funded with future enterprise/cloud monetization

Platform Support

  • Mac only (requires Apple Silicon)
  • Desktop application

Open Source Status

  • GitHub repository is public but has no LICENSE file
  • Technically source-available but not legally open source
  • Cannot be legally forked/modified/distributed without permission

Strengths

  • ✅ First mover advantage in this niche
  • ✅ Y Combinator backing (resources, credibility)
  • ✅ Polished desktop app UX
  • ✅ Free to use
  • ✅ Already has production users

Weaknesses

  • ❌ Mac-only (excludes Linux/Windows users)
  • ❌ Unclear licensing (no LICENSE file)
  • ❌ Appears to be one-shot execution (PR created, agent done)
  • ❌ Desktop app only (no headless/server deployment)
  • ❌ Requires Apple Silicon (excludes Intel Macs)

Overlap with Gru

🔴 HIGH - This is our closest competitor. Both tools:

  • Orchestrate multiple AI agents on GitHub issues
  • Use git worktrees for isolation
  • Aim for autonomous claim → implement → PR workflow
  • Support parallel execution

Emdash by General Action

Website: https://www.emdash.sh/ GitHub: https://github.com/generalaction/emdash Status: Shipped, open source

What It Does

Coding agent orchestration layer that supports 15+ AI CLI tools (Claude Code, Qwen Code, Amp, Codex, etc.) with parallel execution in git worktrees.

Business Model

  • Open source (appears to be MIT/Apache based on GitHub)
  • Free to use
  • Monetization model unclear

Platform Support

  • Cross-platform (Windows, Mac, Linux)
  • Packaged as desktop app (.exe, .dmg, .AppImage, .deb)

Provider Support

  • Multi-provider - 15+ different AI CLIs supported
  • Provider-agnostic architecture
  • Users can choose models per task

Strengths

  • ✅ Multi-provider support (not locked to Claude)
  • ✅ Cross-platform
  • ✅ Appears to be truly open source
  • ✅ Good diff review UI
  • ✅ Linear, GitHub, Jira integration

Weaknesses

  • ❌ Desktop app packaging complexity
  • ❌ Task-based model (not persistent agents)
  • ❌ More complex setup (supports many providers)

Overlap with Gru

🟡 MODERATE - Similar orchestration goals but different architecture philosophy:

  • Both: Git worktrees, parallel execution, GitHub integration
  • Different: Emdash is multi-provider, Gru focuses on Claude with persistent agents

Adjacent Tools (Dec 2025)

Happy

Website: https://happy.engineering/ Status: Shipped, open source (npm install -g happy-coder)

What It Does

Mobile client for remotely controlling Claude Code instances running on your computers. Not an orchestrator itself, but a remote control layer.

Business Model

  • Free and open source
  • No monetization

Platform Support

  • Mobile app (iOS/Android)
  • Controls Claude Code on any machine (Mac, Linux, Windows)

Strengths

  • ✅ Mobile-first design
  • ✅ End-to-end encryption
  • ✅ Multiple concurrent sessions
  • ✅ Remote workflow ("fire off tasks while away from desk")

Weaknesses

  • ❌ Not an orchestrator (requires manual task assignment)
  • ❌ No GitHub issue queue management
  • ❌ No autonomous claiming/execution

Overlap with Gru

🟢 LOW - Complementary, not competitive. Happy is a UI layer; Gru is an orchestrator. Could potentially use Happy to monitor Gru Labs remotely.


Competitive Positioning Matrix (Dec 2025)

FeatureGruConductorEmdashHappy
PlatformCross-platformMac onlyCross-platformMobile (controls any)
Open Source✅ MIT/Apache❓ No license✅ Yes✅ Yes
PricingFreeFreeFreeFree
Agent ModelPersistent (PR lifecycle)One-shot (PR creation)Task-basedManual control
Post-PR Handling✅ Reviews, CI fixes❌ Done after PR❌ Task completeN/A
GitHub IntegrationGitHub as databaseGitHub-nativeGitHub/Linear/JiraNone (controls Claude)
ArchitectureSingle binary (Lab+Tower)Desktop appDesktop appMobile app
Multi-providerClaude focused (P1), pluggable later (P3+)Claude only✅ 15+ providersClaude only
Remote UITower (Phase 3+)Desktop onlyDesktop only✅ Mobile
Server Deployment✅ Headless Labs❌ Desktop app❌ Desktop appN/A
MaturityV1 (feature-complete)ShippedShippedShipped

Gru's Strategic Advantages (Dec 2025)

1. 🎯 Persistent Minions (Killer Feature)

What it means: Gru's agents stay alive after creating a PR to handle:

  • Review comment responses
  • Failed CI check fixes
  • Feedback iterations
  • Multiple review rounds

Why it matters: Competitors treat agents as one-shot tasks. Gru treats them as autonomous collaborators for the full PR lifecycle.

Marketing message: "Autonomous collaborators, not just code generators"


2. ✅ Cross-Platform

What it means: Gru works on Mac, Linux, Windows—anywhere Rust compiles.

Why it matters:

  • Conductor is Mac-only (excludes ~70% of developers)
  • Emdash requires desktop environment
  • Gru can run headless on servers ($5/month VPS vs. $2000 Mac)

User stories:

  • "Run Gru Labs on cheap Linux VPS"
  • "Works on your company's Windows laptop"
  • "Deploy in Docker containers"

3. ✅ Clear Open Source License

What it means: MIT or Apache 2.0 license (assuming we choose this).

Why it matters:

  • Conductor has no LICENSE file (legally murky)
  • Enterprises need clear licensing for compliance
  • Developers can fork, modify, contribute freely

Marketing message: "Fully open source with clear MIT licensing—no gray areas"


4. ✅ GitHub as Database

What it means: No separate database—issues are queue, labels are state, PRs are results.

Why it matters:

  • Simpler architecture (no PostgreSQL/Redis to maintain)
  • Inspectable state (everything visible in GitHub)
  • Rebuild state from GitHub on restart
  • No vendor lock-in

User benefit: "Your orchestration state is just GitHub—no black box database"


5. ✅ Lab + Tower Architecture

What it means: Labs run autonomously; Tower is optional stateless UI relay.

Why it matters:

  • Labs work offline (local-first principle)
  • Tower can crash without affecting Labs
  • Multiple Labs can coordinate via GitHub
  • Remote access without VPN (Labs dial out to Tower)

Unique capability: "Multi-Lab coordination across machines/teams"


6. ✅ Single Binary

What it means: gru lab, gru tower, gru fix all from one executable.

Why it matters:

  • Simple installation (cargo install gru or download binary)
  • No desktop app packaging complexity
  • Works in CI/CD, scripts, automation
  • Easy to update

Developer benefit: "One command to install, no app store required"


Competitive Threats (Updated March 2026)

1. 🔴 Conductor Has First-Mover Advantage (Dec 2025)

  • Already shipped and has users
  • Y Combinator visibility and resources
  • Refined UX from user feedback
  • Production-tested

Mitigation:

  • Move fast to Phase 2 (autonomous orchestration)
  • Emphasize cross-platform and open source
  • Target Linux/Windows users Conductor can't serve

2. 🟡 Emdash Has Multi-Provider Support (Dec 2025)

  • Users may want choice in AI models
  • Cost optimization (cheaper models for simple tasks)
  • Avoid vendor lock-in to Anthropic

Mitigation:

  • Phase 3+ roadmap includes multi-provider support
  • Focus on doing Claude really well first
  • Plugin architecture for future extensibility

3. 🔴 Symphony Has OpenAI Backing (Mar 2026)

  • OpenAI brand and distribution (13.7k stars immediately)
  • Spec-first approach encourages ecosystem of implementations
  • WORKFLOW.md pattern is elegant and could become a standard
  • SSH remote workers enable distributed execution

Mitigation:

  • Symphony is Linear-only and Codex-only; Gru is GitHub-native with multiple backends
  • Symphony delegates PR lifecycle to agent; Gru manages it end-to-end
  • Symphony requires Elixir runtime; Gru is a single binary
  • Adopt good ideas: consider WORKFLOW.md-style repo-owned config

4. 🟡 All Tools Are Free (Dec 2025)

  • No pricing moat or defensibility
  • Competition is purely on features/UX
  • Hard to out-spend VC/corporate-backed competitors (YC, OpenAI, GitHub)

Mitigation:

  • Open source creates community moat
  • Cross-platform serves wider market
  • Persistent Minions = unique feature
  • Future: Hosted Tower convenience (optional paid tier)

5. 🟡 Paperclip Has Explosive Community Growth (Mar 2026)

  • 31k stars in 3 weeks signals massive interest in agent orchestration
  • Agent-agnostic model attracts broader audience
  • Cost tracking/budgets address a real gap in Gru

Mitigation:

  • Paperclip targets business orchestration, not coding agent lifecycle
  • No GitHub-native workflow or PR lifecycle
  • Gru could adopt cost tracking as a feature

Market Insights (Dec 2025)

Validated Demand

The market exists: Conductor and Emdash prove developers want GitHub agent orchestration.

Users will adopt: Production usage of competitors shows willingness to trust AI agents.

Free model works: All competitors are free, using existing Claude/AI subscriptions.

User Needs (Confirmed)

  1. Parallel execution: Tackle backlog with multiple agents simultaneously
  2. Merge conflict avoidance: Git worktree isolation is table stakes
  3. GitHub-native: Developers want orchestration integrated with existing workflow
  4. Simple setup: Must be easy to install and configure
  5. Visibility: Need to see what agents are doing (dashboard, logs, diffs)

Unmet Needs (Gru's Opportunity)

  1. Post-PR handling: No competitor does PR review response well
  2. Cross-platform: Conductor is Mac-only, Emdash requires desktop
  3. Server deployment: Run agents on cheap VPS, not expensive Mac
  4. Multi-Lab: Coordinate agents across machines/teams
  5. Clear licensing: Enterprises need MIT/Apache, not "no license"

Strategic Recommendations (Dec 2025)

Phase 1-2: Catch Up

Goal: Match Conductor's core orchestration by Q1 2025

Priorities:

  1. Issue claiming via labels (gru:todo)
  2. Git worktree management
  3. Parallel Minion execution
  4. PR creation workflow
  5. Basic status dashboard

Messaging: Focus on cross-platform and open source


Phase 2-3: Differentiate

Goal: Ship features competitors don't have

Priorities:

  1. Persistent Minions (respond to reviews, fix CI)
  2. Tower UI (web-based, mobile-responsive)
  3. Multi-Lab (coordinate across machines)

Messaging: "Full PR lifecycle, not just code generation"


Phase 3+: Expand

Goal: Build moats through ecosystem

Priorities:

  1. Multi-provider support (match Emdash)
  2. Plugin architecture
  3. Hosted Tower convenience tier ($10-20/month optional)
  4. Enterprise features (SSO, audit, multi-org)

Messaging: "Open platform for autonomous development"


Positioning Statement (Dec 2025)

For developers who manage large GitHub backlogs, Gru is an autonomous agent orchestrator that handles the full PR lifecycle from issue claim to merge, unlike Conductor (Mac-only, one-shot PRs) or Emdash (desktop app, task-based), Gru runs anywhere (Linux VPS, Windows, Mac), is fully open source, and its persistent Minions handle reviews, CI failures, and iterations—making them true autonomous collaborators.


Opportunities to Learn From Competitors (Updated March 2026)

From Conductor

  • ✅ UX patterns for agent orchestration
  • ✅ GitHub-native workflow design
  • ✅ Messaging around "self-managing agents"
  • ❓ How they handle edge cases (duplicate claims, stuck agents)

From Emdash

  • ✅ Multi-provider plugin architecture
  • ✅ Side-by-side diff UI patterns
  • ✅ Task assignment UX
  • ✅ Support for external trackers (Linear, Jira)

From Symphony

  • WORKFLOW.md as repo-owned prompt + runtime config (versioned with code)
  • ✅ Spec-first design (portable, encourages ecosystem)
  • ✅ Phoenix LiveView dashboard patterns
  • ✅ SSH remote workers for distributed execution

From Cook

  • ✅ Composable operator syntax for chaining agent behaviors
  • ✅ Racing (vN) multiple implementations for quality
  • ✅ Built-in review loops as first-class workflow concept

From Barnum

  • ✅ Progressive disclosure (limit context per agent step)
  • ✅ Type-safe state machine workflow definitions
  • ✅ Persistent worker pools to avoid cold-start costs

From Paperclip

  • ✅ Per-agent cost tracking and budget enforcement
  • ✅ Goal alignment (agents see "what" and "why")
  • ✅ Governance patterns (approval gates, rollback)
  • ✅ Audit logging for agent actions

From Spec-Kit

  • ✅ Specification-driven development methodology
  • ✅ Template hierarchy (core → extensions → presets → project overrides)
  • ✅ Enterprise deployment patterns (air-gapped, proxy support)

From Happy

  • ✅ Mobile-responsive thinking for Tower UI
  • ✅ Multiple concurrent session management
  • ✅ Real-time monitoring UX
  • ✅ "Remote workflow" messaging

Questions to Investigate (Dec 2025)

  1. Conductor's PR lifecycle: Do they handle review responses? CI failures? How?
  2. Emdash architecture: How does their multi-provider plugin system work?
  3. User feedback: What do users love/hate about these tools? (Check Discord, GitHub issues)
  4. Enterprise adoption: Are companies using these in production? What blockers exist?
  5. Collaboration potential: Would Melty Labs or General Action be open to coordination?

Conclusion

December 2025 Assessment

Gru is NOT redundant. It occupies a unique position with:

  • Cross-platform support (not Mac-only)
  • Persistent agent model (full PR lifecycle)
  • GitHub-as-database architecture (no separate state)
  • Clear open source licensing (MIT/Apache)
  • Lab + Tower split (local-first with optional remote UI)

The opportunity is real: Conductor and Emdash validate market demand for GitHub agent orchestration.

Gru's path to win:

  1. Ship Phase 2 quickly (match competitors' core features)
  2. Emphasize cross-platform and open source (serve wider market)
  3. Nail persistent Minions (unique differentiator)
  4. Build community moat (open source, clear license, extensible)

Competition is healthy: Validates the problem, raises awareness, improves the category. Gru's principles (local-first, one binary, GitHub as state) position us well for the long term.

March 2026 Assessment

The landscape has grown significantly. OpenAI entered with Symphony, GitHub shipped Spec-Kit, and community projects like Paperclip (31k stars in 3 weeks) show explosive interest in agent orchestration. The field has expanded from 3 tools to 8.

Gru's position has strengthened:

  • Full PR lifecycle — Still the only tool that manages claim → implement → PR → review response → CI fix → merge autonomously. Every other tool either stops at PR creation (Conductor, Cook) or delegates lifecycle management to the agent (Symphony).
  • GitHub-native — No external tracker dependency. Symphony requires Linear; Paperclip requires Postgres. Gru uses GitHub as both code host and state store.
  • Single binary — No runtime dependencies. Competitors require Elixir (Symphony), Node.js (Cook, Paperclip), or Python (Spec-Kit).
  • Cross-platform — Conductor is still Mac-only.
  • Clear open source licensing — MIT/Apache. Conductor and Cook still have no license file.

New competitive dynamics:

  • Symphony is architecturally the closest new competitor (autonomous poll-loop, workspace isolation, stuck detection). But it's Linear-only, Codex-only, and Elixir-only.
  • Paperclip validates massive interest in agent orchestration but operates at a different abstraction level (business orchestration vs coding lifecycle).
  • Cook and Barnum are complementary tools, not direct competitors.
  • Spec-Kit is a specification layer, not an orchestrator — but watch for GitHub extending it.

Ideas worth adopting from competitors:

  1. WORKFLOW.md-style repo-owned config (from Symphony)
  2. Cost tracking and per-agent budgets (from Paperclip)
  3. Web dashboard for observability (from Symphony, Paperclip)
  4. Racing/review operators for quality (from Cook)