Gru Design Decisions
Last Updated: 2025-12-02 Status: Final decisions made via quantitative DMX analysis
Critical Implementation Decisions (2025-12-02)
Decision 1: Architecture Approach
Question: How to spawn and manage autonomous Claude Code agents?
Answer: CLI + Stream Parsing (scored 0.735/1.0)
After evaluating 6 approaches through spike testing and DMX analysis:
| Approach | Score | Verdict |
|---|---|---|
| CLI + Stream Parsing | 0.735 | ✅ SELECTED |
| ACP Integration | 0.706 | Future (V2+) |
| Agent SDK (Python) | 0.688 | Too complex |
| Pure CLI | 0.559 | No monitoring |
| Rust + tmux | 0.466 | High fragility |
| Zellij | 0.210 | Failed tests |
Implementation:
claude --print \
--session-id <UUID> \
--output-format stream-json \
--dangerously-skip-permissions
Parse JSON events from stdout for real-time monitoring.
See: experiments/DMX_ANALYSIS.md for full analysis.
Decision 2: Implementation Language
Question: Python or Rust?
Answer: Rust (scored 0.890 vs Python 0.110 - 8x advantage)
Rust scored perfectly on all high-priority criteria:
- Single Binary Deployment: 10/10 (Python: 4/10)
- Daemon Reliability: 10/10 (Python: 6/10)
- Concurrency: 10/10 (Python: 6/10)
- Type Safety: 10/10 (Python: 3/10)
Rationale: The vision is "single-binary, local-first" (mentioned 3x in docs). Architecture requires 24/7 daemon with true concurrency for 10+ minions. Rust is the only logical choice for the production system.
See: experiments/LANGUAGE_DECISION.md for full analysis.
Technology Stack
Core:
- Language: Rust
- Runtime: Tokio async
- CLI: Clap
- GraphQL: async-graphql
- Web: Axum
- GitHub: octocrab
Key Dependencies:
[dependencies]
tokio = { version = "1", features = ["full"] }
async-graphql = "7"
axum = "0.7"
clap = { version = "4", features = ["derive"] }
octocrab = "0.38"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
anyhow = "1"
Timeline: 6-8 weeks to production-ready V1
V1: Simplified Single-Lab Design
Core Principles
- Single Lab assumption - No distributed coordination needed initially
- Simple labels for state -
gru:todo,gru:in-progress,gru:done,gru:failed - Comments as event log - GitHub timeline API provides complete audit trail
- Early draft PR - Create as soon as branch exists, provides natural lock mechanism
- GitHub Actions for CI - Delegate test execution to existing infrastructure
- Claude Code for agents (V1) - Use Claude Code as the initial agent runtime, design for pluggable agents later
Agent Runtime
V1: Claude Code
Decision: Start with Claude Code as the agent runtime for Minions.
Rationale:
- ✅ Built-in tool use - Git, file operations, bash commands already available
- ✅ Agentic by default - Designed for autonomous multi-step tasks
- ✅ MCP support - Can integrate with external tools and services
- ✅ Proven - Battle-tested for coding tasks
- ✅ Fast iteration - Focus on orchestration, not building agent infrastructure
Integration:
# config.yaml
agent:
runtime: claude-code
model: claude-sonnet-4-5
tools:
- git
- bash
- file_operations
mcp_servers:
- github # For GitHub API operations
Minion = Claude Code Session:
Each Minion is a Claude Code session. One-to-one mapping.
Minion M42 = Claude Code session working in worktree ~/.gru/work/owner/repo/M42
Minion M43 = Claude Code session working in worktree ~/.gru/work/owner/repo/M43
How it works:
- Lab spawns Claude Code process for each Minion (separate session per worktree)
- Passes initial prompt with issue description and codebase context
- Claude Code autonomously works: reads code, makes changes, commits, pushes
- Lab monitors Claude Code output for events and state changes
- Lab handles GitHub integration layer (labels, comments, PR creation)
- When issue complete, Lab terminates Claude Code session and cleans up
Example Minion initialization:
# Lab spawns Claude Code for Minion M42
cd ~/.gru/work/owner/repo/M42
claude --session M42 --context issue-123.md --autonomous
Prompt template:
You are Minion M42 working on issue #123 in owner/repo.
## Issue
[issue description]
## Your Task
1. Read the issue carefully
2. Explore the codebase to understand relevant code
3. Implement the requested changes
4. Commit when a logical unit of work is complete AND tests pass
5. Push commits to trigger CI
6. Monitor CI results and fix failures
7. When ready, notify the Lab that you're done
## Guidelines
- Commit frequently (after each successful CI run)
- Use commit messages: [minion:M42] <description>
- If stuck or tests fail repeatedly, escalate
- Working directory: ~/.gru/work/owner/repo/M42
- Branch: minion/issue-123-M42
## Tools Available
- Git operations (commit, push, diff)
- File read/write
- Bash commands
- GitHub API (via MCP)
Begin work now.
Session Status Tracking:
Claude Code sessions have internal states that Lab must track:
#![allow(unused)] fn main() { #[derive(Debug, Clone, PartialEq, Eq)] enum SessionStatus { Thinking, // Claude processing UsingTool, // Executing tool (git, bash, etc) Responding, // Generating response WaitingInput, // Needs user input WaitingPermission, // Needs approval for action Idle, // Waiting for next instruction Complete, // Task finished Error, // Encountered error } struct Minion { // ... existing fields ... session_status: SessionStatus, current_tool: Option<String>, // Which tool is being used (if UsingTool) last_output: DateTime<Utc>, // Last time session produced output } }
Lab monitors session status by:
- Parsing Claude Code's JSON output stream
- Watching for status indicators in stdout/stderr
- Detecting tool use events
- Tracking idle time (no output = potentially stuck)
Example status transitions:
idle → thinking → using_tool(git) → thinking → responding → idle
idle → thinking → using_tool(bash) → waiting_permission → idle
idle → thinking → responding → complete
User Attach Sessions:
Users can attach to any active Minion to observe or intervene:
# Attach to Minion M42 (read-only by default)
gru attach M42
# Attach with interactive mode (can send input)
gru attach M42 --interactive
# Attach to see live output
gru attach M42 --follow
Architecture:
┌──────────────┐
│ User TTY │
└──────┬───────┘
│
│ gru attach M42
▼
┌──────────────┐
│ Lab │
│ │
│ ┌────────┐ │
│ │ Attach │ │
│ │Manager │ │
│ └───┬────┘ │
└──────┼───────┘
│
│ Multiplex session I/O
▼
┌──────────────────────┐
│ Claude Code Session │
│ (Minion M42) │
│ │
│ stdin ←─────────────┼─── Lab + Attached users
│ stdout ─────────────→│──→ Lab + Attached users
│ stderr ─────────────→│──→ Lab + Attached users
└──────────────────────┘
Attach Session Management:
#![allow(unused)] fn main() { use std::sync::Arc; use tokio::sync::{mpsc, RwLock}; use chrono::{DateTime, Utc, Duration}; struct AttachSession { id: String, minion_id: String, user_id: String, mode: AttachMode, started_at: DateTime<Utc>, expires_at: DateTime<Utc>, // Max 30 minutes // I/O streams input: mpsc::Sender<Vec<u8>>, // User → Minion output: mpsc::Receiver<Vec<u8>>, // Minion → User } #[derive(Debug, Clone, PartialEq, Eq)] enum AttachMode { ReadOnly, // Just observe Interactive, // Can send input } struct AttachManager { sessions: Arc<RwLock<HashMap<String, AttachSession>>>, } impl AttachManager { async fn attach(&self, minion_id: &str, mode: AttachMode) -> Result<AttachSession, Error> { let minion = self.lab.get_minion(minion_id) .ok_or(Error::MinionNotFound)?; let (input_tx, input_rx) = mpsc::channel(100); let (output_tx, output_rx) = mpsc::channel(100); let session = AttachSession { id: generate_id(), minion_id: minion_id.to_string(), mode: mode.clone(), started_at: Utc::now(), expires_at: Utc::now() + Duration::minutes(30), input: input_tx, output: output_rx, }; // Start multiplexing minion output to this session let minion_clone = minion.clone(); let output_tx_clone = output_tx.clone(); tokio::spawn(async move { Self::stream_output(minion_clone, output_tx_clone).await; }); // If interactive, also forward input if mode == AttachMode::Interactive { let minion_clone = minion.clone(); tokio::spawn(async move { Self::stream_input(input_rx, minion_clone).await; }); } self.sessions.write().await.insert(session.id.clone(), session.clone()); Ok(session) } } }
Security considerations:
- Attach sessions timeout after 30 minutes
- Only one interactive attach per Minion at a time
- Multiple read-only attaches allowed
- Lab retains full control; user input is mediated
- Attach sessions preserved in audit log
Use cases:
# Watch Minion work in real-time
gru attach M42 --follow
# Debug stuck Minion
gru attach M42 --interactive
> # Can send commands to help unstick
# Review what Minion is doing
gru attach M42
[Shows current status: "using_tool(git)", last 100 lines of output]
# Detach but leave Minion running
Ctrl+D or type 'detach'
UI Display:
╭─────────────────────────────────────────────╮
│ Attached to Minion M42 │
│ Issue: #123 - Add user authentication │
│ Status: using_tool(bash) │
│ Uptime: 15m 32s │
│ Commits: 2 │
╰─────────────────────────────────────────────╯
[M42] Running: npm test
[M42]
[M42] > test
[M42] > jest
[M42]
[M42] PASS src/auth.test.js
[M42] ✓ should generate JWT token (45ms)
[M42] ✓ should validate token (12ms)
[M42]
[M42] Tests: 2 passed, 2 total
[M42] Status: thinking
[M42] All tests passed! Committing changes...
Press Ctrl+D to detach (Minion continues running)
Implementation: tmux vs Custom (HISTORICAL — superseded by CLI + stream-json)
Note: This entire section is historical. The tmux approach was evaluated but never shipped. V1 uses CLI + stream-json parsing instead. Kept for decision context only.
Option A: Use tmux
Each Minion runs in a dedicated tmux session:
# Lab spawns Minion M42
tmux new-session -d -s "minion-M42" -c ~/.gru/work/owner/repo/M42
tmux send-keys -t "minion-M42" "claude --session M42 --context issue-123.md" Enter
# User attaches
gru attach M42
# → Lab runs: tmux attach-session -t "minion-M42" -r (read-only)
# Interactive attach
gru attach M42 --interactive
# → Lab runs: tmux attach-session -t "minion-M42"
Pros:
- ✅ Battle-tested - tmux handles all session management, multiplexing
- ✅ Free scrollback - Built-in history buffer
- ✅ Multiple attaches - tmux natively supports many viewers
- ✅ Survives Lab restart - Sessions persist if Lab crashes
- ✅ Standard tooling - Users already know tmux commands
- ✅ Copy/paste - tmux copy mode works out of the box
- ✅ Session recording -
tmux pipe-panefor logging
Cons:
- ⚠️ External dependency - Requires tmux installed
- ⚠️ Less control - Harder to intercept/mediate I/O
- ⚠️ Platform dependency - tmux not available on Windows (WSL only)
- ⚠️ Session pollution - Orphaned tmux sessions if cleanup fails
- ⚠️ Abstraction leakage - Users see raw tmux, not Gru's semantics
Option B: Custom I/O multiplexing
Lab manages stdin/stdout/stderr directly:
#![allow(unused)] fn main() { use tokio::process::Command; use tokio::io::{AsyncBufReadExt, BufReader}; use std::process::Stdio; struct Minion { // ... existing fields ... claude_session: Option<ClaudeCodeSession>, output_buffer: Arc<RwLock<RingBuffer>>, // Last N lines for late attachers attachments: Arc<RwLock<Vec<AttachSession>>>, // Currently attached users } impl Minion { async fn start(&mut self) -> Result<(), Error> { let mut cmd = Command::new("claude") .arg("--session") .arg(&self.id) .stdin(Stdio::piped()) .stdout(Stdio::piped()) .stderr(Stdio::piped()) .spawn()?; // Capture I/O let stdin = cmd.stdin.take().ok_or(Error::StdinMissing)?; let stdout = cmd.stdout.take().ok_or(Error::StdoutMissing)?; let stderr = cmd.stderr.take().ok_or(Error::StderrMissing)?; self.claude_session = Some(ClaudeCodeSession { process: cmd, stdin, stdout, stderr, }); // Start output multiplexing let minion_clone = self.clone(); tokio::spawn(async move { minion_clone.multiplex_output(stdout).await; }); let minion_clone = self.clone(); tokio::spawn(async move { minion_clone.multiplex_output(stderr).await; }); Ok(()) } async fn multiplex_output<R: AsyncRead + Unpin>(&self, reader: R) { let mut lines = BufReader::new(reader).lines(); while let Ok(Some(line)) = lines.next_line().await { let line_bytes = line.as_bytes().to_vec(); // Save to buffer (for late attachers) self.output_buffer.write().await.write(&line_bytes); // Send to all attached sessions let attachments = self.attachments.read().await; for attach in attachments.iter() { // Non-blocking send let _ = attach.output.try_send(line_bytes.clone()); } // Save to archive self.archive_output(&line_bytes).await; } } } }
Pros:
- ✅ Full control - Lab can intercept/filter/log everything
- ✅ No dependencies - Pure Go, works everywhere
- ✅ Tight integration - Easy to parse status, trigger events
- ✅ Clean semantics - Abstract away raw terminal details
- ✅ Cross-platform - Works on Windows, Linux, macOS
Cons:
- ⚠️ More code - Need to implement multiplexing, buffering
- ⚠️ Terminal quirks - Have to handle PTY, control sequences
- ⚠️ Less resilient - Sessions die with Lab (unless we persist)
- ⚠️ Reinventing wheel - Solving problems tmux already solved
Decision: V1 uses CLI + Stream Parsing (tmux superseded)
IMPORTANT UPDATE (2025-12-02): After DMX analysis and spike testing, tmux approach has been superseded by CLI + stream-json parsing. This provides better monitoring with less complexity.
Original V1 tmux rationale (now superseded):
- Speed to MVP - tmux gives us attach functionality for free
- Reliability - tmux is rock-solid, handles edge cases we'd miss
- User familiarity - Developers already know tmux
- Easy migration - Can swap implementation later without changing API
Why CLI + stream-json is better:
- No subprocess complexity (tmux sessions)
- No fragile regex parsing
- JSON events provide structured monitoring
- Claude CLI has
--output-format stream-jsonnatively - Simpler deployment (no tmux dependency)
- See
experiments/DMX_ANALYSIS.mdfor quantitative comparison
V1 Implementation:
#![allow(unused)] fn main() { use tokio::process::Command; use std::path::Path; impl Minion { async fn start(&mut self) -> Result<(), Error> { let session_name = format!("gru-minion-{}", self.id); // Create tmux session Command::new("tmux") .args(["new-session", "-d", "-s", &session_name, "-c", &self.worktree_path]) .output() .await?; // Start Claude Code in tmux let command = format!("claude --session {} --context {}", self.id, self.context_file); Command::new("tmux") .args(["send-keys", "-t", &session_name, &command, "Enter"]) .output() .await?; // Enable logging let log_path = Path::new(&self.archive_path).join("session.log"); let log_command = format!("cat >> {}", log_path.display()); Command::new("tmux") .args(["pipe-pane", "-t", &session_name, &log_command]) .output() .await?; self.tmux_session = Some(session_name); Ok(()) } } impl AttachManager { async fn attach(&self, minion_id: &str, mode: AttachMode) -> Result<(), Error> { let minion = self.lab.get_minion(minion_id) .ok_or(Error::MinionNotFound)?; let mut args = vec!["attach-session", "-t", minion.tmux_session.as_ref().unwrap()]; if mode == AttachMode::ReadOnly { args.push("-r"); // read-only flag } Command::new("tmux") .args(&args) .stdin(std::process::Stdio::inherit()) .stdout(std::process::Stdio::inherit()) .stderr(std::process::Stdio::inherit()) .spawn()? .wait() .await?; Ok(()) } } }
When to build custom (historical — no longer applicable):
- Multiple users request Windows support (tmux unavailable)
- Need fine-grained I/O interception for features
- Want to eliminate external dependencies
- Performance issues with tmux overhead
Note: The CLI + stream-json approach eliminated the need for both tmux and custom I/O multiplexing.
Alternative: Zellij (historical)
Zellij is a modern Rust-based terminal multiplexer gaining popularity:
Pros:
- ✅ Better defaults - Sessions auto-managed, better UX out of the box
- ✅ Plugin system - Native WASM plugins for extensibility
- ✅ Modern codebase - Written in Rust, active development
- ✅ Better UI - Context-aware bottom bar, easier discoverability
- ✅ Simpler API - Cleaner command structure
Cons:
- ⚠️ Less mature - Newer project, smaller ecosystem
- ⚠️ Lower adoption - Not universally installed like tmux/screen
- ⚠️ API stability - May change more rapidly than tmux
Zellij's Rust API:
If Gru is written in Rust, Zellij offers interesting integration possibilities:
Available Rust crates:
zellij-tile- Plugin API for extending Zellijzellij-utils- CLI enums includingSessionsandSessionCommandzellij-client- Client library for programmatic control
Potential advantages:
#![allow(unused)] fn main() { // Hypothetical: Direct Rust API instead of shelling out use zellij_client::Session; let session = Session::new("gru-minion-M42") .working_dir(&minion.worktree_path) .create()?; session.send_keys("claude --session M42...")?; session.attach(AttachMode::ReadOnly)?; }
However, research shows:
- ⚠️ Plugin-focused - Zellij's Rust API primarily for WASM plugins, not embedding
- ⚠️ Still CLI-based - Session control via
zellij attach,zellij list-sessionscommands - ⚠️ Not a library - No documented API for embedding Zellij as a library
- ⚠️ Similar to tmux - Would still shell out to
zellijbinary
Reality check: Both tmux and Zellij are external binaries you invoke via CLI. Neither offers a true embeddable library API. The integration code looks nearly identical:
#![allow(unused)] fn main() { // tmux Command::new("tmux") .args(["new-session", "-d", "-s", session_name]) .spawn()?; // zellij Command::new("zellij") .args(["--session", session_name]) .spawn()?; }
Historical verdict: tmux was initially favored for V1, but CLI + stream-json parsing proved superior and was selected instead. Neither tmux, Zellij, nor GNU Screen are used in the shipped implementation.
Implementation Language: Go vs Rust
Context
Gru is a CLI tool that:
- Manages processes (Claude Code CLI sessions)
- Makes HTTP calls (GitHub API)
- Does file I/O (git worktrees, logs)
- Provides CLI interface
- Potentially exposes GraphQL API (future)
Option A: Go
Pros:
- ✅ Fast to ship - Simpler syntax, faster compile times
- ✅ Better for services - Excellent HTTP/gRPC libraries, proven for APIs
- ✅ Easy concurrency - Goroutines + channels are simple and powerful
- ✅ Great CLI libraries - cobra, viper mature and widely used
- ✅ Deployment - Single static binary, cross-compile trivial
- ✅ GitHub integrations - go-github library is comprehensive
- ✅ Process management - os/exec is straightforward
- ✅ Familiar - More developers know Go than Rust
Cons:
- ⚠️ Error handling - Verbose
if err != nileverywhere - ⚠️ Type safety - Weaker than Rust (no sum types, nil pointers)
- ⚠️ Memory usage - Larger binaries, GC overhead
- ⚠️ Less trendy - Rust has more mindshare in 2024/2025
Good fit for Gru because:
- Orchestration layer (not performance-critical compute)
- Heavy I/O and API calls (Go's sweet spot)
- Need to ship quickly
- Service patterns well-established
Option B: Rust
Pros:
- ✅ Type safety - Sum types, no null, exhaustive matching
- ✅ Performance - Zero-cost abstractions, no GC
- ✅ Modern tooling - cargo, clippy, rustfmt excellent
- ✅ Small binaries - More compact than Go
- ✅ Growing ecosystem - tokio, serde, clap mature
- ✅ Memory safety - Prevents entire classes of bugs
Cons:
- ⚠️ Slower to ship - Longer compile times, steeper learning curve
- ⚠️ Complexity - Lifetimes, ownership, async can be hard
- ⚠️ Smaller ecosystem - Fewer libraries than Go for some domains
- ⚠️ Harder onboarding - Contributors need Rust knowledge
- ⚠️ Async ecosystem - Still evolving, some rough edges
Good fit for Gru because:
- CLI tools are Rust's sweet spot
- Type safety helps with state machine complexity
- Modern developers prefer Rust
- Can use Zellij plugin system (future)
Comparison for Gru's Specific Needs
| Aspect | Go | Rust |
|---|---|---|
| HTTP client | net/http ★★★★★ | reqwest ★★★★☆ |
| GitHub API | go-github ★★★★★ | octocrab ★★★☆☆ |
| CLI framework | cobra ★★★★★ | clap ★★★★★ |
| Process mgmt | os/exec ★★★★☆ | std::process ★★★★☆ |
| GraphQL | gqlgen ★★★★★ | async-graphql ★★★★☆ |
| SQLite | go-sqlite3 ★★★★★ | rusqlite ★★★★★ |
| YAML parsing | gopkg.in/yaml.v3 ★★★★★ | serde_yaml ★★★★★ |
| Time to MVP | ★★★★★ | ★★★☆☆ |
| Type safety | ★★★☆☆ | ★★★★★ |
| Contributor pool | ★★★★★ | ★★★☆☆ |
Decision: Rust for V1 ✅
CONFIRMED (2025-12-02): DMX analysis strongly validates Rust choice with 0.890 score vs Python 0.110.
Rationale:
- Single Binary - Emphasized 3x in product docs; Rust delivers perfectly (10/10)
- Daemon Reliability - 24/7 operation requires Rust's stability (10/10)
- True Concurrency - Managing 10+ minions needs no-GIL parallelism (10/10)
- Type Safety - State machine + lifecycle management benefit from compile-time guarantees (10/10)
- Production Polish - "It just works" deployment experience (10/10)
- Modern tooling - cargo, clippy, rustfmt are excellent
- CLI sweet spot - Rust excels at CLI tools (ripgrep, fd, bat, etc.)
Key Insight: The architecture was designed for Rust's strengths. Not a preference, but an alignment with requirements.
Code structure:
gru/
├── src/
│ ├── main.rs
│ ├── cli/
│ │ ├── mod.rs # CLI setup (clap)
│ │ ├── lab.rs # gru lab command
│ │ └── attach.rs # gru attach command
│ ├── lab/
│ │ ├── mod.rs # Lab orchestrator
│ │ ├── scheduler.rs
│ │ └── poller.rs
│ ├── minion/
│ │ ├── mod.rs # Minion state machine
│ │ └── session.rs # tmux session wrapper
│ ├── github/
│ │ ├── mod.rs # GitHub API client (octocrab)
│ │ └── events.rs # Timeline, labels, PRs
│ └── attach/
│ └── manager.rs # Attach session management
├── Cargo.toml
└── Cargo.lock
Key dependencies:
[dependencies]
clap = { version = "4.5", features = ["derive"] }
tokio = { version = "1.40", features = ["full"] }
octocrab = "0.40" # GitHub API
serde = { version = "1.0", features = ["derive"] }
serde_yaml = "0.9"
serde_json = "1.0"
sqlx = { version = "0.8", features = ["sqlite", "runtime-tokio"] }
anyhow = "1.0"
thiserror = "1.0"
tracing = "0.1"
tracing-subscriber = "0.3"
When to reconsider Rust:
- Performance becomes measurable bottleneck
- Want to leverage Zellij plugin system
- Type safety bugs become significant
- Rust ecosystem catches up for GitHub/API work
- Team composition shifts to Rust-heavy
Hybrid approach (unlikely):
- Core in Go for speed
- Performance-critical pieces in Rust (via FFI)
- Probably overkill for Gru's use case
Alternative: Why Not TypeScript/Python?
TypeScript:
- ❌ Runtime overhead (Node.js)
- ❌ Deployment complexity (node_modules)
- ✅ Good for Tower web UI (future)
Python:
- ❌ Deployment (virtualenv, dependencies)
- ❌ Slower performance
- ❌ No static typing (even with mypy)
- ✅ Quick prototyping
Implementation Notes:
- Rust + Tokio provides excellent async foundation
- CLI + stream-json parsing eliminates need for complex session management
- No tmux/zellij dependency reduces complexity
- Single binary deployment aligns with product vision
- See
experiments/for working prototypes validating approach
Future: Pluggable Agent Architecture
Goal: Support multiple agent runtimes as ecosystem evolves.
Design for extensibility:
#![allow(unused)] fn main() { use async_trait::async_trait; use tokio::sync::mpsc; // Agent interface #[async_trait] trait Agent { // Initialize agent with context async fn initialize(&mut self, ctx: AgentContext) -> Result<(), Error>; // Execute task async fn execute(&mut self, task: Task) -> Result<Result, Error>; // Stream events during execution fn events(&self) -> mpsc::Receiver<AgentEvent>; // Pause/Resume/Stop async fn pause(&mut self) -> Result<(), Error>; async fn resume(&mut self) -> Result<(), Error>; async fn stop(&mut self) -> Result<(), Error>; } // Agent runtime types #[derive(Debug, Clone, PartialEq, Eq)] enum AgentRuntime { ClaudeCode, // claude-code OpenAIAgents, // openai-agents (Future) Custom, // custom (Future) LocalLLM, // local-llm (Future) } }
Potential future runtimes:
- OpenAI Agents - When OpenAI ships their agent framework
- Devin API - If Cognition Labs opens API access
- Local LLMs - For privacy-sensitive codebases (Llama, Mistral)
- Custom agents - User-defined agent scripts/binaries
- Hybrid - Combine multiple agents (Claude for planning, specialist for security, etc.)
Adapter pattern:
#![allow(unused)] fn main() { struct ClaudeCodeAdapter { session: Option<ClaudeCodeSession>, } #[async_trait] impl Agent for ClaudeCodeAdapter { async fn execute(&mut self, task: Task) -> Result<Result, Error> { // Translate task to Claude Code prompt let prompt = format_prompt(&task); // Execute via Claude Code API/CLI let session = self.session.as_mut().ok_or(Error::SessionNotInitialized)?; let result = session.run(&prompt).await?; // Parse result and events parse_result(&result) } // ... other trait methods } }
Why defer this:
- ❌ Premature abstraction - Only one runtime for V1, wait for real use cases
- ❌ Ecosystem immature - Agent frameworks still evolving rapidly
- ❌ YAGNI - May never need multiple runtimes if Claude Code sufficient
When to build pluggable system:
- Multiple users requesting alternative runtimes
- Cost optimization needs (cheaper local models for simple tasks)
- Privacy requirements (can't use cloud LLMs)
- Performance needs (local agents faster for certain tasks)
State Management
Labels
Issue States:
gru:todo- Issue is ready to be claimedgru:in-progress- Minion actively workinggru:done- Completed successfullygru:failed- Failed after retriesgru:blocked- Minion needs human helpgru:ready-to-merge- PR passes checks and is readygru:auto-merge- Auto-merge enabled on PRgru:needs-human-review- PR requires human review before merge
Rationale: Simple, visible in UI, easy to filter. The gru: prefix namespaces labels to avoid collisions with user labels.
Event Log via Comments
Use GitHub Timeline API (GET /repos/:owner/:repo/issues/:number/timeline) to:
- Reconstruct full Minion lifecycle
- Detect state transitions
- Build audit trail
- Track all labeled/commented/cross-referenced events
Comment Format (YAML):
---
event: minion:claim
minion_id: M42
lab_id: lab-hostname
branch: minion/issue-123-M42
timestamp: 2025-01-30T12:34:56Z
---
Example claim comment:
🤖 **Minion M42 claimed this issue**
---
event: minion:claim
minion_id: M42
lab_id: lab-macbook-pro.local
branch: minion/issue-123-M42
timestamp: 2025-01-30T12:34:56Z
---
I'll start working on this now. You can track progress in the draft PR.
Example progress update:
🔄 **Progress Update**
---
event: minion:progress
minion_id: M42
phase: implementation
commits: 2
tests_passing: true
duration_minutes: 15
---
Completed authentication endpoints. Running CI checks now.
Example failure report:
❌ **Minion M42 needs help**
---
event: minion:failed
minion_id: M42
failure_reason: ci_failed_max_retries
attempts: 5
last_error: "Test suite timeout after 10 minutes"
---
I've tried fixing the test failures 5 times but keep hitting timeouts.
Could you take a look at the CI logs? @human-reviewer
Advantages:
- ✅ Immutable, ordered history
- ✅ Accessible via REST and GraphQL
- ✅ No external database needed
- ✅ Human-readable in UI
Draft PR as Lock Mechanism
Workflow
- Lab claims issue → add
claimedlabel, post claim comment - Lab creates branch →
minion/issue-123-M42 - Lab immediately creates draft PR → Title:
[DRAFT] Fixes #123 - If PR creation succeeds → Lab owns the issue, proceed with work
- If PR creation fails (duplicate branch) → Another Lab won, abort gracefully
Commit Strategy
Rule of thumb: Minion commits when:
- Set of related changes complete AND
- CI passes (tests green)
Benefits:
- ✅ Checkpoints progress (can resume on crash)
- ✅ Incremental review possible
- ✅ CI runs on each checkpoint
- ✅ Clear history of Minion's work
Commit Message Format:
[minion:M42] Add user authentication
- Implement login endpoint
- Add JWT token generation
- Add password hashing
Tests: ✓ (12 passed)
GitHub Actions Integration
Lab Responsibilities
- Trigger workflows via repository dispatch or workflow dispatch
- Monitor check runs via GitHub Checks API
- React to failures by fetching logs and attempting fixes
Minion Workflow
- Push commit to branch
- GitHub Actions workflow triggers automatically
- Minion polls check runs status
- On success → proceed to next task or mark ready for review
- On failure → fetch failure logs, analyze, attempt fix, commit retry
Advantages over Local Testing
- ✅ No local test dependencies/setup
- ✅ Proper isolation (containers/VMs)
- ✅ Reuses existing CI configuration
- ✅ Matches human developer workflow
- ✅ No resource limits on Lab host
Issue Dependency DAG (Future)
Support dependencies via issue body metadata:
## Dependencies
- depends-on: #123
- blocks: #456
Implementation:
- Parse issue body on claim
- Check dependency status before starting
- Wait if dependencies not resolved
- Add
blockedlabel with reason
Projects v2 Integration (Future)
When multi-Lab or advanced tracking needed:
Custom Fields
- Status (single-select): Ready | Claimed | In Progress | Review | Done | Failed
- Minion ID (text)
- Lab (text)
- Cost (number) - LLM tokens/dollars
- Started (date)
- Retries (number)
Advantages
- Visual Kanban board
- Rich querying capabilities
- Better UX for filtering/sorting
- Built-in cost tracking
Note: Defer until single-Lab proven and multi-Lab coordination needed.
File Layout
~/.gru/
repos/
<owner>/
<repo>.git # Bare repository mirror
work/
<owner>/
<repo>/
<MINION_ID>/ # Git worktree for active Minion
.git # Worktree metadata
<repo files> # Working copy
archive/
<MINION_ID>/
events.jsonl # Structured event log
plan.md # Minion's execution plan
commits.log # Git commit history
ci-results.json # CI check run results
state/
minions.db # SQLite: active Minions state
cursors.json # GitHub timeline cursors per issue
config.yaml # Lab configuration
Simplified Lifecycle (V1)
Issue Claim
- Poll GitHub for issues with
gru:todolabel - Select highest priority — four labeled tiers (
priority:critical>priority:high>priority:medium>priority:low; unlabeled falls between medium and low); ties broken by oldest-first (FIFO) - Add
gru:in-progresslabel, removegru:todo - Post structured claim comment with Minion ID, timestamp
- Create branch
minion/issue-123-M42 - Create draft PR immediately
Minion Work Loop
- Read issue description and comments
- Generate execution plan
- Implement changes in worktree
- Run local validation (lint, type check)
- Commit changes
- Push to branch → triggers CI
- Wait for CI results
- If CI passes → continue or mark ready for review
- If CI fails → analyze logs, attempt fix, goto step 4
- Max retries exceeded → add
gru:failedlabel, request human help
PR Submission
- Convert draft PR to ready for review
- Post summary comment with:
- Changes made
- Test results
- Cost estimate (tokens used)
- Confidence score
- Subscribe to PR review events
- Monitor for review comments and check failures
Post-PR Monitoring
- Poll PR for new review comments
- Respond to review feedback:
- Simple changes → implement and push
- Unclear requests → ask clarifying questions
- Complex refactors → create handoff for human
- Monitor check runs for failures
- On merge → add
gru:done, archive logs, cleanup worktree
Error Handling
Retry Strategy
- Flaky tests → retry up to 3 times with exponential backoff
- CI failures → analyze logs, attempt fix, max 5 iterations
- Rate limits → exponential backoff, switch to lower priority work
- Network errors → retry with jitter
Escalation
After exhausting retries:
- Add
gru:failedlabel - Post detailed failure report comment
- Tag human for assistance
- Park Minion in paused state (don't cleanup)
- Human can resume via attach session or abandon
Observability
Structured Events (events.jsonl)
{"event":"claimed","minion_id":"M42","issue":123,"timestamp":"2025-01-30T12:34:56Z"}
{"event":"plan_generated","tokens":450,"plan":"..."}
{"event":"commit","sha":"abc123","message":"Add auth","tests_passed":true}
{"event":"ci_triggered","workflow":"test","run_id":123456}
{"event":"ci_passed","duration_ms":45000}
{"event":"pr_created","pr_number":789,"draft":false}
Metrics to Track
- Issues claimed per hour
- Time to first commit
- Time to PR submission
- CI pass rate
- Review response time
- Cost per issue (LLM tokens)
- Success rate (merged vs abandoned)
Security
Token Scoping
- Lab GitHub token requires:
repo,workflow,read:org - Store in
~/.gru/config.yamlwith restricted permissions (0600)
Sandbox Considerations
- CI runs in GitHub Actions (already isolated)
- Local worktrees isolated per Minion
- No network access during local validation (future: use network namespace)
Secrets Handling
- Never commit secrets (pre-commit hook checks)
- Minion has no access to repo secrets (only CI does)
- Redact sensitive data from logs and comments
Future Optimizations
Event-Driven Architecture
- Replace polling with GitHub webhooks
- Labs listen for
issues.labeled,pull_request.review_requested,check_run.completed - Reduce API calls and latency
Caching & RAG
- Local embedding index of codebase for semantic search
- Cache GitHub API responses with ETags
- Reuse test results across similar changes
Multi-Lab Coordination
- First-PR-wins for conflict resolution
- Heartbeat comments for liveness detection
- Stale issue reclamation (after 1 hour no activity)
Cost Optimization
- Model selection based on task complexity (Haiku for simple, Sonnet for complex)
- Prompt caching for repeated codebase context
- Incremental context updates (don't resend entire codebase)
Design Constraints
What We're NOT Building (Yet)
- ❌ Multi-Lab distributed locking
- ❌ Real-time collaboration between Minions
- ❌ Custom test execution environments
- ❌ Local LLM support (cloud-only for V1)
- ❌ Web UI (Tower deferred to V2)
- ❌ Slack/notifications
- ❌ Learning from past PRs
- ❌ Code review quality scoring
V1 Scope
- ✅ Single Lab, local execution
- ✅ Multi-repo support (one Lab watches multiple repos)
- ✅ Simple label-based state machine (3 states + done/failed)
- ✅ GitHub Actions for CI
- ✅ Local testing via pre-commit hooks
- ✅ Draft PR workflow
- ✅ Basic error handling and retries
- ✅ Event log in comments + local files
- ✅ CLI-only interface (
gru lab) - ✅ Manual issue prioritization
- ✅ No SQLite (in-memory state, file-based cursors)
Additional V1 Design Decisions
State Management
No SQLite database:
- In-memory state for active Minions
- Simple JSON file for timeline cursors (
~/.gru/state/cursors.json) - Recovery on restart: check Minion registry, fetch issue state from GitHub
- Archive logs to disk for completed/failed Minions
Labels (simplified to 3 states):
gru:todo→gru:in-progress→gru:done/gru:failed- No
claimedintermediate state (goes directly toin-progress) - Detailed state (review, blocked, testing) in YAML comment events
Minion lifecycle:
- Stay alive indefinitely until PR merged/closed
- No timeout for inactive PRs (occupies slot but ensures responsiveness)
- Failed Minions stay alive for debugging (
gru attachto inspect) - Orphaned Minions (issue closed while running) marked as
Orphanedstate, kept alive
Testing & CI
Local testing via pre-commit hooks:
- Lab runs repo-init to install git hooks
- Pre-commit hook runs tests automatically before allowing commit
- Minion commits frequently (each logical unit of work)
- Tests run automatically, blocking bad commits
- GitHub Actions runs as secondary verification
CI monitoring (30s poll interval):
- Poll check runs every 30 seconds
- High retry limit (10-15 attempts) before escalating
- On max retries: pause (not fail), request human review
- Minion monitors: failed checks, pending checks, stale branch, merge conflicts
Conflict resolution:
- Minion attempts to resolve merge conflicts
- Runs tests locally to verify resolution
- Only pushes if tests pass
- If tests fail after resolution: pause and request human help
Minion Behavior
Review autonomy:
- Maximally autonomous - implements changes, answers questions, refactors
- Can decline suggestions with reasoning
- Can create follow-up issues for out-of-scope work (creates immediately, links in comment)
- Only escalates when truly stuck
Minion ID format:
- Sequential base36 with padding:
M001,M002, ...,M00z,M010, ...,Mzzz - Compact, human-readable, sortable
- Monotonic counter stored in
~/.gru/state/next_id.txt
Branch management:
- Format:
minion/issue-<number>-<minion-id> - Examples:
minion/issue-123-M007,minion/issue-456-M00a - Branches from repository default branch (main/master/develop, detected via API)
- On PR merge: delete both local and remote branch
Branch naming logic:
#![allow(unused)] fn main() { fn generate_branch_name(issue_number: i32, minion_id: &str) -> String { format!("minion/issue-{}-{}", issue_number, minion_id) } }
Configuration
Tokens in environment variables only:
export GRU_GITHUB_TOKEN="ghp_..."
export ANTHROPIC_API_KEY="sk-ant-..."
Config file for non-sensitive settings:
# ~/.gru/config.yaml
repos:
- owner/repo1
- owner/repo2
lab:
slots: 2
poll_interval: 30s
Multi-repo support:
- One Lab instance watches all configured repos
- Slots shared across all repos
- Scheduler prioritizes across repos
Global config only (V1):
- Per-repo overrides deferred to V2
- Single
~/.gru/config.yaml
Draft PR Workflow
Initial creation:
- Title:
[DRAFT] Fixes #123: <issue title> - Body: Template with "🤖 Minion M042 is working on this..."
- Created immediately after branch creation
Ready for review:
- Title:
Fixes #123: <descriptive title>(remove DRAFT prefix) - Body: Updated with proper description, changes, approach
- Convert from draft to ready
Prompt Configuration
Minimal initial context:
- Issue description only
- Working directory and branch name
- Guidelines about committing and testing
- Everything else (README, CONTRIBUTING, git history) available in worktree
Commit guideline:
- "Commit after each logical unit of work (tests run automatically via pre-commit hook)"
Archives
Retention policy:
- Keep forever by default (V1)
- Just text files, minimal space
- User can manually clean
~/.gru/archive/ - Add retention config later if needed
CLI Output
Human-readable by default:
$ gru minions list
ID Issue Repo State Uptime Commits
───────────────────────────────────────────────────────────
M001 #123 owner/repo in-progress 15m 32s 2
M002 #456 owner/other review 2h 14m 5
Add --json flag in future if needed for scripting.
Onboarding
gru init command:
First-time setup wizard that:
- Creates
~/.gru/directory structure - Generates template
config.yamlwith comments - Checks for required environment variables (
GRU_GITHUB_TOKEN,ANTHROPIC_API_KEY) - Validates GitHub token scopes (requires
repo,workflow) - Tests GitHub API connectivity
- Optionally clones/mirrors configured repos
- Sets up git config (user.name, user.email for commits)
Example flow:
$ gru init
🤖 Gru Setup Wizard
Checking environment variables...
✓ GRU_GITHUB_TOKEN found
✓ ANTHROPIC_API_KEY found
Validating GitHub token...
✓ Token has required scopes: repo, workflow
✓ Connected to GitHub as: username
Creating directory structure...
✓ Created ~/.gru/repos
✓ Created ~/.gru/work
✓ Created ~/.gru/archive
✓ Created ~/.gru/state
✓ Created ~/.gru/logs
Generated config file: ~/.gru/config.yaml
Edit this file to configure repositories and settings.
Repository setup:
? Clone repositories now? (y/n): y
✓ Cloned owner/repo1
✓ Cloned owner/repo2
✓ Setup complete! Run 'gru lab' to start.
Subsequent runs:
gru labchecks if~/.gru/exists- If missing, suggests running
gru initfirst - Auto-creates missing subdirectories if root exists
Self-Review Strategy: Prompt-Based over Structured Loop
Question: How should Minions self-review their work before opening a PR?
Answer: Prompt-based self-review (Option A) — defer structured enforcement loop (Option B)
Context: Issue #515 proposed two main options for self-review:
- Option A (Prompt-based): Add review instructions to the task prompt, relying on the code-reviewer agent
- Option B (Structured loop): Add a dedicated review phase in the orchestration layer with DONE/ITERATE gating
Issue #655 revisited #515 after observing real-world Minion behavior and found Option A already working well.
Data (from #655, estimated from manual audit of Minion runs):
- ~92% of Minion runs invoke the code-reviewer agent via prompt instructions
- When consumed, 100% of reviews find actionable issues (~63% high-priority)
- Minions consistently address review findings when they read them
- The ~8% that skip review have legitimate reasons (duplicate issues, mechanical changes, post-review fixups)
Decision: Option A is sufficient. The prompt-based approach in src/prompt_loader.rs (Section 4: Code Review) already achieves high review rates without orchestration complexity. The main gap was an async fire-and-forget problem where reviews were triggered but not consumed — a prompt-level fix, not an orchestration change. Role guardrails for the review prompt were addressed in #649.
Why not Option B:
- Adds orchestration complexity (DONE/ITERATE parsing, iteration caps, extra agent calls)
- 2-3x token cost increase per Minion
- Solves a problem that prompt engineering handles at ~92% rate
- The remaining gap is addressable by improving prompt reliability, not adding a structured loop
- Option B is additive and can be implemented incrementally if needed later
Revisit when:
- Prompt-based review rate drops below 80%
- Review quality degrades (reviews stop finding actionable issues)
- Multi-agent workflows require explicit review gating
See: #515 (original proposal), #655 (data-driven update), #649 (review role guardrails)
Open Questions (Deferred)
- Comment rate limiting: How often should Minions post progress updates? (Freely vs batched vs significant events only)
- Cost limits: Max tokens per issue before pausing? (Default: unlimited for V1?)
- Per-repo config overrides: When to add support?
- Archive retention: When to add configurable cleanup?