mattpocock/sandcastle

GitHub: mattpocock/sandcastle

Stars: 5782 | Forks: 576

Sandcastle
## What Is Sandcastle? A TypeScript library for orchestrating AI coding agents in isolated sandboxes: 1. You invoke agents with a single `sandcastle.run()`. 2. Sandcastle handles sandboxing the agent with a configurable branch strategy. 3. The commits made on the branches get merged back. Sandcastle is provider-agnostic — it ships with built-in providers for Docker, Podman, and Vercel, and you can create your own. Great for parallelizing multiple AFK agents, creating review pipelines, or even just orchestrating your own agents. ## Prerequisites - [Git](https://git-scm.com/) - A sandbox provider — Sandcastle needs an isolated environment to run agents in. Built-in options: - [Docker Desktop](https://www.docker.com/) — most common for local development - [Podman](https://podman.io/) — rootless alternative to Docker - [Vercel](https://vercel.com/) — cloud-based Firecracker microVMs via `@vercel/sandbox` - Or [create your own](#custom-sandbox-providers) using `createBindMountSandboxProvider` or `createIsolatedSandboxProvider` ## Quick start 1. Install the package: npm install --save-dev @ai-hero/sandcastle 2. Run `npx @ai-hero/sandcastle init`. This scaffolds a `.sandcastle` directory with all the files needed. npx @ai-hero/sandcastle init 3. Edit `.sandcastle/.env` and fill in your default values for `ANTHROPIC_API_KEY`. If you want to use your Claude subscription instead of an API key, see [#191](https://github.com/mattpocock/sandcastle/issues/191). cp .sandcastle/.env.example .sandcastle/.env 4. Run the `.sandcastle/main.ts` (or `main.mts`) file with `npx tsx` npx tsx .sandcastle/main.ts // 3. Run the agent via the JS API import { run, claudeCode } from "@ai-hero/sandcastle"; import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; await run({ agent: claudeCode("claude-opus-4-7"), sandbox: docker(), // or podman(), vercel(), or your own provider promptFile: ".sandcastle/prompt.md", }); ## Sandbox Providers Sandcastle uses a `SandboxProvider` to create isolated environments. The `sandbox` option on `run()`, `interactive()`, and `createSandbox()` accepts any provider, including `noSandbox()` — opt in to running the agent directly on the host when container isolation is undesired. Built-in providers: | Provider | Import path | Type | Accepted by | | ---------- | ------------------------------------------ | ---------- | ------------------------------------------- | | Docker | `@ai-hero/sandcastle/sandboxes/docker` | Bind-mount | `run()`, `createSandbox()`, `interactive()` | | Podman | `@ai-hero/sandcastle/sandboxes/podman` | Bind-mount | `run()`, `createSandbox()`, `interactive()` | | Vercel | `@ai-hero/sandcastle/sandboxes/vercel` | Isolated | `run()`, `createSandbox()`, `interactive()` | | No-sandbox | `@ai-hero/sandcastle/sandboxes/no-sandbox` | None | `run()`, `createSandbox()`, `interactive()` | Worktree methods (`wt.run()`, `wt.interactive()`, `wt.createSandbox()`) accept the same providers as their top-level counterparts. `wt.interactive()` defaults to `noSandbox()` when no sandbox is specified. import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; import { podman } from "@ai-hero/sandcastle/sandboxes/podman"; import { vercel } from "@ai-hero/sandcastle/sandboxes/vercel"; import { noSandbox } from "@ai-hero/sandcastle/sandboxes/no-sandbox"; // Docker, Podman, and Vercel are interchangeable in run() and createSandbox(): await run({ agent: claudeCode("claude-opus-4-7"), sandbox: docker(), prompt: "...", }); // No-sandbox runs the agent directly on the host — accepted by run(), // createSandbox(), and interactive(). Skips container isolation entirely: await interactive({ agent: claudeCode("claude-opus-4-7"), sandbox: noSandbox(), prompt: "...", // optional — omit to launch the TUI with no initial prompt cwd: "/path/to/other-repo", // optional — defaults to process.cwd() }); You can also [create your own provider](#custom-sandbox-providers) using `createBindMountSandboxProvider` or `createIsolatedSandboxProvider`. ## API Sandcastle exports a programmatic `run()` function for use in scripts, CI pipelines, or custom tooling. The examples below use `docker()`, but any `SandboxProvider` works in its place. import { run, claudeCode } from "@ai-hero/sandcastle"; import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; const result = await run({ agent: claudeCode("claude-opus-4-7"), sandbox: docker(), promptFile: ".sandcastle/prompt.md", }); console.log(result.iterations.length); // number of iterations executed console.log(result.iterations); // per-iteration results with optional sessionId console.log(result.commits); // array of { sha } for commits created console.log(result.branch); // target branch name ### All options import { run, claudeCode } from "@ai-hero/sandcastle"; import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; const result = await run({ // Agent provider — required. Pass a model string to claudeCode(). // Optional second arg for provider-specific options like effort level. agent: claudeCode("claude-opus-4-7", { effort: "high" }), // Sandbox provider — required. Any SandboxProvider works (docker, podman, vercel, or custom). // Provider-specific config (like imageName, mounts) lives inside the provider factory call. sandbox: docker({ imageName: "sandcastle:local", // Optional: override the UID/GID used for --user flag (defaults to host UID/GID). // Must match the UID baked into the image. Pre-flight check catches mismatches. // containerUid: 1000, // containerGid: 1000, // Optional: mount host directories into the sandbox (e.g. package manager caches) // hostPath supports absolute, tilde-expanded (~), and relative paths (resolved from cwd). // sandboxPath supports absolute and relative paths (resolved from the sandbox repo directory). mounts: [ { hostPath: "~/.npm", sandboxPath: "/home/agent/.npm", readonly: true }, { hostPath: "data", sandboxPath: "data" }, // mounts /data → /data ], // Optional: SELinux volume label — "z" (default, shared), "Z" (private), or false (none). // No-op on non-SELinux systems (Docker Desktop on macOS/Windows, Linux without SELinux). selinuxLabel: "z", // Optional: provider-level env vars merged at launch time env: { DOCKER_SPECIFIC: "value" }, // Optional: attach container to Docker network(s) — string or string[] network: "my-network", // Optional: add the container user to supplementary groups via --group-add. // Accepts group names or numeric GIDs (e.g. for a bind-mounted Docker socket). groups: ["docker", 999], // Optional: expose host devices via --device. Each entry is a full device // spec in host[:container[:permissions]] form (e.g. "/dev/kvm"). devices: ["/dev/kvm"], // Optional: limit CPU resources via --cpus. Fractional values allowed (e.g. 1.5). // cpus: 2, }), // Host repo directory — replaces process.cwd() as the anchor for // .sandcastle/ artifacts (worktrees, logs, env, patches) and git operations. // Relative paths resolve against process.cwd(). Defaults to process.cwd(). cwd: "../other-repo", // Branch strategy — controls how the agent's changes relate to branches. // Defaults to { type: "head" } for bind-mount and { type: "merge-to-head" } for isolated providers. branchStrategy: { type: "branch", branch: "agent/fix-42" }, // Prompt source — provide one of these, not both. // Note: promptFile resolves against process.cwd(), NOT cwd. promptFile: ".sandcastle/prompt.md", // path to a prompt file // prompt: "Fix issue #42 in this repo", // OR an inline prompt string // Values substituted for {{KEY}} placeholders in the prompt. promptArgs: { ISSUE_NUMBER: "42", }, // Maximum number of agent iterations to run before stopping. Default: 1 maxIterations: 5, // Display name for this run, shown as a prefix in log output. name: "fix-issue-42", // Lifecycle hooks grouped by where they run: host or sandbox. hooks: { host: { onWorktreeReady: [{ command: "cp .env.example .env" }], onSandboxReady: [{ command: "echo setup done" }], }, sandbox: { onSandboxReady: [{ command: "npm install" }], }, }, // Host-relative file paths to copy into the sandbox before the container starts. // Not supported with branchStrategy: { type: "head" }. copyToWorktree: [".env"], // Override default timeouts for built-in lifecycle steps. // Unset keys keep their defaults. timeouts: { copyToWorktreeMs: 120_000, // default: 60_000 gitSetupMs: 30_000, // default: 10_000 commitCollectionMs: 60_000, // default: 30_000 mergeToHostMs: 60_000, // default: 30_000 }, // How to record progress. Default: write to a file under .sandcastle/logs/ logging: { type: "file", path: ".sandcastle/logs/my-run.log", // Optional: forward the agent's output stream to your own observability system. // Fires for each text chunk and tool call the agent produces. Errors thrown // by the callback are swallowed so a broken forwarder cannot kill the run. onAgentStreamEvent: (event) => { // event is { type: "text" | "toolCall", iteration, timestamp, ... } myLogger.info(event); }, }, // logging: { type: "stdout" }, // OR render an interactive UI in the terminal // String (or array of strings) the agent emits to end the iteration loop early. // Default: "COMPLETE" completionSignal: "COMPLETE", // Idle timeout in seconds — resets whenever the agent produces output. Default: 600 (10 minutes) idleTimeoutSeconds: 600, // Grace window in seconds after the agent emits a completion signal but // before its process has exited (a "hanging process" — typically a spawned // `gh`/git child or MCP server keeping stdout open). Resets on every // subsequent output line so trailing data is still captured. Default: 60 completionTimeoutSeconds: 60, // Structured output — extract a typed payload from the agent's stdout. // Requires maxIterations === 1 and the tag must appear in the prompt. // output: Output.object({ tag: "result", schema: z.object({ answer: z.number() }) }), // output: Output.string({ tag: "summary" }), }); console.log(result.iterations.length); // number of iterations executed console.log(result.completionSignal); // matched signal string, or undefined if none fired console.log(result.commits); // array of { sha } for commits created console.log(result.branch); // target branch name ### `createSandbox()` — reusable sandbox Use `createSandbox()` when you need to run multiple agents (or multiple rounds of the same agent) inside a single sandbox. It creates the sandbox once, and you call `sandbox.run()` as many times as you need. This avoids repeated container startup costs and keeps all runs on the same branch. Use `run()` instead when you only need a single one-shot invocation — it handles sandbox lifecycle automatically. #### Basic single-run usage import { createSandbox, claudeCode } from "@ai-hero/sandcastle"; import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; await using sandbox = await createSandbox({ branch: "agent/fix-42", sandbox: docker(), }); const result = await sandbox.run({ agent: claudeCode("claude-opus-4-7"), prompt: "Fix issue #42 in this repo.", }); console.log(result.commits); // [{ sha: "abc123" }] #### Multi-run implement-then-review import { createSandbox, claudeCode } from "@ai-hero/sandcastle"; import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; await using sandbox = await createSandbox({ branch: "agent/fix-42", sandbox: docker(), hooks: { sandbox: { onSandboxReady: [{ command: "npm install" }] } }, }); // Step 1: implement const implResult = await sandbox.run({ agent: claudeCode("claude-opus-4-7"), promptFile: ".sandcastle/implement.md", maxIterations: 5, }); // Step 2: review on the same branch, same container const reviewResult = await sandbox.run({ agent: claudeCode("claude-sonnet-4-6"), prompt: "Review the changes and fix any issues.", }); Commits from all `run()` calls accumulate on the same branch. The sandbox container stays alive between runs, so installed dependencies and build artifacts persist. #### Automatic cleanup with `await using` `await using` calls `sandbox.close()` automatically when the block exits. If the sandbox has uncommitted changes, the worktree is preserved on disk; if clean, both container and worktree are removed. #### Manual `close()` with `CloseResult` const sandbox = await createSandbox({ branch: "agent/fix-42", sandbox: docker(), }); // ... run agents ... const closeResult = await sandbox.close(); if (closeResult.preservedWorktreePath) { console.log(`Worktree preserved at ${closeResult.preservedWorktreePath}`); } #### `CreateSandboxOptions` | Option | Type | Default | Description | | ---------------- | --------------- | --------------- | ------------------------------------------------------------------------------------------------------------------- | | `branch` | string | — | **Required.** Explicit branch for the sandbox | | `sandbox` | SandboxProvider | — | **Required.** Sandbox provider (e.g. `docker()`, `podman()`) | | `cwd` | string | `process.cwd()` | Host repo directory — relative paths resolve against `process.cwd()` | | `hooks` | SandboxHooks | — | Lifecycle hooks (`host.*`, `sandbox.*`) — run once at creation time | | `copyToWorktree` | string[] | — | Host-relative file paths to copy into the sandbox at creation time | | `timeouts` | Timeouts | — | Override built-in lifecycle step timeouts (`copyToWorktreeMs`, `gitSetupMs`, `commitCollectionMs`, `mergeToHostMs`) | #### `Sandbox` | Property / Method | Type | Description | | ----------------------- | ------------------------------------------------------------------ | -------------------------------------------- | | `branch` | string | The branch the sandbox is on | | `worktreePath` | string | Host path to the worktree | | `run(options)` | `(SandboxRunOptions) => Promise` | Invoke an agent inside the existing sandbox | | `interactive(options)` | `(SandboxInteractiveOptions) => Promise` | Launch an interactive session in the sandbox | | `close()` | `() => Promise` | Tear down the container and sandbox | | `[Symbol.asyncDispose]` | `() => Promise` | Auto teardown via `await using` | #### `SandboxRunOptions` | Option | Type | Default | Description | | -------------------------- | ------------------ | ----------------------------- | ------------------------------------------------------------------------------------ | | `agent` | AgentProvider | — | **Required.** Agent provider (e.g. `claudeCode("claude-opus-4-7")`) | | `prompt` | string | — | Inline prompt (mutually exclusive with `promptFile`) | | `promptFile` | string | — | Path to prompt file (mutually exclusive with `prompt`) | | `promptArgs` | PromptArgs | — | Key-value map for `{{KEY}}` placeholder substitution | | `maxIterations` | number | `1` | Maximum iterations to run | | `completionSignal` | string \| string[] | `COMPLETE` | String(s) the agent emits to stop the iteration loop early | | `idleTimeoutSeconds` | number | `600` | Idle timeout in seconds — resets on each agent output event | | `completionTimeoutSeconds` | number | `60` | Grace window after the completion signal is seen but the agent process hasn't exited | | `name` | string | — | Display name for the run | | `logging` | object | file (auto-generated) | `{ type: 'file', path }` or `{ type: 'stdout' }` | | `signal` | AbortSignal | — | Cancels the run when aborted; handle stays usable afterward | #### `SandboxRunResult` | Field | Type | Description | | ------------------ | ------------------- | ------------------------------------------------------------------ | | `iterations` | `IterationResult[]` | Per-iteration results (use `.length` for the count) | | `completionSignal` | string? | The matched completion signal string, or `undefined` if none fired | | `stdout` | string | Combined agent output from all iterations | | `commits` | `{ sha }[]` | Commits created during the run | | `logFilePath` | string? | Path to the log file (only when logging to a file) | #### `CloseResult` | Field | Type | Description | | ----------------------- | ------- | ------------------------------------------------------------------------ | | `preservedWorktreePath` | string? | Host path to the preserved worktree, set when it had uncommitted changes | ### `createWorktree()` — independent worktree lifecycle Use `createWorktree()` when you need a worktree (git worktree) as an independent, first-class concept — separate from any sandbox. This is useful when you want to run an interactive session first and then hand the same worktree to a sandboxed AFK agent. Only `branch` and `merge-to-head` strategies are accepted; `head` is a compile-time type error since it means no worktree. Pass `cwd` to target a repo other than `process.cwd()`. Relative paths resolve against `process.cwd()`; absolute paths pass through. A `CwdError` is thrown if the path does not exist or is not a directory. import { createWorktree } from "@ai-hero/sandcastle"; await using wt = await createWorktree({ branchStrategy: { type: "branch", branch: "agent/fix-42" }, copyToWorktree: ["node_modules"], cwd: "/path/to/other-repo", // optional — defaults to process.cwd() }); console.log(wt.worktreePath); // host path to the worktree console.log(wt.branch); // "agent/fix-42" // Run an interactive session in the worktree (defaults to noSandbox) await wt.interactive({ agent: claudeCode("claude-opus-4-7"), prompt: "Explore the codebase and understand the bug.", }); // Run an AFK agent in the worktree (sandbox is required) const result = await wt.run({ agent: claudeCode("claude-opus-4-7"), sandbox: docker({ imageName: "sandcastle:myrepo" }), prompt: "Fix issue #42.", maxIterations: 3, }); console.log(result.commits); // commits made during the run // Create a long-lived sandbox from the worktree import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; await using sandbox = await wt.createSandbox({ sandbox: docker(), hooks: { sandbox: { onSandboxReady: [{ command: "npm install" }] } }, }); // sandbox.close() tears down the container only — the worktree stays await sandbox.close(); // wt.close() cleans up the worktree `wt.close()` checks for uncommitted changes: if the worktree is dirty, it's preserved on disk; if clean, it's removed. `await using` calls `close()` automatically. The worktree persists after `run()`, `interactive()`, and `createSandbox()` complete, so you can hand it to another agent or inspect it. **Split ownership**: When a sandbox is created via `wt.createSandbox()`, `sandbox.close()` tears down the container only — the worktree remains. `wt.close()` is responsible for worktree cleanup. This differs from the top-level `createSandbox()`, where `sandbox.close()` owns both container and worktree. #### `CreateWorktreeOptions` | Option | Type | Default | Description | | ---------------- | ---------------------- | ------- | ------------------------------------------------------------------------------------------------------------------- | | `branchStrategy` | WorktreeBranchStrategy | — | **Required.** `{ type: "branch", branch }` or `{ type: "merge-to-head" }` | | `copyToWorktree` | string[] | — | Host-relative file paths to copy into the worktree at creation time | | `timeouts` | Timeouts | — | Override built-in lifecycle step timeouts (`copyToWorktreeMs`, `gitSetupMs`, `commitCollectionMs`, `mergeToHostMs`) | #### `Worktree` | Property / Method | Type | Description | | ------------------------ | --------------------------------------------------------------------- | --------------------------------------------------- | | `branch` | string | The branch the worktree is on | | `worktreePath` | string | Host path to the worktree | | `run(options)` | `(options: WorktreeRunOptions) => Promise` | Run an AFK agent in the worktree (sandbox required) | | `interactive(options)` | `(options: WorktreeInteractiveOptions) => Promise` | Run an interactive agent session in the worktree | | `createSandbox(options)` | `(options: WorktreeCreateSandboxOptions) => Promise` | Create a long-lived sandbox backed by this worktree | | `close()` | `() => Promise` | Clean up the worktree (preserves if dirty) | | `[Symbol.asyncDispose]` | `() => Promise` | Auto cleanup via `await using` | #### `WorktreeInteractiveOptions` | Option | Type | Default | Description | | ------------ | ---------------------- | ------------- | ------------------------------------------------------------------------------------------------- | | `agent` | AgentProvider | — | **Required.** Agent provider | | `sandbox` | AnySandboxProvider | `noSandbox()` | Sandbox provider (defaults to no sandbox) | | `prompt` | string | — | Inline prompt (mutually exclusive with `promptFile`) | | `promptFile` | string | — | Path to prompt file | | `name` | string | — | Optional session name | | `hooks` | SandboxHooks | — | Lifecycle hooks (`host.*`, `sandbox.*`) | | `promptArgs` | PromptArgs | — | Key-value map for `{{KEY}}` placeholder substitution | | `env` | Record | — | Environment variables to inject into the sandbox | | `signal` | AbortSignal | — | Cancel the session when aborted. The worktree is preserved on disk. Rejects with `signal.reason`. | #### `WorktreeRunOptions` | Option | Type | Default | Description | | -------------------------- | ---------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------ | | `agent` | AgentProvider | — | **Required.** Agent provider | | `sandbox` | SandboxProvider | — | **Required.** Sandbox provider (AFK agents must be sandboxed) | | `prompt` | string | — | Inline prompt (mutually exclusive with `promptFile`) | | `promptFile` | string | — | Path to prompt file | | `maxIterations` | number | 1 | Maximum iterations to run | | `completionSignal` | string \| string[] | — | Substring(s) to stop the iteration loop early | | `idleTimeoutSeconds` | number | 600 | Idle timeout in seconds | | `completionTimeoutSeconds` | number | 60 | Grace window after completion signal is seen but agent process hasn't exited | | `name` | string | — | Optional run name | | `logging` | LoggingOption | file | Logging mode | | `hooks` | SandboxHooks | — | Lifecycle hooks (`host.*`, `sandbox.*`) | | `promptArgs` | PromptArgs | — | Key-value map for `{{KEY}}` placeholder substitution | | `env` | Record | — | Environment variables to inject into the sandbox | | `resumeSession` | string | — | Resume a prior session by ID for agents that support resume. Incompatible with `maxIterations > 1`. Session file must exist on host. | | `signal` | AbortSignal | — | Cancel the run when aborted. Kills the in-flight agent subprocess; the worktree is preserved on disk. Rejects with `signal.reason`. | #### `WorktreeRunResult` | Property | Type | Description | | ------------------ | ------------------- | ------------------------------------------------------ | | `iterations` | `IterationResult[]` | Per-iteration results (use `.length` for the count) | | `completionSignal` | string | The matched completion signal, or undefined | | `stdout` | string | Combined stdout output from all agent iterations | | `commits` | { sha: string }[] | List of commits made by the agent during the run | | `branch` | string | The branch name the agent worked on | | `logFilePath` | string | Path to the log file, if logging was drained to a file | #### `WorktreeCreateSandboxOptions` | Option | Type | Default | Description | | ---------------- | --------------- | ------- | ------------------------------------------------------------------------------------------------------------------- | | `sandbox` | SandboxProvider | — | **Required.** Sandbox provider (e.g. `docker()`) | | `hooks` | SandboxHooks | — | Lifecycle hooks (`host.*`, `sandbox.*`) | | `copyToWorktree` | string[] | — | Host-relative file paths to copy into the worktree at creation time | | `timeouts` | Timeouts | — | Override built-in lifecycle step timeouts (`copyToWorktreeMs`, `gitSetupMs`, `commitCollectionMs`, `mergeToHostMs`) | ## How it works Sandcastle uses a **branch strategy** configured on the sandbox provider to control how the agent's changes relate to branches. There are three strategies: - **Head** (`{ type: "head" }`) — The agent writes directly to the host working directory. No worktree, no branch indirection. This is the default for bind-mount providers like `docker()`. - **Merge-to-head** (`{ type: "merge-to-head" }`) — Sandcastle creates a temporary branch in a git worktree. The agent works on the temp branch, and changes are merged back to HEAD when done. The temp branch is cleaned up after merge. - **Branch** (`{ type: "branch", branch: "foo" }`) — Commits land on an explicitly named branch in a git worktree. Re-running with the same branch reuses the existing worktree and fast-forwards it from `origin` when safe — see [ADR 0003](docs/adr/0003-reuse-worktree-by-default.md). For bind-mount providers (like Docker), the worktree directory is bind-mounted into the container — the agent writes directly to the host filesystem through the mount, so no sync is needed. From your point of view, you just configure `branchStrategy: { type: 'branch', branch: 'foo' }` on `run()`, and get a commit on branch `foo` once it's complete. All 100% local. ## Prompts Sandcastle uses a flexible prompt system. You write the prompt, and the engine executes it — no opinions about workflow, task management, or context sources are imposed. ### Prompt resolution You must provide exactly one of: 1. `prompt: "inline string"` — pass an inline prompt directly via `RunOptions` 2. `promptFile: "./path/to/prompt.md"` — point to a specific file via `RunOptions` `prompt` and `promptFile` are mutually exclusive — providing both is an error. If neither is provided, `run()` throws an error asking you to supply one. **Inline prompts (`prompt: "..."`) are passed to the agent literally.** No `{{KEY}}` substitution, no `` !`command` `` expansion, no built-in `{{SOURCE_BRANCH}}` / `{{TARGET_BRANCH}}` injection. If you need values interpolated into an inline prompt, build the string in JavaScript (`` `Work on ${branch}…` ``). Passing `promptArgs` alongside an inline prompt is an error — switch to `promptFile` to use substitution. The substitution and expansion features below apply **only** to prompts sourced from `promptFile`. ### Dynamic context with `` !`command` `` Use `` !`command` `` expressions in your prompt to pull in dynamic context. Each expression is replaced with the command's stdout before the prompt is sent to the agent. All expressions in a prompt run **in parallel** for faster expansion. Commands run **inside the sandbox** after `sandbox.onSandboxReady` hooks complete, so they see the same repo state the agent sees (including installed dependencies). # Open issues !`gh issue list --state open --label Sandcastle --json number,title,body,comments,labels --limit 100` # Recent commits !`git log --oneline -10` If any command exits with a non-zero code, the run fails immediately with an error. ### Prompt arguments with `{{KEY}}` Use `{{KEY}}` placeholders in your prompt to inject values from the `promptArgs` option. This is useful for reusing the same prompt file across multiple runs with different parameters. import { run } from "@ai-hero/sandcastle"; await run({ promptFile: "./my-prompt.md", promptArgs: { ISSUE_NUMBER: 42, PRIORITY: "high" }, }); In the prompt file: Work on issue #{{ISSUE_NUMBER}} (priority: {{PRIORITY}}). Prompt argument substitution runs on the host before shell expression expansion, so `{{KEY}}` placeholders inside `` !`command` `` expressions are replaced first: !`gh issue view {{ISSUE_NUMBER}} --json body -q .body` A `{{KEY}}` placeholder with no matching prompt argument is an error. Unused prompt arguments produce a warning. `` !`command` `` expansion only runs on shell blocks written in the prompt file itself. Any `` !`…` `` pattern that appears inside an argument value is treated as inert text — it won't be executed against the host shell. This makes it safe to pass user-authored content (issue titles, PR descriptions, docs excerpts) through `promptArgs`. ### Built-in prompt arguments Sandcastle automatically injects two built-in prompt arguments into every prompt: | Placeholder | Value | | ------------------- | ----------------------------------------------------------------- | | `{{SOURCE_BRANCH}}` | The branch the agent works on (determined by the branch strategy) | | `{{TARGET_BRANCH}}` | The host's active branch at `run()` time | Use them in your prompt without passing them via `promptArgs`: You are working on {{SOURCE_BRANCH}}. When diffing, compare against {{TARGET_BRANCH}}. Passing `SOURCE_BRANCH` or `TARGET_BRANCH` in `promptArgs` is an error — built-in prompt arguments cannot be overridden. ### Early termination with `COMPLETE` This is useful for task-based workflows where the agent should stop once it has finished, rather than running all remaining iterations. You can override the default signal by passing `completionSignal` to `run()`. It accepts a single string or an array of strings: await run({ // ... completionSignal: "DONE", }); // Or pass multiple signals — the loop stops on the first match: await run({ // ... completionSignal: ["TASK_COMPLETE", "TASK_ABORTED"], }); Tell the agent to output your chosen string(s) in the prompt, and the orchestrator will stop when it detects any of them. The matched signal is returned as `result.completionSignal`. #### Hanging processes after the completion signal The agent process is expected to exit shortly after emitting the completion signal. When a child it spawned — a `gh`/git subprocess, a long-lived MCP server, etc. — inherits the agent's stdout pipe and keeps it open, the parent process can linger long past its logical end. Sandcastle would otherwise wait for the full `idleTimeoutSeconds` and fail with `AgentIdleTimeoutError`, throwing away the commits the agent already made. Instead, once the completion signal is observed in the output buffer, Sandcastle swaps in a short **completion timeout** (default 60 s). When it expires, the run resolves successfully with a warning that the process was hanging; `result.commits` and `result.completionSignal` are populated as if the process had exited cleanly. The timer resets on every subsequent output line, so trailing data emitted after the signal — token-usage events, terminal `result` events, a structured-output `` — is still captured. A clean process exit always wins the race, so healthy runs gain zero added latency. The completion timeout only matters when the process hangs. Tune the window with `completionTimeoutSeconds`: await run({ // ... completionTimeoutSeconds: 30, // shorter grace window }); This is independent of `idleTimeoutSeconds`. They cover different phases: `idleTimeoutSeconds` runs **before** any signal is seen (genuinely stuck agent → fail); `completionTimeoutSeconds` runs **after** the signal is seen (hanging process → succeed with warning). See [ADR 0019](docs/adr/0019-completion-timeout-for-hanging-process.md). ### Structured output Use `Output.object()` to extract a typed, schema-validated JSON payload from the agent's stdout. The agent emits its answer inside an XML tag you specify, and Sandcastle parses, validates, and returns it on `result.output`. The schema can be any [Standard Schema](https://standardschema.dev) validator — the examples below use [Zod](https://zod.dev), but Valibot, ArkType, and others work identically. See [ADR 0010](docs/adr/0010-structured-output.md) for design rationale. import { run, Output, claudeCode } from "@ai-hero/sandcastle"; import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; import { z } from "zod"; const result = await run({ agent: claudeCode("claude-opus-4-7"), sandbox: docker(), prompt: `Analyze the code, and output the result as JSON inside tags. The result must match this schema: { summary: string; score: string } `, output: Output.object({ tag: "result", schema: z.object({ summary: z.string(), score: z.number() }), }), }); console.log(result.output.summary); // typed as string console.log(result.output.score); // typed as number `Output.string({ tag })` extracts the tag contents as a plain string (trimmed, no JSON parsing). Both helpers require `maxIterations` to be `1` (the default). The resolved prompt must contain the configured opening tag literal. When extraction or validation fails, `run()` throws a `StructuredOutputError`. Alongside `tag`, `rawMatched`, `cause`, `commits`, `branch`, and `preservedWorktreePath`, the error carries the `sessionId` (and `sessionFilePath`, when the session was captured) of the run that produced the bad output. You can resume that session to ask the agent to re-emit corrected output, without repeating the work: import { run, Output, StructuredOutputError } from "@ai-hero/sandcastle"; try { return await run({ ...opts, output }); } catch (e) { if (e instanceof StructuredOutputError && e.sessionId) { return await run({ ...opts, output, resumeSession: e.sessionId, prompt: `Your previous output failed: ${e.message}. Re-emit it inside <${e.tag}> tags.`, }); } throw e; } ### Templates | Template | Description | | ------------------------------ | ------------------------------------------------------------------------- | | `blank` | Bare scaffold — write your own prompt and orchestration | | `simple-loop` | Picks issues one by one and closes them | | `sequential-reviewer` | Implements issues one by one, with a code review step after each | | `parallel-planner` | Plans parallelizable issues, executes on separate branches, then merges | | `parallel-planner-with-review` | Plans parallelizable issues, executes with per-branch review, then merges | Select a template during `sandcastle init` when prompted, or re-run init in a fresh repo to try a different one. ## CLI commands ### `sandcastle init` Scaffolds the `.sandcastle/` config directory and builds the container image. This is the first command you run in a new repo. You choose a sandbox provider (Docker or Podman) during init — selecting Podman writes a `Containerfile` instead of `Dockerfile` and uses `sandcastle podman build-image` for the build step. Init detects your host package manager (npm, pnpm, yarn, or bun) from a `packageManager` field or lockfile, defaulting to npm. Templates whose `main` file imports a host dependency — the planner templates import [Zod](https://zod.dev) for their `` output schema — prompt you to install it with that package manager when it isn't already in your `package.json`, so the first `npx tsx .sandcastle/main.ts` doesn't fail with `ERR_MODULE_NOT_FOUND`. Every interactive prompt has a paired `--flag` so the entire init can run non-interactively (e.g. in CI or a scripted setup). When stdin is not a TTY and a required flag is missing, init fails fast with a clear error rather than wedging on a prompt. | Option | Required | Default | Description | | ------------------------- | -------- | ---------------------------- | -------------------------------------------------------------------------------------------------------------- | | `--image-name` | No | `sandcastle:` | Docker image name | | `--agent` | No | Interactive prompt | Agent to use (`claude-code`, `pi`, `codex`, `cursor`, `opencode`, `copilot`) | | `--model` | No | Agent's default model | Model to use (e.g. `claude-sonnet-4-6`). Defaults to agent's default | | `--sandbox` | No | Interactive prompt | Sandbox provider to use (`docker`, `podman`) | | `--template` | No | Interactive prompt | Template to scaffold (e.g. `blank`, `simple-loop`) | | `--issue-tracker` | No | Interactive prompt | Issue tracker to use (`github-issues`, `beads`, `custom`) | | `--create-label` | No | Interactive prompt | `true` / `false` — whether to create the `Sandcastle` GitHub label (only with `--issue-tracker github-issues`) | | `--build-image` | No | Interactive prompt | `true` / `false` — whether to build the sandbox image now (silently ignored with `--issue-tracker custom`) | | `--install-template-deps` | No | Interactive prompt | `true` / `false` — whether to install template host deps (e.g. `zod` for the planner templates) | Creates the following files: .sandcastle/ ├── Dockerfile # Sandbox environment (customize as needed) ├── prompt.md # Agent instructions ├── .env.example # Token placeholders └── .gitignore # Ignores .env, logs/ Errors if `.sandcastle/` already exists to prevent overwriting customizations. ### `sandcastle docker build-image` Rebuilds the Docker image from an existing `.sandcastle/` directory. Use this after modifying the Dockerfile. On Linux/macOS, the build automatically passes `--build-arg AGENT_UID=$(id -u)` and `AGENT_GID=$(id -g)` so the image's `agent` user matches the host UID — this prevents permission errors on image-built files without runtime chown. | Option | Required | Default | Description | | -------------- | -------- | ---------------------------- | --------------------------------------------------------------------------------- | | `--image-name` | No | `sandcastle:` | Docker image name | | `--dockerfile` | No | — | Path to a custom Dockerfile (build context will be the current working directory) | ### `sandcastle docker remove-image` Removes the Docker image. | Option | Required | Default | Description | | -------------- | -------- | ---------------------------- | ----------------- | | `--image-name` | No | `sandcastle:` | Docker image name | ### `sandcastle podman build-image` Builds the Podman image from an existing `.sandcastle/` directory. Use this after modifying the Containerfile. | Option | Required | Default | Description | | ----------------- | -------- | ---------------------------- | ------------------------------------------------------------------------------------ | | `--image-name` | No | `sandcastle:` | Podman image name | | `--containerfile` | No | — | Path to a custom Containerfile (build context will be the current working directory) | ### `sandcastle podman remove-image` Removes the Podman image. | Option | Required | Default | Description | | -------------- | -------- | ---------------------------- | ----------------- | | `--image-name` | No | `sandcastle:` | Podman image name | ### `RunOptions` | Option | Type | Default | Description | | -------------------------- | ------------------ | ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `agent` | AgentProvider | — | **Required.** Agent provider (e.g. `claudeCode("claude-opus-4-7")`, `pi("claude-sonnet-4-6")`, `codex("gpt-5.4-mini")`, `cursor("composer-2")`, `opencode("opencode/big-pickle")`, `copilot("claude-sonnet-4.5")`) | | `sandbox` | SandboxProvider | — | **Required.** Sandbox provider (e.g. `docker()`, `podman()`, `docker({ imageName: "sandcastle:local" })`) | | `cwd` | string | `process.cwd()` | Host repo directory — anchor for `.sandcastle/` artifacts and git operations. Relative paths resolve against `process.cwd()`. | | `prompt` | string | — | Inline prompt (mutually exclusive with `promptFile`) | | `promptFile` | string | — | Path to prompt file (mutually exclusive with `prompt`). Resolves against `process.cwd()`, **not** `cwd`. | | `maxIterations` | number | `1` | Maximum iterations to run | | `hooks` | SandboxHooks | — | Lifecycle hooks (`host.*`, `sandbox.*`) | | `name` | string | — | Display name for the run, shown as a prefix in log output | | `promptArgs` | PromptArgs | — | Key-value map for `{{KEY}}` placeholder substitution | | `branchStrategy` | BranchStrategy | per-provider default | Branch strategy: `{ type: 'head' }`, `{ type: 'merge-to-head' }`, or `{ type: 'branch', branch: '…' }` | | `copyToWorktree` | string[] | — | Host-relative file paths to copy into the sandbox before start (not supported with `branchStrategy: { type: 'head' }`) | | `logging` | object | file (auto-generated) | `{ type: 'file', path }` or `{ type: 'stdout' }` | | `completionSignal` | string \| string[] | `COMPLETE` | String or array of strings the agent emits to stop the iteration loop early | | `idleTimeoutSeconds` | number | `600` | Idle timeout in seconds — resets on each agent output event | | `completionTimeoutSeconds` | number | `60` | Grace window in seconds after the completion signal is observed but the agent process has not exited (hanging process). See [Hanging processes after the completion signal](#hanging-processes-after-the-completion-signal). | | `resumeSession` | string | — | Resume a prior session by ID for agents that support resume. Incompatible with `maxIterations > 1`. Session file must exist on host. | | `signal` | AbortSignal | — | Cancel the run when aborted. Kills the in-flight agent subprocess and cancels lifecycle hooks; the worktree is preserved on disk. Rejects with `signal.reason`. | | `timeouts` | Timeouts | — | Override default timeouts for built-in lifecycle steps: `copyToWorktreeMs` (60 000), `gitSetupMs` (10 000), `commitCollectionMs` (30 000), `mergeToHostMs` (30 000). | | `output` | OutputDefinition | — | Structured output definition (`Output.object(…)` or `Output.string(…)`). Requires `maxIterations === 1`. See [Structured output](#structured-output). | ### `RunResult` | Field | Type | Description | | ------------------ | ------------------- | ------------------------------------------------------------------ | | `iterations` | `IterationResult[]` | Per-iteration results (use `.length` for the count) | | `completionSignal` | string? | The matched completion signal string, or `undefined` if none fired | | `stdout` | string | Agent output | | `commits` | `{ sha }[]` | Commits created during the run | | `branch` | string | Target branch name | | `logFilePath` | string? | Path to the log file (only when logging to a file) | | `output` | T? | Typed structured output (only present when `output` option is set) | ### `IterationResult` | Field | Type | Description | | ----------------- | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------- | | `sessionId` | string? | Agent session ID from the provider stream, or `undefined` if the provider does not emit one | | `sessionFilePath` | string? | Absolute host path to the captured session JSONL, or `undefined` when capture is off | | `usage` | `IterationUsage`? | Token usage snapshot from the last assistant message, or `undefined` when capture is off or provider does not support usage parsing | ### `IterationUsage` | Field | Type | Description | | -------------------------- | ------ | ------------------------------------------ | | `inputTokens` | number | Input tokens consumed | | `cacheCreationInputTokens` | number | Tokens used to create prompt cache entries | | `cacheReadInputTokens` | number | Tokens read from prompt cache | | `outputTokens` | number | Output tokens generated | ### Session capture After each resumable provider iteration, Sandcastle automatically captures the agent's session file from the sandbox to the host. Claude Code sessions are stored under `~/.claude/projects//.jsonl`; Codex sessions are stored under `~/.codex/sessions/YYYY/MM/DD/rollout-*-.jsonl`; Pi sessions are stored under `~/.pi/agent/sessions/----/_.jsonl`. Any provider-specific `cwd` fields are rewritten to match the host repo root, so the provider's native resume command works. Session capture is enabled by default for `claudeCode()`, `codex()`, and `pi()` and can be opted out via `captureSessions: false`. Providers without `sessionStorage` do not attempt capture. Capture failure fails the run. ### Session resume Pass `resumeSession` to `run()` to continue a prior Claude Code, Codex, or Pi conversation inside a new sandbox: const result = await run({ agent: claudeCode("claude-opus-4-7"), sandbox: docker(), prompt: "Continue where you left off", resumeSession: "abc-123-def", }); You can also continue the last captured session from a result: const first = await run({ agent: codex("gpt-5.4-mini"), sandbox: docker(), prompt: "Draft a plan", }); const second = await first.resume?.("Now implement the plan"); `resume` is present only on results from resumable providers (Claude Code, Codex, Pi) — hence the optional-chaining call. Before the sandbox starts, Sandcastle validates that the session file exists on the host and transfers it into the sandbox with `cwd` fields rewritten to match the sandbox-side path. Claude Code receives `--resume `; Codex receives `codex exec resume ` with the prompt piped over stdin; Pi receives `--session `. Constraints: - `resumeSession` is incompatible with `maxIterations > 1` (throws before sandbox creation). - The provider's host session file must exist (throws before sandbox creation). - Only iteration 1 receives the resume flag; subsequent iterations (if any) start fresh. - Providers without resume support reject `resumeSession`. ### Session fork `RunResult.fork(prompt, options?)` is the sibling of `.resume()`: it continues from the last captured session but leaves the parent session JSONL untouched and writes the child under a new session id. The mechanism is the agent's native fork flag — `claude --resume --fork-session` for Claude Code, `codex exec fork ` for Codex. Fork enables fan-out workflows where a single parent run is the starting point for several independent children: const parent = await run({ agent: claudeCode("claude-opus-4-7"), sandbox: docker(), prompt: "Read the codebase and summarise the data model", }); const [reviewA, reviewB] = await Promise.all([ parent.fork?.("Review the migration plan", { branchStrategy: { type: "branch", branch: "review-a" }, }), parent.fork?.("Audit the auth layer", { branchStrategy: { type: "branch", branch: "review-b" }, }), ]); **Fork is session-only.** `--fork-session` and `codex exec fork` isolate the agent session JSONL — they do **not** isolate the branch, worktree, or sandbox. Safe concurrent fan-out (`Promise.all([r.fork(a), r.fork(b)])`) requires the caller to give each child a distinct `branch` via `branchStrategy: { type: "branch", branch: "..." }`. The default `head` and `merge-to-head` strategies are **not** safe for concurrent forks: `head` shares the host working directory across all children, and `merge-to-head` races `git merge` against the same HEAD. See [ADR 0018](docs/adr/0018-fork-is-session-only.md). `fork` is present only on results from providers with `sessionStorage` (Claude Code, Codex) — hence the optional-chaining call. The same single-iteration and session-file constraints as `.resume()` apply. ### `ClaudeCodeOptions` The `claudeCode()` factory accepts an optional second argument for provider-specific options: agent: claudeCode("claude-opus-4-7", { effort: "high" }); | Option | Type | Default | Description | | ----------------- | --------------------------------------------------------- | ------- | --------------------------------------------------------- | | `effort` | `"low"` \| `"medium"` \| `"high"` \| `"xhigh"` \| `"max"` | — | Claude Code reasoning effort level (`max` is Opus only) | | `env` | `Record` | `{}` | Environment variables injected by this agent provider | | `captureSessions` | `boolean` | `true` | Capture agent session JSONL to host for `claude --resume` | ### `CodexOptions` The `codex()` factory accepts an optional second argument for provider-specific options: agent: codex("gpt-5.4", { effort: "high" }); | Option | Type | Default | Description | | ----------------- | ---------------------------------------------- | ------- | --------------------------------------------------------- | | `effort` | `"low"` \| `"medium"` \| `"high"` \| `"xhigh"` | — | Codex reasoning effort level via `model_reasoning_effort` | | `env` | `Record` | `{}` | Environment variables injected by this agent provider | | `captureSessions` | `boolean` | `true` | Capture Codex rollout JSONL to host for resume | ### `PiOptions` The `pi()` factory accepts an optional second argument for provider-specific options: agent: pi("claude-sonnet-4-6", { thinking: "high" }); | Option | Type | Default | Description | | ----------------- | ------------------------------------------------------------------------ | ------- | -------------------------------------------------------- | | `thinking` | `"off"` \| `"minimal"` \| `"low"` \| `"medium"` \| `"high"` \| `"xhigh"` | — | Pi reasoning effort level via the `--thinking` flag | | `env` | `Record` | `{}` | Environment variables injected by this agent provider | | `captureSessions` | `boolean` | `true` | Capture pi session JSONL to host for `pi --session ` | ### Provider `env` Both **agent providers** and **sandbox providers** accept an optional `env: Record` in their options. These environment variables are merged with the `.sandcastle/.env` resolver output at launch time: await run({ agent: claudeCode("claude-opus-4-7", { env: { ANTHROPIC_API_KEY: "sk-ant-..." }, }), sandbox: docker({ env: { DOCKER_SPECIFIC_VAR: "value" }, }), prompt: "Fix issue #42", }); **Merge rules:** - Provider env (agent + sandbox) overrides `.sandcastle/.env` resolver output for shared keys - Agent provider env and sandbox provider env **must not overlap** — if they share any key, `run()` throws an error - When `env` is not provided, it defaults to `{}` Environment variables are also resolved automatically from `.sandcastle/.env` and `process.env` — no need to pass them to the API. The required variables depend on the **agent provider** (see `sandcastle init` output for details). ## Custom Sandbox Providers Sandcastle ships with built-in providers for Docker, Podman, and Vercel, but you can create your own. A sandbox provider tells Sandcastle how to execute commands in an isolated environment. There are two kinds: - **Bind-mount** — the sandbox can mount a host directory. Sandcastle creates a worktree on the host and the provider mounts it in. No file sync needed. Use this for Docker, Podman, or any local container runtime. - **Isolated** — the sandbox has its own filesystem (e.g. a cloud VM). The provider handles syncing code in and out via `copyIn` and `copyFileOut`. Use this when the sandbox cannot access the host filesystem. ### The sandbox handle contract Both provider types return a **sandbox handle** from their `create()` function. The handle exposes: | Method | Required | Description | | -------------- | ---------- | ---------------------------------------------------------------------------- | | `exec` | Both | Run a command, optionally streaming stdout line-by-line via `options.onLine` | | `close` | Both | Tear down the sandbox | | `copyFileIn` | Bind-mount | Copy a single file from the host into the sandbox | | `copyFileOut` | Both | Copy a single file from the sandbox to the host | | `copyIn` | Isolated | Copy a file or directory from the host into the sandbox | | `worktreePath` | Both | Absolute path to the repo directory inside the sandbox | ### `ExecResult` Every `exec` call returns an `ExecResult`: interface ExecResult { readonly stdout: string; readonly stderr: string; readonly exitCode: number; } ### Bind-mount provider example A minimal bind-mount provider that shells out to local processes (no container): import { createBindMountSandboxProvider, type BindMountCreateOptions, type BindMountSandboxHandle, type ExecResult, } from "@ai-hero/sandcastle"; import { execFile, spawn } from "node:child_process"; import { copyFile as fsCopyFile, mkdir as fsMkdir } from "node:fs/promises"; import { dirname } from "node:path"; import { createInterface } from "node:readline"; const localProcess = () => createBindMountSandboxProvider({ name: "local-process", create: async ( options: BindMountCreateOptions, ): Promise => { const worktreePath = options.worktreePath; return { worktreePath, exec: ( command: string, opts?: { onLine?: (line: string) => void; cwd?: string }, ): Promise => { if (opts?.onLine) { const onLine = opts.onLine; return new Promise((resolve, reject) => { const proc = spawn("sh", ["-c", command], { cwd: opts?.cwd ?? worktreePath, stdio: ["ignore", "pipe", "pipe"], }); const stdoutChunks: string[] = []; const stderrChunks: string[] = []; const rl = createInterface({ input: proc.stdout! }); rl.on("line", (line) => { stdoutChunks.push(line); onLine(line); // forward each line to Sandcastle }); proc.stderr!.on("data", (chunk: Buffer) => { stderrChunks.push(chunk.toString()); }); proc.on("error", (err) => reject(err)); proc.on("close", (code) => { resolve({ stdout: stdoutChunks.join("\n"), stderr: stderrChunks.join(""), exitCode: code ?? 0, }); }); }); } return new Promise((resolve, reject) => { execFile( "sh", ["-c", command], { cwd: opts?.cwd ?? worktreePath, maxBuffer: 10 * 1024 * 1024 }, (error, stdout, stderr) => { if (error && error.code === undefined) { reject(new Error(`exec failed: ${error.message}`)); } else { resolve({ stdout: stdout.toString(), stderr: stderr.toString(), exitCode: typeof error?.code === "number" ? error.code : 0, }); } }, ); }); }, copyFileIn: async (hostPath: string, sandboxPath: string) => { await fsMkdir(dirname(sandboxPath), { recursive: true }); await fsCopyFile(hostPath, sandboxPath); }, copyFileOut: async (sandboxPath: string, hostPath: string) => { await fsMkdir(dirname(hostPath), { recursive: true }); await fsCopyFile(sandboxPath, hostPath); }, close: async () => { // nothing to tear down for a local process }, }; }, }); ### Isolated provider example A minimal isolated provider using a temp directory: import { createIsolatedSandboxProvider, type IsolatedSandboxHandle, type ExecResult, } from "@ai-hero/sandcastle"; import { execFile, spawn } from "node:child_process"; import { copyFile, mkdir, mkdtemp, rm } from "node:fs/promises"; import { tmpdir } from "node:os"; import { dirname, join } from "node:path"; import { createInterface } from "node:readline"; const tempDir = () => createIsolatedSandboxProvider({ name: "temp-dir", create: async (): Promise => { const root = await mkdtemp(join(tmpdir(), "sandbox-")); const worktreePath = join(root, "workspace"); await mkdir(worktreePath, { recursive: true }); return { worktreePath, exec: ( command: string, opts?: { onLine?: (line: string) => void; cwd?: string }, ): Promise => { if (opts?.onLine) { const onLine = opts.onLine; return new Promise((resolve, reject) => { const proc = spawn("sh", ["-c", command], { cwd: opts?.cwd ?? worktreePath, stdio: ["ignore", "pipe", "pipe"], }); const stdoutChunks: string[] = []; const stderrChunks: string[] = []; const rl = createInterface({ input: proc.stdout! }); rl.on("line", (line) => { stdoutChunks.push(line); onLine(line); }); proc.stderr!.on("data", (chunk: Buffer) => { stderrChunks.push(chunk.toString()); }); proc.on("error", (err) => reject(err)); proc.on("close", (code) => { resolve({ stdout: stdoutChunks.join("\n"), stderr: stderrChunks.join(""), exitCode: code ?? 0, }); }); }); } return new Promise((resolve, reject) => { execFile( "sh", ["-c", command], { cwd: opts?.cwd ?? worktreePath, maxBuffer: 10 * 1024 * 1024 }, (error, stdout, stderr) => { if (error && error.code === undefined) { reject(new Error(`exec failed: ${error.message}`)); } else { resolve({ stdout: stdout.toString(), stderr: stderr.toString(), exitCode: typeof error?.code === "number" ? error.code : 0, }); } }, ); }); }, copyIn: async (hostPath: string, sandboxPath: string) => { const info = await stat(hostPath); if (info.isDirectory()) { await cp(hostPath, sandboxPath, { recursive: true }); } else { await mkdir(dirname(sandboxPath), { recursive: true }); await copyFile(hostPath, sandboxPath); } }, copyFileOut: async (sandboxPath: string, hostPath: string) => { await mkdir(dirname(hostPath), { recursive: true }); await copyFile(sandboxPath, hostPath); }, close: async () => { await rm(root, { recursive: true, force: true }); }, }; }, }); ### Branch strategies A branch strategy controls where the agent's commits land. Configure it when constructing the provider: | Strategy | Behavior | Bind-mount | Isolated | | --------------- | ------------------------------------------------------------------------ | ---------- | --------- | | `head` | Agent writes directly to the host working directory. No worktree created | Default | N/A | | `merge-to-head` | Sandcastle creates a temp branch, merges back to HEAD when done | Supported | Default | | `branch` | Commits land on an explicit named branch you provide | Supported | Supported | **When to use each:** - **`head`** — fast iteration during development. No branch indirection, no merge step. Only works with bind-mount providers since the agent needs direct host filesystem access. - **`merge-to-head`** — safe default for automation. The agent works on a throwaway branch; if something goes wrong, HEAD is untouched. Use this for CI or unattended runs. - **`branch`** — when you want commits on a specific branch (e.g. for a PR). Pass `{ type: "branch", branch: "agent/fix-42" }`. Branch strategy is now configured on `run()`, not on the provider: import { run, claudeCode } from "@ai-hero/sandcastle"; import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; // head — direct write, bind-mount only (default for bind-mount providers) await run({ agent: claudeCode("claude-opus-4-7"), sandbox: docker(), prompt: "…", }); // merge-to-head — temp branch, merge back (default for isolated providers) await run({ agent: claudeCode("claude-opus-4-7"), sandbox: tempDir(), prompt: "…", }); // branch — explicit named branch await run({ agent: claudeCode("claude-opus-4-7"), sandbox: docker(), branchStrategy: { type: "branch", branch: "agent/fix-42" }, prompt: "…", }); ### Passing to `run()` Pass your custom provider via the `sandbox` option — it works the same as the built-in `docker()` provider: import { run, claudeCode } from "@ai-hero/sandcastle"; const result = await run({ agent: claudeCode("claude-opus-4-7"), sandbox: localProcess(), // your custom provider prompt: "Fix issue #42 in this repo.", }); ### Reference implementations For real-world examples, see: ## Configuration ### Config directory (`.sandcastle/`) All per-repo sandbox configuration lives in `.sandcastle/`. Run `sandcastle init` to create it. ### Custom Dockerfile The `.sandcastle/Dockerfile` controls the sandbox environment. The default template installs: - **Node.js 22** (base image) - **git**, **curl**, **jq** (system dependencies) - **GitHub CLI** (`gh`) - **Claude Code CLI** - A non-root `agent` user (required — Claude runs as this user) When customizing the Dockerfile, ensure you keep: - A non-root user (the default `agent` user) for Claude to run as - `git` (required for commits and branch operations) - `gh` (required for issue fetching) - Claude Code CLI installed and on PATH Add your project-specific dependencies (e.g., language runtimes, build tools) to the Dockerfile as needed. ### Hooks Hooks are grouped by **where** they run — `host` (on the developer's machine) or `sandbox` (inside the container): hooks: { host: { onWorktreeReady: [{ command: "cp .env.example .env" }], onSandboxReady: [{ command: "echo sandbox is up" }], }, sandbox: { onSandboxReady: [ { command: "npm install", timeoutMs: 300_000 }, { command: "apt-get install -y ffmpeg", sudo: true }, ], }, } | Hook | Runs on | When | Working directory | | ------------------------ | ------- | -------------------------------------------- | ------------------------------------------- | | `host.onWorktreeReady` | Host | After `copyToWorktree`, before sandbox start | Worktree path (host repo root under `head`) | | `host.onSandboxReady` | Host | After sandbox is up | Worktree path (host repo root under `head`) | | `sandbox.onSandboxReady` | Sandbox | After sandbox is up | Sandbox repo directory | **Ordering:** `copyToWorktree` -> `host.onWorktreeReady` (sequential) -> sandbox created -> `host.onSandboxReady` + `sandbox.onSandboxReady` (parallel). - **Host hooks** accept `{ command: string; timeoutMs?: number }` — no `sudo`, no `cwd`. Use `cd` or inline env in the command string. - **Sandbox hooks** accept `{ command: string; sudo?: boolean; timeoutMs?: number }` — set `sudo: true` for elevated privileges. - **`timeoutMs`** overrides the default 60 s per-hook timeout. Useful for long-running setup commands like dependency installs (e.g. `timeoutMs: 300_000` for 5 minutes). - Within each hook point, sandbox hooks run in parallel; host hooks within `onSandboxReady` also run in parallel with sandbox hooks. `host.onWorktreeReady` hooks run sequentially in declared order. - If any hook exits non-zero, setup fails fast. - When a `signal` is passed to `run()`, it is threaded to all hooks — aborting the signal cancels any in-flight hook commands. ## Development npm install npm run build # Bundle with tsup npm test # Run tests with vitest npm run typecheck # Type-check ## License MIT
标签:自动化攻击