mattpocock/sandcastle

GitHub: mattpocock/sandcastle

TypeScript 库，用于在隔离沙箱中编排 AI 编码代理，自动管理 Git 分支策略与提交合并。

Stars: 6952 | Forks: 705

## What Is Sandcastle? A TypeScript library for orchestrating AI coding agents in isolated sandboxes: 1. You invoke agents with a single `sandcastle.run()`. 2. Sandcastle handles sandboxing the agent with a configurable branch strategy. 3. The commits made on the branches get merged back. Sandcastle is provider-agnostic — it ships with built-in providers for Docker, Podman, and Vercel, and you can create your own. Great for parallelizing multiple AFK agents, creating review pipelines, or even just orchestrating your own agents. ## Prerequisites - [Git](https://git-scm.com/) - A sandbox provider — Sandcastle needs an isolated environment to run agents in. Built-in options: - [Docker Desktop](https://www.docker.com/) — most common for local development - [Podman](https://podman.io/) — rootless alternative to Docker - [Vercel](https://vercel.com/) — cloud-based Firecracker microVMs via `@vercel/sandbox` - Or [create your own](#custom-sandbox-providers) using `createBindMountSandboxProvider` or `createIsolatedSandboxProvider` ## Quick start 1. Install the package: npm install --save-dev @ai-hero/sandcastle 2. Run `npx @ai-hero/sandcastle init`. This scaffolds a `.sandcastle` directory with all the files needed. npx @ai-hero/sandcastle init 3. Edit `.sandcastle/.env` and fill in your default values for `CLAUDE_CODE_OAUTH_TOKEN` (run `claude setup-token` on your host to get one). To use an Anthropic API key instead, uncomment and fill in `ANTHROPIC_API_KEY`. cp .sandcastle/.env.example .sandcastle/.env 4. Run the `.sandcastle/main.ts` (or `main.mts`) file with `npx tsx` npx tsx .sandcastle/main.ts // 3. Run the agent via the JS API import { run, claudeCode } from "@ai-hero/sandcastle"; import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; await run({ agent: claudeCode("claude-opus-4-8"), sandbox: docker(), // or podman(), vercel(), or your own provider promptFile: ".sandcastle/prompt.md", }); ## Sandbox Providers Sandcastle uses a `SandboxProvider` to create isolated environments. The `sandbox` option on `run()`, `interactive()`, and `createSandbox()` accepts any provider, including `noSandbox()` — opt in to running the agent directly on the host when container isolation is undesired. Built-in providers: | Provider | Import path | Type | Accepted by | | ---------- | ------------------------------------------ | ---------- | ------------------------------------------- | | Docker | `@ai-hero/sandcastle/sandboxes/docker` | Bind-mount | `run()`, `createSandbox()`, `interactive()` | | Podman | `@ai-hero/sandcastle/sandboxes/podman` | Bind-mount | `run()`, `createSandbox()`, `interactive()` | | Vercel | `@ai-hero/sandcastle/sandboxes/vercel` | Isolated | `run()`, `createSandbox()`, `interactive()` | | No-sandbox | `@ai-hero/sandcastle/sandboxes/no-sandbox` | None | `run()`, `createSandbox()`, `interactive()` | Worktree methods (`wt.run()`, `wt.interactive()`, `wt.createSandbox()`) accept the same providers as their top-level counterparts. `wt.interactive()` defaults to `noSandbox()` when no sandbox is specified. import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; import { podman } from "@ai-hero/sandcastle/sandboxes/podman"; import { vercel } from "@ai-hero/sandcastle/sandboxes/vercel"; import { noSandbox } from "@ai-hero/sandcastle/sandboxes/no-sandbox"; // Docker, Podman, and Vercel are interchangeable in run() and createSandbox(): await run({ agent: claudeCode("claude-opus-4-8"), sandbox: docker(), prompt: "...", }); // No-sandbox runs the agent directly on the host — accepted by run(), // createSandbox(), and interactive(). Skips container isolation entirely: await interactive({ agent: claudeCode("claude-opus-4-8"), sandbox: noSandbox(), prompt: "...", // optional — omit to launch the TUI with no initial prompt cwd: "/path/to/other-repo", // optional — defaults to process.cwd() }); You can also [create your own provider](#custom-sandbox-providers) using `createBindMountSandboxProvider` or `createIsolatedSandboxProvider`. ## API Sandcastle exports a programmatic `run()` function for use in scripts, CI pipelines, or custom tooling. The examples below use `docker()`, but any `SandboxProvider` works in its place. import { run, claudeCode } from "@ai-hero/sandcastle"; import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; const result = await run({ agent: claudeCode("claude-opus-4-8"), sandbox: docker(), promptFile: ".sandcastle/prompt.md", }); console.log(result.iterations.length); // number of iterations executed console.log(result.iterations); // per-iteration results with optional sessionId console.log(result.commits); // array of { sha } for commits created console.log(result.branch); // target branch name ### All options import { run, claudeCode } from "@ai-hero/sandcastle"; import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; const result = await run({ // Agent provider — required. Pass a model string to claudeCode(). // Optional second arg for provider-specific options like effort level. agent: claudeCode("claude-opus-4-8", { effort: "high" }), // Sandbox provider — required. Any SandboxProvider works (docker, podman, vercel, or custom). // Provider-specific config (like imageName, mounts) lives inside the provider factory call. sandbox: docker({ imageName: "sandcastle:local", // Optional: override the UID/GID used for --user flag (defaults to host UID/GID). // Must match the UID baked into the image. Pre-flight check catches mismatches. // containerUid: 1000, // containerGid: 1000, // Optional: mount host directories into the sandbox (e.g. package manager caches) // hostPath supports absolute, tilde-expanded (~), and relative paths (resolved from cwd). // sandboxPath supports absolute and relative paths (resolved from the sandbox repo directory). mounts: [ { hostPath: "~/.npm", sandboxPath: "/home/agent/.npm", readonly: true }, { hostPath: "data", sandboxPath: "data" }, // mounts /data → /data ], // Optional: SELinux volume label — "z" (default, shared), "Z" (private), or false (none). // No-op on non-SELinux systems (Docker Desktop on macOS/Windows, Linux without SELinux). selinuxLabel: "z", // Optional: provider-level env vars merged at launch time env: { DOCKER_SPECIFIC: "value" }, // Optional: attach container to Docker network(s) — string or string[] network: "my-network", // Optional: add the container user to supplementary groups via --group-add. // Accepts group names or numeric GIDs (e.g. for a bind-mounted Docker socket). groups: ["docker", 999], // Optional: expose host devices via --device. Each entry is a full device // spec in host[:container[:permissions]] form (e.g. "/dev/kvm"). devices: ["/dev/kvm"], // Optional: limit CPU resources via --cpus. Fractional values allowed (e.g. 1.5). // cpus: 2, }), // Host repo directory — replaces process.cwd() as the anchor for // .sandcastle/ artifacts (worktrees, logs, env, patches) and git operations. // Relative paths resolve against process.cwd(). Defaults to process.cwd(). cwd: "../other-repo", // Branch strategy — controls how the agent's changes relate to branches. // Defaults to { type: "head" } for bind-mount and { type: "merge-to-head" } for isolated providers. branchStrategy: { type: "branch", branch: "agent/fix-42" }, // Prompt source — provide one of these, not both. // Note: promptFile resolves against process.cwd(), NOT cwd. promptFile: ".sandcastle/prompt.md", // path to a prompt file // prompt: "Fix issue #42 in this repo", // OR an inline prompt string // Values substituted for {{KEY}} placeholders in the prompt. promptArgs: { ISSUE_NUMBER: "42", }, // Maximum number of agent iterations to run before stopping. Default: 1 maxIterations: 5, // Display name for this run, shown as a prefix in log output. name: "fix-issue-42", // Lifecycle hooks grouped by where they run: host or sandbox. hooks: { host: { onWorktreeReady: [{ command: "cp .env.example .env" }], onSandboxReady: [{ command: "echo setup done" }], }, sandbox: { onSandboxReady: [{ command: "npm install" }], }, }, // Host-relative file paths to copy into the sandbox before the container starts. // Not supported with branchStrategy: { type: "head" }. copyToWorktree: [".env"], // Override default timeouts for built-in lifecycle steps. // Unset keys keep their defaults. timeouts: { copyToWorktreeMs: 120_000, // default: 60_000 gitSetupMs: 30_000, // default: 10_000 commitCollectionMs: 60_000, // default: 30_000 mergeToHostMs: 60_000, // default: 30_000 }, // How to record progress. Default: write to a file under .sandcastle/logs/ logging: { type: "file", path: ".sandcastle/logs/my-run.log", // Optional: forward the agent's output stream to your own observability system. // Fires for each text chunk, tool call, and raw stdout line the agent // produces. Errors thrown by the callback are swallowed so a broken // forwarder cannot kill the run. onAgentStreamEvent: (event) => { // event is { type: "text" | "toolCall" | "raw", iteration, timestamp, ... } myLogger.info(event); }, // Optional: append every raw stdout line the agent emits to the same // log file, interleaved with the human-readable output. Includes lines // the provider's stream parser would otherwise drop. Intended for // debugging stuck or unexpected agent behaviour. verbose: true, }, // logging: { type: "stdout", verbose: true }, // OR terminal mode (verbose: raw lines to stdout) // String (or array of strings) the agent emits to end the iteration loop early. // Default: "COMPLETE" completionSignal: "COMPLETE", // Idle timeout in seconds — resets whenever the agent produces output. Default: 600 (10 minutes) idleTimeoutSeconds: 600, // Grace window in seconds after the agent emits a completion signal but // before its process has exited (a "hanging process" — typically a spawned // `gh`/git child or MCP server keeping stdout open). Resets on every // subsequent output line so trailing data is still captured. Default: 60 completionTimeoutSeconds: 60, // Structured output — extract a typed payload from the agent's stdout. // Requires maxIterations === 1 and the tag must appear in the prompt. // output: Output.object({ tag: "result", schema: z.object({ answer: z.number() }) }), // output: Output.string({ tag: "summary" }), }); console.log(result.iterations.length); // number of iterations executed console.log(result.completionSignal); // matched signal string, or undefined if none fired console.log(result.commits); // array of { sha } for commits created console.log(result.branch); // target branch name ### `createSandbox()` — reusable sandbox Use `createSandbox()` when you need to run multiple agents (or multiple rounds of the same agent) inside a single sandbox. It creates the sandbox once, and you call `sandbox.run()` as many times as you need. This avoids repeated container startup costs and keeps all runs on the same branch. Use `run()` instead when you only need a single one-shot invocation — it handles sandbox lifecycle automatically. #### Basic single-run usage import { createSandbox, claudeCode } from "@ai-hero/sandcastle"; import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; await using sandbox = await createSandbox({ branch: "agent/fix-42", sandbox: docker(), }); const result = await sandbox.run({ agent: claudeCode("claude-opus-4-8"), prompt: "Fix issue #42 in this repo.", }); console.log(result.commits); // [{ sha: "abc123" }] #### Multi-run implement-then-review import { createSandbox, claudeCode } from "@ai-hero/sandcastle"; import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; await using sandbox = await createSandbox({ branch: "agent/fix-42", sandbox: docker(), hooks: { sandbox: { onSandboxReady: [{ command: "npm install" }] } }, }); // Step 1: implement const implResult = await sandbox.run({ agent: claudeCode("claude-opus-4-8"), promptFile: ".sandcastle/implement.md", maxIterations: 5, }); // Step 2: review on the same branch, same container const reviewResult = await sandbox.run({ agent: claudeCode("claude-sonnet-4-6"), prompt: "Review the changes and fix any issues.", }); Commits from all `run()` calls accumulate on the same branch. The sandbox container stays alive between runs, so installed dependencies and build artifacts persist. `sandbox.exec()` lets the harness run shell commands directly in the same warm sandbox — handy for gating an implement step on a quick verification before kicking off the review: await using sandbox = await createSandbox({ branch: "agent/fix-42", sandbox: docker(), hooks: { sandbox: { onSandboxReady: [{ command: "npm install" }] } }, }); await sandbox.run({ agent: claudeCode("claude-opus-4-8"), promptFile: ".sandcastle/implement.md", maxIterations: 5, }); // Verify before review — non-zero exitCode is returned, not thrown. const tests = await sandbox.exec("npm test"); if (tests.exitCode !== 0) { throw new Error(`Tests failed:\n${tests.stdout}\n${tests.stderr}`); } await sandbox.run({ agent: claudeCode("claude-sonnet-4-6"), prompt: "Review the changes and fix any issues.", }); `cwd` defaults to the sandbox repo path, matching `interactive()`. Pass `cwd` to override. #### Automatic cleanup with `await using` `await using` calls `sandbox.close()` automatically when the block exits. If the sandbox has uncommitted changes, the worktree is preserved on disk; if clean, both container and worktree are removed. #### Manual `close()` with `CloseResult` const sandbox = await createSandbox({ branch: "agent/fix-42", sandbox: docker(), }); // ... run agents ... const closeResult = await sandbox.close(); if (closeResult.preservedWorktreePath) { console.log(`Worktree preserved at ${closeResult.preservedWorktreePath}`); } #### `CreateSandboxOptions` | Option | Type | Default | Description | | ---------------- | --------------- | --------------- | ------------------------------------------------------------------------------------------------------------------- | | `branch` | string | — | **Required.** Explicit branch for the sandbox | | `sandbox` | SandboxProvider | — | **Required.** Sandbox provider (e.g. `docker()`, `podman()`) | | `cwd` | string | `process.cwd()` | Host repo directory — relative paths resolve against `process.cwd()` | | `hooks` | SandboxHooks | — | Lifecycle hooks (`host.*`, `sandbox.*`) — run once at creation time | | `copyToWorktree` | string[] | — | Host-relative file paths to copy into the sandbox at creation time | | `timeouts` | Timeouts | — | Override built-in lifecycle step timeouts (`copyToWorktreeMs`, `gitSetupMs`, `commitCollectionMs`, `mergeToHostMs`) | #### `Sandbox` | Property / Method | Type | Description | | ----------------------- | ------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------- | | `branch` | string | The branch the sandbox is on | | `worktreePath` | string | Host path to the worktree | | `run(options)` | `(SandboxRunOptions) => Promise` | Invoke an agent inside the existing sandbox | | `interactive(options)` | `(SandboxInteractiveOptions) => Promise` | Launch an interactive session in the sandbox | | `exec(cmd, options?)` | `(command: string, options?: SandboxExecOptions) => Promise` | Run a shell command in the sandbox. `cwd` defaults to the sandbox repo path. Non-zero `exitCode` is returned, not thrown. | | `close()` | `() => Promise` | Tear down the container and sandbox | | `[Symbol.asyncDispose]` | `() => Promise` | Auto teardown via `await using` | #### `SandboxRunOptions` | Option | Type | Default | Description | | -------------------------- | ------------------ | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | | `agent` | AgentProvider | — | **Required.** Agent provider (e.g. `claudeCode("claude-opus-4-8")`) | | `prompt` | string | — | Inline prompt (mutually exclusive with `promptFile`) | | `promptFile` | string | — | Path to prompt file (mutually exclusive with `prompt`) | | `promptArgs` | PromptArgs | — | Key-value map for `{{KEY}}` placeholder substitution | | `maxIterations` | number | `1` | Maximum iterations to run | | `completionSignal` | string \| string[] | `COMPLETE` | String(s) the agent emits to stop the iteration loop early | | `idleTimeoutSeconds` | number | `600` | Idle timeout in seconds — resets on each agent output event | | `completionTimeoutSeconds` | number | `60` | Grace window after the completion signal is seen but the agent process hasn't exited | | `name` | string | — | Display name for the run | | `logging` | object | file (auto-generated) | `{ type: 'file', path }` or `{ type: 'stdout' }` | | `resumeSession` | string | — | Resume a prior session by ID for agents that support resume. Incompatible with `maxIterations > 1`. Session file must exist on host. | | `signal` | AbortSignal | — | Cancels the run when aborted; handle stays usable afterward | #### `SandboxRunResult` | Field | Type | Description | | -------------------------- | ---------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | | `iterations` | `IterationResult[]` | Per-iteration results (use `.length` for the count) | | `completionSignal` | string? | The matched completion signal string, or `undefined` if none fired | | `stdout` | string | Combined agent output from all iterations | | `commits` | `{ sha }[]` | Commits created during the run | | `logFilePath` | string? | Path to the log file (only when logging to a file) | | `resume(prompt, options?)` | `(prompt: string, options?: ResumeSandboxRunResultOptions) => Promise` | Continue the captured session for one iteration inside the same warm sandbox. Present only when the provider captured a session id. | | `fork(prompt, options?)` | `(prompt: string, options?: ResumeSandboxRunResultOptions) => Promise` | Fork the captured session for one iteration inside the same warm sandbox. The parent session is left intact (ADR 0018). | #### `CloseResult` | Field | Type | Description | | ----------------------- | ------- | ------------------------------------------------------------------------ | | `preservedWorktreePath` | string? | Host path to the preserved worktree, set when it had uncommitted changes | ### `createWorktree()` — independent worktree lifecycle Use `createWorktree()` when you need a worktree (git worktree) as an independent, first-class concept — separate from any sandbox. This is useful when you want to run an interactive session first and then hand the same worktree to a sandboxed AFK agent. Only `branch` and `merge-to-head` strategies are accepted; `head` is a compile-time type error since it means no worktree. Pass `cwd` to target a repo other than `process.cwd()`. Relative paths resolve against `process.cwd()`; absolute paths pass through. A `CwdError` is thrown if the path does not exist or is not a directory. import { createWorktree } from "@ai-hero/sandcastle"; await using wt = await createWorktree({ branchStrategy: { type: "branch", branch: "agent/fix-42" }, copyToWorktree: ["node_modules"], cwd: "/path/to/other-repo", // optional — defaults to process.cwd() }); console.log(wt.worktreePath); // host path to the worktree console.log(wt.branch); // "agent/fix-42" // Run an interactive session in the worktree (defaults to noSandbox) await wt.interactive({ agent: claudeCode("claude-opus-4-8"), prompt: "Explore the codebase and understand the bug.", }); // Run an AFK agent in the worktree (sandbox is required) const result = await wt.run({ agent: claudeCode("claude-opus-4-8"), sandbox: docker({ imageName: "sandcastle:myrepo" }), prompt: "Fix issue #42.", maxIterations: 3, }); console.log(result.commits); // commits made during the run // Create a long-lived sandbox from the worktree import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; await using sandbox = await wt.createSandbox({ sandbox: docker(), hooks: { sandbox: { onSandboxReady: [{ command: "npm install" }] } }, }); // sandbox.close() tears down the container only — the worktree stays await sandbox.close(); // wt.close() cleans up the worktree `wt.close()` checks for uncommitted changes: if the worktree is dirty, it's preserved on disk; if clean, it's removed. `await using` calls `close()` automatically. The worktree persists after `run()`, `interactive()`, and `createSandbox()` complete, so you can hand it to another agent or inspect it. With `branchStrategy: { type: "merge-to-head" }`, each `wt.run()` / `wt.interactive()` merges the agent's commits back to the host's current branch before returning, and the worktree's source branch is preserved across calls so subsequent ones can reuse the same handle. (This differs from top-level `run()`, where the temp branch is deleted after the merge.) **Split ownership**: When a sandbox is created via `wt.createSandbox()`, `sandbox.close()` tears down the container only — the worktree remains. `wt.close()` is responsible for worktree cleanup. This differs from the top-level `createSandbox()`, where `sandbox.close()` owns both container and worktree. #### `CreateWorktreeOptions` | Option | Type | Default | Description | | ---------------- | ---------------------- | ------- | ------------------------------------------------------------------------------------------------------------------- | | `branchStrategy` | WorktreeBranchStrategy | — | **Required.** `{ type: "branch", branch }` or `{ type: "merge-to-head" }` | | `copyToWorktree` | string[] | — | Host-relative file paths to copy into the worktree at creation time | | `timeouts` | Timeouts | — | Override built-in lifecycle step timeouts (`copyToWorktreeMs`, `gitSetupMs`, `commitCollectionMs`, `mergeToHostMs`) | #### `Worktree` | Property / Method | Type | Description | | ------------------------ | --------------------------------------------------------------------- | --------------------------------------------------- | | `branch` | string | The branch the worktree is on | | `worktreePath` | string | Host path to the worktree | | `run(options)` | `(options: WorktreeRunOptions) => Promise` | Run an AFK agent in the worktree (sandbox required) | | `interactive(options)` | `(options: WorktreeInteractiveOptions) => Promise` | Run an interactive agent session in the worktree | | `createSandbox(options)` | `(options: WorktreeCreateSandboxOptions) => Promise` | Create a long-lived sandbox backed by this worktree | | `close()` | `() => Promise` | Clean up the worktree (preserves if dirty) | | `[Symbol.asyncDispose]` | `() => Promise` | Auto cleanup via `await using` | #### `WorktreeInteractiveOptions` | Option | Type | Default | Description | | ------------ | ---------------------- | ------------- | ------------------------------------------------------------------------------------------------- | | `agent` | AgentProvider | — | **Required.** Agent provider | | `sandbox` | AnySandboxProvider | `noSandbox()` | Sandbox provider (defaults to no sandbox) | | `prompt` | string | — | Inline prompt (mutually exclusive with `promptFile`) | | `promptFile` | string | — | Path to prompt file | | `name` | string | — | Optional session name | | `hooks` | SandboxHooks | — | Lifecycle hooks (`host.*`, `sandbox.*`) | | `promptArgs` | PromptArgs | — | Key-value map for `{{KEY}}` placeholder substitution | | `env` | Record | — | Environment variables to inject into the sandbox | | `signal` | AbortSignal | — | Cancel the session when aborted. The worktree is preserved on disk. Rejects with `signal.reason`. | #### `WorktreeRunOptions` | Option | Type | Default | Description | | -------------------------- | ---------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------ | | `agent` | AgentProvider | — | **Required.** Agent provider | | `sandbox` | SandboxProvider | — | **Required.** Sandbox provider (AFK agents must be sandboxed) | | `prompt` | string | — | Inline prompt (mutually exclusive with `promptFile`) | | `promptFile` | string | — | Path to prompt file | | `maxIterations` | number | 1 | Maximum iterations to run | | `completionSignal` | string \| string[] | — | Substring(s) to stop the iteration loop early | | `idleTimeoutSeconds` | number | 600 | Idle timeout in seconds | | `completionTimeoutSeconds` | number | 60 | Grace window after completion signal is seen but agent process hasn't exited | | `name` | string | — | Optional run name | | `logging` | LoggingOption | file | Logging mode | | `hooks` | SandboxHooks | — | Lifecycle hooks (`host.*`, `sandbox.*`) | | `promptArgs` | PromptArgs | — | Key-value map for `{{KEY}}` placeholder substitution | | `env` | Record | — | Environment variables to inject into the sandbox | | `resumeSession` | string | — | Resume a prior session by ID for agents that support resume. Incompatible with `maxIterations > 1`. Session file must exist on host. | | `signal` | AbortSignal | — | Cancel the run when aborted. Kills the in-flight agent subprocess; the worktree is preserved on disk. Rejects with `signal.reason`. | #### `WorktreeRunResult` | Property | Type | Description | | ------------------ | ------------------- | ------------------------------------------------------ | | `iterations` | `IterationResult[]` | Per-iteration results (use `.length` for the count) | | `completionSignal` | string | The matched completion signal, or undefined | | `stdout` | string | Combined stdout output from all agent iterations | | `commits` | { sha: string }[] | List of commits made by the agent during the run | | `branch` | string | The branch name the agent worked on | | `logFilePath` | string | Path to the log file, if logging was drained to a file | #### `WorktreeCreateSandboxOptions` | Option | Type | Default | Description | | ---------------- | --------------- | ------- | ------------------------------------------------------------------------------------------------------------------- | | `sandbox` | SandboxProvider | — | **Required.** Sandbox provider (e.g. `docker()`) | | `hooks` | SandboxHooks | — | Lifecycle hooks (`host.*`, `sandbox.*`) | | `copyToWorktree` | string[] | — | Host-relative file paths to copy into the worktree at creation time | | `timeouts` | Timeouts | — | Override built-in lifecycle step timeouts (`copyToWorktreeMs`, `gitSetupMs`, `commitCollectionMs`, `mergeToHostMs`) | ## How it works Sandcastle uses a **branch strategy** configured on the sandbox provider to control how the agent's changes relate to branches. There are three strategies: - **Head** (`{ type: "head" }`) — The agent writes directly to the host working directory. No worktree, no branch indirection. This is the default for bind-mount providers like `docker()`. - **Merge-to-head** (`{ type: "merge-to-head" }`) — Sandcastle creates a temporary branch in a git worktree. The agent works on the temp branch, and changes are merged back to HEAD when done. The temp branch is cleaned up after merge. - **Branch** (`{ type: "branch", branch: "foo" }`) — Commits land on an explicitly named branch in a git worktree. Re-running with the same branch reuses the existing worktree and fast-forwards it from `origin` when safe — see [ADR 0003](docs/adr/0003-reuse-worktree-by-default.md). For bind-mount providers (like Docker), the worktree directory is bind-mounted into the container — the agent writes directly to the host filesystem through the mount, so no sync is needed. From your point of view, you just configure `branchStrategy: { type: 'branch', branch: 'foo' }` on `run()`, and get a commit on branch `foo` once it's complete. All 100% local. ## Prompts Sandcastle uses a flexible prompt system. You write the prompt, and the engine executes it — no opinions about workflow, task management, or context sources are imposed. ### Prompt resolution You must provide exactly one of: 1. `prompt: "inline string"` — pass an inline prompt directly via `RunOptions` 2. `promptFile: "./path/to/prompt.md"` — point to a specific file via `RunOptions` `prompt` and `promptFile` are mutually exclusive — providing both is an error. If neither is provided, `run()` throws an error asking you to supply one. **Inline prompts (`prompt: "..."`) are passed to the agent literally.** No `{{KEY}}` substitution, no `` !`command` `` expansion, no built-in `{{SOURCE_BRANCH}}` / `{{TARGET_BRANCH}}` injection. If you need values interpolated into an inline prompt, build the string in JavaScript (`` `Work on ${branch}…` ``). Passing `promptArgs` alongside an inline prompt is an error — switch to `promptFile` to use substitution. The substitution and expansion features below apply **only** to prompts sourced from `promptFile`. ### Dynamic context with `` !`command` `` Use `` !`command` `` expressions in your prompt to pull in dynamic context. Each expression is replaced with the command's stdout before the prompt is sent to the agent. All expressions in a prompt run **in parallel** for faster expansion. Commands run **inside the sandbox** after `sandbox.onSandboxReady` hooks complete, so they see the same repo state the agent sees (including installed dependencies). # Open issues !`gh issue list --state open --label Sandcastle --json number,title,body,comments,labels --limit 100` # Recent commits !`git log --oneline -10` If any command exits with a non-zero code, the run fails immediately with an error. ### Prompt arguments with `{{KEY}}` Use `{{KEY}}` placeholders in your prompt to inject values from the `promptArgs` option. This is useful for reusing the same prompt file across multiple runs with different parameters. import { run } from "@ai-hero/sandcastle"; await run({ promptFile: "./my-prompt.md", promptArgs: { ISSUE_NUMBER: 42, PRIORITY: "high" }, }); In the prompt file: Work on issue #{{ISSUE_NUMBER}} (priority: {{PRIORITY}}). Prompt argument substitution runs on the host before shell expression expansion, so `{{KEY}}` placeholders inside `` !`command` `` expressions are replaced first: !`gh issue view {{ISSUE_NUMBER}} --json body -q .body` A `{{KEY}}` placeholder with no matching prompt argument is an error. Unused prompt arguments produce a warning. `` !`command` `` expansion only runs on shell blocks written in the prompt file itself. Any `` !`…` `` pattern that appears inside an argument value is treated as inert text — it won't be executed against the host shell. This makes it safe to pass user-authored content (issue titles, PR descriptions, docs excerpts) through `promptArgs`. ### Built-in prompt arguments Sandcastle automatically injects two built-in prompt arguments into every prompt: | Placeholder | Value | | ------------------- | ----------------------------------------------------------------- | | `{{SOURCE_BRANCH}}` | The branch the agent works on (determined by the branch strategy) | | `{{TARGET_BRANCH}}` | The host's active branch at `run()` time | Use them in your prompt without passing them via `promptArgs`: You are working on {{SOURCE_BRANCH}}. When diffing, compare against {{TARGET_BRANCH}}. Passing `SOURCE_BRANCH` or `TARGET_BRANCH` in `promptArgs` is an error — built-in prompt arguments cannot be overridden. ### Early termination with `COMPLETE` When the agent outputs `COMPLETE`, the orchestrator stops the iteration loop early. This is a convention you document in your prompt for the agent to follow — the engine never injects it. This is useful for task-based workflows where the agent should stop once it has finished, rather than running all remaining iterations. You can override the default signal by passing `completionSignal` to `run()`. It accepts a single string or an array of strings: await run({ // ... completionSignal: "DONE", }); // Or pass multiple signals — the loop stops on the first match: await run({ // ... completionSignal: ["TASK_COMPLETE", "TASK_ABORTED"], }); Tell the agent to output your chosen string(s) in the prompt, and the orchestrator will stop when it detects any of them. The matched signal is returned as `result.completionSignal`. #### Hanging processes after the completion signal The agent process is expected to exit shortly after emitting the completion signal. When a child it spawned — a `gh`/git subprocess, a long-lived MCP server, etc. — inherits the agent's stdout pipe and keeps it open, the parent process can linger long past its logical end. Sandcastle would otherwise wait for the full `idleTimeoutSeconds` and fail with `AgentIdleTimeoutError`, throwing away the commits the agent already made. Instead, once the completion signal is observed in the output buffer, Sandcastle swaps in a short **completion timeout** (default 60 s). When it expires, the run resolves successfully with a warning that the process was hanging; `result.commits` and `result.completionSignal` are populated as if the process had exited cleanly. The timer resets on every subsequent output line, so trailing data emitted after the signal — token-usage events, terminal `result` events, a structured-output `` — is still captured. A clean process exit always wins the race, so healthy runs gain zero added latency. The completion timeout only matters when the process hangs. Tune the window with `completionTimeoutSeconds`: await run({ // ... completionTimeoutSeconds: 30, // shorter grace window }); This is independent of `idleTimeoutSeconds`. They cover different phases: `idleTimeoutSeconds` runs **before** any signal is seen (genuinely stuck agent → fail); `completionTimeoutSeconds` runs **after** the signal is seen (hanging process → succeed with warning). See [ADR 0019](docs/adr/0019-completion-timeout-for-hanging-process.md). ### Structured output Use `Output.object()` to extract a typed, schema-validated JSON payload from the agent's stdout. The agent emits its answer inside an XML tag you specify, and Sandcastle parses, validates, and returns it on `result.output`. The schema can be any [Standard Schema](https://standardschema.dev) validator — the examples below use [Zod](https://zod.dev), but Valibot, ArkType, and others work identically. See [ADR 0010](docs/adr/0010-structured-output.md) for design rationale. import { run, Output, claudeCode } from "@ai-hero/sandcastle"; import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; import { z } from "zod"; const result = await run({ agent: claudeCode("claude-opus-4-8"), sandbox: docker(), prompt: `Analyze the code, and output the result as JSON inside tags. The result must match this schema: { summary: string; score: string } `, output: Output.object({ tag: "result", schema: z.object({ summary: z.string(), score: z.number() }), }), }); console.log(result.output.summary); // typed as string console.log(result.output.score); // typed as number `Output.string({ tag })` extracts the tag contents as a plain string (trimmed, no JSON parsing). Both helpers require `maxIterations` to be `1` (the default). The resolved prompt must contain the configured opening tag literal. When extraction or validation fails, `run()` throws a `StructuredOutputError`. Alongside `tag`, `rawMatched`, `cause`, `commits`, `branch`, and `preservedWorktreePath`, the error carries the `sessionId` (and `sessionFilePath`, when the session was captured) of the run that produced the bad output. Pass `maxRetries` to have Sandcastle handle the retry loop for you. Each retry resumes the same agent session and feeds back a token-efficient description of the error, so the agent can re-emit a corrected tag without redoing the work. Retries require an agent provider that supports session resumption (`claudeCode`, `codex`, `pi`) — calling `run()` with `maxRetries > 0` against a non-resumable provider (`cursor`, `opencode`, `copilot`) throws immediately. const result = await run({ agent: claudeCode("claude-opus-4-8"), sandbox: docker(), prompt: "Analyze the code and emit JSON inside tags.", output: Output.object({ tag: "result", schema: z.object({ summary: z.string(), score: z.number() }), maxRetries: 2, // 2 retries on top of the initial attempt }), }); If you need to drive the retry loop manually — for example, to customise the feedback prompt or rotate models on each attempt — leave `maxRetries` at its default of `0` and resume the failed session yourself: import { run, Output, StructuredOutputError } from "@ai-hero/sandcastle"; try { return await run({ ...opts, output }); } catch (e) { if (e instanceof StructuredOutputError && e.sessionId) { return await run({ ...opts, output, resumeSession: e.sessionId, prompt: `Your previous output failed: ${e.message}. Re-emit it inside <${e.tag}> tags.`, }); } throw e; } ### Templates `sandcastle init` prompts you to choose a sandbox provider (Docker or Podman), an issue tracker (GitHub Issues, Beads, or Custom), and a template, which scaffolds a ready-to-use prompt and `main.mts` suited to a specific workflow. If your project's `package.json` has `"type": "module"`, the file will be named `main.ts` instead. Choosing **Custom** scaffolds the project in a deliberately broken-until-configured state plus a `.sandcastle/SETUP_ISSUE_TRACKER.md` prompt you feed to your coding agent, which wires up your own tracker by editing the scaffolded files in place. Five templates are available: | Template | Description | | ------------------------------ | ------------------------------------------------------------------------- | | `blank` | Bare scaffold — write your own prompt and orchestration | | `simple-loop` | Picks issues one by one and closes them | | `sequential-reviewer` | Implements issues one by one, with a code review step after each | | `parallel-planner` | Plans parallelizable issues, executes on separate branches, then merges | | `parallel-planner-with-review` | Plans parallelizable issues, executes with per-branch review, then merges | Select a template during `sandcastle init` when prompted, or re-run init in a fresh repo to try a different one. ## CLI commands ### `sandcastle init` Scaffolds the `.sandcastle/` config directory and builds the container image. This is the first command you run in a new repo. You choose a sandbox provider (Docker or Podman) during init — selecting Podman writes a `Containerfile` instead of `Dockerfile` and uses `sandcastle podman build-image` for the build step. Init detects your host package manager (npm, pnpm, yarn, or bun) from a `packageManager` field or lockfile, defaulting to npm. Templates whose `main` file imports a host dependency — the planner templates import [Zod](https://zod.dev) for their `` output schema — prompt you to install it with that package manager when it isn't already in your `package.json`, so the first `npx tsx .sandcastle/main.ts` doesn't fail with `ERR_MODULE_NOT_FOUND`. Every interactive prompt has a paired `--flag` so the entire init can run non-interactively (e.g. in CI or a scripted setup). When stdin is not a TTY and a required flag is missing, init fails fast with a clear error rather than wedging on a prompt. | Option | Required | Default | Description | | ------------------------- | -------- | ---------------------------- | -------------------------------------------------------------------------------------------------------------- | | `--image-name` | No | `sandcastle:` | Docker image name | | `--agent` | No | Interactive prompt | Agent to use (`claude-code`, `pi`, `codex`, `cursor`, `opencode`, `copilot`) | | `--model` | No | Agent's default model | Model to use (e.g. `claude-sonnet-4-6`). Defaults to agent's default | | `--sandbox` | No | Interactive prompt | Sandbox provider to use (`docker`, `podman`) | | `--template` | No | Interactive prompt | Template to scaffold (e.g. `blank`, `simple-loop`) | | `--issue-tracker` | No | Interactive prompt | Issue tracker to use (`github-issues`, `beads`, `custom`) | | `--create-label` | No | Interactive prompt | `true` / `false` — whether to create the `Sandcastle` GitHub label (only with `--issue-tracker github-issues`) | | `--build-image` | No | Interactive prompt | `true` / `false` — whether to build the sandbox image now (silently ignored with `--issue-tracker custom`) | | `--install-template-deps` | No | Interactive prompt | `true` / `false` — whether to install template host deps (e.g. `zod` for the planner templates) | Creates the following files: .sandcastle/ ├── Dockerfile # Sandbox environment (customize as needed) ├── prompt.md # Agent instructions ├── .env.example # Token placeholders └── .gitignore # Ignores .env, logs/ Errors if `.sandcastle/` already exists to prevent overwriting customizations. ### `sandcastle docker build-image` Rebuilds the Docker image from an existing `.sandcastle/` directory. Use this after modifying the Dockerfile. On Linux/macOS, the build automatically passes `--build-arg AGENT_UID=$(id -u)` and `AGENT_GID=$(id -g)` so the image's `agent` user matches the host UID — this prevents permission errors on image-built files without runtime chown. | Option | Required | Default | Description | | -------------- | -------- | ---------------------------- | --------------------------------------------------------------------------------- | | `--image-name` | No | `sandcastle:` | Docker image name | | `--dockerfile` | No | — | Path to a custom Dockerfile (build context will be the current working directory) | ### `sandcastle docker remove-image` Removes the Docker image. | Option | Required | Default | Description | | -------------- | -------- | ---------------------------- | ----------------- | | `--image-name` | No | `sandcastle:` | Docker image name | ### `sandcastle podman build-image` Builds the Podman image from an existing `.sandcastle/` directory. Use this after modifying the Containerfile. | Option | Required | Default | Description | | ----------------- | -------- | ---------------------------- | ------------------------------------------------------------------------------------ | | `--image-name` | No | `sandcastle:` | Podman image name | | `--containerfile` | No | — | Path to a custom Containerfile (build context will be the current working directory) | ### `sandcastle podman remove-image` Removes the Podman image. | Option | Required | Default | Description | | -------------- | -------- | ---------------------------- | ----------------- | | `--image-name` | No | `sandcastle:` | Podman image name | ### `RunOptions` | Option | Type | Default | Description | | -------------------------- | ------------------ | ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `agent` | AgentProvider | — | **Required.** Agent provider (e.g. `claudeCode("claude-opus-4-8")`, `pi("claude-sonnet-4-6")`, `codex("gpt-5.4")`, `cursor("composer-2")`, `opencode("opencode/big-pickle")`, `copilot("claude-sonnet-4.5")`) | | `sandbox` | SandboxProvider | — | **Required.** Sandbox provider (e.g. `docker()`, `podman()`, `docker({ imageName: "sandcastle:local" })`) | | `cwd` | string | `process.cwd()` | Host repo directory — anchor for `.sandcastle/` artifacts and git operations. Relative paths resolve against `process.cwd()`. | | `prompt` | string | — | Inline prompt (mutually exclusive with `promptFile`) | | `promptFile` | string | — | Path to prompt file (mutually exclusive with `prompt`). Resolves against `process.cwd()`, **not** `cwd`. | | `maxIterations` | number | `1` | Maximum iterations to run | | `hooks` | SandboxHooks | — | Lifecycle hooks (`host.*`, `sandbox.*`) | | `name` | string | — | Display name for the run, shown as a prefix in log output | | `promptArgs` | PromptArgs | — | Key-value map for `{{KEY}}` placeholder substitution | | `branchStrategy` | BranchStrategy | per-provider default | Branch strategy: `{ type: 'head' }`, `{ type: 'merge-to-head' }`, or `{ type: 'branch', branch: '…' }` | | `copyToWorktree` | string[] | — | Host-relative file paths to copy into the sandbox before start (not supported with `branchStrategy: { type: 'head' }`) | | `logging` | object | file (auto-generated) | `{ type: 'file', path }` or `{ type: 'stdout' }` | | `completionSignal` | string \| string[] | `COMPLETE` | String or array of strings the agent emits to stop the iteration loop early | | `idleTimeoutSeconds` | number | `600` | Idle timeout in seconds — resets on each agent output event | | `completionTimeoutSeconds` | number | `60` | Grace window in seconds after the completion signal is observed but the agent process has not exited (hanging process). See [Hanging processes after the completion signal](#hanging-processes-after-the-completion-signal). | | `resumeSession` | string | — | Resume a prior session by ID for agents that support resume. Incompatible with `maxIterations > 1`. Session file must exist on host. | | `signal` | AbortSignal | — | Cancel the run when aborted. Kills the in-flight agent subprocess and cancels lifecycle hooks; the worktree is preserved on disk. Rejects with `signal.reason`. | | `timeouts` | Timeouts | — | Override default timeouts for built-in lifecycle steps: `copyToWorktreeMs` (60 000), `gitSetupMs` (10 000), `commitCollectionMs` (30 000), `mergeToHostMs` (30 000). | | `output` | OutputDefinition | — | Structured output definition (`Output.object(…)` or `Output.string(…)`). Requires `maxIterations === 1`. See [Structured output](#structured-output). | ### `RunResult` | Field | Type | Description | | ------------------ | ------------------- | ------------------------------------------------------------------ | | `iterations` | `IterationResult[]` | Per-iteration results (use `.length` for the count) | | `completionSignal` | string? | The matched completion signal string, or `undefined` if none fired | | `stdout` | string | Agent output | | `commits` | `{ sha }[]` | Commits created during the run | | `branch` | string | Target branch name | | `logFilePath` | string? | Path to the log file (only when logging to a file) | | `output` | T? | Typed structured output (only present when `output` option is set) | ### `IterationResult` | Field | Type | Description | | ----------------- | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------- | | `sessionId` | string? | Agent session ID from the provider stream, or `undefined` if the provider does not emit one | | `sessionFilePath` | string? | Absolute host path to the captured session JSONL, or `undefined` when capture is off | | `usage` | `IterationUsage`? | Token usage snapshot from the last assistant message, or `undefined` when capture is off or provider does not support usage parsing | ### `IterationUsage` | Field | Type | Description | | -------------------------- | ------ | ------------------------------------------ | | `inputTokens` | number | Input tokens consumed | | `cacheCreationInputTokens` | number | Tokens used to create prompt cache entries | | `cacheReadInputTokens` | number | Tokens read from prompt cache | | `outputTokens` | number | Output tokens generated | ### Session capture After each resumable provider iteration, Sandcastle automatically captures the agent's session file from the sandbox to the host. Claude Code sessions are stored under `~/.claude/projects//.jsonl`; Codex sessions are stored under `~/.codex/sessions/YYYY/MM/DD/rollout-*-.jsonl`; Pi sessions are stored under `~/.pi/agent/sessions/----/_.jsonl`. Any provider-specific `cwd` fields are rewritten to match the host repo root, so the provider's native resume command works. For Claude Code, any `Agent`-tool or `Workflow`-tool subagent transcripts written under `/subagents/agent-*.jsonl` are captured alongside the main session. Subagent capture is best-effort: a failure on an individual transcript logs a warning and lets siblings and the main session through. Main-session capture failure still fails the run (see below). Session capture is enabled by default for `claudeCode()`, `codex()`, and `pi()` and can be opted out via `captureSessions: false`. Providers without `sessionStorage` do not attempt capture. Capture failure fails the run. ### Session resume Pass `resumeSession` to `run()` to continue a prior Claude Code, Codex, or Pi conversation inside a new sandbox: const result = await run({ agent: claudeCode("claude-opus-4-8"), sandbox: docker(), prompt: "Continue where you left off", resumeSession: "abc-123-def", }); You can also continue the last captured session from a result: const first = await run({ agent: codex("gpt-5.4"), sandbox: docker(), prompt: "Draft a plan", }); const second = await first.resume?.("Now implement the plan"); `resume` is present only on results from resumable providers (Claude Code, Codex, Pi) — hence the optional-chaining call. Before the sandbox starts, Sandcastle validates that the session file exists on the host and transfers it into the sandbox with `cwd` fields rewritten to match the sandbox-side path. Claude Code receives `--resume `; Codex receives `codex exec resume ` with the prompt piped over stdin; Pi receives `--session `. Constraints: - `resumeSession` is incompatible with `maxIterations > 1` (throws before sandbox creation). - The provider's host session file must exist (throws before sandbox creation). - Only iteration 1 receives the resume flag; subsequent iterations (if any) start fresh. - Providers without resume support reject `resumeSession`. ### Session fork `RunResult.fork(prompt, options?)` is the sibling of `.resume()`: it continues from the last captured session but leaves the parent session JSONL untouched and writes the child under a new session id. The mechanism is the agent's native fork flag — `claude --resume --fork-session` for Claude Code, `codex exec fork ` for Codex. Fork enables fan-out workflows where a single parent run is the starting point for several independent children: const parent = await run({ agent: claudeCode("claude-opus-4-8"), sandbox: docker(), prompt: "Read the codebase and summarise the data model", }); const [reviewA, reviewB] = await Promise.all([ parent.fork?.("Review the migration plan", { branchStrategy: { type: "branch", branch: "review-a" }, }), parent.fork?.("Audit the auth layer", { branchStrategy: { type: "branch", branch: "review-b" }, }), ]); **Fork is session-only.** `--fork-session` and `codex exec fork` isolate the agent session JSONL — they do **not** isolate the branch, worktree, or sandbox. Safe concurrent fan-out (`Promise.all([r.fork(a), r.fork(b)])`) requires the caller to give each child a distinct `branch` via `branchStrategy: { type: "branch", branch: "..." }`. The default `head` and `merge-to-head` strategies are **not** safe for concurrent forks: `head` shares the host working directory across all children, and `merge-to-head` races `git merge` against the same HEAD. See [ADR 0018](docs/adr/0018-fork-is-session-only.md). `fork` is present only on results from providers with `sessionStorage` (Claude Code, Codex) — hence the optional-chaining call. The same single-iteration and session-file constraints as `.resume()` apply. ### `ClaudeCodeOptions` The `claudeCode()` factory accepts an optional second argument for provider-specific options: agent: claudeCode("claude-opus-4-8", { effort: "high" }); | Option | Type | Default | Description | | ----------------- | ---------------------------------------------------------------------------------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `effort` | `"low"` \| `"medium"` \| `"high"` \| `"xhigh"` \| `"max"` | — | Claude Code reasoning effort level (`max` is Opus only) | | `env` | `Record` | `{}` | Environment variables injected by this agent provider | | `captureSessions` | `boolean` | `true` | Capture agent session JSONL to host for `claude --resume` | | `permissionMode` | `"default"` \| `"acceptEdits"` \| `"plan"` \| `"auto"` \| `"dontAsk"` \| `"bypassPermissions"` | — | Maps to Claude's `--permission-mode` flag. When set, replaces Sandcastle's default `--dangerously-skip-permissions` on AFK runs. Use `"auto"` for AI-mediated per-tool approve/deny without bypass. | ### `CodexOptions` The `codex()` factory accepts an optional second argument for provider-specific options: agent: codex("gpt-5.4", { effort: "high" }); | Option | Type | Default | Description | | ------------------- | ---------------------------------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `effort` | `"low"` \| `"medium"` \| `"high"` \| `"xhigh"` | — | Codex reasoning effort level via `model_reasoning_effort` | | `env` | `Record` | `{}` | Environment variables injected by this agent provider | | `captureSessions` | `boolean` | `true` | Capture Codex rollout JSONL to host for resume | | `approvalsReviewer` | `"user"` \| `"auto_review"` | — | Maps to Codex's `approvals_reviewer` config. When `"auto_review"`, swaps `--dangerously-bypass-approvals-and-sandbox` for `-a on-request -s danger-full-access` so the reviewer agent evaluates each approval prompt. | ### `PiOptions` The `pi()` factory accepts an optional second argument for provider-specific options: agent: pi("claude-sonnet-4-6", { thinking: "high" }); | Option | Type | Default | Description | | ----------------- | ------------------------------------------------------------------------ | ------- | -------------------------------------------------------- | | `thinking` | `"off"` \| `"minimal"` \| `"low"` \| `"medium"` \| `"high"` \| `"xhigh"` | — | Pi reasoning effort level via the `--thinking` flag | | `env` | `Record` | `{}` | Environment variables injected by this agent provider | | `captureSessions` | `boolean` | `true` | Capture pi session JSONL to host for `pi --session ` | ### Provider `env` Both **agent providers** and **sandbox providers** accept an optional `env: Record` in their options. These environment variables are merged with the `.sandcastle/.env` resolver output at launch time: await run({ agent: claudeCode("claude-opus-4-8", { env: { ANTHROPIC_API_KEY: "sk-ant-..." }, }), sandbox: docker({ env: { DOCKER_SPECIFIC_VAR: "value" }, }), prompt: "Fix issue #42", }); **Merge rules:** - Provider env (agent + sandbox) overrides `.sandcastle/.env` resolver output for shared keys - Agent provider env and sandbox provider env **must not overlap** — if they share any key, `run()` throws an error - When `env` is not provided, it defaults to `{}` Environment variables are also resolved automatically from `.sandcastle/.env` and `process.env` — no need to pass them to the API. The required variables depend on the **agent provider** (see `sandcastle init` output for details). ## Custom Sandbox Providers Sandcastle ships with built-in providers for Docker, Podman, and Vercel, but you can create your own. A sandbox provider tells Sandcastle how to execute commands in an isolated environment. There are two kinds: - **Bind-mount** — the sandbox can mount a host directory. Sandcastle creates a worktree on the host and the provider mounts it in. No file sync needed. Use this for Docker, Podman, or any local container runtime. - **Isolated** — the sandbox has its own filesystem (e.g. a cloud VM). The provider handles syncing code in and out via `copyIn` and `copyFileOut`. Use this when the sandbox cannot access the host filesystem. ### The sandbox handle contract Both provider types return a **sandbox handle** from their `create()` function. The handle exposes: | Method | Required | Description | | -------------- | ---------- | ---------------------------------------------------------------------------- | | `exec` | Both | Run a command, optionally streaming stdout line-by-line via `options.onLine` | | `close` | Both | Tear down the sandbox | | `copyFileIn` | Bind-mount | Copy a single file from the host into the sandbox | | `copyFileOut` | Both | Copy a single file from the sandbox to the host | | `copyIn` | Isolated | Copy a file or directory from the host into the sandbox | | `worktreePath` | Both | Absolute path to the repo directory inside the sandbox | ### `ExecResult` Every `exec` call returns an `ExecResult`: interface ExecResult { readonly stdout: string; readonly stderr: string; readonly exitCode: number; } ### Bind-mount provider example A minimal bind-mount provider that shells out to local processes (no container): import { createBindMountSandboxProvider, type BindMountCreateOptions, type BindMountSandboxHandle, type ExecResult, } from "@ai-hero/sandcastle"; import { execFile, spawn } from "node:child_process"; import { copyFile as fsCopyFile, mkdir as fsMkdir } from "node:fs/promises"; import { dirname } from "node:path"; import { createInterface } from "node:readline"; const localProcess = () => createBindMountSandboxProvider({ name: "local-process", create: async ( options: BindMountCreateOptions, ): Promise => { const worktreePath = options.worktreePath; return { worktreePath, exec: ( command: string, opts?: { onLine?: (line: string) => void; cwd?: string }, ): Promise => { if (opts?.onLine) { const onLine = opts.onLine; return new Promise((resolve, reject) => { const proc = spawn("sh", ["-c", command], { cwd: opts?.cwd ?? worktreePath, stdio: ["ignore", "pipe", "pipe"], }); const stdoutChunks: string[] = []; const stderrChunks: string[] = []; const rl = createInterface({ input: proc.stdout! }); rl.on("line", (line) => { stdoutChunks.push(line); onLine(line); // forward each line to Sandcastle }); proc.stderr!.on("data", (chunk: Buffer) => { stderrChunks.push(chunk.toString()); }); proc.on("error", (err) => reject(err)); proc.on("close", (code) => { resolve({ stdout: stdoutChunks.join("\n"), stderr: stderrChunks.join(""), exitCode: code ?? 0, }); }); }); } return new Promise((resolve, reject) => { execFile( "sh", ["-c", command], { cwd: opts?.cwd ?? worktreePath, maxBuffer: 10 * 1024 * 1024 }, (error, stdout, stderr) => { if (error && error.code === undefined) { reject(new Error(`exec failed: ${error.message}`)); } else { resolve({ stdout: stdout.toString(), stderr: stderr.toString(), exitCode: typeof error?.code === "number" ? error.code : 0, }); } }, ); }); }, copyFileIn: async (hostPath: string, sandboxPath: string) => { await fsMkdir(dirname(sandboxPath), { recursive: true }); await fsCopyFile(hostPath, sandboxPath); }, copyFileOut: async (sandboxPath: string, hostPath: string) => { await fsMkdir(dirname(hostPath), { recursive: true }); await fsCopyFile(sandboxPath, hostPath); }, close: async () => { // nothing to tear down for a local process }, }; }, }); ### Isolated provider example A minimal isolated provider using a temp directory: import { createIsolatedSandboxProvider, type IsolatedSandboxHandle, type ExecResult, } from "@ai-hero/sandcastle"; import { execFile, spawn } from "node:child_process"; import { copyFile, mkdir, mkdtemp, rm } from "node:fs/promises"; import { tmpdir } from "node:os"; import { dirname, join } from "node:path"; import { createInterface } from "node:readline"; const tempDir = () => createIsolatedSandboxProvider({ name: "temp-dir", create: async (): Promise => { const root = await mkdtemp(join(tmpdir(), "sandbox-")); const worktreePath = join(root, "workspace"); await mkdir(worktreePath, { recursive: true }); return { worktreePath, exec: ( command: string, opts?: { onLine?: (line: string) => void; cwd?: string }, ): Promise => { if (opts?.onLine) { const onLine = opts.onLine; return new Promise((resolve, reject) => { const proc = spawn("sh", ["-c", command], { cwd: opts?.cwd ?? worktreePath, stdio: ["ignore", "pipe", "pipe"], }); const stdoutChunks: string[] = []; const stderrChunks: string[] = []; const rl = createInterface({ input: proc.stdout! }); rl.on("line", (line) => { stdoutChunks.push(line); onLine(line); }); proc.stderr!.on("data", (chunk: Buffer) => { stderrChunks.push(chunk.toString()); }); proc.on("error", (err) => reject(err)); proc.on("close", (code) => { resolve({ stdout: stdoutChunks.join("\n"), stderr: stderrChunks.join(""), exitCode: code ?? 0, }); }); }); } return new Promise((resolve, reject) => { execFile( "sh", ["-c", command], { cwd: opts?.cwd ?? worktreePath, maxBuffer: 10 * 1024 * 1024 }, (error, stdout, stderr) => { if (error && error.code === undefined) { reject(new Error(`exec failed: ${error.message}`)); } else { resolve({ stdout: stdout.toString(), stderr: stderr.toString(), exitCode: typeof error?.code === "number" ? error.code : 0, }); } }, ); }); }, copyIn: async (hostPath: string, sandboxPath: string) => { const info = await stat(hostPath); if (info.isDirectory()) { await cp(hostPath, sandboxPath, { recursive: true }); } else { await mkdir(dirname(sandboxPath), { recursive: true }); await copyFile(hostPath, sandboxPath); } }, copyFileOut: async (sandboxPath: string, hostPath: string) => { await mkdir(dirname(hostPath), { recursive: true }); await copyFile(sandboxPath, hostPath); }, close: async () => { await rm(root, { recursive: true, force: true }); }, }; }, }); ### Branch strategies A branch strategy controls where the agent's commits land. Configure it when constructing the provider: | Strategy | Behavior | Bind-mount | Isolated | | --------------- | ------------------------------------------------------------------------ | ---------- | --------- | | `head` | Agent writes directly to the host working directory. No worktree created | Default | N/A | | `merge-to-head` | Sandcastle creates a temp branch, merges back to HEAD when done | Supported | Default | | `branch` | Commits land on an explicit named branch you provide | Supported | Supported | **When to use each:** - **`head`** — fast iteration during development. No branch indirection, no merge step. Only works with bind-mount providers since the agent needs direct host filesystem access. - **`merge-to-head`** — safe default for automation. The agent works on a throwaway branch; if something goes wrong, HEAD is untouched. Use this for CI or unattended runs. - **`branch`** — when you want commits on a specific branch (e.g. for a PR). Pass `{ type: "branch", branch: "agent/fix-42" }`. Branch strategy is now configured on `run()`, not on the provider: import { run, claudeCode } from "@ai-hero/sandcastle"; import { docker } from "@ai-hero/sandcastle/sandboxes/docker"; // head — direct write, bind-mount only (default for bind-mount providers) await run({ agent: claudeCode("claude-opus-4-8"), sandbox: docker(), prompt: "…", }); // merge-to-head — temp branch, merge back (default for isolated providers) await run({ agent: claudeCode("claude-opus-4-8"), sandbox: tempDir(), prompt: "…", }); // branch — explicit named branch await run({ agent: claudeCode("claude-opus-4-8"), sandbox: docker(), branchStrategy: { type: "branch", branch: "agent/fix-42" }, prompt: "…", }); ### Passing to `run()` Pass your custom provider via the `sandbox` option — it works the same as the built-in `docker()` provider: import { run, claudeCode } from "@ai-hero/sandcastle"; const result = await run({ agent: claudeCode("claude-opus-4-8"), sandbox: localProcess(), // your custom provider prompt: "Fix issue #42 in this repo.", }); ### Reference implementations For real-world examples, see: - [`src/sandboxes/docker.ts`](src/sandboxes/docker.ts) — bind-mount provider using Docker containers (with SELinux label support) - [`src/sandboxes/vercel.ts`](src/sandboxes/vercel.ts) — isolated provider using Vercel Firecracker microVMs via `@vercel/sandbox` - [`src/sandboxes/podman.ts`](src/sandboxes/podman.ts) — bind-mount provider using Podman containers (with SELinux label support) - [`src/sandboxes/test-isolated.ts`](src/sandboxes/test-isolated.ts) — isolated provider using temp directories (used in tests) ## Configuration ### Config directory (`.sandcastle/`) All per-repo sandbox configuration lives in `.sandcastle/`. Run `sandcastle init` to create it. ### Custom Dockerfile The `.sandcastle/Dockerfile` controls the sandbox environment. The default template installs: - **Node.js 22** (base image) - **git**, **curl**, **jq** (system dependencies) - **GitHub CLI** (`gh`) - **Claude Code CLI** - A non-root `agent` user (required — Claude runs as this user) When customizing the Dockerfile, ensure you keep: - A non-root user (the default `agent` user) for Claude to run as - `git` (required for commits and branch operations) - `gh` (required for issue fetching) - Claude Code CLI installed and on PATH Add your project-specific dependencies (e.g., language runtimes, build tools) to the Dockerfile as needed. ### Hooks Hooks are grouped by **where** they run — `host` (on the developer's machine) or `sandbox` (inside the container): hooks: { host: { onWorktreeReady: [{ command: "cp .env.example .env" }], onSandboxReady: [{ command: "echo sandbox is up" }], }, sandbox: { onSandboxReady: [ { command: "npm install", timeoutMs: 300_000 }, { command: "apt-get install -y ffmpeg", sudo: true }, ], }, } | Hook | Runs on | When | Working directory | | ------------------------ | ------- | -------------------------------------------- | ------------------------------------------- | | `host.onWorktreeReady` | Host | After `copyToWorktree`, before sandbox start | Worktree path (host repo root under `head`) | | `host.onSandboxReady` | Host | After sandbox is up | Worktree path (host repo root under `head`) | | `sandbox.onSandboxReady` | Sandbox | After sandbox is up | Sandbox repo directory | **Ordering:** `copyToWorktree` -> `host.onWorktreeReady` (sequential) -> sandbox created -> `host.onSandboxReady` + `sandbox.onSandboxReady` (parallel). - **Host hooks** accept `{ command: string; timeoutMs?: number }` — no `sudo`, no `cwd`. Use `cd` or inline env in the command string. - **Sandbox hooks** accept `{ command: string; sudo?: boolean; timeoutMs?: number }` — set `sudo: true` for elevated privileges. - **`timeoutMs`** overrides the default 60 s per-hook timeout. Useful for long-running setup commands like dependency installs (e.g. `timeoutMs: 300_000` for 5 minutes). - Within each hook point, sandbox hooks run in parallel; host hooks within `onSandboxReady` also run in parallel with sandbox hooks. `host.onWorktreeReady` hooks run sequentially in declared order. - If any hook exits non-zero, setup fails fast. - When a `signal` is passed to `run()`, it is threaded to all hooks — aborting the signal cancels any in-flight hook commands. ## Development npm install npm run build # Bundle with tsup npm test # Run tests with vitest npm run typecheck # Type-check ## License MIT

标签：AI智能体, SOC Prime, TypeScript, 代码生成, 安全插件, 开发工具, 沙箱, 渗透测试工具, 编排框架, 网络安全研究, 自动化攻击, 请求拦截