## What Is Sandcastle?
A TypeScript library for orchestrating AI coding agents in isolated sandboxes:
1. You invoke agents with a single `sandcastle.run()`.
2. Sandcastle handles sandboxing the agent with a configurable branch strategy.
3. The commits made on the branches get merged back.
Sandcastle is provider-agnostic — it ships with built-in providers for Docker, Podman, and Vercel, and you can create your own. Great for parallelizing multiple AFK agents, creating review pipelines, or even just orchestrating your own agents.
## Prerequisites
- [Git](https://git-scm.com/)
- A sandbox provider — Sandcastle needs an isolated environment to run agents in. Built-in options:
- [Docker Desktop](https://www.docker.com/) — most common for local development
- [Podman](https://podman.io/) — rootless alternative to Docker
- [Vercel](https://vercel.com/) — cloud-based Firecracker microVMs via `@vercel/sandbox`
- Or [create your own](#custom-sandbox-providers) using `createBindMountSandboxProvider` or `createIsolatedSandboxProvider`
## Quick start
1. Install the package:
npm install --save-dev @ai-hero/sandcastle
2. Run `npx @ai-hero/sandcastle init`. This scaffolds a `.sandcastle` directory with all the files needed.
npx @ai-hero/sandcastle init
3. Edit `.sandcastle/.env` and fill in your default values for `ANTHROPIC_API_KEY`. If you want to use your Claude subscription instead of an API key, see [#191](https://github.com/mattpocock/sandcastle/issues/191).
cp .sandcastle/.env.example .sandcastle/.env
4. Run the `.sandcastle/main.ts` (or `main.mts`) file with `npx tsx`
npx tsx .sandcastle/main.ts
// 3. Run the agent via the JS API
import { run, claudeCode } from "@ai-hero/sandcastle";
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
await run({
agent: claudeCode("claude-opus-4-7"),
sandbox: docker(), // or podman(), vercel(), or your own provider
promptFile: ".sandcastle/prompt.md",
});
## Sandbox Providers
Sandcastle uses a `SandboxProvider` to create isolated environments. The `sandbox` option on `run()`, `interactive()`, and `createSandbox()` accepts any provider, including `noSandbox()` — opt in to running the agent directly on the host when container isolation is undesired. Built-in providers:
| Provider | Import path | Type | Accepted by |
| ---------- | ------------------------------------------ | ---------- | ------------------------------------------- |
| Docker | `@ai-hero/sandcastle/sandboxes/docker` | Bind-mount | `run()`, `createSandbox()`, `interactive()` |
| Podman | `@ai-hero/sandcastle/sandboxes/podman` | Bind-mount | `run()`, `createSandbox()`, `interactive()` |
| Vercel | `@ai-hero/sandcastle/sandboxes/vercel` | Isolated | `run()`, `createSandbox()`, `interactive()` |
| No-sandbox | `@ai-hero/sandcastle/sandboxes/no-sandbox` | None | `run()`, `createSandbox()`, `interactive()` |
Worktree methods (`wt.run()`, `wt.interactive()`, `wt.createSandbox()`) accept the same providers as their top-level counterparts. `wt.interactive()` defaults to `noSandbox()` when no sandbox is specified.
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
import { podman } from "@ai-hero/sandcastle/sandboxes/podman";
import { vercel } from "@ai-hero/sandcastle/sandboxes/vercel";
import { noSandbox } from "@ai-hero/sandcastle/sandboxes/no-sandbox";
// Docker, Podman, and Vercel are interchangeable in run() and createSandbox():
await run({
agent: claudeCode("claude-opus-4-7"),
sandbox: docker(),
prompt: "...",
});
// No-sandbox runs the agent directly on the host — accepted by run(),
// createSandbox(), and interactive(). Skips container isolation entirely:
await interactive({
agent: claudeCode("claude-opus-4-7"),
sandbox: noSandbox(),
prompt: "...", // optional — omit to launch the TUI with no initial prompt
cwd: "/path/to/other-repo", // optional — defaults to process.cwd()
});
You can also [create your own provider](#custom-sandbox-providers) using `createBindMountSandboxProvider` or `createIsolatedSandboxProvider`.
## API
Sandcastle exports a programmatic `run()` function for use in scripts, CI pipelines, or custom tooling. The examples below use `docker()`, but any `SandboxProvider` works in its place.
import { run, claudeCode } from "@ai-hero/sandcastle";
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
const result = await run({
agent: claudeCode("claude-opus-4-7"),
sandbox: docker(),
promptFile: ".sandcastle/prompt.md",
});
console.log(result.iterations.length); // number of iterations executed
console.log(result.iterations); // per-iteration results with optional sessionId
console.log(result.commits); // array of { sha } for commits created
console.log(result.branch); // target branch name
### All options
import { run, claudeCode } from "@ai-hero/sandcastle";
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
const result = await run({
// Agent provider — required. Pass a model string to claudeCode().
// Optional second arg for provider-specific options like effort level.
agent: claudeCode("claude-opus-4-7", { effort: "high" }),
// Sandbox provider — required. Any SandboxProvider works (docker, podman, vercel, or custom).
// Provider-specific config (like imageName, mounts) lives inside the provider factory call.
sandbox: docker({
imageName: "sandcastle:local",
// Optional: override the UID/GID used for --user flag (defaults to host UID/GID).
// Must match the UID baked into the image. Pre-flight check catches mismatches.
// containerUid: 1000,
// containerGid: 1000,
// Optional: mount host directories into the sandbox (e.g. package manager caches)
// hostPath supports absolute, tilde-expanded (~), and relative paths (resolved from cwd).
// sandboxPath supports absolute and relative paths (resolved from the sandbox repo directory).
mounts: [
{ hostPath: "~/.npm", sandboxPath: "/home/agent/.npm", readonly: true },
{ hostPath: "data", sandboxPath: "data" }, // mounts /data → /data
],
// Optional: SELinux volume label — "z" (default, shared), "Z" (private), or false (none).
// No-op on non-SELinux systems (Docker Desktop on macOS/Windows, Linux without SELinux).
selinuxLabel: "z",
// Optional: provider-level env vars merged at launch time
env: { DOCKER_SPECIFIC: "value" },
// Optional: attach container to Docker network(s) — string or string[]
network: "my-network",
// Optional: add the container user to supplementary groups via --group-add.
// Accepts group names or numeric GIDs (e.g. for a bind-mounted Docker socket).
groups: ["docker", 999],
// Optional: expose host devices via --device. Each entry is a full device
// spec in host[:container[:permissions]] form (e.g. "/dev/kvm").
devices: ["/dev/kvm"],
// Optional: limit CPU resources via --cpus. Fractional values allowed (e.g. 1.5).
// cpus: 2,
}),
// Host repo directory — replaces process.cwd() as the anchor for
// .sandcastle/ artifacts (worktrees, logs, env, patches) and git operations.
// Relative paths resolve against process.cwd(). Defaults to process.cwd().
cwd: "../other-repo",
// Branch strategy — controls how the agent's changes relate to branches.
// Defaults to { type: "head" } for bind-mount and { type: "merge-to-head" } for isolated providers.
branchStrategy: { type: "branch", branch: "agent/fix-42" },
// Prompt source — provide one of these, not both.
// Note: promptFile resolves against process.cwd(), NOT cwd.
promptFile: ".sandcastle/prompt.md", // path to a prompt file
// prompt: "Fix issue #42 in this repo", // OR an inline prompt string
// Values substituted for {{KEY}} placeholders in the prompt.
promptArgs: {
ISSUE_NUMBER: "42",
},
// Maximum number of agent iterations to run before stopping. Default: 1
maxIterations: 5,
// Display name for this run, shown as a prefix in log output.
name: "fix-issue-42",
// Lifecycle hooks grouped by where they run: host or sandbox.
hooks: {
host: {
onWorktreeReady: [{ command: "cp .env.example .env" }],
onSandboxReady: [{ command: "echo setup done" }],
},
sandbox: {
onSandboxReady: [{ command: "npm install" }],
},
},
// Host-relative file paths to copy into the sandbox before the container starts.
// Not supported with branchStrategy: { type: "head" }.
copyToWorktree: [".env"],
// Override default timeouts for built-in lifecycle steps.
// Unset keys keep their defaults.
timeouts: {
copyToWorktreeMs: 120_000, // default: 60_000
gitSetupMs: 30_000, // default: 10_000
commitCollectionMs: 60_000, // default: 30_000
mergeToHostMs: 60_000, // default: 30_000
},
// How to record progress. Default: write to a file under .sandcastle/logs/
logging: {
type: "file",
path: ".sandcastle/logs/my-run.log",
// Optional: forward the agent's output stream to your own observability system.
// Fires for each text chunk and tool call the agent produces. Errors thrown
// by the callback are swallowed so a broken forwarder cannot kill the run.
onAgentStreamEvent: (event) => {
// event is { type: "text" | "toolCall", iteration, timestamp, ... }
myLogger.info(event);
},
},
// logging: { type: "stdout" }, // OR render an interactive UI in the terminal
// String (or array of strings) the agent emits to end the iteration loop early.
// Default: "COMPLETE"
completionSignal: "COMPLETE",
// Idle timeout in seconds — resets whenever the agent produces output. Default: 600 (10 minutes)
idleTimeoutSeconds: 600,
// Grace window in seconds after the agent emits a completion signal but
// before its process has exited (a "hanging process" — typically a spawned
// `gh`/git child or MCP server keeping stdout open). Resets on every
// subsequent output line so trailing data is still captured. Default: 60
completionTimeoutSeconds: 60,
// Structured output — extract a typed payload from the agent's stdout.
// Requires maxIterations === 1 and the tag must appear in the prompt.
// output: Output.object({ tag: "result", schema: z.object({ answer: z.number() }) }),
// output: Output.string({ tag: "summary" }),
});
console.log(result.iterations.length); // number of iterations executed
console.log(result.completionSignal); // matched signal string, or undefined if none fired
console.log(result.commits); // array of { sha } for commits created
console.log(result.branch); // target branch name
### `createSandbox()` — reusable sandbox
Use `createSandbox()` when you need to run multiple agents (or multiple rounds of the same agent) inside a single sandbox. It creates the sandbox once, and you call `sandbox.run()` as many times as you need. This avoids repeated container startup costs and keeps all runs on the same branch.
Use `run()` instead when you only need a single one-shot invocation — it handles sandbox lifecycle automatically.
#### Basic single-run usage
import { createSandbox, claudeCode } from "@ai-hero/sandcastle";
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
await using sandbox = await createSandbox({
branch: "agent/fix-42",
sandbox: docker(),
});
const result = await sandbox.run({
agent: claudeCode("claude-opus-4-7"),
prompt: "Fix issue #42 in this repo.",
});
console.log(result.commits); // [{ sha: "abc123" }]
#### Multi-run implement-then-review
import { createSandbox, claudeCode } from "@ai-hero/sandcastle";
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
await using sandbox = await createSandbox({
branch: "agent/fix-42",
sandbox: docker(),
hooks: { sandbox: { onSandboxReady: [{ command: "npm install" }] } },
});
// Step 1: implement
const implResult = await sandbox.run({
agent: claudeCode("claude-opus-4-7"),
promptFile: ".sandcastle/implement.md",
maxIterations: 5,
});
// Step 2: review on the same branch, same container
const reviewResult = await sandbox.run({
agent: claudeCode("claude-sonnet-4-6"),
prompt: "Review the changes and fix any issues.",
});
Commits from all `run()` calls accumulate on the same branch. The sandbox container stays alive between runs, so installed dependencies and build artifacts persist.
#### Automatic cleanup with `await using`
`await using` calls `sandbox.close()` automatically when the block exits. If the sandbox has uncommitted changes, the worktree is preserved on disk; if clean, both container and worktree are removed.
#### Manual `close()` with `CloseResult`
const sandbox = await createSandbox({
branch: "agent/fix-42",
sandbox: docker(),
});
// ... run agents ...
const closeResult = await sandbox.close();
if (closeResult.preservedWorktreePath) {
console.log(`Worktree preserved at ${closeResult.preservedWorktreePath}`);
}
#### `CreateSandboxOptions`
| Option | Type | Default | Description |
| ---------------- | --------------- | --------------- | ------------------------------------------------------------------------------------------------------------------- |
| `branch` | string | — | **Required.** Explicit branch for the sandbox |
| `sandbox` | SandboxProvider | — | **Required.** Sandbox provider (e.g. `docker()`, `podman()`) |
| `cwd` | string | `process.cwd()` | Host repo directory — relative paths resolve against `process.cwd()` |
| `hooks` | SandboxHooks | — | Lifecycle hooks (`host.*`, `sandbox.*`) — run once at creation time |
| `copyToWorktree` | string[] | — | Host-relative file paths to copy into the sandbox at creation time |
| `timeouts` | Timeouts | — | Override built-in lifecycle step timeouts (`copyToWorktreeMs`, `gitSetupMs`, `commitCollectionMs`, `mergeToHostMs`) |
#### `Sandbox`
| Property / Method | Type | Description |
| ----------------------- | ------------------------------------------------------------------ | -------------------------------------------- |
| `branch` | string | The branch the sandbox is on |
| `worktreePath` | string | Host path to the worktree |
| `run(options)` | `(SandboxRunOptions) => Promise` | Invoke an agent inside the existing sandbox |
| `interactive(options)` | `(SandboxInteractiveOptions) => Promise` | Launch an interactive session in the sandbox |
| `close()` | `() => Promise` | Tear down the container and sandbox |
| `[Symbol.asyncDispose]` | `() => Promise` | Auto teardown via `await using` |
#### `SandboxRunOptions`
| Option | Type | Default | Description |
| -------------------------- | ------------------ | ----------------------------- | ------------------------------------------------------------------------------------ |
| `agent` | AgentProvider | — | **Required.** Agent provider (e.g. `claudeCode("claude-opus-4-7")`) |
| `prompt` | string | — | Inline prompt (mutually exclusive with `promptFile`) |
| `promptFile` | string | — | Path to prompt file (mutually exclusive with `prompt`) |
| `promptArgs` | PromptArgs | — | Key-value map for `{{KEY}}` placeholder substitution |
| `maxIterations` | number | `1` | Maximum iterations to run |
| `completionSignal` | string \| string[] | `COMPLETE` | String(s) the agent emits to stop the iteration loop early |
| `idleTimeoutSeconds` | number | `600` | Idle timeout in seconds — resets on each agent output event |
| `completionTimeoutSeconds` | number | `60` | Grace window after the completion signal is seen but the agent process hasn't exited |
| `name` | string | — | Display name for the run |
| `logging` | object | file (auto-generated) | `{ type: 'file', path }` or `{ type: 'stdout' }` |
| `signal` | AbortSignal | — | Cancels the run when aborted; handle stays usable afterward |
#### `SandboxRunResult`
| Field | Type | Description |
| ------------------ | ------------------- | ------------------------------------------------------------------ |
| `iterations` | `IterationResult[]` | Per-iteration results (use `.length` for the count) |
| `completionSignal` | string? | The matched completion signal string, or `undefined` if none fired |
| `stdout` | string | Combined agent output from all iterations |
| `commits` | `{ sha }[]` | Commits created during the run |
| `logFilePath` | string? | Path to the log file (only when logging to a file) |
#### `CloseResult`
| Field | Type | Description |
| ----------------------- | ------- | ------------------------------------------------------------------------ |
| `preservedWorktreePath` | string? | Host path to the preserved worktree, set when it had uncommitted changes |
### `createWorktree()` — independent worktree lifecycle
Use `createWorktree()` when you need a worktree (git worktree) as an independent, first-class concept — separate from any sandbox. This is useful when you want to run an interactive session first and then hand the same worktree to a sandboxed AFK agent.
Only `branch` and `merge-to-head` strategies are accepted; `head` is a compile-time type error since it means no worktree.
Pass `cwd` to target a repo other than `process.cwd()`. Relative paths resolve against `process.cwd()`; absolute paths pass through. A `CwdError` is thrown if the path does not exist or is not a directory.
import { createWorktree } from "@ai-hero/sandcastle";
await using wt = await createWorktree({
branchStrategy: { type: "branch", branch: "agent/fix-42" },
copyToWorktree: ["node_modules"],
cwd: "/path/to/other-repo", // optional — defaults to process.cwd()
});
console.log(wt.worktreePath); // host path to the worktree
console.log(wt.branch); // "agent/fix-42"
// Run an interactive session in the worktree (defaults to noSandbox)
await wt.interactive({
agent: claudeCode("claude-opus-4-7"),
prompt: "Explore the codebase and understand the bug.",
});
// Run an AFK agent in the worktree (sandbox is required)
const result = await wt.run({
agent: claudeCode("claude-opus-4-7"),
sandbox: docker({ imageName: "sandcastle:myrepo" }),
prompt: "Fix issue #42.",
maxIterations: 3,
});
console.log(result.commits); // commits made during the run
// Create a long-lived sandbox from the worktree
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
await using sandbox = await wt.createSandbox({
sandbox: docker(),
hooks: { sandbox: { onSandboxReady: [{ command: "npm install" }] } },
});
// sandbox.close() tears down the container only — the worktree stays
await sandbox.close();
// wt.close() cleans up the worktree
`wt.close()` checks for uncommitted changes: if the worktree is dirty, it's preserved on disk; if clean, it's removed. `await using` calls `close()` automatically. The worktree persists after `run()`, `interactive()`, and `createSandbox()` complete, so you can hand it to another agent or inspect it.
**Split ownership**: When a sandbox is created via `wt.createSandbox()`, `sandbox.close()` tears down the container only — the worktree remains. `wt.close()` is responsible for worktree cleanup. This differs from the top-level `createSandbox()`, where `sandbox.close()` owns both container and worktree.
#### `CreateWorktreeOptions`
| Option | Type | Default | Description |
| ---------------- | ---------------------- | ------- | ------------------------------------------------------------------------------------------------------------------- |
| `branchStrategy` | WorktreeBranchStrategy | — | **Required.** `{ type: "branch", branch }` or `{ type: "merge-to-head" }` |
| `copyToWorktree` | string[] | — | Host-relative file paths to copy into the worktree at creation time |
| `timeouts` | Timeouts | — | Override built-in lifecycle step timeouts (`copyToWorktreeMs`, `gitSetupMs`, `commitCollectionMs`, `mergeToHostMs`) |
#### `Worktree`
| Property / Method | Type | Description |
| ------------------------ | --------------------------------------------------------------------- | --------------------------------------------------- |
| `branch` | string | The branch the worktree is on |
| `worktreePath` | string | Host path to the worktree |
| `run(options)` | `(options: WorktreeRunOptions) => Promise` | Run an AFK agent in the worktree (sandbox required) |
| `interactive(options)` | `(options: WorktreeInteractiveOptions) => Promise` | Run an interactive agent session in the worktree |
| `createSandbox(options)` | `(options: WorktreeCreateSandboxOptions) => Promise` | Create a long-lived sandbox backed by this worktree |
| `close()` | `() => Promise` | Clean up the worktree (preserves if dirty) |
| `[Symbol.asyncDispose]` | `() => Promise` | Auto cleanup via `await using` |
#### `WorktreeInteractiveOptions`
| Option | Type | Default | Description |
| ------------ | ---------------------- | ------------- | ------------------------------------------------------------------------------------------------- |
| `agent` | AgentProvider | — | **Required.** Agent provider |
| `sandbox` | AnySandboxProvider | `noSandbox()` | Sandbox provider (defaults to no sandbox) |
| `prompt` | string | — | Inline prompt (mutually exclusive with `promptFile`) |
| `promptFile` | string | — | Path to prompt file |
| `name` | string | — | Optional session name |
| `hooks` | SandboxHooks | — | Lifecycle hooks (`host.*`, `sandbox.*`) |
| `promptArgs` | PromptArgs | — | Key-value map for `{{KEY}}` placeholder substitution |
| `env` | Record | — | Environment variables to inject into the sandbox |
| `signal` | AbortSignal | — | Cancel the session when aborted. The worktree is preserved on disk. Rejects with `signal.reason`. |
#### `WorktreeRunOptions`
| Option | Type | Default | Description |
| -------------------------- | ---------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| `agent` | AgentProvider | — | **Required.** Agent provider |
| `sandbox` | SandboxProvider | — | **Required.** Sandbox provider (AFK agents must be sandboxed) |
| `prompt` | string | — | Inline prompt (mutually exclusive with `promptFile`) |
| `promptFile` | string | — | Path to prompt file |
| `maxIterations` | number | 1 | Maximum iterations to run |
| `completionSignal` | string \| string[] | — | Substring(s) to stop the iteration loop early |
| `idleTimeoutSeconds` | number | 600 | Idle timeout in seconds |
| `completionTimeoutSeconds` | number | 60 | Grace window after completion signal is seen but agent process hasn't exited |
| `name` | string | — | Optional run name |
| `logging` | LoggingOption | file | Logging mode |
| `hooks` | SandboxHooks | — | Lifecycle hooks (`host.*`, `sandbox.*`) |
| `promptArgs` | PromptArgs | — | Key-value map for `{{KEY}}` placeholder substitution |
| `env` | Record | — | Environment variables to inject into the sandbox |
| `resumeSession` | string | — | Resume a prior session by ID for agents that support resume. Incompatible with `maxIterations > 1`. Session file must exist on host. |
| `signal` | AbortSignal | — | Cancel the run when aborted. Kills the in-flight agent subprocess; the worktree is preserved on disk. Rejects with `signal.reason`. |
#### `WorktreeRunResult`
| Property | Type | Description |
| ------------------ | ------------------- | ------------------------------------------------------ |
| `iterations` | `IterationResult[]` | Per-iteration results (use `.length` for the count) |
| `completionSignal` | string | The matched completion signal, or undefined |
| `stdout` | string | Combined stdout output from all agent iterations |
| `commits` | { sha: string }[] | List of commits made by the agent during the run |
| `branch` | string | The branch name the agent worked on |
| `logFilePath` | string | Path to the log file, if logging was drained to a file |
#### `WorktreeCreateSandboxOptions`
| Option | Type | Default | Description |
| ---------------- | --------------- | ------- | ------------------------------------------------------------------------------------------------------------------- |
| `sandbox` | SandboxProvider | — | **Required.** Sandbox provider (e.g. `docker()`) |
| `hooks` | SandboxHooks | — | Lifecycle hooks (`host.*`, `sandbox.*`) |
| `copyToWorktree` | string[] | — | Host-relative file paths to copy into the worktree at creation time |
| `timeouts` | Timeouts | — | Override built-in lifecycle step timeouts (`copyToWorktreeMs`, `gitSetupMs`, `commitCollectionMs`, `mergeToHostMs`) |
## How it works
Sandcastle uses a **branch strategy** configured on the sandbox provider to control how the agent's changes relate to branches. There are three strategies:
- **Head** (`{ type: "head" }`) — The agent writes directly to the host working directory. No worktree, no branch indirection. This is the default for bind-mount providers like `docker()`.
- **Merge-to-head** (`{ type: "merge-to-head" }`) — Sandcastle creates a temporary branch in a git worktree. The agent works on the temp branch, and changes are merged back to HEAD when done. The temp branch is cleaned up after merge.
- **Branch** (`{ type: "branch", branch: "foo" }`) — Commits land on an explicitly named branch in a git worktree. Re-running with the same branch reuses the existing worktree and fast-forwards it from `origin` when safe — see [ADR 0003](docs/adr/0003-reuse-worktree-by-default.md).
For bind-mount providers (like Docker), the worktree directory is bind-mounted into the container — the agent writes directly to the host filesystem through the mount, so no sync is needed.
From your point of view, you just configure `branchStrategy: { type: 'branch', branch: 'foo' }` on `run()`, and get a commit on branch `foo` once it's complete. All 100% local.
## Prompts
Sandcastle uses a flexible prompt system. You write the prompt, and the engine executes it — no opinions about workflow, task management, or context sources are imposed.
### Prompt resolution
You must provide exactly one of:
1. `prompt: "inline string"` — pass an inline prompt directly via `RunOptions`
2. `promptFile: "./path/to/prompt.md"` — point to a specific file via `RunOptions`
`prompt` and `promptFile` are mutually exclusive — providing both is an error. If neither is provided, `run()` throws an error asking you to supply one.
**Inline prompts (`prompt: "..."`) are passed to the agent literally.** No `{{KEY}}` substitution, no `` !`command` `` expansion, no built-in `{{SOURCE_BRANCH}}` / `{{TARGET_BRANCH}}` injection. If you need values interpolated into an inline prompt, build the string in JavaScript (`` `Work on ${branch}…` ``). Passing `promptArgs` alongside an inline prompt is an error — switch to `promptFile` to use substitution.
The substitution and expansion features below apply **only** to prompts sourced from `promptFile`.
### Dynamic context with `` !`command` ``
Use `` !`command` `` expressions in your prompt to pull in dynamic context. Each expression is replaced with the command's stdout before the prompt is sent to the agent. All expressions in a prompt run **in parallel** for faster expansion.
Commands run **inside the sandbox** after `sandbox.onSandboxReady` hooks complete, so they see the same repo state the agent sees (including installed dependencies).
# Open issues
!`gh issue list --state open --label Sandcastle --json number,title,body,comments,labels --limit 100`
# Recent commits
!`git log --oneline -10`
If any command exits with a non-zero code, the run fails immediately with an error.
### Prompt arguments with `{{KEY}}`
Use `{{KEY}}` placeholders in your prompt to inject values from the `promptArgs` option. This is useful for reusing the same prompt file across multiple runs with different parameters.
import { run } from "@ai-hero/sandcastle";
await run({
promptFile: "./my-prompt.md",
promptArgs: { ISSUE_NUMBER: 42, PRIORITY: "high" },
});
In the prompt file:
Work on issue #{{ISSUE_NUMBER}} (priority: {{PRIORITY}}).
Prompt argument substitution runs on the host before shell expression expansion, so `{{KEY}}` placeholders inside `` !`command` `` expressions are replaced first:
!`gh issue view {{ISSUE_NUMBER}} --json body -q .body`
A `{{KEY}}` placeholder with no matching prompt argument is an error. Unused prompt arguments produce a warning.
`` !`command` `` expansion only runs on shell blocks written in the prompt file itself. Any `` !`…` `` pattern that appears inside an argument value is treated as inert text — it won't be executed against the host shell. This makes it safe to pass user-authored content (issue titles, PR descriptions, docs excerpts) through `promptArgs`.
### Built-in prompt arguments
Sandcastle automatically injects two built-in prompt arguments into every prompt:
| Placeholder | Value |
| ------------------- | ----------------------------------------------------------------- |
| `{{SOURCE_BRANCH}}` | The branch the agent works on (determined by the branch strategy) |
| `{{TARGET_BRANCH}}` | The host's active branch at `run()` time |
Use them in your prompt without passing them via `promptArgs`:
You are working on {{SOURCE_BRANCH}}. When diffing, compare against {{TARGET_BRANCH}}.
Passing `SOURCE_BRANCH` or `TARGET_BRANCH` in `promptArgs` is an error — built-in prompt arguments cannot be overridden.
### Early termination with `COMPLETE`
This is useful for task-based workflows where the agent should stop once it has finished, rather than running all remaining iterations.
You can override the default signal by passing `completionSignal` to `run()`. It accepts a single string or an array of strings:
await run({
// ...
completionSignal: "DONE",
});
// Or pass multiple signals — the loop stops on the first match:
await run({
// ...
completionSignal: ["TASK_COMPLETE", "TASK_ABORTED"],
});
Tell the agent to output your chosen string(s) in the prompt, and the orchestrator will stop when it detects any of them. The matched signal is returned as `result.completionSignal`.
#### Hanging processes after the completion signal
The agent process is expected to exit shortly after emitting the completion signal. When a child it spawned — a `gh`/git subprocess, a long-lived MCP server, etc. — inherits the agent's stdout pipe and keeps it open, the parent process can linger long past its logical end. Sandcastle would otherwise wait for the full `idleTimeoutSeconds` and fail with `AgentIdleTimeoutError`, throwing away the commits the agent already made.
Instead, once the completion signal is observed in the output buffer, Sandcastle swaps in a short **completion timeout** (default 60 s). When it expires, the run resolves successfully with a warning that the process was hanging; `result.commits` and `result.completionSignal` are populated as if the process had exited cleanly. The timer resets on every subsequent output line, so trailing data emitted after the signal — token-usage events, terminal `result` events, a structured-output `` — is still captured.
A clean process exit always wins the race, so healthy runs gain zero added latency. The completion timeout only matters when the process hangs.
Tune the window with `completionTimeoutSeconds`:
await run({
// ...
completionTimeoutSeconds: 30, // shorter grace window
});
This is independent of `idleTimeoutSeconds`. They cover different phases: `idleTimeoutSeconds` runs **before** any signal is seen (genuinely stuck agent → fail); `completionTimeoutSeconds` runs **after** the signal is seen (hanging process → succeed with warning). See [ADR 0019](docs/adr/0019-completion-timeout-for-hanging-process.md).
### Structured output
Use `Output.object()` to extract a typed, schema-validated JSON payload from the agent's stdout. The agent emits its answer inside an XML tag you specify, and Sandcastle parses, validates, and returns it on `result.output`. The schema can be any [Standard Schema](https://standardschema.dev) validator — the examples below use [Zod](https://zod.dev), but Valibot, ArkType, and others work identically. See [ADR 0010](docs/adr/0010-structured-output.md) for design rationale.
import { run, Output, claudeCode } from "@ai-hero/sandcastle";
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
import { z } from "zod";
const result = await run({
agent: claudeCode("claude-opus-4-7"),
sandbox: docker(),
prompt: `Analyze the code, and output the result as JSON inside tags.
The result must match this schema:
{ summary: string; score: string }
`,
output: Output.object({
tag: "result",
schema: z.object({ summary: z.string(), score: z.number() }),
}),
});
console.log(result.output.summary); // typed as string
console.log(result.output.score); // typed as number
`Output.string({ tag })` extracts the tag contents as a plain string (trimmed, no JSON parsing). Both helpers require `maxIterations` to be `1` (the default). The resolved prompt must contain the configured opening tag literal.
When extraction or validation fails, `run()` throws a `StructuredOutputError`. Alongside `tag`, `rawMatched`, `cause`, `commits`, `branch`, and `preservedWorktreePath`, the error carries the `sessionId` (and `sessionFilePath`, when the session was captured) of the run that produced the bad output. You can resume that session to ask the agent to re-emit corrected output, without repeating the work:
import { run, Output, StructuredOutputError } from "@ai-hero/sandcastle";
try {
return await run({ ...opts, output });
} catch (e) {
if (e instanceof StructuredOutputError && e.sessionId) {
return await run({
...opts,
output,
resumeSession: e.sessionId,
prompt: `Your previous output failed: ${e.message}. Re-emit it inside <${e.tag}> tags.`,
});
}
throw e;
}
### Templates
| Template | Description |
| ------------------------------ | ------------------------------------------------------------------------- |
| `blank` | Bare scaffold — write your own prompt and orchestration |
| `simple-loop` | Picks issues one by one and closes them |
| `sequential-reviewer` | Implements issues one by one, with a code review step after each |
| `parallel-planner` | Plans parallelizable issues, executes on separate branches, then merges |
| `parallel-planner-with-review` | Plans parallelizable issues, executes with per-branch review, then merges |
Select a template during `sandcastle init` when prompted, or re-run init in a fresh repo to try a different one.
## CLI commands
### `sandcastle init`
Scaffolds the `.sandcastle/` config directory and builds the container image. This is the first command you run in a new repo. You choose a sandbox provider (Docker or Podman) during init — selecting Podman writes a `Containerfile` instead of `Dockerfile` and uses `sandcastle podman build-image` for the build step.
Init detects your host package manager (npm, pnpm, yarn, or bun) from a `packageManager` field or lockfile, defaulting to npm. Templates whose `main` file imports a host dependency — the planner templates import [Zod](https://zod.dev) for their `` output schema — prompt you to install it with that package manager when it isn't already in your `package.json`, so the first `npx tsx .sandcastle/main.ts` doesn't fail with `ERR_MODULE_NOT_FOUND`.
Every interactive prompt has a paired `--flag` so the entire init can run non-interactively (e.g. in CI or a scripted setup). When stdin is not a TTY and a required flag is missing, init fails fast with a clear error rather than wedging on a prompt.
| Option | Required | Default | Description |
| ------------------------- | -------- | ---------------------------- | -------------------------------------------------------------------------------------------------------------- |
| `--image-name` | No | `sandcastle:` | Docker image name |
| `--agent` | No | Interactive prompt | Agent to use (`claude-code`, `pi`, `codex`, `cursor`, `opencode`, `copilot`) |
| `--model` | No | Agent's default model | Model to use (e.g. `claude-sonnet-4-6`). Defaults to agent's default |
| `--sandbox` | No | Interactive prompt | Sandbox provider to use (`docker`, `podman`) |
| `--template` | No | Interactive prompt | Template to scaffold (e.g. `blank`, `simple-loop`) |
| `--issue-tracker` | No | Interactive prompt | Issue tracker to use (`github-issues`, `beads`, `custom`) |
| `--create-label` | No | Interactive prompt | `true` / `false` — whether to create the `Sandcastle` GitHub label (only with `--issue-tracker github-issues`) |
| `--build-image` | No | Interactive prompt | `true` / `false` — whether to build the sandbox image now (silently ignored with `--issue-tracker custom`) |
| `--install-template-deps` | No | Interactive prompt | `true` / `false` — whether to install template host deps (e.g. `zod` for the planner templates) |
Creates the following files:
.sandcastle/
├── Dockerfile # Sandbox environment (customize as needed)
├── prompt.md # Agent instructions
├── .env.example # Token placeholders
└── .gitignore # Ignores .env, logs/
Errors if `.sandcastle/` already exists to prevent overwriting customizations.
### `sandcastle docker build-image`
Rebuilds the Docker image from an existing `.sandcastle/` directory. Use this after modifying the Dockerfile. On Linux/macOS, the build automatically passes `--build-arg AGENT_UID=$(id -u)` and `AGENT_GID=$(id -g)` so the image's `agent` user matches the host UID — this prevents permission errors on image-built files without runtime chown.
| Option | Required | Default | Description |
| -------------- | -------- | ---------------------------- | --------------------------------------------------------------------------------- |
| `--image-name` | No | `sandcastle:` | Docker image name |
| `--dockerfile` | No | — | Path to a custom Dockerfile (build context will be the current working directory) |
### `sandcastle docker remove-image`
Removes the Docker image.
| Option | Required | Default | Description |
| -------------- | -------- | ---------------------------- | ----------------- |
| `--image-name` | No | `sandcastle:` | Docker image name |
### `sandcastle podman build-image`
Builds the Podman image from an existing `.sandcastle/` directory. Use this after modifying the Containerfile.
| Option | Required | Default | Description |
| ----------------- | -------- | ---------------------------- | ------------------------------------------------------------------------------------ |
| `--image-name` | No | `sandcastle:` | Podman image name |
| `--containerfile` | No | — | Path to a custom Containerfile (build context will be the current working directory) |
### `sandcastle podman remove-image`
Removes the Podman image.
| Option | Required | Default | Description |
| -------------- | -------- | ---------------------------- | ----------------- |
| `--image-name` | No | `sandcastle:` | Podman image name |
### `RunOptions`
| Option | Type | Default | Description |
| -------------------------- | ------------------ | ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `agent` | AgentProvider | — | **Required.** Agent provider (e.g. `claudeCode("claude-opus-4-7")`, `pi("claude-sonnet-4-6")`, `codex("gpt-5.4-mini")`, `cursor("composer-2")`, `opencode("opencode/big-pickle")`, `copilot("claude-sonnet-4.5")`) |
| `sandbox` | SandboxProvider | — | **Required.** Sandbox provider (e.g. `docker()`, `podman()`, `docker({ imageName: "sandcastle:local" })`) |
| `cwd` | string | `process.cwd()` | Host repo directory — anchor for `.sandcastle/` artifacts and git operations. Relative paths resolve against `process.cwd()`. |
| `prompt` | string | — | Inline prompt (mutually exclusive with `promptFile`) |
| `promptFile` | string | — | Path to prompt file (mutually exclusive with `prompt`). Resolves against `process.cwd()`, **not** `cwd`. |
| `maxIterations` | number | `1` | Maximum iterations to run |
| `hooks` | SandboxHooks | — | Lifecycle hooks (`host.*`, `sandbox.*`) |
| `name` | string | — | Display name for the run, shown as a prefix in log output |
| `promptArgs` | PromptArgs | — | Key-value map for `{{KEY}}` placeholder substitution |
| `branchStrategy` | BranchStrategy | per-provider default | Branch strategy: `{ type: 'head' }`, `{ type: 'merge-to-head' }`, or `{ type: 'branch', branch: '…' }` |
| `copyToWorktree` | string[] | — | Host-relative file paths to copy into the sandbox before start (not supported with `branchStrategy: { type: 'head' }`) |
| `logging` | object | file (auto-generated) | `{ type: 'file', path }` or `{ type: 'stdout' }` |
| `completionSignal` | string \| string[] | `COMPLETE` | String or array of strings the agent emits to stop the iteration loop early |
| `idleTimeoutSeconds` | number | `600` | Idle timeout in seconds — resets on each agent output event |
| `completionTimeoutSeconds` | number | `60` | Grace window in seconds after the completion signal is observed but the agent process has not exited (hanging process). See [Hanging processes after the completion signal](#hanging-processes-after-the-completion-signal). |
| `resumeSession` | string | — | Resume a prior session by ID for agents that support resume. Incompatible with `maxIterations > 1`. Session file must exist on host. |
| `signal` | AbortSignal | — | Cancel the run when aborted. Kills the in-flight agent subprocess and cancels lifecycle hooks; the worktree is preserved on disk. Rejects with `signal.reason`. |
| `timeouts` | Timeouts | — | Override default timeouts for built-in lifecycle steps: `copyToWorktreeMs` (60 000), `gitSetupMs` (10 000), `commitCollectionMs` (30 000), `mergeToHostMs` (30 000). |
| `output` | OutputDefinition | — | Structured output definition (`Output.object(…)` or `Output.string(…)`). Requires `maxIterations === 1`. See [Structured output](#structured-output). |
### `RunResult`
| Field | Type | Description |
| ------------------ | ------------------- | ------------------------------------------------------------------ |
| `iterations` | `IterationResult[]` | Per-iteration results (use `.length` for the count) |
| `completionSignal` | string? | The matched completion signal string, or `undefined` if none fired |
| `stdout` | string | Agent output |
| `commits` | `{ sha }[]` | Commits created during the run |
| `branch` | string | Target branch name |
| `logFilePath` | string? | Path to the log file (only when logging to a file) |
| `output` | T? | Typed structured output (only present when `output` option is set) |
### `IterationResult`
| Field | Type | Description |
| ----------------- | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| `sessionId` | string? | Agent session ID from the provider stream, or `undefined` if the provider does not emit one |
| `sessionFilePath` | string? | Absolute host path to the captured session JSONL, or `undefined` when capture is off |
| `usage` | `IterationUsage`? | Token usage snapshot from the last assistant message, or `undefined` when capture is off or provider does not support usage parsing |
### `IterationUsage`
| Field | Type | Description |
| -------------------------- | ------ | ------------------------------------------ |
| `inputTokens` | number | Input tokens consumed |
| `cacheCreationInputTokens` | number | Tokens used to create prompt cache entries |
| `cacheReadInputTokens` | number | Tokens read from prompt cache |
| `outputTokens` | number | Output tokens generated |
### Session capture
After each resumable provider iteration, Sandcastle automatically captures the agent's session file from the sandbox to the host. Claude Code sessions are stored under `~/.claude/projects//.jsonl`; Codex sessions are stored under `~/.codex/sessions/YYYY/MM/DD/rollout-*-.jsonl`; Pi sessions are stored under `~/.pi/agent/sessions/----/_.jsonl`. Any provider-specific `cwd` fields are rewritten to match the host repo root, so the provider's native resume command works.
Session capture is enabled by default for `claudeCode()`, `codex()`, and `pi()` and can be opted out via `captureSessions: false`. Providers without `sessionStorage` do not attempt capture. Capture failure fails the run.
### Session resume
Pass `resumeSession` to `run()` to continue a prior Claude Code, Codex, or Pi conversation inside a new sandbox:
const result = await run({
agent: claudeCode("claude-opus-4-7"),
sandbox: docker(),
prompt: "Continue where you left off",
resumeSession: "abc-123-def",
});
You can also continue the last captured session from a result:
const first = await run({
agent: codex("gpt-5.4-mini"),
sandbox: docker(),
prompt: "Draft a plan",
});
const second = await first.resume?.("Now implement the plan");
`resume` is present only on results from resumable providers (Claude Code, Codex, Pi) — hence the optional-chaining call.
Before the sandbox starts, Sandcastle validates that the session file exists on the host and transfers it into the sandbox with `cwd` fields rewritten to match the sandbox-side path. Claude Code receives `--resume `; Codex receives `codex exec resume ` with the prompt piped over stdin; Pi receives `--session `.
Constraints:
- `resumeSession` is incompatible with `maxIterations > 1` (throws before sandbox creation).
- The provider's host session file must exist (throws before sandbox creation).
- Only iteration 1 receives the resume flag; subsequent iterations (if any) start fresh.
- Providers without resume support reject `resumeSession`.
### Session fork
`RunResult.fork(prompt, options?)` is the sibling of `.resume()`: it continues from the last captured session but leaves the parent session JSONL untouched and writes the child under a new session id. The mechanism is the agent's native fork flag — `claude --resume --fork-session` for Claude Code, `codex exec fork ` for Codex.
Fork enables fan-out workflows where a single parent run is the starting point for several independent children:
const parent = await run({
agent: claudeCode("claude-opus-4-7"),
sandbox: docker(),
prompt: "Read the codebase and summarise the data model",
});
const [reviewA, reviewB] = await Promise.all([
parent.fork?.("Review the migration plan", {
branchStrategy: { type: "branch", branch: "review-a" },
}),
parent.fork?.("Audit the auth layer", {
branchStrategy: { type: "branch", branch: "review-b" },
}),
]);
**Fork is session-only.** `--fork-session` and `codex exec fork` isolate the agent session JSONL — they do **not** isolate the branch, worktree, or sandbox. Safe concurrent fan-out (`Promise.all([r.fork(a), r.fork(b)])`) requires the caller to give each child a distinct `branch` via `branchStrategy: { type: "branch", branch: "..." }`. The default `head` and `merge-to-head` strategies are **not** safe for concurrent forks: `head` shares the host working directory across all children, and `merge-to-head` races `git merge` against the same HEAD. See [ADR 0018](docs/adr/0018-fork-is-session-only.md).
`fork` is present only on results from providers with `sessionStorage` (Claude Code, Codex) — hence the optional-chaining call. The same single-iteration and session-file constraints as `.resume()` apply.
### `ClaudeCodeOptions`
The `claudeCode()` factory accepts an optional second argument for provider-specific options:
agent: claudeCode("claude-opus-4-7", { effort: "high" });
| Option | Type | Default | Description |
| ----------------- | --------------------------------------------------------- | ------- | --------------------------------------------------------- |
| `effort` | `"low"` \| `"medium"` \| `"high"` \| `"xhigh"` \| `"max"` | — | Claude Code reasoning effort level (`max` is Opus only) |
| `env` | `Record` | `{}` | Environment variables injected by this agent provider |
| `captureSessions` | `boolean` | `true` | Capture agent session JSONL to host for `claude --resume` |
### `CodexOptions`
The `codex()` factory accepts an optional second argument for provider-specific options:
agent: codex("gpt-5.4", { effort: "high" });
| Option | Type | Default | Description |
| ----------------- | ---------------------------------------------- | ------- | --------------------------------------------------------- |
| `effort` | `"low"` \| `"medium"` \| `"high"` \| `"xhigh"` | — | Codex reasoning effort level via `model_reasoning_effort` |
| `env` | `Record` | `{}` | Environment variables injected by this agent provider |
| `captureSessions` | `boolean` | `true` | Capture Codex rollout JSONL to host for resume |
### `PiOptions`
The `pi()` factory accepts an optional second argument for provider-specific options:
agent: pi("claude-sonnet-4-6", { thinking: "high" });
| Option | Type | Default | Description |
| ----------------- | ------------------------------------------------------------------------ | ------- | -------------------------------------------------------- |
| `thinking` | `"off"` \| `"minimal"` \| `"low"` \| `"medium"` \| `"high"` \| `"xhigh"` | — | Pi reasoning effort level via the `--thinking` flag |
| `env` | `Record` | `{}` | Environment variables injected by this agent provider |
| `captureSessions` | `boolean` | `true` | Capture pi session JSONL to host for `pi --session ` |
### Provider `env`
Both **agent providers** and **sandbox providers** accept an optional `env: Record` in their options. These environment variables are merged with the `.sandcastle/.env` resolver output at launch time:
await run({
agent: claudeCode("claude-opus-4-7", {
env: { ANTHROPIC_API_KEY: "sk-ant-..." },
}),
sandbox: docker({
env: { DOCKER_SPECIFIC_VAR: "value" },
}),
prompt: "Fix issue #42",
});
**Merge rules:**
- Provider env (agent + sandbox) overrides `.sandcastle/.env` resolver output for shared keys
- Agent provider env and sandbox provider env **must not overlap** — if they share any key, `run()` throws an error
- When `env` is not provided, it defaults to `{}`
Environment variables are also resolved automatically from `.sandcastle/.env` and `process.env` — no need to pass them to the API. The required variables depend on the **agent provider** (see `sandcastle init` output for details).
## Custom Sandbox Providers
Sandcastle ships with built-in providers for Docker, Podman, and Vercel, but you can create your own. A sandbox provider tells Sandcastle how to execute commands in an isolated environment. There are two kinds:
- **Bind-mount** — the sandbox can mount a host directory. Sandcastle creates a worktree on the host and the provider mounts it in. No file sync needed. Use this for Docker, Podman, or any local container runtime.
- **Isolated** — the sandbox has its own filesystem (e.g. a cloud VM). The provider handles syncing code in and out via `copyIn` and `copyFileOut`. Use this when the sandbox cannot access the host filesystem.
### The sandbox handle contract
Both provider types return a **sandbox handle** from their `create()` function. The handle exposes:
| Method | Required | Description |
| -------------- | ---------- | ---------------------------------------------------------------------------- |
| `exec` | Both | Run a command, optionally streaming stdout line-by-line via `options.onLine` |
| `close` | Both | Tear down the sandbox |
| `copyFileIn` | Bind-mount | Copy a single file from the host into the sandbox |
| `copyFileOut` | Both | Copy a single file from the sandbox to the host |
| `copyIn` | Isolated | Copy a file or directory from the host into the sandbox |
| `worktreePath` | Both | Absolute path to the repo directory inside the sandbox |
### `ExecResult`
Every `exec` call returns an `ExecResult`:
interface ExecResult {
readonly stdout: string;
readonly stderr: string;
readonly exitCode: number;
}
### Bind-mount provider example
A minimal bind-mount provider that shells out to local processes (no container):
import {
createBindMountSandboxProvider,
type BindMountCreateOptions,
type BindMountSandboxHandle,
type ExecResult,
} from "@ai-hero/sandcastle";
import { execFile, spawn } from "node:child_process";
import { copyFile as fsCopyFile, mkdir as fsMkdir } from "node:fs/promises";
import { dirname } from "node:path";
import { createInterface } from "node:readline";
const localProcess = () =>
createBindMountSandboxProvider({
name: "local-process",
create: async (
options: BindMountCreateOptions,
): Promise => {
const worktreePath = options.worktreePath;
return {
worktreePath,
exec: (
command: string,
opts?: { onLine?: (line: string) => void; cwd?: string },
): Promise => {
if (opts?.onLine) {
const onLine = opts.onLine;
return new Promise((resolve, reject) => {
const proc = spawn("sh", ["-c", command], {
cwd: opts?.cwd ?? worktreePath,
stdio: ["ignore", "pipe", "pipe"],
});
const stdoutChunks: string[] = [];
const stderrChunks: string[] = [];
const rl = createInterface({ input: proc.stdout! });
rl.on("line", (line) => {
stdoutChunks.push(line);
onLine(line); // forward each line to Sandcastle
});
proc.stderr!.on("data", (chunk: Buffer) => {
stderrChunks.push(chunk.toString());
});
proc.on("error", (err) => reject(err));
proc.on("close", (code) => {
resolve({
stdout: stdoutChunks.join("\n"),
stderr: stderrChunks.join(""),
exitCode: code ?? 0,
});
});
});
}
return new Promise((resolve, reject) => {
execFile(
"sh",
["-c", command],
{ cwd: opts?.cwd ?? worktreePath, maxBuffer: 10 * 1024 * 1024 },
(error, stdout, stderr) => {
if (error && error.code === undefined) {
reject(new Error(`exec failed: ${error.message}`));
} else {
resolve({
stdout: stdout.toString(),
stderr: stderr.toString(),
exitCode: typeof error?.code === "number" ? error.code : 0,
});
}
},
);
});
},
copyFileIn: async (hostPath: string, sandboxPath: string) => {
await fsMkdir(dirname(sandboxPath), { recursive: true });
await fsCopyFile(hostPath, sandboxPath);
},
copyFileOut: async (sandboxPath: string, hostPath: string) => {
await fsMkdir(dirname(hostPath), { recursive: true });
await fsCopyFile(sandboxPath, hostPath);
},
close: async () => {
// nothing to tear down for a local process
},
};
},
});
### Isolated provider example
A minimal isolated provider using a temp directory:
import {
createIsolatedSandboxProvider,
type IsolatedSandboxHandle,
type ExecResult,
} from "@ai-hero/sandcastle";
import { execFile, spawn } from "node:child_process";
import { copyFile, mkdir, mkdtemp, rm } from "node:fs/promises";
import { tmpdir } from "node:os";
import { dirname, join } from "node:path";
import { createInterface } from "node:readline";
const tempDir = () =>
createIsolatedSandboxProvider({
name: "temp-dir",
create: async (): Promise => {
const root = await mkdtemp(join(tmpdir(), "sandbox-"));
const worktreePath = join(root, "workspace");
await mkdir(worktreePath, { recursive: true });
return {
worktreePath,
exec: (
command: string,
opts?: { onLine?: (line: string) => void; cwd?: string },
): Promise => {
if (opts?.onLine) {
const onLine = opts.onLine;
return new Promise((resolve, reject) => {
const proc = spawn("sh", ["-c", command], {
cwd: opts?.cwd ?? worktreePath,
stdio: ["ignore", "pipe", "pipe"],
});
const stdoutChunks: string[] = [];
const stderrChunks: string[] = [];
const rl = createInterface({ input: proc.stdout! });
rl.on("line", (line) => {
stdoutChunks.push(line);
onLine(line);
});
proc.stderr!.on("data", (chunk: Buffer) => {
stderrChunks.push(chunk.toString());
});
proc.on("error", (err) => reject(err));
proc.on("close", (code) => {
resolve({
stdout: stdoutChunks.join("\n"),
stderr: stderrChunks.join(""),
exitCode: code ?? 0,
});
});
});
}
return new Promise((resolve, reject) => {
execFile(
"sh",
["-c", command],
{ cwd: opts?.cwd ?? worktreePath, maxBuffer: 10 * 1024 * 1024 },
(error, stdout, stderr) => {
if (error && error.code === undefined) {
reject(new Error(`exec failed: ${error.message}`));
} else {
resolve({
stdout: stdout.toString(),
stderr: stderr.toString(),
exitCode: typeof error?.code === "number" ? error.code : 0,
});
}
},
);
});
},
copyIn: async (hostPath: string, sandboxPath: string) => {
const info = await stat(hostPath);
if (info.isDirectory()) {
await cp(hostPath, sandboxPath, { recursive: true });
} else {
await mkdir(dirname(sandboxPath), { recursive: true });
await copyFile(hostPath, sandboxPath);
}
},
copyFileOut: async (sandboxPath: string, hostPath: string) => {
await mkdir(dirname(hostPath), { recursive: true });
await copyFile(sandboxPath, hostPath);
},
close: async () => {
await rm(root, { recursive: true, force: true });
},
};
},
});
### Branch strategies
A branch strategy controls where the agent's commits land. Configure it when constructing the provider:
| Strategy | Behavior | Bind-mount | Isolated |
| --------------- | ------------------------------------------------------------------------ | ---------- | --------- |
| `head` | Agent writes directly to the host working directory. No worktree created | Default | N/A |
| `merge-to-head` | Sandcastle creates a temp branch, merges back to HEAD when done | Supported | Default |
| `branch` | Commits land on an explicit named branch you provide | Supported | Supported |
**When to use each:**
- **`head`** — fast iteration during development. No branch indirection, no merge step. Only works with bind-mount providers since the agent needs direct host filesystem access.
- **`merge-to-head`** — safe default for automation. The agent works on a throwaway branch; if something goes wrong, HEAD is untouched. Use this for CI or unattended runs.
- **`branch`** — when you want commits on a specific branch (e.g. for a PR). Pass `{ type: "branch", branch: "agent/fix-42" }`.
Branch strategy is now configured on `run()`, not on the provider:
import { run, claudeCode } from "@ai-hero/sandcastle";
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
// head — direct write, bind-mount only (default for bind-mount providers)
await run({
agent: claudeCode("claude-opus-4-7"),
sandbox: docker(),
prompt: "…",
});
// merge-to-head — temp branch, merge back (default for isolated providers)
await run({
agent: claudeCode("claude-opus-4-7"),
sandbox: tempDir(),
prompt: "…",
});
// branch — explicit named branch
await run({
agent: claudeCode("claude-opus-4-7"),
sandbox: docker(),
branchStrategy: { type: "branch", branch: "agent/fix-42" },
prompt: "…",
});
### Passing to `run()`
Pass your custom provider via the `sandbox` option — it works the same as the built-in `docker()` provider:
import { run, claudeCode } from "@ai-hero/sandcastle";
const result = await run({
agent: claudeCode("claude-opus-4-7"),
sandbox: localProcess(), // your custom provider
prompt: "Fix issue #42 in this repo.",
});
### Reference implementations
For real-world examples, see:
## Configuration
### Config directory (`.sandcastle/`)
All per-repo sandbox configuration lives in `.sandcastle/`. Run `sandcastle init` to create it.
### Custom Dockerfile
The `.sandcastle/Dockerfile` controls the sandbox environment. The default template installs:
- **Node.js 22** (base image)
- **git**, **curl**, **jq** (system dependencies)
- **GitHub CLI** (`gh`)
- **Claude Code CLI**
- A non-root `agent` user (required — Claude runs as this user)
When customizing the Dockerfile, ensure you keep:
- A non-root user (the default `agent` user) for Claude to run as
- `git` (required for commits and branch operations)
- `gh` (required for issue fetching)
- Claude Code CLI installed and on PATH
Add your project-specific dependencies (e.g., language runtimes, build tools) to the Dockerfile as needed.
### Hooks
Hooks are grouped by **where** they run — `host` (on the developer's machine) or `sandbox` (inside the container):
hooks: {
host: {
onWorktreeReady: [{ command: "cp .env.example .env" }],
onSandboxReady: [{ command: "echo sandbox is up" }],
},
sandbox: {
onSandboxReady: [
{ command: "npm install", timeoutMs: 300_000 },
{ command: "apt-get install -y ffmpeg", sudo: true },
],
},
}
| Hook | Runs on | When | Working directory |
| ------------------------ | ------- | -------------------------------------------- | ------------------------------------------- |
| `host.onWorktreeReady` | Host | After `copyToWorktree`, before sandbox start | Worktree path (host repo root under `head`) |
| `host.onSandboxReady` | Host | After sandbox is up | Worktree path (host repo root under `head`) |
| `sandbox.onSandboxReady` | Sandbox | After sandbox is up | Sandbox repo directory |
**Ordering:** `copyToWorktree` -> `host.onWorktreeReady` (sequential) -> sandbox created -> `host.onSandboxReady` + `sandbox.onSandboxReady` (parallel).
- **Host hooks** accept `{ command: string; timeoutMs?: number }` — no `sudo`, no `cwd`. Use `cd` or inline env in the command string.
- **Sandbox hooks** accept `{ command: string; sudo?: boolean; timeoutMs?: number }` — set `sudo: true` for elevated privileges.
- **`timeoutMs`** overrides the default 60 s per-hook timeout. Useful for long-running setup commands like dependency installs (e.g. `timeoutMs: 300_000` for 5 minutes).
- Within each hook point, sandbox hooks run in parallel; host hooks within `onSandboxReady` also run in parallel with sandbox hooks. `host.onWorktreeReady` hooks run sequentially in declared order.
- If any hook exits non-zero, setup fails fast.
- When a `signal` is passed to `run()`, it is threaded to all hooks — aborting the signal cancels any in-flight hook commands.
## Development
npm install
npm run build # Bundle with tsup
npm test # Run tests with vitest
npm run typecheck # Type-check
## License
MIT