AgentBox - SDK to run coding agents in any sandbox
One SDK to run Claude Code, Codex, or OpenCode inside Docker, E2B, Modal, Daytona, or Vercel sandboxes - boots each agent's native server (JSON-RPC, HTTP/SSE) instead of using non-interactive --print mode.
If you've ever tried to "run Claude Code in a sandbox" and ended up shelling out to claude --print in non-interactive mode, you've felt the limit AgentBox is built around. Non-interactive mode strips most of what makes the agent useful: approval flows, tool-use control, streaming events. AgentBox does it differently - it boots each agent's native server inside the sandbox and talks to it over WebSocket or HTTP. Full interactive capabilities, intact.
The other half of the value: agent and sandbox are both pluggable. Swap providers and your application code stays the same.
The minimal example
import { Agent, Sandbox } from "agentbox-sdk";
const sandbox = new Sandbox("local-docker", {
workingDir: "/workspace",
image: process.env.IMAGE_ID!,
env: { ANTHROPIC_API_KEY: process.env.ANTHROPIC_API_KEY! },
});
await sandbox.findOrProvision();
const run = new Agent("claude-code", {
sandbox,
cwd: "/workspace",
approvalMode: "auto",
}).stream({
model: "sonnet",
input: "Create a hello world Express server in /workspace/server.ts",
});
for await (const event of run) {
if (event.type === "text.delta") process.stdout.write(event.delta);
}
await sandbox.delete();
That's the whole shape: construct, provision, run, stream, delete.
Install and image setup
npm install agentbox-sdk
Requires Node >= 20. The agent CLI you want to run (claude, opencode, codex) needs to be installed inside your sandbox image - AgentBox boots the CLI's server, so the binary has to be there.
For each sandbox provider, build a base image from one of the bundled presets:
npx agentbox image build --provider local-docker --preset browser-agent
The build prints the provider's native image reference - a Docker tag, Modal image ID, E2B template, or Daytona snapshot. Set it as IMAGE_ID.
Agents
Three providers, all running their CLI inside the sandbox:
| Provider | CLI | Model format |
|---|---|---|
claude-code | claude | sonnet, opus, haiku |
opencode | opencode | anthropic/claude-sonnet-4-6, openai/gpt-4.1, ... |
codex | codex | gpt-5.3-codex, gpt-5.4 |
A reasoning level can be passed alongside model: low | medium | high | xhigh. AgentBox maps it to each provider's native control - Codex's effort on turn/start, Claude Code's --effort flag, OpenCode's reasoningEffort agent variant. xhigh requires a model that supports it (Opus 4.7+, Codex gpt-5.4).
Sandboxes
Five providers, same interface:
| Provider | What it is | Auth |
|---|---|---|
local-docker | Local Docker container | Docker daemon |
e2b | Cloud micro-VM | E2B_API_KEY |
modal | Cloud container | MODAL_TOKEN_ID + MODAL_TOKEN_SECRET |
daytona | Cloud dev environment | DAYTONA_API_KEY |
vercel | Ephemeral cloud VM | VERCEL_TOKEN + team + project |
Every sandbox supports findOrProvision, run, runAsync, gitClone, uploadAndRun, openPort, getPreviewLink, snapshot, stop, delete.
The lifecycle quirk worth knowing: new Sandbox(...) only stores configuration. It doesn't create or attach to a real sandbox. You have to call findOrProvision() once before any operation that needs a live sandbox, including agent runs. Calling anything else first throws a clear error rather than silently lazy-creating - which makes the (potentially slow) attach/create step explicit.
Vercel is the odd one out. Two specifics:
- It uses runtime snapshots, not pre-built images. Call
sandbox.snapshot()to capture state and pass the returned id viaprovider.snapshotIdnext run. - Ports must be declared at create time via
provider.ports.openPort()is a no-op at runtime, so any port the agent or your code will listen on must be listed up front (e.g. opencode uses 4096; codex/claude-code use 43180).
Skills, sub-agents, MCPs, custom commands - all addressable
The SDK exposes the parts you'd otherwise reach into provider-specific config to set:
- Skills - attach GitHub repos as agent skills (cloned into the sandbox), or embed inline with a
SKILL.mdstring. - Sub-agents - declare named delegates with their own instructions and tool allowlist.
- MCP servers - both
local(spawn a process inside the sandbox) andremote(URL with SSE). - Custom commands - register slash commands (or
$-prefixed for Codex) the agent can invoke. - Multimodal input - mix text, images, and PDFs (provider-dependent: opencode does text/images/files, claude-code does text/images/PDFs, codex does text/images).
- Custom images - define your own image with a small
.mjsconfig andnpx agentbox image build --file ./my-image.mjs.
Hooks
Each provider's native hook format is exposed - Claude Code's PostToolUse/PreToolUse hook config maps directly, OpenCode and Codex have their own equivalents. The SDK doesn't try to invent a unified hook abstraction; it forwards each provider's native shape.
When to reach for it
- You're building a product on top of coding agents and need provider portability without rewriting your app code.
- You want full interactive capabilities (approval flows, streaming, tool-use control) inside a sandbox.
- You need to mix sandbox providers - dev on local-docker, staging on E2B, production on Modal - without forking application logic.
When not to
- One-off scripts. The SDK is overkill if you just want to run an agent against a local repo.
- Workflows where the non-interactive
--printmode is genuinely sufficient. AgentBox's value is interactive parity; if you don't need it, you're paying complexity for nothing.
Trade-offs
The "boot the agent's server inside the sandbox" approach is the right call for capability but not for cold start - each new sandbox has to spin up the agent CLI inside it before you can issue the first turn. For high-volume, sub-second workloads, pair AgentBox with a sandbox provider that snapshots fast (CubeSandbox, E2B, or Modal) and reuse where possible.
The provider-portable surface is real but not absolute. Multimodal capability differs by provider; reasoning levels map differently; some hook shapes are provider-specific. Read the matrix before assuming a feature works everywhere.
Recent discussion
From the wider webFeatured in
Claude Code tools, plugins, and integrations
The best tools, MCP servers, and harnesses for getting more out of Claude Code - orchestration, observability, telemetry, and remote control.
Security tools for AI coding agents
Sandboxes, scanners, proxies, and governance toolkits that keep autonomous agents from doing damage.
Tools for OpenAI Codex CLI
The Codex-aware slice of the directory: orchestration, observability, sandboxes, and bridges built specifically for the OpenAI Codex runtime.
Related entries
SmolVM - one-command sandbox for Claude Code and Codex
Pre-installed sandboxed VM with Claude and Codex ready to run, plus git credentials wired up. Removes the 'press enter to accept' loop without exposing the host.
Garden Skills - production skill pack for Claude Code, Cursor, and Codex
Three carefully-scoped skills: web-design-engineer (with an anti-cliche blocklist that breaks the generic-AI-landing-page loop), gpt-image-2 (80+ templates, three runtime modes including advisor-only fallback), and kb-retriever (layered data_structure.md navigation for bounded local-KB retrieval). Tested across Claude Code, Claude.ai, Cursor, Codex, Gemini, OpenCode.
PostTrainBench - can a CLI agent post-train a base LLM in 10 hours?
Benchmark measuring whether Claude Code, Codex CLI, Gemini CLI, and OpenCode can autonomously improve 4 small base models (Qwen3-1.7B/4B, SmolLM3-3B, Gemma-3-4B) on 7 evals (AIME, BFCL, GPQA, GSM8K, HealthBench, HumanEval, Arena Hard) within a single H100 GPU and 10 hours. Includes agent-as-judge anti-reward-hacking and baseline-replacement penalties for tampering.
agent-session-resume - cross-agent session resume skill
Skill that lets Claude Code, Codex, Antigravity, and OpenCode pick up where any of them left off by reading and writing a shared session-state file.