GitHub LibraryFeatured

AgentBox - SDK to run coding agents in any sandbox

One SDK to run Claude Code, Codex, or OpenCode inside Docker, E2B, Modal, Daytona, or Vercel sandboxes - boots each agent's native server (JSON-RPC, HTTP/SSE) instead of using non-interactive --print mode.

Saved Apr 28, 20264 min readView source ↗

#claude-code #codex #opencode #sandbox #agent-security

If you've ever tried to "run Claude Code in a sandbox" and ended up shelling out to claude --print in non-interactive mode, you've felt the limit AgentBox is built around. Non-interactive mode strips most of what makes the agent useful: approval flows, tool-use control, streaming events. AgentBox does it differently - it boots each agent's native server inside the sandbox and talks to it over WebSocket or HTTP. Full interactive capabilities, intact.

The other half of the value: agent and sandbox are both pluggable. Swap providers and your application code stays the same.

The minimal example

import { Agent, Sandbox } from "agentbox-sdk";

const sandbox = new Sandbox("local-docker", {
  workingDir: "/workspace",
  image: process.env.IMAGE_ID!,
  env: { ANTHROPIC_API_KEY: process.env.ANTHROPIC_API_KEY! },
});

await sandbox.findOrProvision();

const run = new Agent("claude-code", {
  sandbox,
  cwd: "/workspace",
  approvalMode: "auto",
}).stream({
  model: "sonnet",
  input: "Create a hello world Express server in /workspace/server.ts",
});

for await (const event of run) {
  if (event.type === "text.delta") process.stdout.write(event.delta);
}

await sandbox.delete();

That's the whole shape: construct, provision, run, stream, delete.

Install and image setup

npm install agentbox-sdk

Requires Node >= 20. The agent CLI you want to run (claude, opencode, codex) needs to be installed inside your sandbox image - AgentBox boots the CLI's server, so the binary has to be there.

For each sandbox provider, build a base image from one of the bundled presets:

npx agentbox image build --provider local-docker --preset browser-agent

The build prints the provider's native image reference - a Docker tag, Modal image ID, E2B template, or Daytona snapshot. Set it as IMAGE_ID.

Agents

Three providers, all running their CLI inside the sandbox:

Provider	CLI	Model format
`claude-code`	`claude`	`sonnet`, `opus`, `haiku`
`opencode`	`opencode`	`anthropic/claude-sonnet-4-6`, `openai/gpt-4.1`, ...
`codex`	`codex`	`gpt-5.3-codex`, `gpt-5.4`

A reasoning level can be passed alongside model: low | medium | high | xhigh. AgentBox maps it to each provider's native control - Codex's effort on turn/start, Claude Code's --effort flag, OpenCode's reasoningEffort agent variant. xhigh requires a model that supports it (Opus 4.7+, Codex gpt-5.4).

Sandboxes

Five providers, same interface:

Provider	What it is	Auth
`local-docker`	Local Docker container	Docker daemon
`e2b`	Cloud micro-VM	`E2B_API_KEY`
`modal`	Cloud container	`MODAL_TOKEN_ID` + `MODAL_TOKEN_SECRET`
`daytona`	Cloud dev environment	`DAYTONA_API_KEY`
`vercel`	Ephemeral cloud VM	`VERCEL_TOKEN` + team + project

Every sandbox supports findOrProvision, run, runAsync, gitClone, uploadAndRun, openPort, getPreviewLink, snapshot, stop, delete.

The lifecycle quirk worth knowing: new Sandbox(...) only stores configuration. It doesn't create or attach to a real sandbox. You have to call findOrProvision() once before any operation that needs a live sandbox, including agent runs. Calling anything else first throws a clear error rather than silently lazy-creating - which makes the (potentially slow) attach/create step explicit.

Vercel is the odd one out. Two specifics:

It uses runtime snapshots, not pre-built images. Call sandbox.snapshot() to capture state and pass the returned id via provider.snapshotId next run.
Ports must be declared at create time via provider.ports. openPort() is a no-op at runtime, so any port the agent or your code will listen on must be listed up front (e.g. opencode uses 4096; codex/claude-code use 43180).

Skills, sub-agents, MCPs, custom commands - all addressable

The SDK exposes the parts you'd otherwise reach into provider-specific config to set:

Skills - attach GitHub repos as agent skills (cloned into the sandbox), or embed inline with a SKILL.md string.
Sub-agents - declare named delegates with their own instructions and tool allowlist.
MCP servers - both local (spawn a process inside the sandbox) and remote (URL with SSE).
Custom commands - register slash commands (or $-prefixed for Codex) the agent can invoke.
Multimodal input - mix text, images, and PDFs (provider-dependent: opencode does text/images/files, claude-code does text/images/PDFs, codex does text/images).
Custom images - define your own image with a small .mjs config and npx agentbox image build --file ./my-image.mjs.

Hooks

Each provider's native hook format is exposed - Claude Code's PostToolUse/PreToolUse hook config maps directly, OpenCode and Codex have their own equivalents. The SDK doesn't try to invent a unified hook abstraction; it forwards each provider's native shape.

When to reach for it

You're building a product on top of coding agents and need provider portability without rewriting your app code.
You want full interactive capabilities (approval flows, streaming, tool-use control) inside a sandbox.
You need to mix sandbox providers - dev on local-docker, staging on E2B, production on Modal - without forking application logic.

When not to

One-off scripts. The SDK is overkill if you just want to run an agent against a local repo.
Workflows where the non-interactive --print mode is genuinely sufficient. AgentBox's value is interactive parity; if you don't need it, you're paying complexity for nothing.

Trade-offs

The "boot the agent's server inside the sandbox" approach is the right call for capability but not for cold start - each new sandbox has to spin up the agent CLI inside it before you can issue the first turn. For high-volume, sub-second workloads, pair AgentBox with a sandbox provider that snapshots fast (CubeSandbox, E2B, or Modal) and reuse where possible.

The provider-portable surface is real but not absolute. Multimodal capability differs by provider; reasoning levels map differently; some hook shapes are provider-specific. Read the matrix before assuming a feature works everywhere.

Recent discussion

From the wider web

Run Claude Code in Any Sandbox with One API: AgentBox SDK
dev.to · Apr 24, 2026

Featured in

Related entries

GitHub Tool

SmolVM - one-command sandbox for Claude Code and Codex

Pre-installed sandboxed VM with Claude and Codex ready to run, plus git credentials wired up. Removes the 'press enter to accept' loop without exposing the host.

#claude-code #codex #sandbox #security #vm

GitHub LibraryFeatured

Garden Skills - production skill pack for Claude Code, Cursor, and Codex

Three carefully-scoped skills: web-design-engineer (with an anti-cliche blocklist that breaks the generic-AI-landing-page loop), gpt-image-2 (80+ templates, three runtime modes including advisor-only fallback), and kb-retriever (layered data_structure.md navigation for bounded local-KB retrieval). Tested across Claude Code, Claude.ai, Cursor, Codex, Gemini, OpenCode.

Why I saved this - The web-design skill's anti-cliche blocklist is the most opinionated take on 'stop producing the same hero + 3 cards' I've seen.

#claude-code #skills #cursor #codex #gemini-cli

GitHub ToolFeatured

PostTrainBench - can a CLI agent post-train a base LLM in 10 hours?

Benchmark measuring whether Claude Code, Codex CLI, Gemini CLI, and OpenCode can autonomously improve 4 small base models (Qwen3-1.7B/4B, SmolLM3-3B, Gemma-3-4B) on 7 evals (AIME, BFCL, GPQA, GSM8K, HealthBench, HumanEval, Arena Hard) within a single H100 GPU and 10 hours. Includes agent-as-judge anti-reward-hacking and baseline-replacement penalties for tampering.

Why I saved this - Current leader: Opus 4.6 via Claude Code at 23.2 average. The reward-hacking safeguards (eval tampering and model-substitution detection, baseline-replacement penalty) are the part most agent benchmarks skip.

#evals #claude-code #codex #gemini-cli #opencode

GitHub Hack

agent-session-resume - cross-agent session resume skill

Skill that lets Claude Code, Codex, Antigravity, and OpenCode pick up where any of them left off by reading and writing a shared session-state file.

#claude-code #codex #opencode #agent-memory #skills