SmolVM - one-command sandbox for Claude Code and Codex
Pre-installed sandboxed VM with Claude and Codex ready to run, plus git credentials wired up. Removes the 'press enter to accept' loop without exposing the host.
This entry doesn't have a long-form writeup yet. Follow the source link above for the full context.
Recent discussion
From the wider webFeatured in
Claude Code tools, plugins, and integrations
The best tools, MCP servers, and harnesses for getting more out of Claude Code - orchestration, observability, telemetry, and remote control.
Security tools for AI coding agents
Sandboxes, scanners, proxies, and governance toolkits that keep autonomous agents from doing damage.
Tools for OpenAI Codex CLI
The Codex-aware slice of the directory: orchestration, observability, sandboxes, and bridges built specifically for the OpenAI Codex runtime.
Related entries
AgentBox - SDK to run coding agents in any sandbox
One SDK to run Claude Code, Codex, or OpenCode inside Docker, E2B, Modal, Daytona, or Vercel sandboxes - boots each agent's native server (JSON-RPC, HTTP/SSE) instead of using non-interactive --print mode.
wanman - worktree-isolated multi-agent runtime for Claude Code and Codex
Multi-agent runtime that spawns each Claude Code or Codex agent in its own git worktree and home directory. JSON-RPC subprocess control, task pooling, artifact storage. Solves the share-a-directory failure mode that breaks most multi-agent harnesses.
PostTrainBench - can a CLI agent post-train a base LLM in 10 hours?
Benchmark measuring whether Claude Code, Codex CLI, Gemini CLI, and OpenCode can autonomously improve 4 small base models (Qwen3-1.7B/4B, SmolLM3-3B, Gemma-3-4B) on 7 evals (AIME, BFCL, GPQA, GSM8K, HealthBench, HumanEval, Arena Hard) within a single H100 GPU and 10 hours. Includes agent-as-judge anti-reward-hacking and baseline-replacement penalties for tampering.
pentest-ai-agents - Claude Code subagents for offensive security
Specialized Claude Code subagents that turn the CLI into a pentest assistant: plan engagements, analyze recon, research exploits, build detections, audit STIGs, and write reports.