Discovery
Back to browse

SmolVM - one-command sandbox for Claude Code and Codex

Pre-installed sandboxed VM with Claude and Codex ready to run, plus git credentials wired up. Removes the 'press enter to accept' loop without exposing the host.

View source ↗

This entry doesn't have a long-form writeup yet. Follow the source link above for the full context.

Recent discussion

From the wider web

Featured in

Related entries

GitHubToolFeatured

wanman - worktree-isolated multi-agent runtime for Claude Code and Codex

Multi-agent runtime that spawns each Claude Code or Codex agent in its own git worktree and home directory. JSON-RPC subprocess control, task pooling, artifact storage. Solves the share-a-directory failure mode that breaks most multi-agent harnesses.

Why I saved this - The 'one-man train' framing is load-bearing: humans observe rather than approve every step. Worktree-per-agent isolation is the upgrade most multi-agent harnesses skip.
GitHubToolFeatured

PostTrainBench - can a CLI agent post-train a base LLM in 10 hours?

Benchmark measuring whether Claude Code, Codex CLI, Gemini CLI, and OpenCode can autonomously improve 4 small base models (Qwen3-1.7B/4B, SmolLM3-3B, Gemma-3-4B) on 7 evals (AIME, BFCL, GPQA, GSM8K, HealthBench, HumanEval, Arena Hard) within a single H100 GPU and 10 hours. Includes agent-as-judge anti-reward-hacking and baseline-replacement penalties for tampering.

Why I saved this - Current leader: Opus 4.6 via Claude Code at 23.2 average. The reward-hacking safeguards (eval tampering and model-substitution detection, baseline-replacement penalty) are the part most agent benchmarks skip.