Discovery
Back to browse
GitHubToolFeatured

llm-openai-via-codex - reuse a Codex subscription as an LLM backend

Simon Willison's plugin for the `llm` CLI that routes calls through an existing OpenAI Codex subscription. Lets you use Codex-tier models from any `llm`-aware tool.

2 min readView source ↗

A small, very Simon-Willison plugin that does exactly one thing: lets the llm CLI route calls through your existing OpenAI Codex subscription. If you already pay for Codex, this turns that subscription into a backend for any tool that speaks llm.

Worth saying up front: per Simon's blog post and the linked tweet from Romain Huet, this is explicitly OK with OpenAI - it's not a workaround or a grey-zone hack. You're still using your own subscription against your own account, just from a different client.

What it actually does

It registers a model namespace (openai-codex/...) in the llm CLI. Calls to that namespace go through the local Codex CLI's authentication, so there's no separate API key to manage. Whatever models your Codex subscription gives you become callable from any llm-aware tool - one-shot prompts, pipelines, scripts, llm chat, the works.

Quick start

The plugin lives on PyPI and installs into the same environment as llm:

llm install llm-openai-via-codex

You also need the OpenAI Codex CLI installed and authenticated separately - that's where the actual auth lives. The plugin doesn't ship its own credentials.

List the models your Codex subscription gives access to:

llm models -q openai-codex

Then prompt:

llm -m openai-codex/gpt-5.5 'Generate an SVG of a pelican riding a bicycle'

That's the whole interface.

Why it's interesting

Two reasons.

First, the llm CLI is the cleanest way to compose model calls into shell pipelines. Once Codex is reachable from llm, it's reachable from everything that already uses llm - templated prompts, background jobs, the llm chat REPL, the SQLite log database, the plugin ecosystem. You don't have to pick which interface gets your Codex subscription; they all do.

Second, it's a useful pattern for any subscription-backed model. The plugin source is short enough to read in a sitting and adapt - if you want to do the same trick for some other CLI-authenticated model provider, this is the template.

Development

The repo uses uv:

cd llm-openai-via-codex
uv run pytest
uv run llm -m models
uv run llm -m openai-codex/gpt-5.5 'Talk to me in Swedish'

When to reach for it

  • You already pay for Codex and want to use it from anything that isn't the Codex CLI.
  • You're building shell pipelines or batch jobs that call models, and llm is your interface of choice.
  • You want a stable, auditable way to prompt Codex models from a script without juggling API keys.

When not to

  • You don't have a Codex subscription. The plugin doesn't issue one - it forwards yours.
  • You need streaming UIs, tool-use, or long-running agent loops. llm is a CLI; this isn't the right shape for an interactive coding agent.

Limits

The plugin is small and the README is short for a reason - it does one thing well and doesn't try to be a Codex client. Anything fancier (multi-turn tool use, file editing, MCP servers) belongs in the Codex CLI itself, not here. Apache 2.0 licensed.

Featured in

Related entries

GitHubToolFeatured

wanman - worktree-isolated multi-agent runtime for Claude Code and Codex

Multi-agent runtime that spawns each Claude Code or Codex agent in its own git worktree and home directory. JSON-RPC subprocess control, task pooling, artifact storage. Solves the share-a-directory failure mode that breaks most multi-agent harnesses.

Why I saved this - The 'one-man train' framing is load-bearing: humans observe rather than approve every step. Worktree-per-agent isolation is the upgrade most multi-agent harnesses skip.
GitHubToolFeatured

PostTrainBench - can a CLI agent post-train a base LLM in 10 hours?

Benchmark measuring whether Claude Code, Codex CLI, Gemini CLI, and OpenCode can autonomously improve 4 small base models (Qwen3-1.7B/4B, SmolLM3-3B, Gemma-3-4B) on 7 evals (AIME, BFCL, GPQA, GSM8K, HealthBench, HumanEval, Arena Hard) within a single H100 GPU and 10 hours. Includes agent-as-judge anti-reward-hacking and baseline-replacement penalties for tampering.

Why I saved this - Current leader: Opus 4.6 via Claude Code at 23.2 average. The reward-hacking safeguards (eval tampering and model-substitution detection, baseline-replacement penalty) are the part most agent benchmarks skip.