ESP-Claw - AI agent framework for IoT
Espressif's chat-coding agent framework for ESP32 devices. Brings tool-calling LLM agents to embedded targets with C-level memory budgets.
ESP-Claw is Espressif's chat-coding agent framework for ESP32-class microcontrollers. The premise is bold and the execution is unusual: take the OpenClaw "agents talking to tools, evolving over time" pattern, reimplement it in C with embedded-friendly memory budgets, and run the whole loop - sensing, decision-making, and execution - locally on a $5 chip.
The framing the project leads with is the right one: traditional IoT stops at connectivity. Devices can connect, but they cannot think; they can execute commands, but they cannot decide. ESP-Claw moves the agent runtime down onto the device itself, turning passive executors into active decision-making nodes.
Four design ideas
The README organizes the project around four pillars worth understanding before you flash a board:
- Chat as Creation - device behavior is defined through conversation. IM chat plus dynamic Lua loading lets non-programmers describe what they want and have the device adopt the new behavior live.
- Event Driven - any sensor reading, MQTT message, or external event can trigger the agent loop. Response latency is in the milliseconds, not seconds, because the model call isn't always in the loop.
- Structured Memory - memories are organized in a structured form, not dumped into a prompt. Privacy stays on-device.
- MCP Communication - speaks the standard Model Context Protocol both as server (exposes the device's tools to upstream agents) and as client (consumes tools from other MCP servers).
Quick start: flash from the browser
Several supported boards (M5Stack CoreS3, common breadboards, and others under application/edge_agent/boards/) can be flashed entirely from the browser - configuration and flash, no toolchain install required.
The flow:
- Visit the project's online flashing page.
- Plug your ESP32-S3 board in via USB.
- Pick the right board profile.
- Configure your LLM provider and IM platform.
- Flash. Done.
For unsupported boards or for ESP32-P4 (and friends), build locally from ./application/edge_agent/ using the standard ESP-IDF flow. The README points to the project docs for board adaptation specifics.
Supported LLMs and IMs
LLM providers - OpenAI-style and Anthropic-style APIs both work natively. Tested providers include OpenAI's GPT models, Alibaba Cloud Bailian's Qwen, Anthropic's Claude, and DeepSeek. Custom endpoints are supported.
The README's recommendation for the self-programming loop to actually work: a model with strong tool use and instruction following. They suggest gpt-5.4, qwen3.6-plus, claude4.6-sonnet, deepseek-v4-pro, or anything of comparable capability. Smaller models can drive the simpler event loops but won't reliably do the chat-as-creation Lua-modification path.
IM platforms - Telegram, QQ, Feishu, and WeChat are supported out of the box. The IM layer is extensible, so adding others is a known path.
What "Chat as Creation" actually means
The interesting bit. Most IoT-meets-LLM projects use the LLM to translate intent into pre-baked function calls. ESP-Claw goes further: when the user describes a behavior the device doesn't already implement, the agent generates new Lua to define it, and that Lua is loaded into the device runtime live - no firmware reflash, no compile cycle.
That's the part that's hard to do in C on an ESP32-class device with a few hundred KB of RAM. The Lua VM, the agent loop, the memory store, and the IM transport all have to fit in budget while still leaving headroom for whatever the user invents in chat.
Why this is interesting beyond IoT
It's a real-world demonstration that the "agent as runtime" pattern scales down to embedded targets. If it works on an ESP32-S3, the same architecture is viable on every embedded surface that has enough flash to hold a Lua VM and a network stack - which is most of them, at this point.
The companion projects worth knowing about:
- OpenClaw - the original concept ESP-Claw inherits the vocabulary and design from.
- MimiClaw - separate ESP32 implementation that ESP-Claw drew on for agent loop, IM, and related capabilities on-chip.
When to reach for it
- You're building consumer IoT products where the LLM-as-controller experience is the differentiator.
- You want a research platform for "agents on the edge" without building a firmware framework from scratch.
- You're already in the Espressif ecosystem and want first-party agent tooling.
When not to
- Devices that aren't ESP32 series. The implementation is C plus Espressif-specific drivers; porting is non-trivial.
- Workloads where you need the agent to operate without a network. The LLM call still goes out to a cloud provider unless you front-end it with a local model gateway.
- Anything where guaranteed latency or hard-real-time behavior matters. The agent loop fires fast in normal operation, but a model call is still a model call.
Trade-offs and rough edges
The project is under active development - the README warns that the demo is migrating from basic_demo to edge_agent, and PRs against the new path are preferred. Expect the API surface to keep moving.
There's a privacy story baked in (structured memory stays on-device), but the LLM call itself is whatever the configured provider sees. If you care about not exposing prompts to a cloud provider, point ESP-Claw at a self-hosted endpoint. The custom-endpoint support is exactly that escape hatch.
Espressif maintains it. License is in-repo (Apache-style based on the badge). For a category that's ~98% prototypes, having a chip vendor ship a real, supported framework with browser-flashing and chat-as-creation working out of the box is the part that makes this worth taking seriously.
Related entries
Foundry - foundation layer for agentic intelligence
Python agent runtime and framework aimed at production agentic systems. Early but already has 800+ stars and a clear shape around runtime primitives.
OQP - verification protocol for AI agents
MCP-compatible spec defining four endpoints (capabilities, workflows, execute, assess-risk) so agents can prove a shipped change satisfies business requirements before it goes live.
LABE - legal action boundary eval
Public benchmark that tests an agent at the moment it's about to take a high-impact legal action. Same harness, baseline vs verified, measures unjustified action drops and goal-completion gains.
a2a-java - Java SDK for Agent2Agent protocol
Official Java SDK implementing the Agent2Agent (A2A) protocol for inter-agent messaging and capability discovery. Provides client and server implementations for JVM agent stacks.