pentest-ai-agents - Claude Code subagents for offensive security
Specialized Claude Code subagents that turn the CLI into a pentest assistant: plan engagements, analyze recon, research exploits, build detections, audit STIGs, and write reports.
Tag
68 entries tagged with #ai-agent.
Tools and libraries that build, run, or interface with autonomous AI agents.
Specialized Claude Code subagents that turn the CLI into a pentest assistant: plan engagements, analyze recon, research exploits, build detections, audit STIGs, and write reports.
Self-hostable platform for building, running, and sharing AI workspace agents and apps with any model. No vendor lock-in - bring your own LLM provider or run local.
Python agent runtime and framework aimed at production agentic systems. Early but already has 800+ stars and a clear shape around runtime primitives.
Open-source hub that connects to Claude Code, Codex, Hermes, OpenClaw, and other agent runtimes - local or on remote machines - through a single chat UI. Less workflow-tied than Conductor.
Cross-tool plugin for Claude Code, Codex CLI, Cursor, and OpenCode CLI that injects an optional 'inner monologue' track alongside normal output. The model decides whether and how to use it.
Always-on personal agent harness powered by Claude Code with Discord, Telegram, and built-in web UI front-ends. The 'phone in your pocket runs an agent' setup.
Chrome extension and MCP server that lets coding agents drive web tasks by calling site APIs instead of clicking through the DOM. Targets the brittleness of Playwright-style browser automation.
MCP server that drives an actual Chrome instance via the Chrome DevTools Protocol with page scanning, screenshots, and physical input simulation for agents.
All-in-one network protocol toolkit with browser capture, MITM proxy, JS hooks, fingerprint spoofing, and an MCP server so agents can drive the analysis directly.
MCP-compatible spec defining four endpoints (capabilities, workflows, execute, assess-risk) so agents can prove a shipped change satisfies business requirements before it goes live.
Public benchmark that tests an agent at the moment it's about to take a high-impact legal action. Same harness, baseline vs verified, measures unjustified action drops and goal-completion gains.
Go-based agent firewall that controls egress from MCP servers, blocking SSRF, DLP leaks, and prompt-injection vectors at the network layer. Acts as a fetch proxy for tool calls.
TypeScript CLI exposing 100+ Figma read/write commands, giving AI agents full control to create shapes, components, styles, and exports without the Figma plugin sandbox.
Experiment from the Browser Use team that replaces Playwright with raw Chrome DevTools Protocol and lets the agent write its own tools. ~600 lines, no framework lock-in.
Lets you control AI agents running on your computer from your phone, and gives those agents access to phone-side capabilities (push, SMS, calendar, contacts, location). Supports 15+ agent CLIs across Linux, Windows, and macOS.
Chrome extension paired with a CLI that gives AI agents full browser control: tabs, DOM, navigation, and automation. Aimed at agent-driven web tasks rather than human-recorded scripts.
Zero-config CLI that exposes Chrome to AI agents over a uniform interface. Designed to plug into any agent runtime without per-agent configuration.
Official Java SDK implementing the Agent2Agent (A2A) protocol for inter-agent messaging and capability discovery. Provides client and server implementations for JVM agent stacks.
Curated list of agent harnesses, orchestrators, and coding-agent runtimes. Useful index for evaluating multi-agent infrastructure projects.
Self-hostable gateway that routes requests across Anthropic, OpenAI, and other LLM providers with API-key management, analytics, and per-team policies. Designed for multi-provider agent deployments.
Aggregated repo of 200+ production open-source Rails apps and engines, intended as a corpus for AI agents to search for real-world architectural patterns. Acts as a grounding dataset rather than a tutorial.
30+ tools that extend Xcode's iOS Simulator: testing, debugging, network monitoring, captures, accessibility, and a CLI that lets AI agents drive simulator actions. Used by 80k+ iOS developers.
Open-source coding agent that scored 65.2% on TerminalBench with Gemini 3 flash, beating Junie CLI and Google's official harness. Run leaderboard-compliant with full transcripts and no AGENTS.md tricks.
CLI that gives AI coding agents persistent recall across sessions through progressive memory snapshots. Aimed at workflows where context is lost between agent runs.
Security agent that runs scanner agents to surface candidate vulnerabilities, then has validator agents reproduce each one against a running instance. Outputs only confirmed exploitable findings.
Open-source Playwright library for AI-driven browser regression testing with intelligent caching, auto-healing locators, and multi-model verification. Designed to keep flaky AI tests stable across model versions.
High-performance Bun job queue with SQLite persistence, dead-letter queue, cron scheduling, and S3 backups. Marketed as BullMQ alternative for AI agent workloads.
Open-source harness that pulls coding tasks from Linear, runs them in isolated cloud sandboxes, and opens PRs for human review. Built to manage many concurrent agent jobs without local worktree juggling.
A deliberately wasteful Claude Code skill for stress testing, inflating metrics, or just burning budget. Useful for testing observability dashboards and rate-limit handling.
Nix-based Linux distribution purpose-built for running AI agents. Hardened defaults and an immutable base aimed at sandboxing autonomous coding agents.
React component library for visualizing distributed traces from AI agents. Drop-in widgets for timelines, span trees, and tool-call breakdowns from LangChain or custom runtimes.
Rust SDK for driving Windows applications with native UI Automation, designed as a Playwright-style API for AI agents. Lets LLMs click, type, and read state across desktop apps.
Zero-dependency Go dashboard for OpenClaw AI agents covering cost tracking, token usage, and per-agent monitoring. Single-binary deploy.
CLI that exports SF Symbols as true vector SVG, PDF, or PNG by walking macOS's private symbol renderer. Designed so an agent can fetch icon assets autonomously during design sessions.
Swift CLI that lets your agent read and send iMessages and SMS through Apple's Messages.app. Useful for routing notifications or two-factor codes back into a coding session.
TypeScript browser harness that lets an LLM complete arbitrary tasks with automatic recovery when selectors break or pages restructure. From the browser-use team.
TypeScript agent framework built around DeepSeek with cache-first loops, R1 thought harvesting, and tool-call repair. Ships with an Ink-based TUI for runtime inspection.
Terminal-native coding agent powered by Moonshot's Kimi K2.6 model. TypeScript-based alternative to Claude Code or Codex CLI for users who want to drive Kimi from the shell.
Chrome extension for clicking elements in a running app, leaving comments, and shipping the annotated context back to an AI agent for fixes. Closes the loop between UI bug reports and code edits.
Push-to-talk dictation app for macOS that runs entirely on local models, no data leaves the machine. Designed to drive coding agents and email by voice.
Graph-native infrastructure for storing, enriching, and retrieving structured agent context. Provides semantic retrieval and portable context cores you can move between agent runtimes.
TypeScript harness that wraps Claude Code with spec-driven plans, enforced quality gates, and persistent project knowledge. Targets teams shipping production code with the agent rather than prototyping.
Documentation system designed to be read and rewritten by coding agents instead of humans. Stores knowledge in a format that survives long agent sessions.
Curated list of self-improvement loops, research agents, and autoresearch systems following Karpathy's framing. Useful index when designing multi-step agent harnesses.
MCP server that records user browser interactions and lets agents replay them as automation scripts. Bridge between human demonstrations and agent execution.
Python SDK for authoring MCP servers with batteries for tool registration, auth, and apps-sdk style flows. Aims to be the universal scaffold for new MCPs.
Rust-based desktop app that lets Claude drive your terminal, browser, mouse, and keyboard via the Anthropic computer-use API. Single binary, multi-model.
Docker's collection of ready-to-use Compose stacks for orchestrating open-source LLMs, tools, and agent runtimes. Useful starting points for self-hosted setups.
Shell installer that deploys n8n, Ollama, Flowise, Supabase, RAG stack, and 30+ tools behind auto-HTTPS. Self-hosted Zapier or Make alternative.
Lightweight library for sandboxing Node.js code execution from agents without containers or VMs, using runtime isolation. Built for code interpreter use cases.
Lightweight Go CLI for defining focused AI agents in TOML and triggering them from pipes, git hooks, cron, or the terminal. No framework, just unix.
Rust gateway and debugger for AI agent traffic across Anthropic, OpenAI, Azure, Gemini, DeepSeek, and others. Trace and inspect tool calls in flight.
NASA JPL agent that lets developers inspect, diagnose, and operate ROS1/ROS2 robotics systems through natural language. Bridges LLMs with the ROS toolchain.
TypeScript SDK for building agents with a code-first composition model: tools, skills, and MCP servers wire together as plain modules. Ships an agent loop you control.
Vulnerability research framework that parses, decompiles, and analyzes Windows kernel drivers for exploitable IOCTLs using AI agents. Sleep through fuzzing campaigns.
Lets agents interact with programs that expect a human at the keyboard - REPLs, debuggers, TUI apps - things bash pipes cannot reach. Fills the gap between shell and full computer-use.
Small TypeScript library for ReAct-style agent loops on the Bun stack. Tools, skills, and a coding-focused harness in a minimal package.
Wraps any command-line tool as a typed JavaScript API agents can call directly. Saves writing a custom MCP for every CLI you want to expose.
Self-hosted email client with an embedded AI agent, running entirely on Cloudflare Workers. No backend to manage, edge-distributed by default.
Open-source Rust headless browser built for AI agents and scraping. Lower memory and faster cold starts than Chromium-based stacks like Puppeteer and Playwright.
Aggregates many MCP servers behind one endpoint. Acts as an MCP gateway/proxy so clients only configure a single server.
MCP client with persistent sessions, stdio + HTTP transports, OAuth 2.1, JSON output for code mode, and a sandbox proxy. Calls any MCP server from a shell.
Runtime adapter that exposes any MCP, OpenAPI, or GraphQL server as a flat CLI. Zero codegen, zero rebuild - handy for shell scripts and agent toolchains.
AI-agent engine for Java apps. CLI plus REST API that wraps the Claude Code execution model so you can drop it into any JVM service.
Multi-bot, multi-engine Telegram bridge with per-bot personality, budget caps, streaming, session resume, and an Agent Bus for parallel pipelines.
Espressif's chat-coding agent framework for ESP32 devices. Brings tool-calling LLM agents to embedded targets with C-level memory budgets.
Go-based MCP server that gives agents read/write Figma access without rate limits. Text-to-design and design-to-code in one binary.
C# CLI built specifically for agents to read, edit, and automate Office files. Single binary, no Office install required.