DeepZero - automated kernel driver vuln research

Vulnerability research framework that parses, decompiles, and analyzes Windows kernel drivers for exploitable IOCTLs using AI agents. Sleep through fuzzing campaigns.

Saved Apr 28, 20264 min readView source ↗

#python #agent-security #ai-agent #reverse-engineering #automation

DeepZero is a vulnerability research pipeline engine - the kind of tool you'd want to build once and reuse forever, but most teams never get around to building. Define your pipeline as YAML, point it at a corpus of files (Windows kernel drivers, in the canonical example), and DeepZero handles orchestration, parallelism, fault tolerance, and resumable state. The actual analysis - decompilation, semgrep rules, LLM assessment - is plugged in as composable stages.

The framing in the discovery entry is the right one: sleep through fuzzing campaigns. The point of the engine is that the human-attention parts (writing the rules, designing the prompts) and the patient-machine parts (running them across thousands of files) get cleanly separated.

What it does

Six properties stack:

Pipeline-as-YAML - chain ingest, filter, transform, and LLM-assess stages declaratively. Same shape across different research projects.
Parallel execution - ThreadPoolExecutor with configurable concurrency per stage.
Resumable runs - atomic per-sample state on disk. Ctrl+C the run, come back later, re-execute the same command to pick up where you left off. The right behaviour for any pipeline that takes hours.
LLM integration via LiteLLM - any provider works. Jinja2 prompt templates, swappable models.
REST API for querying run state and sample data over HTTP (the README marks it as work-in-progress).
Extensible - custom processors are plain Python classes, referenced by path in YAML.

The shape is more "Apache Airflow for vulnerability research" than "yet another fuzzer." That distinction is what makes it interesting - the actual research happens in the processors and prompt templates, not in DeepZero itself.

Quick start

git clone https://github.com/416rehman/DeepZero.git
cd DeepZero
pip install -e .
cp .env.example .env

deepzero run C:\drivers -p .\pipelines\loldrivers\pipeline.yaml

Python 3.11+ required. The shipped example pipeline (pipelines/loldrivers/) is a real BYOVD - "bring your own vulnerable driver" - kernel-driver research workflow. It's the canonical case the project is built around.

What ships in the box

The shipped processors are themselves a useful map of how the engine is meant to be used:

pe_ingest/ - PE header parser and driver metadata extractor. The IngestProcessor shape - reads input files, emits structured samples downstream.
loldrivers_filter/ - hash exclusion filter against the loldrivers.io database. The MapProcessor shape - drop samples that match known-vulnerable signatures because someone else already found them.
ghidra_decompile/ - Ghidra headless decompiler. MapProcessor again. Decompiled code becomes the input to the LLM stage.
semgrep_scanner/ - batch semgrep with bundled rules. BulkMapProcessor shape - operates on batches rather than one sample at a time.
pipelines/loldrivers/ - the assembled pipeline. pipeline.yaml, assessment.j2 (the Jinja2 LLM prompt), and a rules/ directory of Semgrep rules.

That set is enough to do real research on Windows kernel drivers without writing a line of glue. It's also a reasonable template for anything else that fits the "ingest, filter, transform, assess" shape - smart contract audits, firmware analysis, large-scale codebase reviews.

The processor protocol

Custom processors are referenced by path in YAML and instantiated as Python classes. The README's repo structure shows three flavours:

IngestProcessor - turns raw inputs into samples.
MapProcessor - one sample in, one sample out.
BulkMapProcessor - batch in, batch out.

Plus a reduce shape for aggregation. Each is a Python class you write once and reference from any pipeline that needs it.

Why the resumability matters

Vulnerability research pipelines are almost never short. A typical BYOVD run touches thousands of drivers, decompiles each, runs Semgrep rules across the decompiled output, then prompts an LLM for an assessment of every survivor. Costs and runtime grow with corpus size. The right answer for that shape is atomic per-sample state and idempotent re-runs - which is what DeepZero gives you.

Ctrl+C in the middle, re-run the same command, the engine skips everything already complete. Add a new stage to the pipeline, only the new stage runs against existing samples. Replace a prompt, the LLM stage reruns but the decompilation stage doesn't.

When to reach for it

You have a corpus of files and a research question that needs LLM assessment at the end of a longer pipeline.
You've been running ad-hoc Bash + Python scripts and they keep failing partway through and losing state.
You want to share a pipeline with a teammate as a YAML + processor-class bundle, not a README.md of "run these commands in order."

When not to

One-off analysis. Writing the YAML and the processor classes is overhead that doesn't pay off for a single run.
Workflows that don't have a structured corpus to iterate over. DeepZero's value is the per-sample resumability; if you're not iterating over samples, you're paying complexity for nothing.
Production exploitation tooling. This is research scaffolding - it ends with "here's a candidate finding to investigate by hand," not "here's an exploit."

Trade-offs

Windows + Linux are the documented platforms; macOS isn't called out specifically. The kernel-driver focus of the shipped example pipeline is real, but the engine itself is platform-agnostic - if your corpus is Linux ELF binaries or Solidity contracts, the same shape applies, you just write different processors.

The REST API is marked work-in-progress. Don't depend on it for anything that needs to be reliable today.

CI runs on Python 3.11 and 3.12. MIT licensed. Documentation lives at blog.ahmadz.ai/DeepZero/ and goes deeper on the architecture, schemas, CLI, and processor authoring than the README does. If you're evaluating whether DeepZero fits a research project, the docs are the right next step.

Featured in

Security tools for AI coding agents
Sandboxes, scanners, proxies, and governance toolkits that keep autonomous agents from doing damage.

Related entries

GitHub Tool

code-on-incus - per-agent isolated VMs with active defense

Gives each AI agent its own Incus machine with root, Docker, and systemd. Built-in detector stops threats automatically when an agent goes off-script.

#claude-code #agent-security #self-hosted #python

GitHub Tool

sandstorm - run Claude agents in cloud sandboxes

FastAPI service for running Claude Code agents in secure E2B cloud sandboxes via API, CLI, or Slack. Single call, full agent, no infrastructure.

#claude-code #agent-security #python #self-hosted

GitHub Tool

pipelock - MCP firewall for AI agents

Go-based agent firewall that controls egress from MCP servers, blocking SSRF, DLP leaks, and prompt-injection vectors at the network layer. Acts as a fetch proxy for tool calls.

#mcp #agent-security #go #firewall #ai-agent

GitHub Tool

skylos - PR gate for AI-generated code

CLI that gates pull requests by detecting dead code, leaked secrets, and AI-code regressions across Python, TS/JS, Java, and Go. Designed to catch the failure modes of AI-generated PRs.

#agent-security #code-quality #cli #python #developer-tools