DeepZero - automated kernel driver vuln research
Vulnerability research framework that parses, decompiles, and analyzes Windows kernel drivers for exploitable IOCTLs using AI agents. Sleep through fuzzing campaigns.
DeepZero is a vulnerability research pipeline engine - the kind of tool you'd want to build once and reuse forever, but most teams never get around to building. Define your pipeline as YAML, point it at a corpus of files (Windows kernel drivers, in the canonical example), and DeepZero handles orchestration, parallelism, fault tolerance, and resumable state. The actual analysis - decompilation, semgrep rules, LLM assessment - is plugged in as composable stages.
The framing in the discovery entry is the right one: sleep through fuzzing campaigns. The point of the engine is that the human-attention parts (writing the rules, designing the prompts) and the patient-machine parts (running them across thousands of files) get cleanly separated.
What it does
Six properties stack:
- Pipeline-as-YAML - chain ingest, filter, transform, and LLM-assess stages declaratively. Same shape across different research projects.
- Parallel execution - ThreadPoolExecutor with configurable concurrency per stage.
- Resumable runs - atomic per-sample state on disk. Ctrl+C the run, come back later, re-execute the same command to pick up where you left off. The right behaviour for any pipeline that takes hours.
- LLM integration via LiteLLM - any provider works. Jinja2 prompt templates, swappable models.
- REST API for querying run state and sample data over HTTP (the README marks it as work-in-progress).
- Extensible - custom processors are plain Python classes, referenced by path in YAML.
The shape is more "Apache Airflow for vulnerability research" than "yet another fuzzer." That distinction is what makes it interesting - the actual research happens in the processors and prompt templates, not in DeepZero itself.
Quick start
git clone https://github.com/416rehman/DeepZero.git
cd DeepZero
pip install -e .
cp .env.example .env
deepzero run C:\drivers -p .\pipelines\loldrivers\pipeline.yaml
Python 3.11+ required. The shipped example pipeline (pipelines/loldrivers/) is a real BYOVD - "bring your own vulnerable driver" - kernel-driver research workflow. It's the canonical case the project is built around.
What ships in the box
The shipped processors are themselves a useful map of how the engine is meant to be used:
pe_ingest/- PE header parser and driver metadata extractor. TheIngestProcessorshape - reads input files, emits structured samples downstream.loldrivers_filter/- hash exclusion filter against theloldrivers.iodatabase. TheMapProcessorshape - drop samples that match known-vulnerable signatures because someone else already found them.ghidra_decompile/- Ghidra headless decompiler.MapProcessoragain. Decompiled code becomes the input to the LLM stage.semgrep_scanner/- batch semgrep with bundled rules.BulkMapProcessorshape - operates on batches rather than one sample at a time.pipelines/loldrivers/- the assembled pipeline.pipeline.yaml,assessment.j2(the Jinja2 LLM prompt), and arules/directory of Semgrep rules.
That set is enough to do real research on Windows kernel drivers without writing a line of glue. It's also a reasonable template for anything else that fits the "ingest, filter, transform, assess" shape - smart contract audits, firmware analysis, large-scale codebase reviews.
The processor protocol
Custom processors are referenced by path in YAML and instantiated as Python classes. The README's repo structure shows three flavours:
IngestProcessor- turns raw inputs into samples.MapProcessor- one sample in, one sample out.BulkMapProcessor- batch in, batch out.
Plus a reduce shape for aggregation. Each is a Python class you write once and reference from any pipeline that needs it.
Why the resumability matters
Vulnerability research pipelines are almost never short. A typical BYOVD run touches thousands of drivers, decompiles each, runs Semgrep rules across the decompiled output, then prompts an LLM for an assessment of every survivor. Costs and runtime grow with corpus size. The right answer for that shape is atomic per-sample state and idempotent re-runs - which is what DeepZero gives you.
Ctrl+C in the middle, re-run the same command, the engine skips everything already complete. Add a new stage to the pipeline, only the new stage runs against existing samples. Replace a prompt, the LLM stage reruns but the decompilation stage doesn't.
When to reach for it
- You have a corpus of files and a research question that needs LLM assessment at the end of a longer pipeline.
- You've been running ad-hoc Bash + Python scripts and they keep failing partway through and losing state.
- You want to share a pipeline with a teammate as a YAML + processor-class bundle, not a
README.mdof "run these commands in order."
When not to
- One-off analysis. Writing the YAML and the processor classes is overhead that doesn't pay off for a single run.
- Workflows that don't have a structured corpus to iterate over. DeepZero's value is the per-sample resumability; if you're not iterating over samples, you're paying complexity for nothing.
- Production exploitation tooling. This is research scaffolding - it ends with "here's a candidate finding to investigate by hand," not "here's an exploit."
Trade-offs
Windows + Linux are the documented platforms; macOS isn't called out specifically. The kernel-driver focus of the shipped example pipeline is real, but the engine itself is platform-agnostic - if your corpus is Linux ELF binaries or Solidity contracts, the same shape applies, you just write different processors.
The REST API is marked work-in-progress. Don't depend on it for anything that needs to be reliable today.
CI runs on Python 3.11 and 3.12. MIT licensed. Documentation lives at blog.ahmadz.ai/DeepZero/ and goes deeper on the architecture, schemas, CLI, and processor authoring than the README does. If you're evaluating whether DeepZero fits a research project, the docs are the right next step.
Featured in
Related entries
code-on-incus - per-agent isolated VMs with active defense
Gives each AI agent its own Incus machine with root, Docker, and systemd. Built-in detector stops threats automatically when an agent goes off-script.
sandstorm - run Claude agents in cloud sandboxes
FastAPI service for running Claude Code agents in secure E2B cloud sandboxes via API, CLI, or Slack. Single call, full agent, no infrastructure.
pipelock - MCP firewall for AI agents
Go-based agent firewall that controls egress from MCP servers, blocking SSRF, DLP leaks, and prompt-injection vectors at the network layer. Acts as a fetch proxy for tool calls.
skylos - PR gate for AI-generated code
CLI that gates pull requests by detecting dead code, leaked secrets, and AI-code regressions across Python, TS/JS, Java, and Go. Designed to catch the failure modes of AI-generated PRs.