Discovery
Back to browse

Obscura - Rust headless browser for AI agents

Open-source Rust headless browser built for AI agents and scraping. Lower memory and faster cold starts than Chromium-based stacks like Puppeteer and Playwright.

4 min readView source ↗

Obscura is a headless browser engine written in Rust, built specifically for the workloads that wreck Chrome - large-scale scraping, AI agent automation, and stealthy reconnaissance. It runs real JavaScript via V8, speaks the Chrome DevTools Protocol, and works as a drop-in replacement for headless Chrome behind both Puppeteer and Playwright.

The pitch in numbers (from the project's own benchmarks):

MetricObscuraHeadless Chrome
Memory30 MB200+ MB
Binary size70 MB300+ MB
Anti-detectbuilt-innone
Page load85 ms~500 ms
Startupinstant~2s
Puppeteeryesyes
Playwrightyesyes

The 6-7x memory reduction is what makes it practical to run dozens of concurrent browser sessions on a single VM - which is exactly the pattern you hit the moment you start scraping at any meaningful scale or running browsing agents in parallel.

Install

Single binary. No Chrome, no Node.js, no dependencies.

# Linux x86_64
curl -LO https://github.com/h4ckf0r0day/obscura/releases/latest/download/obscura-x86_64-linux.tar.gz
tar xzf obscura-x86_64-linux.tar.gz
./obscura fetch https://example.com --eval "document.title"

# macOS Apple Silicon
curl -LO https://github.com/h4ckf0r0day/obscura/releases/latest/download/obscura-aarch64-macos.tar.gz

# macOS Intel
curl -LO https://github.com/h4ckf0r0day/obscura/releases/latest/download/obscura-x86_64-macos.tar.gz

Windows downloads as a .zip from the releases page.

Or build from source (Rust 1.75+, first build ~5 minutes because V8 compiles from source - cached after):

cargo build --release
cargo build --release --features stealth

Three usage modes

One-shot fetch for quick scrapes:

obscura fetch https://example.com --eval "document.title"
obscura fetch https://example.com --dump links
obscura fetch https://news.ycombinator.com --dump html
obscura fetch https://example.com --wait-until networkidle0

CDP server when Puppeteer or Playwright drive the session:

obscura serve --port 9222
obscura serve --port 9222 --stealth

Parallel scrape when you have a list of URLs and want output:

obscura scrape url1 url2 url3 ... \
  --concurrency 25 \
  --eval "document.querySelector('h1').textContent" \
  --format json

Drop into Puppeteer or Playwright

Same code you'd use for headless Chrome, just point it at the Obscura CDP socket.

Puppeteer:

import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
  browserWSEndpoint: 'ws://127.0.0.1:9222/devtools/browser',
});

const page = await browser.newPage();
await page.goto('https://news.ycombinator.com');
const stories = await page.evaluate(() =>
  Array.from(document.querySelectorAll('.titleline > a'))
    .map(a => ({ title: a.textContent, url: a.href }))
);

Playwright:

import { chromium } from 'playwright-core';

const browser = await chromium.connectOverCDP({ endpointURL: 'ws://127.0.0.1:9222' });
const page = await browser.newContext().then(ctx => ctx.newPage());
await page.goto('https://en.wikipedia.org/wiki/Web_scraping');
console.log(await page.title());

The CDP coverage is the part to verify if you have an exotic dependency: Target, Page, Runtime, DOM, Network, Fetch (with live request interception), Storage, Input, plus an LP.getMarkdown extension that returns DOM-as-Markdown - useful for feeding pages to an LLM. Most Puppeteer/Playwright code paths just work.

Stealth mode

Build or run with stealth and you get two layers of evasion:

Anti-fingerprinting

  • Per-session randomization of GPU, screen, canvas, audio, and battery fingerprints
  • Realistic navigator.userAgentData (Chrome 145, high-entropy values)
  • event.isTrusted = true for dispatched events
  • Hidden internal properties so Object.keys(window) is safe
  • Function.prototype.toString() returns [native code] for native function masking
  • navigator.webdriver = undefined, matching real Chrome

Tracker blocking - 3,520 domains blocked by default. Analytics, ads, telemetry, fingerprinting scripts all blocked at the network layer; trackers don't even load.

This is the part to use ethically. Anti-detection is fine for sites you operate or have a legitimate research relationship with; using it to bypass rate limits on someone else's product is a different conversation.

Quick CLI reference

obscura serve flags:

FlagDefaultDescription
--port9222WebSocket port
--proxy-HTTP/SOCKS5 proxy URL
--stealthoffanti-detection + tracker blocking
--workers1parallel worker processes
--obey-robotsoffrespect robots.txt

obscura fetch <URL> flags:

FlagDefaultDescription
--dumphtmloutput: html, text, or links
--eval-JS expression to evaluate
--wait-untilloadload, domcontentloaded, networkidle0
--selector-wait for CSS selector
--stealthoffanti-detection mode

When to reach for it

  • High-concurrency scraping where Chrome's memory footprint is the bottleneck.
  • AI agents that need a real browser at scale - dozens of concurrent sessions per VM is realistic.
  • Workloads where startup time matters (per-task browsers, ephemeral lambdas, sandbox-per-request architectures).

When not to

  • Sites that need exotic Chrome features Obscura's CDP coverage doesn't include yet. Check the methods table before betting on it.
  • Workflows where you want the full Chromium DevTools UI for debugging. This is a headless engine; there's no UI surface.
  • One-off scraping where Puppeteer + system Chrome is already running and the memory isn't a problem.

Apache 2.0. The --obey-robots flag is opt-in, which is a practical choice for the use case but worth knowing - turn it on when you should be turning it on.

Recent discussion

From the wider web

Related entries