Discovery
Back to browse
GitHubToolFeatured

ThinkWatch - enterprise AI and MCP bastion host

Rust gateway in front of OpenAI, Anthropic, Gemini, and self-hosted LLMs (plus MCP servers) with RBAC, audit logs, rate limits, and cost tracking. The boring layer enterprises actually need.

5 min readView source ↗

Most "AI gateway" projects on GitHub are LLM proxies with a logging tab. ThinkWatch is the boring, enterprise-shaped version: a Rust gateway that sits in front of OpenAI, Anthropic, Gemini, Azure OpenAI, AWS Bedrock, and your MCP servers, and refuses to be the weak link in any of them.

The framing the README uses is the right one: "Just as an SSH bastion is the single gateway through which all server access must flow, ThinkWatch is the single gateway through which all AI access must flow. Every model request. Every tool call. Every token. Authenticated, authorized, rate-limited, logged, and accounted for."

It's the layer most companies discover they need around month four of agent rollout, after API keys have been scattered across .env files, no one can explain the monthly bill, and someone in legal asks for an audit trail.

Architecture in two ports

                    ┌──────────────────────────────────────┐
 Claude Code ─────> │                                      │ ─> OpenAI
 Cursor ─────────>  │   Gateway  :3000                     │ ─> Anthropic
 Custom Agent ─────>│   AI API + MCP Unified Proxy         │ ─> Google Gemini
 CI/CD Pipeline ─>  │                                      │ ─> Azure OpenAI / AWS Bedrock
                    └──────────────────────────────────────┘
                    ┌──────────────────────────────────────┐
 Admin Browser ───> │   Console  :3001                     │
                    │   Management UI + Admin API          │
                    └──────────────────────────────────────┘

The dual-port design is deliberate. Port 3000 is public-facing model traffic. Port 3001 is the admin console, internal-only, with CSP headers, session-IP binding, and JWT entropy enforcement. You expose 3000 to your developers and your CI; you keep 3001 inside the VPC.

What the gateway does on a single request

Drop-in compatible with the OpenAI and Anthropic SDKs - it serves OpenAI Chat Completions (/v1/chat/completions), Anthropic Messages (/v1/messages), and OpenAI Responses (/v1/responses) on the same port, and converts between formats automatically for Gemini, Azure, and Bedrock. Cursor, Continue, Cline, and Claude Code all see what looks like the upstream they expect.

Issued credentials are virtual tw- keys with a surfaces allowlist - the same token can be valid on the AI gateway, the MCP gateway, both, or neither. Rotation has grace periods; expiry warnings fire ahead of time; inactivity timeouts close abandoned keys. Plaintext is shown exactly once at issuance; storage is SHA-256 hashed.

The rate limit and budget engine

This is the part most homegrown gateways get wrong. ThinkWatch runs two parallel quota systems on every request:

  • Sliding-window rate limits - rolling 60-bucket windows at 1m / 5m / 1h / 5h / 1d / 1w. Counts requests OR weighted tokens. Pre-flight for requests, post-flight for tokens.
  • Natural-period budget caps - calendar-aligned daily / weekly / monthly. Counts weighted tokens only. Soft cap (one request can push you past the line).

Subjects stack. A single request resolves to multiple (subject_kind, subject_id) tuples - typically api_key + user + provider for AI traffic, user + mcp_server for MCP traffic - and every enabled rule runs against all of them in one atomic Lua check. Any rule rejecting kills the whole request. The rejection includes the offending rule label in the body (user:requests/1m, provider:requests/1h, etc.).

The example from the README spells out a realistic policy:

On the developer USER subject:
  rate_limit_rule  ai_gateway / requests / 60s   -> 60
  rate_limit_rule  ai_gateway / tokens   / 1d    -> 1_000_000
  budget_cap       monthly                       -> 20_000_000

On the OpenAI PROVIDER subject:
  rate_limit_rule  ai_gateway / requests / 1h    -> 100_000

Three different "tokens" - don't confuse them

NumberSourceUsed for
Raw tokensprovider-billed input_tokens / output_tokensanalytics, cost reports
Weighted tokensraw × input_multiplier / output_multiplier per modelquota accounting (rate limits + budgets)
USD costraw × input_price / output_price per modelbilling

The multipliers default to 1.0 (quotas count raw tokens), but the right move is to tune them so a single gpt-4o burst can't blow through a monthly cap meant for everyday work.

Failure mode

Redis is the rate-limiter's backing store. When it's unavailable, the engine fails open by default and bumps gateway_rate_limiter_fail_open_total so you can see it happened. Operators who'd rather refuse traffic than miss accounting can flip security.rate_limit_fail_closed = true on the Settings page; the gateway then returns 429 (rate_limiter_unavailable) for any request it couldn't check. Pick the failure mode that matches your incident posture - the right answer differs by team.

Budget alerts fire at 50% / 80% / 95% / 100% of any cap, each at most once per period bucket. If a single response takes you from 60 straight past 100, the 80, 95, and 100 lines all log on that one response - subsequent requests in the same period don't re-fire.

Security and compliance

The README's checklist is unusually thorough for an open-source project:

  • 5-tier RBAC (Super Admin / Admin / Team Manager / Developer / Viewer)
  • SSO/OIDC via Zitadel, Okta, Azure AD, or any OIDC-compliant provider
  • AES-256-GCM encryption for provider keys at rest
  • Soft-delete with 30-day automatic purge for users, providers, and API keys
  • Distroless containers (~2MB runtime image, no shell)
  • Session IP binding so admin tokens can't be replayed from another network
  • Password complexity enforcement, JWT entropy enforcement, CSP headers, security headers, CORS whitelisting
  • Startup dependency validation - PostgreSQL, Redis, encryption key, all checked before traffic is accepted

Observability story

Prometheus metrics on the gateway port, but disabled by default until you set METRICS_BEARER_TOKEN. When unset, the route returns 404 and the recorder isn't installed - zero memory or CPU cost. Pass the same value as Authorization: Bearer <token> from your scraper.

Audit logs are ClickHouse-backed for SQL-queryable analytics across columnar storage, with multi-channel forwarding to UDP/TCP Syslog (RFC 5424), Kafka, or HTTP webhooks. Pipe to your SIEM, your data lake, or your alerting pipeline. Health probes are split into /health/live (liveness), /health/ready (readiness, including verifying that at least one provider is active so K8s won't route to a fresh pod with an empty router), and /api/health for detailed latency.

When to reach for it

  • Multi-team or multi-tenant agent rollouts. Anything where "who used which model and how much" needs an answer that's not "look at the credit card statement."
  • Compliance-driven environments needing audit trails for AI-assisted code generation or data access.
  • Cost-attribution problems - team budgets, per-key caps, per-provider ceilings.

When not to

  • Solo developers or small teams. The setup overhead (PostgreSQL + Redis + ClickHouse + the gateway itself) is real, and the value scales with the number of users you're governing.
  • Teams that need a fully managed gateway with someone else's SLA. ThinkWatch is self-hosted.

Summary

The Rust + React + PostgreSQL + Redis + ClickHouse stack is heavy, but every piece is justified by a feature you'd otherwise build yourself worse. If you're at the point where you're considering writing a gateway, this is the one to compare against before you spend a quarter doing it.

Featured in

Related entries