MCPMark - stress-testing MCP benchmark
Benchmark harness that evaluates models and agents on real-world MCP usage. Comparable scores across servers and frontier models.
This entry doesn't have a long-form writeup yet. Follow the source link above for the full context.
Featured in
Related entries
trace-mcp - framework-aware codebase MCP for coding agents
MCP server with 138 tools and cross-language framework awareness (58 integrations across 81 languages). Indexes Laravel/Inertia/Vue, Rails/Hotwire, Django/HTMX edges so agents skip re-deriving call graphs. Decision memory links architectural choices to the code they're about. Local-first ONNX embeddings, optional LSP enrichment.
mcptube - Karpathy-style LLM wiki for YouTube
MCP server that turns YouTube videos into a persistent, merging wiki rather than ephemeral vector chunks. Scene-change frame extraction + vision analysis captures slides, code, and diagrams that transcripts miss. 25+ MCP tools, FTS5+LLM hybrid retrieval, version history with source attribution per claim.
ThinkWatch - enterprise AI and MCP bastion host
Rust gateway in front of OpenAI, Anthropic, Gemini, and self-hosted LLMs (plus MCP servers) with RBAC, audit logs, rate limits, and cost tracking. The boring layer enterprises actually need.
OpenTabs - API-driven browser agent
Chrome extension and MCP server that lets coding agents drive web tasks by calling site APIs instead of clicking through the DOM. Targets the brittleness of Playwright-style browser automation.