MCPMark - stress-testing MCP benchmark

Benchmark harness that evaluates models and agents on real-world MCP usage. Comparable scores across servers and frontier models.

Saved Apr 25, 2026View source ↗

#mcp #benchmark #evaluation #agent-eval #tool-use

This entry doesn't have a long-form writeup yet. Follow the source link above for the full context.

Featured in

MCP servers and Model Context Protocol tools
Production MCP servers, gateways, frameworks, and clients - everything in this directory that speaks the Model Context Protocol.

Related entries

GitHub ToolFeatured

trace-mcp - framework-aware codebase MCP for coding agents

MCP server with 138 tools and cross-language framework awareness (58 integrations across 81 languages). Indexes Laravel/Inertia/Vue, Rails/Hotwire, Django/HTMX edges so agents skip re-deriving call graphs. Decision memory links architectural choices to the code they're about. Local-first ONNX embeddings, optional LSP enrichment.

Why I saved this - Distinct from Qartez - Qartez is structural (PageRank, blast radius), trace-mcp is framework-semantic. The cross-language edges (Laravel controller -> Vue page via Inertia) are the differentiated bit.

#mcp #claude-code #code-intelligence #typescript #static-analysis

GitHub ToolFeatured

mcptube - Karpathy-style LLM wiki for YouTube

MCP server that turns YouTube videos into a persistent, merging wiki rather than ephemeral vector chunks. Scene-change frame extraction + vision analysis captures slides, code, and diagrams that transcripts miss. 25+ MCP tools, FTS5+LLM hybrid retrieval, version history with source attribution per claim.

Why I saved this - The wiki-merge design is the differentiator vs RAG-over-YouTube clones - one MCP article with citations, not ten near-duplicate chunks. Scene-change extraction is what makes visual-heavy talks usable.

#mcp #claude-code #knowledge-graph #agent-memory #python

GitHub ToolFeatured

ThinkWatch - enterprise AI and MCP bastion host

Rust gateway in front of OpenAI, Anthropic, Gemini, and self-hosted LLMs (plus MCP servers) with RBAC, audit logs, rate limits, and cost tracking. The boring layer enterprises actually need.

#ai-gateway #mcp #security #observability #rust

GitHub Tool

OpenTabs - API-driven browser agent

Chrome extension and MCP server that lets coding agents drive web tasks by calling site APIs instead of clicking through the DOM. Targets the brittleness of Playwright-style browser automation.

#claude-code #mcp #browser-automation #chrome-extension #ai-agent