LABE - legal action boundary eval
Public benchmark that tests an agent at the moment it's about to take a high-impact legal action. Same harness, baseline vs verified, measures unjustified action drops and goal-completion gains.
This entry doesn't have a long-form writeup yet. Follow the source link above for the full context.
Featured in
Related entries
OQP - verification protocol for AI agents
MCP-compatible spec defining four endpoints (capabilities, workflows, execute, assess-risk) so agents can prove a shipped change satisfies business requirements before it goes live.
secure-exec - npm-compatible Node sandboxing
Lightweight library for sandboxing Node.js code execution from agents without containers or VMs, using runtime isolation. Built for code interpreter use cases.
pipelock - MCP firewall for AI agents
Go-based agent firewall that controls egress from MCP servers, blocking SSRF, DLP leaks, and prompt-injection vectors at the network layer. Acts as a fetch proxy for tool calls.
RedAI - validate vulnerabilities in live targets
Security agent that runs scanner agents to surface candidate vulnerabilities, then has validator agents reproduce each one against a running instance. Outputs only confirmed exploitable findings.