Discovery
Back to browse

LABE - legal action boundary eval

Public benchmark that tests an agent at the moment it's about to take a high-impact legal action. Same harness, baseline vs verified, measures unjustified action drops and goal-completion gains.

View source ↗

This entry doesn't have a long-form writeup yet. Follow the source link above for the full context.

Featured in

Related entries