Open research on how to evaluate, train, and control agents. With academic and industry partners.
Eight scoring dimensions, calibration, integrity safeguards.
Academic, Interface, enterprise partners.
Why you can trust the scores.
Where the research ships.
Collaborate or cite.
We use strictly necessary storage to remember this choice. If you accept, we load Google Analytics 4 (GA4) for aggregate traffic metrics only — see our Privacy Policy.