But can it pass 1,000 real calls?
We test across 11 dimensions — comprehension, logic, safety, bias, and beyond — on real calls, automatically, at scale.
A complete rubric for production voice agents — grouped into four themes that mirror how teams actually ship.
Building-block library: personalities, environments, demographics, scenario templates. Composes thousands of realistic test conversations.
Tester agent places real calls over PSTN to the agent under test. Black-box — no SDK, no access to internals.
ASR + structured rubric scoring across all 11 dimensions. Reports update on every release.
Scenarios adapt in real time based on prior agent responses — coverage grows with every run.
Real calls placed over PSTN. No SDK, no instrumentation, no access to agent-side logs.