11 dimensions · Real calls · Fully automated

Your voice AI agent
passed the demo.

But can it pass 1,000 real calls?

We test across 11 dimensions — comprehension, logic, safety, bias, and beyond — on real calls, automatically, at scale.

§ 01 · What we measure

Eleven dimensions. Eighty-nine metrics and counting...

9 rubrics + fairness lens + workflow compliance

Voice AI Evaluation

A complete rubric for production voice agents — grouped into four themes that mirror how teams actually ship.

Core Technical

4 rubrics · 33 metrics

Real-World Reliability

2 rubrics · 13 metrics

Trust & Risk

3 rubrics · 32 metrics

Cross-Cutting

2 rubrics · 11 metrics

§ 02 · How we measure

Black-box testing pipeline

Building-block library: personalities, environments, demographics, scenario templates. Composes thousands of realistic test conversations.

PersonalitiesEnvironmentsDemographicsTemplates

Tester agent places real calls over PSTN to the agent under test. Black-box — no SDK, no access to internals.

Real PSTNNo SDKBlack-box

ASR + structured rubric scoring across all 11 dimensions. Reports update on every release.

ASR11 rubrics89 metrics

Dynamic test generation

Scenarios adapt in real time based on prior agent responses — coverage grows with every run.

100 % automated, black-box

Real calls placed over PSTN. No SDK, no instrumentation, no access to agent-side logs.

§ 04 · Get started