11 dimensions · Real calls · Fully automated

Your voice AI agent
passed the demo.

But can it pass 1,000 real calls?

We test across 11 dimensions — comprehension, logic, safety, bias, and beyond — on real calls, automatically, at scale.

§ 01 · What we measure

Eleven dimensions. Eighty-nine metrics and counting...

Voice AI Evaluation

A complete rubric for production voice agents — grouped into four themes that mirror how teams actually ship.

Core Technical
4 rubrics · 33 metrics
Real-World Reliability
2 rubrics · 13 metrics
Trust & Risk
3 rubrics · 32 metrics
Cross-Cutting
2 rubrics · 11 metrics
§ 02 · How we measure

100 % automated. Real calls. No agent-side logs.

01

Scenarios

Building-block library: personalities, environments, demographics, scenario templates. Composes thousands of realistic test conversations.

PersonalitiesEnvironmentsDemographicsTemplates
02

Automated Calls

Tester agent places real calls over PSTN to the agent under test. Black-box — no SDK, no access to internals.

Real PSTNNo SDKBlack-box
03

Metrics

ASR + structured rubric scoring across all 11 dimensions. Reports update on every release.

ASR11 rubrics89 metrics
Dynamic test generation

Scenarios adapt in real time based on prior agent responses — coverage grows with every run.

100 % automated, black-box

Real calls placed over PSTN. No SDK, no instrumentation, no access to agent-side logs.

§ 04 · Get started

Measure what matters.
Ship with confidence.

Open the dashboard
Voice AI Testing Platform · BabbleLabs