Benchmark Framework

How to benchmark voice workflows without vanity metrics

This framework focuses on end-to-end writing outcomes, not isolated transcription speed claims. Use these dimensions to compare tools in realistic team tasks.

1. Capture latency

Measure time from capture trigger to first usable text in the target app field. Record p50 and p95 across at least 100 runs.

2. Cleanup quality

Score whether cleaned output preserves intent, language, and actionability. Use domain-specific prompts from engineering, support, and product teams.

3. Workflow completion speed

Track how long it takes to complete a full task: dictate, cleanup, review, and send. This is usually the most practical metric for team productivity impact.

4. Error recovery behavior

Test degraded network and malformed audio paths. Measure how quickly users can recover and finish the workflow.

Suggested reporting format

Benchmark date and environment details.
Task mix (chat, docs, tickets, email).
p50/p95 workflow completion time.
Intent-preservation pass rate from human review.
Top 3 failure modes and mitigation plan.