We’re in the midst of a healthcare AI boom, with voice technology at its center.
The pitch is persuasive: automatic transcriptions, instant summaries, reduced admin.
But when voice AI gets things wrong, the consequences aren't just frustrating, they're potentially life-threatening.
This is what makes digital health strategist John Nosta’s recent framework so necessary. His latency-error quadrant finally gives language to something many of us working in the space have long intuited.
It’s a simple idea: speed and accuracy are not the same, and the real value of voice AI lies in the balance between the two.
That balance matters. Because while Silicon Valley races to automate care with promises of efficiency and scale, Nosta reminds us of the human cost of technical failure.
A smart speaker misinterpreting “coffee pods” as “dog food” is annoying. A medical assistant mixing up similar-sounding medications is something else entirely.
Nosta’s quadrant lays out four possible outcomes. 1) Fast and accurate: optimized. 2) Fast but wrong: dangerous. 3) Slow but accurate: inefficient. 4) Slow and wrong: unacceptable.
It’s not a particularly flattering framework for the current state of play, but that’s the point. It gives us a tool to evaluate these systems honestly and without hype.
The intersection of these axes yields four quadrants. Each defines a different kind of system, and each carries its own cognitive and clinical implications.
The uptake of AI in healthcare is accelerating.
In 2024, 2 in 3 physicians reported using AI tools (up 78% from the year before!). Meanwhile, 75% of US healthcare providers and payers increased their IT budgets over the past year, with much of that spend directed at AI.
The speed of change is impressive. So are the benefits, when the tech works.
The most advanced systems can reduce documentation time, identify who is speaking during a consultation, and enable an LLM to generate real-time alerts based on what's said. The idea is to let doctors talk naturally while AI listens, captures, and supports.
When it works, it’s a clear win. But the risks of premature deployment remain high. Initial error rates for medical speech recognition can be as high as 7.4%, though this drops to 0.3% with human correction.
That still leaves room for dangerous missteps. The Joint Commission has already cited speech recognition as a contributing factor in patient safety incidents, particularly around medications.
We also know the tech performs inconsistently across different contexts. Accents, background noise, and complex medical vocabulary can trip up even the most advanced models. And many of the systems currently on the market haven’t been trained with the diversity of real-world healthcare in mind.
At Speechmatics, we’ve taken that balance seriously and we have the data to prove it. John Nosta’s quadrant gives the perfect framing for what’s at stake in healthcare voice AI: speed without accuracy is dangerous, and accuracy without speed is unusable. Our upcoming graph demonstrates exactly how we measure up.
A comparison of the latency/accuracy trade-off of several providers on the Kincaid test set (2hrs long). Note the log scale on the x-axis, range is (0.35, 20)s
Built from real-time benchmarks, it shows that Speechmatics doesn’t just fit the quadrant, we outperform competitors across both axes. It’s a clear validation that the “clinical sweet spot” isn’t theoretical. It’s achievable – and we’re already there. We’re also at the forefront of movement toward foundational models that can be fine-tuned with specialty-specific language. We offer features like industry-leading speaker diarization, which makes conversations more usable.
This is why Nosta’s quadrant matters. It gives healthcare leaders, product teams, and policymakers a shared vocabulary for making difficult choices. It acknowledges a reality we too often ignore: that faster isn’t always better, and slower isn’t always safer. The only systems we should be deploying are the ones that get both right.
In medicine, words matter. They aren’t just communication. They’re diagnosis, treatment, and care. As voice AI systems increasingly mediate those words, the difference between fast and wrong, and fast and right, becomes more than technical. It becomes clinical.
We shouldn’t accept anything less than the sweet spot.