We're delighted to announce that we've once again pushed the boundaries of speech-to-text technology. Our commitment to advancing speech to text for the Romanian language has yielded significant results with the latest upgrade from v5 to v6 of our model.
With substantial improvements in accuracy, the v6 model stands as a testament to our continuous endeavor to provide the best in-class solution for our users.
A Deeper Dive into the Results:
The most impressive metric showcasing our model's evolution is the Word Error Rate (WER). For those new to the world of speech recognition, WER is a standard metric used to measure the performance of a speech-to-text conversion. It calculates the ratio of incorrect words (substitutions, insertions, deletions) to the total number of words spoken. A lower WER indicates better accuracy. For instance, if WER is 0.1, it implies a 90% accuracy rate.
Let’s delve into the numbers:
Overall WER improvement from v5 to v6: +8%
- v5 WER: 0.064488
- v6 WER: 0.059555
This signifies that we've successfully consolidated our transcription model for the Romanian language, consistently achieving an impressive 95% accuracy across diverse datasets and challenging audio types.
Spotlight on Specific Improvements:
Phone Calls: one of the most dynamic environments for speech-to-text technology is in phone call transcriptions. Varied clarity, different accents, and background noises can present challenges. Our v6 model proudly showcases a substantial 20% reduction in error rates.
- v5 WER for Phone Calls: 0.07226 (implying 92.77% accuracy)
- v6 WER for Phone Calls: 0.05806 (implying 94.19% accuracy)
Legal Audios: in a sector where precision is paramount, our model has achieved a 17% improvement in error rate when transcribing legal documentation.
- v5 WER for Legal Audios: 0.07166 (implying 92.83% accuracy)
- v6 WER for Legal Audios: 0.05984 (implying 94.01% accuracy)
How Can One Calculate Accuracy Based on WER?
To put it simply, Accuracy can be calculated as:
Accuracy = (1 − WER) × 100
So, if you have a WER of 0.1 (or 10%), the accuracy of the speech-to-text model would be:
Accuracy = (1 − 0.1) × 100 = 90
Robust Evaluation Using Extensive Datasets
To ensure the effectiveness of our upgrades, we employed 50 datasets for evaluation. These datasets comprised a whopping 100,000 data samples, guaranteeing a comprehensive and exhaustive assessment. Such thorough testing not only validates our results but also provides users with the assurance that our improvements are genuinely beneficial in real-world scenarios.
At Vatis Tech, we're driven by the desire to innovate and refine our solutions. Our Romanian speech-to-text solution's latest upgrade is a clear manifestation of this commitment.
With the v6 model now available, we invite you to experience its heightened accuracy firsthand. Stay tuned for more advancements in the near future!