Adrian Ispas

June 27, 2025

Vatis Tech Takes a Leap Forward: Announcing Our v6 Transcription Model

TABLE OF CONTENTS

Experience the Future of Speech Recognition Today

Try Vatis now, no credit card required.

TRY FREE

Share this article

We're delighted to announce that we've once again pushed the boundaries of speech-to-text technology. Our commitment to advancing speech to text for the Romanian language has yielded significant results with the latest upgrade from v5 to v6 of our model.

With substantial improvements in accuracy, the v6 model stands as a testament to our continuous endeavor to provide the best in-class solution for our users.‍

‍

A Deeper Dive into the Results‍

The most impressive metric showcasing our model's evolution is the Word Error Rate (WER). For those new to the world of speech recognition, WER is a standard metric used to measure the performance of a speech-to-text conversion. It calculates the ratio of incorrect words (substitutions, insertions, deletions) to the total number of words spoken. A lower WER indicates better accuracy. For instance, if WER is 0.1, it implies a 90% accuracy rate.

‍

Let’s delve into the numbers

Overall WER improvement from v5 to v6: +8%

v5 WER: 0.064488
v6 WER: 0.059555

This signifies that we've successfully consolidated our transcription model for the Romanian language, consistently achieving an impressive 95% accuracy across diverse datasets and challenging audio types.

‍

Spotlight on Specific Improvements

Phone Calls: one of the most dynamic environments for speech-to-text technology is in phone call transcriptions. Varied clarity, different accents, and background noises can present challenges. Our v6 model proudly showcases a substantial 20% reduction in error rates.

v5 WER for Phone Calls: 0.07226 (implying 92.77% accuracy)
v6 WER for Phone Calls: 0.05806 (implying 94.19% accuracy)

Phone Call WER for v6 Transcription Model — Phone Calls WER for v6 Transcription Model

‍

Legal Audios: in a sector where precision is paramount, our model has achieved a 17% improvement in error rate when transcribing legal documentation.

v5 WER for Legal Audios: 0.07166 (implying 92.83% accuracy)
v6 WER for Legal Audios: 0.05984 (implying 94.01% accuracy)

Legal Audios WER for v6 Transcription Model

‍

How Can One Calculate Accuracy Based on WER?

To put it simply, Accuracy can be calculated as:

Accuracy = (1 − WER) × 100

So, if you have a WER of 0.1 (or 10%), the accuracy of the speech-to-text model would be:

Accuracy = (1 − 0.1) × 100 = 90

‍

Robust Evaluation Using Extensive Datasets

To ensure the effectiveness of our upgrades, we employed 50 datasets for evaluation. These datasets comprised a whopping 100,000 data samples, guaranteeing a comprehensive and exhaustive assessment. Such thorough testing not only validates our results but also provides users with the assurance that our improvements are genuinely beneficial in real-world scenarios.

‍

Wrapping Up

At Vatis Tech, we're driven by the desire to innovate and refine our solutions. Our Romanian speech-to-text solution's latest upgrade is a clear manifestation of this commitment.

We extend our gratitude to our dedicated team, our partners, and most importantly, our users, who continually motivate us to strive for excellence.

With the v6 model now available, we invite you to experience its heightened accuracy firsthand. Stay tuned for more advancements in the near future!

Continue Reading

Maria Carp

June 29, 2026

How to Transcribe a Video to Text for Free (2026): Step-by-Step Guide

Here's the typical step-by-step workflow, using the free web tool transcribevideototext.com as a worked example:Upload your file. Drag and drop your video (MP4, MOV, or WebM) or audio file (MP3, WAV, or M4A) into the browser. No software install required.Pick or auto-detect the language. Good tools support 100+ languages and can auto-detect spoken language, which matters if your content isn't in English.Transcribe. Start the job and let the model process the audio. A few minutes of video usually takes well under a minute.Review with timestamps. Read through the transcript alongside word-level timestamps and speaker labels (diarization), so you can quickly fix any mishearings and identify who said what.Export. Download the result as plain text (TXT), a document (PDF), or subtitle files (SRT/VTT) ready to drop into a video editor or YouTube by using the MCP.

Compare the best video-to-text converter tools of 2026 — free and paid. Accuracy, languages, speaker labels, SRT export and pricing, reviewed honestly.

Maria Carp

June 23, 2026

Best Video to Text Converter Tools in 2026 (Free & Paid)

Adrian Ispas

May 28, 2026

Laws Regarding Recording Conversations: 2026 Guide

Understand the laws regarding recording conversations. Our 2026 guide covers federal & state rules, consent laws, and compliance steps.

Adrian Ispas

May 27, 2026

Standout Resumes for Journalists: 2026 Guide

Craft standout resumes for journalists that get noticed. Our 2026 guide covers metrics, ATS optimization, portfolio integration, and examples.

For engineers who read the docs before the marketing page

Read the documentation, try for free, tell us how it goes.

API Docs Try For Free