Speech to text API

Integrate Vatis Tech's Speech-to-Text APIs into your application in minutes with a single API call and easy-to-follow API docs

1import requests
2
3url = "https://vatis.tech/api/v1/files/transcribe/file"
4
5payload = {
6 'language': 'ro_RO'
7}
8
9files = [
10 ('file', open('/path/to/your_file.mp4','rb'))
11]
12headers = {
13 'Authorization': 'Bearer *your_api_key_here*',
14}
15
16response = requests.request("POST", url, headers = headers, data = payload, files = files)
17
18print(response.text.encode('utf8'))

What makes Vatis Tech Speech-to-Text API a compelling choice for development?

Inner circlesMid circlesOuter circlesOne hundred icon

Over 90% Accuracy 

Our Automatic Speech Recognition model consistently achieves a speech-to-text accuracy exceeding 90%, and approaches an impressive 99% when transcribing high-quality audio—reaching a level of accuracy comparable to human transcription.

Inner circlesMid circlesOuter circlesDatabase icon

Highly Scalable 

We are able to handle large volumes of audio data. You can test the infrastructure’s accuracy and scalability by performing extensive load testing and simultaneously sending thousands of files for transcription.

Inner circlesMid circlesOuter circlesBank icon

Unbeatable pricing

Top-tier automatic speech-to-text services at a very competitive price.We make no compromises on quality or turnaround time.

Inner circlesMid circlesOuter circlesCode big icon

Easy-to-follow API docs

Streamlined implementation is facilitated by our comprehensive documentation, enabling you to be operational in a matter of hours.

Inner circlesMid circlesOuter circlesCheckmark icon

Trustworthy Service

Experience peace of mind with our service's exceptional reliability, ensuring a constant 99.9% uptime

Inner circlesMid circlesOuter circles

Security & Confidentiality

Your files are encrypted and protected from unauthorized access. Only you have the encryption key. We use bank-grade security and have strict data storage policies to keep your files safe.

Inner circlesMid circlesOuter circles

Flexible Deployment

On cloud or on premise with your data fully secured. Your model will be accessible through our API platform, integrating seamlessly into your application within minutes through a single API call.

Inner circlesMid circlesOuter circlesAtom icon

Customisable Models

Tailored speech-to-text models can achieve an accuracy improvement ranging from 10% to 20%, outperforming generic models. 

Explore the flexibility of our highly scalable speech-to-text API with impactful applications spanning key business sectors, including but not limited to:

Transcription Features

Voice search icon

Real-Time Transcription

It provides immediate access to accurate transcriptions. Particularly useful for live events, meetings, conferences, customer service interactions. 

Async icon

Async Transcription

Transcribe pre-recorded audio or video files with high accuracy using our highly scalable infrastructure.

Box icon

Multiple Formats Supported

You don't have to worry about file format or sampling rates. We can handle anything — from MP3 to FLAC, MP4 to MKV, and everything in between.

Download icon

Easy Exporting

When it's time to wrap up, effortlessly export your polished transcript into PDF, DOCX, TXT, or SRT formats.

User tag icon

Punctuation & Capitalization

Identify a wide range of entities like people and company names, dates, or locations from your audio files.

One hundred icon

Numeral Conversion

Automatically convert letter-written numerals to number-written numerals.

Pencil icon

Find and Replace

The "Find and Replace" function allows you to replace specific text expressions with other terms. This feature seamlessly operates in both async and live transcription modes.

Hourglass icon

Word Timings

The entire transcript has an associated timestamp for each word, so you can easily find what you need, quick.

Swear icon

Profanity Filtering

Swearing is rarely appropriate — we automatically filter out the bad words out of your transcript. And yes, you can turn it off if needed.

Users icon

Speaker Diarization 

Involves distinguishing and segmenting different speakers in an audio recording. The goal of speaker diarization is to identify "who spoke when" in a given audio file, assigning each speaker a distinct label.

Fork icon

Multiple Channels

Multichannel transcribes audio with multiple tracks, each representing a distinct speaker or source. It is an accurate representation of the dialogue originating from various sources within the recording.

Percentage icon

Confidence Scores

We like transparency — we show a confidence score of our algorithms for each word in the transcript.

1{
2  "type": "transcription",
3  "transcription_config": {
4    "operating_point": "enhanced",
5    "language": "en"
6  },
7  "translation_config": {
8    "target_languages": ["es", "de"] 
9    // Set languages here to enable translation
10  }
11}

COMING SOON

Automatic Translation

Vatis Tech empowers you to effortlessly translate your audio into multiple languages. Easily integrate translation into your application with a single API call.Explore our translation feature for free in the Vatis Tech Transcription Software—no coding necessary.

Translation functionality can be activated when transcribing either a pre-recorded file or in real-time, whether through Vatis Tech SaaS or on-premises deployment.

If you're new to Vatis Tech, refer to our documentation on Transcribing a File. Once set up, incorporate the following configuration to enable translation.

COMING SOON

Audio Intelligence Features

Summarization 

Transcript summarization is the process of creating a brief and coherent summary that captures the main ideas and key concepts of a lengthy transcript, enabling quick understanding or reference.

Sentiment Analysis 

Sentiment analysis identifies whether the emotions conveyed in the transcript are positive, negative, or neutral, offering insights for applications like customer feedback, social media monitoring, and content evaluation.

Pencil icon

PII Redaction

The PII Redaction model lets you minimize sensitive information about individuals by automatically identifying and removing it from your transcript.

Topic Detection 

The Topic Detection model lets you identify and label different topics in your audio or video files.

Question mark icon

Ask Anything 

The "Ask Anything" function allows users to pose a wide range of questions related to their files, and the AI Assistant will provide informative responses based on its knowledge.

Auto Chapters

"Auto Chapter" automatically divides audio files, enhancing navigation by detecting content changes or pauses, allowing easy skipping between sections.

See For Yourself

Transcribe pre-recorded audio or video files with high accuracy using our highly scalable infrastructure.

00:00:00

Speaker 1

Good afternoon, my name is Aisha. Thank you for calling for ISIN today. How may I be of assistance?

00:00:07

Speaker 2

Hi Aisha. Well, I want to sign up for the Wi-Fi service.

00:00:05

Speaker 1

Sure, we can definitely sign you up. What's your phone number.

00:00:21

Speaker 2

414-263-0157.

00:00:25

Speaker 2

Thank you very much. And can I have your full name?

00:00:30

Speaker 2

George Washington.

00:00:33

Speaker 1

Thank you, George Washington. What is your date of birth?

00:00:37

Speaker 2

It is July 1st, 1968.

00:00:46

Speaker 1

Thank you very much. And can you confirm the address that we have for you here in the system?

00:00:54

Speaker 2

The address is 2898 Atwood Terrace, Columbus, Ohio, 432-24

00:00:00

Speaker 1

Good afternoon, my name is Aisha. Thank you for calling for ISIN today. How may I be of assistance?

00:00:07

Speaker 2

Hi Aisha. Well, I want to sign up for the Wi-Fi service.

00:00:05

Speaker 1

Sure, we can definitely sign you up. What's your phone number.

00:00:21

Speaker 2

414-263-0157.

00:00:25

Speaker 2

Thank you very much. And can I have your full name?

00:00:30

Speaker 2

George Washington.

00:00:33

Speaker 1

Thank you, George Washington. What is your date of birth?

00:00:37

Speaker 2

It is July 1st, 1968.

00:00:46

Speaker 1

Thank you very much. And can you confirm the address that we have for you here in the system?

00:00:54

Speaker 2

The address is 2898 Atwood Terrace, Columbus, Ohio, 432-24

Person

Numeral Conversion

Date

Address

Punctuation

00:00:00

Speaker 1

Bună ziua, numele meu este Aisha. Vă mulțumesc că ați sunat astăzi la ISIN. Cu ce vă pot fi de folos?

00:00:07

Speaker 2

Bună, Aisha. Ei bine, vreau să mă abonez la serviciul de Wi-Fi.

00:00:05

Speaker 1

Bineînțeles, cu siguranță vă putem înscrie. Care este numărul dvs de telefon?

00:00:21

Speaker 2

414-263-0157.

00:00:25

Speaker 2

Mulțumesc frumos. Și îmi puteți spune numele dvs. complet.

00:00:30

Speaker 2

George Washington.

00:00:33

Speaker 1

Mulțumesc, George Washington. Care este data dvs de naștere?

00:00:37

Speaker 2

Este 1 iulie 1968.

00:00:46

Speaker 1

Mulțumesc foarte mult. Puteți confirma adresa pe care o avem în sistem?

00:00:54

Speaker 2

Adresa este 2898 Atwood Terrace, Columbus, Ohio, 432-24.

Languages & Formats

Supported Formats

Experience the Future of Speech Recognition Today

Try Vatis now, no credit card required.

Waveform visual

We use cookies to improve your experience and for marketing. Read our Cookie Policy.

Accept AllReject All