Use the benchmarking functionality of the Cloud Speech-to-Text Console to measure the accuracy of any of the transcription models used in the Speech-to-Text V2 API.
Cloud Speech-to-Text Console provides visual benchmarking for pre-trained and Custom Speech-to-Text models. You can inspect the recognition quality by comparing Word-Error-Rate (WER) evaluation metrics across multiple transcription models to help you decide which model best fits your application.
Before you begin
Ensure you have signed up for a Google Cloud account, created a project, trained a custom speech model, and deployed using an endpoint.
Create a ground-truth dataset
To create a custom benchmarking dataset, gather audio samples that accurately reflect the type of traffic the transcription model will encounter in a production environment. The aggregate duration of these audio files should ideally span a minimum of 30 minutes and not exceed 10 hours. To assemble the dataset, you will need to:
- Create a directory in a Cloud Storage bucket of your choice to store the audio and text files for the dataset.
- For every audio-file in the dataset, create reasonably accurate transcriptions. For each audio file (such as
example_audio_1.wav
), a corresponding ground-truth text file (example_audio_1.txt
) must be created. This service uses these audio-text pairings in a Cloud Storage bucket to assemble the dataset.
Benchmark the model
Using the Custom Speech-to-Text model and your benchmarking dataset to assess the accuracy of your model, follow the Measure and improve accuracy guide.