Migrate from classic to Conformer models

To improve transcription and speech-recognition accuracy and performance, Cloud Speech-to-Text V1 API is updating the remaining classic speech models to state-of-the-art Conformer-based models in a way that doesn't break API functionality. Classic models refer to those exposed in the V1 API under the command_and_search, default, phone_call, and video model flags. They are based on discrete acoustic and language models and have been supporting selected Speech-to-Text API use cases.

Since the inception of the Conformer architecture in 2020 at Google Brain, we have tested our solution and gradually replaced our Speech-to-Text V1 API models. Doing so has increased in-domain accuracy, robustness, and performance across a range of use cases. In this page, you can find instructions on how you can benefit from the migration and how you can opt-in to migrate earlier or later, depending on your needs.

What is changing

After the migration deadline, we will start routing traffic away from the currently exposed models. These model identifiers will remain valid and continue to serve traffic, because the redirection happens internally.

This table shows the routing to take place when the migration takes effect. The redirection will happen between the already visible model identifiers. While not required, you can make code changes and test the model behavior in your own time.

BCP-47 code Current model identifier single_utterance Model identifier traffic is directed to
en-US command_and_search false latest_long
command_and_search true latest_short
default false telephony
phone_call false telephony
phone_call(use_enhanced=true) true telephony_short
video false telephony
de-DE, en-AU, en-GB, en-IN, es-ES, es-US, fr-CA, fr-FR, it-IT, ja-JP, nl-NL, pt-BR command_and_search false latest_long
command_and_search true latest_short
default false latest_long
phone_call(use_enhanced=true) true latest_short
phone_call false latest_long

Timeline

You have three migration options, outlined on this page. In January 2024, we begin gradually shifting traffic from the classic models to the Conformer-based ones project by project, with individual communication prior to the migration. By June 2024, we expect to shift all traffic to only the Conformer models. Anyone still requesting the classic models will automatically get rerouted to the corresponding Conformer-based models.

Migration mechanism

Customers can opt in earlier or opt out and migrate later by following these instructions:

Preferred: Opt in and migrate earlier

If you want to opt in proactively, change the model identifier that you have been using in Speech-to-Text V1 API with the updated one, as indicated in the preceding table. Migrating your project proactively gives you time to test the models and take advantage of the improved accuracy and robustness earlier.

Opt out and migrate later

If you find any issues with the updated models and would like to opt out from the migration temporarily, create a Google Cloud support case. When creating the support case, use the title "Opt out from Speech-to-Text conformer migration" and provide your project IDs and the reason for opting out.