To improve transcription and speech-recognition accuracy and performance, Cloud Speech-to-Text V1 API is updating the remaining classic speech models to state-of-the-art Conformer-based models in a way that doesn't break API functionality. Classic models refer to those exposed in the V1 API under the command_and_search
, default
, phone_call
, and video
model flags. They are based on discrete acoustic and language models and have been supporting selected Speech-to-Text API use cases.
Since the inception of the Conformer architecture in 2020 at Google Brain, we have tested our solution and gradually replaced our Speech-to-Text V1 API models. Doing so has increased in-domain accuracy, robustness, and performance across a range of use cases. In this page, you can find instructions on how you can benefit from the migration and how you can opt-in to migrate earlier or later, depending on your needs.
What is changing
After the migration deadline, we will start routing traffic away from the currently exposed models. These model identifiers will remain valid and continue to serve traffic, because the redirection happens internally.
This table shows the routing to take place when the migration takes effect. The redirection will happen between the already visible model identifiers. While not required, you can make code changes and test the model behavior in your own time.
BCP-47 code | Current model identifier | single_utterance | Model identifier traffic is directed to |
---|---|---|---|
en-US | command_and_search |
false |
latest_long |
command_and_search |
true |
latest_short |
|
default |
false |
telephony |
|
phone_call |
false |
telephony |
|
phone_call(use_enhanced=true) |
true |
telephony_short |
|
video |
false |
telephony |
|
de-DE, en-AU, en-GB, en-IN, es-ES, es-US, fr-CA, fr-FR, it-IT, ja-JP, nl-NL, pt-BR | command_and_search |
false |
latest_long |
command_and_search |
true |
latest_short |
|
default |
false |
latest_long |
|
phone_call(use_enhanced=true) |
true |
latest_short |
|
phone_call |
false |
latest_long |
Timeline
You have three migration options, outlined on this page. In January 2024, we begin gradually shifting traffic from the classic models to the Conformer-based ones project by project, with individual communication prior to the migration. By June 2024, we expect to shift all traffic to only the Conformer models. Anyone still requesting the classic models will automatically get rerouted to the corresponding Conformer-based models.
Migration mechanism
Customers can opt in earlier or opt out and migrate later by following these instructions:
Preferred: Opt in and migrate earlier
If you want to opt in proactively, change the model identifier that you have been using in Speech-to-Text V1 API with the updated one, as indicated in the preceding table. Migrating your project proactively gives you time to test the models and take advantage of the improved accuracy and robustness earlier.
Opt out and migrate later
If you find any issues with the updated models and would like to opt out from the migration temporarily, create a Google Cloud support case. When creating the support case, use the title "Opt out from Speech-to-Text conformer migration" and provide your project IDs and the reason for opting out.