Vertex AI pricing

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

This page covers pricing for Generative AI on Vertex AI. For all other Vertex AI pricing including ML Platform and MLOps services please refer to Vertex AI pricing page.

Google models

Gemini

With the Multimodal models in Vertex AI, you can input either text or media (images, video). Text input is charged by every 1,000 characters of input (prompt) and every 1,000 characters of output (response). Characters are counted by UTF-8 code points and white space is excluded from the count, resulting in approximately 4 characters per token. Prediction requests that lead to filtered responses are charged for the input only. At the end of each billing cycle, fractions of one cent ($0.01) are rounded to one cent. Media input is charged per image or per second (video). If your request fails with a 400 or 500 error, you won't be charged for the tokens used.

Model	Feature	Type	Price ( =< 128K input tokens)	Price ( > 128K input tokens)
Gemini 1.5 Flash	Multimodal	Image Input Video Input Text Input Audio Input	$0.00002 / image $0.00002 / second $0.00001875 / 1k characters $0.000002 / second	$0.00004 / image $0.00004 / second $0.0000375 / 1k characters $0.000004 / second
		Text Output	$0.000075 / 1k characters	$0.00015 / 1k characters
	Tuning*	Training Token	$8 / M tokens
Gemini 1.5 Pro	Multimodal	Image Input Video Input Text Input Audio Input	$0.00032875 / image $0.00032875 / second $0.0003125 / 1k characters $0.00003125 / second	$0.0006575 / image $0.0006575 / second $0.000625 / 1k characters $0.0000625 / second
		Text Output	$0.00125 / 1k characters	$0.0025 / 1k characters
	Tuning*	Training Token	$80 / M tokens
Gemini 1.0 Pro	Multimodal	Image Input Video Input Text Input	$0.0025 / image $0.002 / second $0.000125 / 1k characters
Gemini 1.0 Pro		Text Output	$0.000375 / 1k characters
Grounding with Google Search	Text	Grounding requests	$35 / 1k requests (for up to 1M requests per day). Please contact your account team if you require more than 1M requests per day.

* Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.
* If a query context is longer than 128K, all tokens are charged at long context rates.
* Gemini models are available in batch mode at 50% discount.
* Gemini 1.0 Pro only support up to 32K context window.
* PDFs are billed as image input, with one PDF page equivalent to one image.
* Tuned model endpoint has the same prediction price as the base model.
* Grounding with Google Search: If you are using dynamic retrieval to optimize costs, only requests that contain at least one grounding support URL from the web in their response are charged for Grounding with Google Search. Costs for Gemini always apply.

Imagen

With Imagen on Vertex AI, you can generate novel images and edit images based on text prompts you provide, or edit only parts of images using a mask area you define along with a host of other capabilities.

Model	Feature	Description	Input	Output	Price
Imagen 3	Image generation	Generate an image Edit an image Customize an image	Text prompt	Image	$0.04 per image
Imagen 3 Fast	Image generation	Generate an image	Text prompt	Image	$0.02 per image
Imagen 2, Imagen	Image generation	Generate an image	Text prompt	Image	$0.020 per image
	Image editing	Edit an image using mask free or mask approach	Image/Text prompt	Image	$0.020 per image
	Upscaling	Increase resolution of a generated image to 2k and 4k	Image	Image	$0.003 per image
	Fine-tuning	Enable a "subject" provided by the user to used in Imagen prompts (few shot training)	Subject(s) with text identifier and 4-8 images per subject	Fine-tuned model (after training with user provided subjects)	$ per node hour (Vertex AI custom training pricing)
	Visual Captioning	Generate a short or long text caption for an image	Image	Text caption	$0.0015/image
	Visual Q&A	Provide an answer based on a question referencing an image	Image/Text prompt	Text answer	$0.0015/image

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Embedding

Model	Feature	Description	Input	Output	Price
multimodalembedding	Embeddings for Multimodal: Text	Generate embeddings using text as an input	Text	Embeddings	$0.0002 / 1k characters input
	Embeddings for Multimodal: Image	Generate embeddings using image as an input	Image	Embeddings	$0.0001 / image input
	Embeddings for Multimodal: Video Plus	Video Plus	Video	Embeddings (up to 15 embeddings per min of video)	$0.0020 per second of video
	Embeddings for Multimodal: Video Standard	Video Standard	Video	Embeddings (up to 8 embeddings per min of video)	$0.0010 per second of video
	Embeddings for Multimodal: Video Essential	Video Essential	Video	Embeddings (up to 4 embeddings per min of video)	$0.0005 per second of video

Model	Type	Region	Price per 1,000 characters
Embeddings for Text	Input	Global	Online requests: $0.000025 Batch requests: $0.00002
Embeddings for Text	Output	Global	Online requests: No charge Batch requests: No charge

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Code completion

Generative AI on Vertex AI charges by every 1,000 characters of input (prompt) and every 1,000 characters of output (response). Characters are counted by UTF-8 code points and white space is excluded from the count. During the Preview stage, charges are 100% discounted. Prediction requests that lead to filtered responses are charged for the input only. At the end of each billing cycle, fractions of one cent ($0.01) are rounded to one cent.

Model	Type	Region	Price per 1,000 characters
Codey for Code Completion	Input	Global	Online requests: $0.00025
Codey for Code Completion	Output	Global	Online requests: $0.0005

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Translation (Text)

Use the Vertex AI API and translation LLM to translate text. LLM translations tend to be more fluent and human sounding than classic translation models, but have more limited language support (Learn More).

Model	Method	Usage	Price per million characters
LLM	Text translation (Preview)^*	The number of input characters per month	$10 per million characters^*
	Text translation (Preview)^*	The number of output characters per month	$10 per million characters^*

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.
^*Price is per character processed by the model. For details about counted characters, see Charged characters

Context Caching

With context caching, you can reduce the cost of Gemini input token processing by 75% and latency of content generation by caching the context portion of your input text or media to Gemini models. The amount of time data is stored in the cache, which can be controlled by the user, determines the "Context Cache Storage" charges. When creating a cached context, users will be charged the standard input token cost. Cache hits on input data are charged at a reduced rate, "Cached Input", instead of the normal input cost. The data size for both storage and input is calculated in the same way as Gemini input pricing.

Model	Feature	Type	Price ( =< 128K input tokens )	Price ( > 128K input tokens )
Gemini 1.5 Flash	Cached Input	Image Input Video Input Text Input Audio Input	0.000005 / image 0.000005 / second 0.0000046875 / 1k characters 0.0000005 / second	0.00001 / image 0.00001 / second 0.000009375 / 1k characters 0.000001 / second
Gemini 1.5 Flash	Context Cache Storage	Image Input Video Input Text Input Audio Input	0.000263 / image / hr 0.000263/ second / hr 0.00025 / 1k characters / hr 0.000025 / second / hr
Gemini 1.5 Pro	Cached Input	Image Input Video Input Text Input Audio Input	0.0000821875 / image 0.0000821875 / second 0.000078125 / 1k characters 0.0000078125 / second	0.000164375 / image 0.000164375 / second 0.00015625 / 1k characters 0.000015625 / second
Gemini 1.5 Pro	Context Cache Storage	Image Input Video Input Text Input Audio Input	0.0011835 / image / hr 0.0011835/ second / hr 0.001125 / 1k characters / hr 0.0001125 / second / hr

Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

Example cached cost calculation

If a user creates a 250,000 character cached context with a TTL of 2 hours and subsequently sends twenty separate requests to the Gemini 1.5 Pro model during those 2 hours, and each request has a 200-character query added to the cached context and 400 character output, the total charge is calculated as follows:

Cache Creation cost:
250,000 input characters x ($0.0003125 / 1000) = $0.078125 cached input cost.

Cache Storage cost:
250,000 characters x 2 hours = 500,000 total character hours;
500,000 total character hours x ($0.001125 / 1000) = $0.5625 storage cost.

Requests using cache cost:
200 characters x 20 requests = 4,000 total character inputs
250,000 cached characters * 20 requests = 5,000,000 total cached character inputs
4,000 total character inputs * ($0.0003125 / 1000) = $0.00125 character input cost
5,000,000 total cached character inputs * ($0.000078125 / 1000) = $0.390625 cached input cost
$0.00125 character input cost + $0.390625 cached input cost = $0.391875 total input cost

Output cost:
400 output characters x 20 prompts = 8,000 total output characters;
8,000 total output characters x ($0.00375 / 1000) = $0.03 output cost.

Total cost:
$0.078125 cached input cost + $0.5625 cached storage cost + $0.391875 input cost + $0.03 output cost = $1.0625 total cost.

Example cost calculation

If a user sends five separate requests to the PaLM Text Bison model, and each request has a 200-character input and 400-character output, the total charge is calculated as follows:

Input cost:
200 input characters x 5 prompts = 1,000 total input characters;
1,000 total input characters x ($0.00025 / 1000) = $0.00025 input cost.

Output cost:
400 output characters x 5 prompts = 2,000 total output characters;
2,000 total output characters x ($0.0005 / 1000) = $0.001 output cost.

Total cost:
$0.00025 input cost + $0.001 output cost = $0.00125 total cost.

Partner models

Partner models are a curated list of generative AI models developed by Google partners. Partner models are offered as managed APIs. For more information, see Overview of partner models. The following sections list pricing details for Google partner models.

AI21 Lab's models

Model	Pricing
Jamba 1.5 Large	Input: $2 / million tokens Output: $8 / million tokens
Jamba 1.5 Mini	Input: $0.20 / million tokens Output: $0.40 / million tokens

Anthropic’s Claude models

Cache Write: $3.75 / million tokens
Cache Hit: $0.30 / million tokens

Cache Write: $0.30 / million tokens
Cache Hit: $0.03 / million tokens

Cache Write: $18.75 / million tokens
Cache Hit: $1.50 / million tokens

Model	Pricing
Claude 3.5 Haiku	Input: $0.80 / million tokens Output: $4 / million tokens Batch Input: $0.40 / million tokens Batch Output: $2 / million tokens Cache Write: $1 / million tokens Cache Hit: $0.08 / million tokens Batch Cache Write: $0.50 / million tokens Batch Cache Hit: $0.04 / million tokens
Claude 3.5 Sonnet v2	Input: $3 / million tokens Output: $15 / million tokens Batch Input: $1.50 / million tokens Batch Output: $7.50 / million tokens Cache Write: $3.75 / million tokens Cache Hit: $0.30 / million tokens Batch Cache Write: $1.875 / million tokens Batch Cache Hit: $0.15 / million tokens
Claude 3.5 Sonnet	Input: $3 / million tokens Output: $15 / million tokens
Claude 3 Haiku	Input: $0.25 / million tokens Output: $1.25 / million tokens
Claude 3 Sonnet (deprecated)	Input: $3 / million tokens Output: $15 / million tokens
Claude 3 Opus	Input: $15 / million tokens Output: $75 / million tokens

Meta's Llama models

Model	Pricing
Llama 3.1 405B	Input: $5.00 / million tokens Output: $16.00 / million tokens

Mistral AI’s models

Model	Pricing
Mistral Large (24.11)	Input: $2.00 / million tokens Output: $6.00 / million tokens
Mistral Large (24.07) (deprecated)	Input: $2.00 / million tokens Output: $6.00 / million tokens
Mistral Nemo	Input: $0.15 / million tokens Output: $0.15 / million tokens
Codestral (25.01)	Input: $0.30 / million tokens Output: $0.90 / million tokens
Codestral (24.05) (deprecated)	Input: $0.20 / million tokens Output: $0.60 / million tokens

Request a custom quote

With Google Cloud's pay-as-you-go pricing, you only pay for the services you use. Connect with our sales team to get a custom quote for your organization.

Contact sales