Run LLM inference on Cloud Run GPUs with vLLM (services)
Stay organized with collections
Save and categorize content based on your preferences.
The following codelab shows how to run a backend service that runs vLLM, which is an
inference engine for production systems, along with Google's Gemma 2, which is
a 2 billion parameters instruction-tuned model.