HuggingFace provides pre-trained models, fine-tuning scripts, and development APIs that make the process of creating and discovering LLMs easier. Model Garden supports all Text Generation Inference supported models in HuggingFace.
Deployment options
You can deploy the Text Generation Inference supported models in either Vertex AI or Google Kubernetes Engine (GKE). To deploy a Hugging Face text generation model, go to Model Garden and click Deploy from Hugging Face.
Deploy in Vertex AI
Vertex AI offers a managed platform for building and scaling machine learning projects without in-house MLOps expertise. You can use Vertex AI as the downstream application that serves the Hugging Face models. We recommend using Vertex AI if you want end-to-end MLOps capabilities, value-added ML features, and a serverless experience for streamlined development.
To get started, see the following examples:
- Some models have detailed model cards and verified deployment settings, such as google/gemma-7b-it, meta-llama/Llama-2-7b-chat-hf, and mistralai/Mistral-7B-v0.1).
- Some models have verified deployment settings, but no detailed model cards, such as NousResearch/Genstruct-7B.
- Some models have unverified deployment settings which were calculated automatically, such as ai4bharat/Airavata.
Deploy in GKE
Google Kubernetes Engine (GKE) is the Google Cloud solution for managed Kubernetes that provides scalability, security, resilience, and cost effectiveness. We recommend this option if you have existing Kubernetes investments, your organization has in-house MLOps expertise, or if you need granular control over complex AI/ML workloads with unique security, data pipeline, and resource management requirements.
To get started, see the following examples:
- Some models have detailed model cards and verified deployment settings, such as google/gemma-7b-it, meta-llama/Llama-2-7b-chat-hf, and mistralai/Mistral-7B-v0.1).
- Some models have verified deployment settings, but no detailed model cards, such as NousResearch/Genstruct-7B.