Key concepts

This page provides information about the key concepts for Model Armor.

Model Armor templates

Model Armor templates let you configure how Model Armor screens prompts and responses. They function as sets of customized filters and thresholds for different safety and security confidence levels, allowing control over what content is flagged.

The thresholds represent confidence levels. That is, how confident Model Armor is about the prompt or response including offending content. For example, you can create a template that filters prompts for hateful content with a HIGH threshold, meaning Model Armor reports high confidence that the prompt contains hateful content. A LOW_AND_ABOVE threshold indicates any level of confidence (LOW, MEDIUM, and HIGH) in making that claim.

Model Armor filters

Model Armor offers a variety of filters to help you provide safe and secure AI models. Here's a breakdown of the filter categories.

Responsible AI safety filter

Prompts and responses can be screened at the aforementioned confidence levels for the following categories:

Category	Definition
Hate Speech	Negative or harmful comments targeting identity and/or protected attributes.
Harassment	Threatening, intimidating, bullying, or abusive comments targeting another individual.
Sexually Explicit	Contains references to sexual acts or other lewd content.
Dangerous Content	Promotes or enables access to harmful goods, services, and activities.

The child sexual abuse material (CSAM) filter is applied by default and cannot be turned off.

Prompt injection and jailbreak detection

Prompt injection is a security vulnerability where attackers craft special commands within the text input (the prompt) to trick an AI model. This can make the AI ignore its usual instructions, reveal sensitive information, or perform actions it wasn't designed to do. Jailbreaking in the context of LLMs refers to the act of bypassing the safety protocols and ethical guidelines that are built into the model. This allows the LLM to generate responses that it was originally designed to avoid, such as harmful, unethical, and dangerous content.

When prompt injection and jailbreak detection is enabled, Model Armor scans prompts and responses for malicious content. If it is detected, Model Armor blocks the prompt or response.

Sensitive Data Protection

Sensitive data, like a person's name or address, may inadvertently or intentionally be sent to a model or provided in a model's response.

Sensitive Data Protection is a Google Cloud service to help you discover, classify, and de-identify sensitive data. Sensitive Data Protection can identify sensitive elements, context, and documents to help you reduce the risk of data leakage going into and out of AI workloads. You can use Sensitive Data Protection directly within Model Armor to transform, tokenize, and redact sensitive elements while retaining non-sensitive context. Model Armor can accept existing inspection templates, which are configurations that act like blueprints to streamline the process of scanning and identifying sensitive data specific to your business and compliance needs. This way, you can have consistency and interoperability between other workloads that use Sensitive Data Protection.

Model Armor offers two modes for Sensitive Data Protection configuration:

Basic Sensitive Data Protection configuration: This mode provides a simpler way to configure Sensitive Data Protection by directly specifying the types of sensitive data to scan for. It supports six categories, which are, CREDIT_CARD_NUMBER, US_SOCIAL_SECURITY_NUMBER, FINANCIAL_ACCOUNT_NUMBER, US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER, GCP_CREDENTIALS, GCP_API_KEY. Basic configuration only allows for inspection operations and does not support the use of Sensitive Data Protection templates. For more information, see Basic Sensitive Data Protection configuration.
Advanced Sensitive Data Protection configuration: This mode offers more flexibility and customization by enabling the use of Sensitive Data Protection templates. Sensitive Data Protection templates are predefined configurations that allow you to specify more granular detection rules and de-identification techniques. Advanced configuration supports both inspection and de-identification operations.

While confidence levels can be set for Sensitive Data Protection, they operate in a slightly different way than confidence levels for other filters. For more information about confidence levels for Sensitive Data Protection, see Sensitive Data Protection match likelihood. For more information about Sensitive Data Protection in general, see Sensitive Data Protection overview.

Malicious URL detection

Malicious URLs are often disguised to look legitimate, making them a potent tool for phishing attacks, malware distribution, and other online threats. For example, if a PDF contains an embedded malicious URL, it can be used to compromise any downstream systems processing LLM outputs.

When malicious URL detection is enabled, Model Armor scans URLs to identify if they're malicious. This lets you to take action and prevent malicious URLs from being returned.

Model Armor confidence levels

Confidence levels can be set for responsible AI safety categories (that is, Sexually Explicit, Dangerous, Harassment, and Hate Speech), Prompt Injection and Jailbreak, and Sensitive Data Protection (including topicality).

For confidence levels that allow granular thresholds, Model Armor interprets them as follows:

High: Identify if the message has content with a high likelihood.
Medium and above: Identify if the message has content with a medium or high likelihood.
Low and above: Identify if the message has content with a low, medium, or high likelihood.

PDF screening

Text in PDFs can include malicious and sensitive content. Model Armor can screen PDFs for safety, prompt injection and jailbreak attempts, sensitive data, and malicious URLs.

Model Armor floor settings

While Model Armor templates provide flexibility for individual applications, organizations often need to establish a baseline level of protection across all their AI applications. This is where Model Armor floor settings are used. They act as rules that dictate minimum requirements for all templates created at a specific point in the Google Cloud resource hierarchy (that is, at an organization, folder, or project level).

For more information, see Model Armor floor settings.