Package google.cloud.language.v1beta2

Index

LanguageService

Provides text analysis operations such as sentiment analysis and entity recognition.

AnalyzeEntities

rpc AnalyzeEntities(AnalyzeEntitiesRequest) returns (AnalyzeEntitiesResponse)

Finds named entities (currently proper names and common nouns) in the text along with entity types, salience, mentions for each entity, and other properties.

Authorization scopes

Requires one of the following OAuth scopes:

  • https://www.googleapis.com/auth/cloud-language
  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

AnalyzeEntitySentiment

rpc AnalyzeEntitySentiment(AnalyzeEntitySentimentRequest) returns (AnalyzeEntitySentimentResponse)

Finds entities, similar to AnalyzeEntities in the text and analyzes sentiment associated with each entity and its mentions.

Authorization scopes

Requires one of the following OAuth scopes:

  • https://www.googleapis.com/auth/cloud-language
  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

AnalyzeSentiment

rpc AnalyzeSentiment(AnalyzeSentimentRequest) returns (AnalyzeSentimentResponse)

Analyzes the sentiment of the provided text.

Authorization scopes

Requires one of the following OAuth scopes:

  • https://www.googleapis.com/auth/cloud-language
  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

AnalyzeSyntax

rpc AnalyzeSyntax(AnalyzeSyntaxRequest) returns (AnalyzeSyntaxResponse)

Analyzes the syntax of the text and provides sentence boundaries and tokenization along with part of speech tags, dependency trees, and other properties.

Authorization scopes

Requires one of the following OAuth scopes:

  • https://www.googleapis.com/auth/cloud-language
  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

AnnotateText

rpc AnnotateText(AnnotateTextRequest) returns (AnnotateTextResponse)

A convenience method that provides all syntax, sentiment, entity, and classification features in one call.

Authorization scopes

Requires one of the following OAuth scopes:

  • https://www.googleapis.com/auth/cloud-language
  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ClassifyText

rpc ClassifyText(ClassifyTextRequest) returns (ClassifyTextResponse)

Classifies a document into categories.

Authorization scopes

Requires one of the following OAuth scopes:

  • https://www.googleapis.com/auth/cloud-language
  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ModerateText

rpc ModerateText(ModerateTextRequest) returns (ModerateTextResponse)

Moderates a document for harmful and sensitive categories.

Authorization scopes

Requires one of the following OAuth scopes:

  • https://www.googleapis.com/auth/cloud-language
  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

AnalyzeEntitiesRequest

The entity analysis request message.

Fields
document

Document

Required. Input document.

encoding_type

EncodingType

The encoding type used by the API to calculate offsets.

AnalyzeEntitiesResponse

The entity analysis response message.

Fields
entities[]

Entity

The recognized entities in the input document.

language

string

The language of the text, which will be the same as the language specified in the request or, if not specified, the automatically-detected language. See Document.language field for more details.

AnalyzeEntitySentimentRequest

The entity-level sentiment analysis request message.

Fields
document

Document

Required. Input document.

encoding_type

EncodingType

The encoding type used by the API to calculate offsets.

AnalyzeEntitySentimentResponse

The entity-level sentiment analysis response message.

Fields
entities[]

Entity

The recognized entities in the input document with associated sentiments.

language

string

The language of the text, which will be the same as the language specified in the request or, if not specified, the automatically-detected language. See Document.language field for more details.

AnalyzeSentimentRequest

The sentiment analysis request message.

Fields
document

Document

Required. Input document.

encoding_type

EncodingType

The encoding type used by the API to calculate sentence offsets for the sentence sentiment.

AnalyzeSentimentResponse

The sentiment analysis response message.

Fields
document_sentiment

Sentiment

The overall sentiment of the input document.

language

string

The language of the text, which will be the same as the language specified in the request or, if not specified, the automatically-detected language. See Document.language field for more details.

sentences[]

Sentence

The sentiment for all the sentences in the document.

AnalyzeSyntaxRequest

The syntax analysis request message.

Fields
document

Document

Required. Input document.

encoding_type

EncodingType

The encoding type used by the API to calculate offsets.

AnalyzeSyntaxResponse

The syntax analysis response message.

Fields
sentences[]

Sentence

Sentences in the input document.

tokens[]

Token

Tokens, along with their syntactic information, in the input document.

language

string

The language of the text, which will be the same as the language specified in the request or, if not specified, the automatically-detected language. See Document.language field for more details.

AnnotateTextRequest

The request message for the text annotation API, which can perform multiple analysis types (sentiment, entities, and syntax) in one call.

Fields
document

Document

Required. Input document.

features

Features

Required. The enabled features.

encoding_type

EncodingType

The encoding type used by the API to calculate offsets.

Features

All available features for sentiment, syntax, and semantic analysis. Setting each one to true will enable that specific analysis for the input.

Fields
extract_syntax

bool

Extract syntax information.

extract_entities

bool

Extract entities.

extract_document_sentiment

bool

Extract document-level sentiment.

extract_entity_sentiment

bool

Extract entities and their associated sentiment.

classify_text

bool

Classify the full document into categories. If this is true, the API will use the default model which classifies into a predefined taxonomy.

moderate_text

bool

Moderate the document for harmful and sensitive categories.

classification_model_options

ClassificationModelOptions

Optional. The model options to use for classification. Defaults to v1 options if not specified. Only used if classify_text is set to true.

AnnotateTextResponse

The text annotations response message.

Fields
sentences[]

Sentence

Sentences in the input document. Populated if the user enables AnnotateTextRequest.Features.extract_syntax.

tokens[]

Token

Tokens, along with their syntactic information, in the input document. Populated if the user enables AnnotateTextRequest.Features.extract_syntax.

entities[]

Entity

Entities, along with their semantic information, in the input document. Populated if the user enables AnnotateTextRequest.Features.extract_entities.

document_sentiment

Sentiment

The overall sentiment for the document. Populated if the user enables AnnotateTextRequest.Features.extract_document_sentiment.

language

string

The language of the text, which will be the same as the language specified in the request or, if not specified, the automatically-detected language. See Document.language field for more details.

categories[]

ClassificationCategory

Categories identified in the input document.

moderation_categories[]

ClassificationCategory

Harmful and sensitive categories identified in the input document.

ClassificationCategory

Represents a category returned from the text classifier.

Fields
name

string

The name of the category representing the document.

confidence

float

The classifier's confidence of the category. Number represents how certain the classifier is that this category represents the given text.

ClassificationModelOptions

Model options available for classification requests.

Fields
Union field model_type. If this field is not set, then the v1_model will be used by default. model_type can be only one of the following:
v1_model

V1Model

Setting this field will use the V1 model and V1 content categories version. The V1 model is a legacy model; support for this will be discontinued in the future.

v2_model

V2Model

Setting this field will use the V2 model with the appropriate content categories version. The V2 model is a better performing model.

V1Model

This type has no fields.

Options for the V1 model.

V2Model

Options for the V2 model.

Fields
content_categories_version

ContentCategoriesVersion

The content categories used for classification.

ContentCategoriesVersion

The content categories used for classification.

Enums
CONTENT_CATEGORIES_VERSION_UNSPECIFIED If ContentCategoriesVersion is not specified, this option will default to V1.
V1 Legacy content categories of our initial launch in 2017.
V2 Updated content categories in 2022.

ClassifyTextRequest

The document classification request message.

Fields
document

Document

Required. Input document.

classification_model_options

ClassificationModelOptions

Optional. Model options to use for classification. Defaults to v1 options if not specified.

ClassifyTextResponse

The document classification response message.

Fields
categories[]

ClassificationCategory

Categories representing the input document.

DependencyEdge

Represents dependency parse tree information for a token.

Fields
head_token_index

int32

Represents the head of this token in the dependency tree. This is the index of the token which has an arc going to this token. The index is the position of the token in the array of tokens returned by the API method. If this token is a root token, then the head_token_index is its own index.

label

Label

The parse label for the token.

Label

The parse label enum for the token.

Enums
UNKNOWN Unknown
ABBREV Abbreviation modifier
ACOMP Adjectival complement
ADVCL Adverbial clause modifier
ADVMOD Adverbial modifier
AMOD Adjectival modifier of an NP
APPOS Appositional modifier of an NP
ATTR Attribute dependent of a copular verb
AUX Auxiliary (non-main) verb
AUXPASS Passive auxiliary
CC Coordinating conjunction
CCOMP Clausal complement of a verb or adjective
CONJ Conjunct
CSUBJ Clausal subject
CSUBJPASS Clausal passive subject
DEP Dependency (unable to determine)
DET Determiner
DISCOURSE Discourse
DOBJ Direct object
EXPL Expletive
GOESWITH Goes with (part of a word in a text not well edited)
IOBJ Indirect object
MARK Marker (word introducing a subordinate clause)
MWE Multi-word expression
MWV Multi-word verbal expression
NEG Negation modifier
NN Noun compound modifier
NPADVMOD Noun phrase used as an adverbial modifier
NSUBJ Nominal subject
NSUBJPASS Passive nominal subject
NUM Numeric modifier of a noun
NUMBER Element of compound number
P Punctuation mark
PARATAXIS Parataxis relation
PARTMOD Participial modifier
PCOMP The complement of a preposition is a clause
POBJ Object of a preposition
POSS Possession modifier
POSTNEG Postverbal negative particle
PRECOMP Predicate complement
PRECONJ Preconjunt
PREDET Predeterminer
PREF Prefix
PREP Prepositional modifier
PRONL The relationship between a verb and verbal morpheme
PRT Particle
PS Associative or possessive marker
QUANTMOD Quantifier phrase modifier
RCMOD Relative clause modifier
RCMODREL Complementizer in relative clause
RDROP Ellipsis without a preceding predicate
REF Referent
REMNANT Remnant
REPARANDUM Reparandum
ROOT Root
SNUM Suffix specifying a unit of number
SUFF Suffix
TMOD Temporal modifier
TOPIC Topic marker
VMOD Clause headed by an infinite form of the verb that modifies a noun
VOCATIVE Vocative
XCOMP Open clausal complement
SUFFIX Name suffix
TITLE Name title
ADVPHMOD Adverbial phrase modifier
AUXCAUS Causative auxiliary
AUXVV Helper auxiliary
DTMOD Rentaishi (Prenominal modifier)
FOREIGN Foreign words
KW Keyword
LIST List for chains of comparable items
NOMC Nominalized clause
NOMCSUBJ Nominalized clausal subject
NOMCSUBJPASS Nominalized clausal passive
NUMC Compound of numeric modifier
COP Copula
DISLOCATED Dislocated relation (for fronted/topicalized elements)
ASP Aspect marker
GMOD Genitive modifier
GOBJ Genitive object
INFMOD Infinitival modifier
MES Measure
NCOMP Nominal complement of a noun

Document

Represents the input to API methods.

Fields
type

Type

Required. If the type is not set or is TYPE_UNSPECIFIED, returns an INVALID_ARGUMENT error.

language

string

The language of the document (if not specified, the language is automatically detected). Both ISO and BCP-47 language codes are accepted.
Language Support lists currently supported languages for each API method. If the language (either specified by the caller or automatically detected) is not supported by the called API method, an INVALID_ARGUMENT error is returned.

reference_web_uri

string

The web URI where the document comes from. This URI is not used for fetching the content, but as a hint for analyzing the document.

boilerplate_handling

BoilerplateHandling

Indicates how detected boilerplate(e.g. advertisements, copyright declarations, banners) should be handled for this document. If not specified, boilerplate will be treated the same as content.

Union field source. The source of the document: a string containing the content or a Google Cloud Storage URI. source can be only one of the following:
content

string

The content of the input in string format. Cloud audit logging exempt since it is based on user data.

gcs_content_uri

string

The Google Cloud Storage URI where the file content is located. This URI must be of the form: gs://bucket_name/object_name. For more details, see https://cloud.google.com/storage/docs/reference-uris. NOTE: Cloud Storage object versioning is not supported.

BoilerplateHandling

Ways of handling boilerplate detected in the document

Enums
BOILERPLATE_HANDLING_UNSPECIFIED The boilerplate handling is not specified.
SKIP_BOILERPLATE Do not analyze detected boilerplate. Reference web URI is required for detecting boilerplate.
KEEP_BOILERPLATE Treat boilerplate the same as content.

Type

The document types enum.

Enums
TYPE_UNSPECIFIED The content type is not specified.
PLAIN_TEXT Plain text
HTML HTML

EncodingType

Represents the text encoding that the caller uses to process the output. Providing an EncodingType is recommended because the API provides the beginning offsets for various outputs, such as tokens and mentions, and languages that natively use different text encodings may access offsets differently.

Enums
NONE If EncodingType is not specified, encoding-dependent information (such as begin_offset) will be set at -1.
UTF8 Encoding-dependent information (such as begin_offset) is calculated based on the UTF-8 encoding of the input. C++ and Go are examples of languages that use this encoding natively.
UTF16 Encoding-dependent information (such as begin_offset) is calculated based on the UTF-16 encoding of the input. Java and JavaScript are examples of languages that use this encoding natively.
UTF32 Encoding-dependent information (such as begin_offset) is calculated based on the UTF-32 encoding of the input. Python is an example of a language that uses this encoding natively.

Entity

Represents a phrase in the text that is a known entity, such as a person, an organization, or location. The API associates information, such as salience and mentions, with entities.

Fields
name

string

The representative name for the entity.

type

Type

The entity type.

metadata

map<string, string>

Metadata associated with the entity.

For most entity types, the metadata is a Wikipedia URL (wikipedia_url) and Knowledge Graph MID (mid), if they are available. For the metadata associated with other entity types, see the Type table below.

salience

float

The salience score associated with the entity in the [0, 1.0] range.

The salience score for an entity provides information about the importance or centrality of that entity to the entire document text. Scores closer to 0 are less salient, while scores closer to 1.0 are highly salient.

mentions[]

EntityMention

The mentions of this entity in the input document. The API currently supports proper noun mentions.

sentiment

Sentiment

For calls to AnalyzeEntitySentimentRequest or if AnnotateTextRequest.Features.extract_entity_sentiment is set to true, this field will contain the aggregate sentiment expressed for this entity in the provided document.

Type

The type of the entity. For most entity types, the associated metadata is a Wikipedia URL (wikipedia_url) and Knowledge Graph MID (mid). The table below lists the associated fields for entities that have different metadata.

Enums
UNKNOWN Unknown
PERSON Person
LOCATION Location
ORGANIZATION Organization
EVENT Event
WORK_OF_ART Artwork
CONSUMER_GOOD Consumer product
OTHER Other types of entities
PHONE_NUMBER

Phone number

The metadata lists the phone number, formatted according to local convention, plus whichever additional elements appear in the text:

  • number - the actual number, broken down into sections as per local convention
  • national_prefix - country code, if detected
  • area_code - region or area code, if detected
  • extension - phone extension (to be dialed after connection), if detected
ADDRESS

Address

The metadata identifies the street number and locality plus whichever additional elements appear in the text:

  • street_number - street number
  • locality - city or town
  • street_name - street/route name, if detected
  • postal_code - postal code, if detected
  • country - country, if detected<
  • broad_region - administrative area, such as the state, if detected
  • narrow_region - smaller administrative area, such as county, if detected
  • sublocality - used in Asian addresses to demark a district within a city, if detected
DATE

Date

The metadata identifies the components of the date:

  • year - four digit year, if detected
  • month - two digit month number, if detected
  • day - two digit day number, if detected
NUMBER

Number

The metadata is the number itself.

PRICE

Price

The metadata identifies the value and currency.

EntityMention

Represents a mention for an entity in the text. Currently, proper noun mentions are supported.

Fields
text

TextSpan

The mention text.

type

Type

The type of the entity mention.

sentiment

Sentiment

For calls to AnalyzeEntitySentimentRequest or if AnnotateTextRequest.Features.extract_entity_sentiment is set to true, this field will contain the sentiment expressed for this mention of the entity in the provided document.

Type

The supported types of mentions.

Enums
TYPE_UNKNOWN Unknown
PROPER Proper name
COMMON Common noun (or noun compound)

ModerateTextRequest

The document moderation request message.

Fields
document

Document

Required. Input document.

ModerateTextResponse

The document moderation response message.

Fields
moderation_categories[]

ClassificationCategory

Harmful and sensitive categories representing the input document.

PartOfSpeech

Represents part of speech information for a token.

Fields
tag

Tag

The part of speech tag.

aspect

Aspect

The grammatical aspect.

case

Case

The grammatical case.

form

Form

The grammatical form.

gender

Gender

The grammatical gender.

mood

Mood

The grammatical mood.

number

Number

The grammatical number.

person

Person

The grammatical person.

proper

Proper

The grammatical properness.

reciprocity

Reciprocity

The grammatical reciprocity.

tense

Tense

The grammatical tense.

voice

Voice

The grammatical voice.

Aspect

The characteristic of a verb that expresses time flow during an event.

Enums
ASPECT_UNKNOWN Aspect is not applicable in the analyzed language or is not predicted.
PERFECTIVE Perfective
IMPERFECTIVE Imperfective
PROGRESSIVE Progressive

Case

The grammatical function performed by a noun or pronoun in a phrase, clause, or sentence. In some languages, other parts of speech, such as adjective and determiner, take case inflection in agreement with the noun.

Enums
CASE_UNKNOWN Case is not applicable in the analyzed language or is not predicted.
ACCUSATIVE Accusative
ADVERBIAL Adverbial
COMPLEMENTIVE Complementive
DATIVE Dative
GENITIVE Genitive
INSTRUMENTAL Instrumental
LOCATIVE Locative
NOMINATIVE Nominative
OBLIQUE Oblique
PARTITIVE Partitive
PREPOSITIONAL Prepositional
REFLEXIVE_CASE Reflexive
RELATIVE_CASE Relative
VOCATIVE Vocative

Form

Depending on the language, Form can be categorizing different forms of verbs, adjectives, adverbs, etc. For example, categorizing inflected endings of verbs and adjectives or distinguishing between short and long forms of adjectives and participles

Enums
FORM_UNKNOWN Form is not applicable in the analyzed language or is not predicted.
ADNOMIAL Adnomial
AUXILIARY Auxiliary
COMPLEMENTIZER Complementizer
FINAL_ENDING Final ending
GERUND Gerund
REALIS Realis
IRREALIS Irrealis
SHORT Short form
LONG Long form
ORDER Order form
SPECIFIC Specific form

Gender

Gender classes of nouns reflected in the behaviour of associated words.

Enums
GENDER_UNKNOWN Gender is not applicable in the analyzed language or is not predicted.
FEMININE Feminine
MASCULINE Masculine
NEUTER Neuter

Mood

The grammatical feature of verbs, used for showing modality and attitude.

Enums
MOOD_UNKNOWN Mood is not applicable in the analyzed language or is not predicted.
CONDITIONAL_MOOD Conditional
IMPERATIVE Imperative
INDICATIVE Indicative
INTERROGATIVE Interrogative
JUSSIVE Jussive
SUBJUNCTIVE Subjunctive

Number

Count distinctions.

Enums
NUMBER_UNKNOWN Number is not applicable in the analyzed language or is not predicted.
SINGULAR Singular
PLURAL Plural
DUAL Dual

Person

The distinction between the speaker, second person, third person, etc.

Enums
PERSON_UNKNOWN Person is not applicable in the analyzed language or is not predicted.
FIRST First
SECOND Second
THIRD Third
REFLEXIVE_PERSON Reflexive

Proper

This category shows if the token is part of a proper name.

Enums
PROPER_UNKNOWN Proper is not applicable in the analyzed language or is not predicted.
PROPER Proper
NOT_PROPER Not proper

Reciprocity

Reciprocal features of a pronoun.

Enums
RECIPROCITY_UNKNOWN Reciprocity is not applicable in the analyzed language or is not predicted.
RECIPROCAL Reciprocal
NON_RECIPROCAL Non-reciprocal

Tag

The part of speech tags enum.

Enums
UNKNOWN Unknown
ADJ Adjective
ADP Adposition (preposition and postposition)
ADV Adverb
CONJ Conjunction
DET Determiner
NOUN Noun (common and proper)
NUM Cardinal number
PRON Pronoun
PRT Particle or other function word
PUNCT Punctuation
VERB Verb (all tenses and modes)
X Other: foreign words, typos, abbreviations
AFFIX Affix

Tense

Time reference.

Enums
TENSE_UNKNOWN Tense is not applicable in the analyzed language or is not predicted.
CONDITIONAL_TENSE Conditional
FUTURE Future
PAST Past
PRESENT Present
IMPERFECT Imperfect
PLUPERFECT Pluperfect

Voice

The relationship between the action that a verb expresses and the participants identified by its arguments.

Enums
VOICE_UNKNOWN Voice is not applicable in the analyzed language or is not predicted.
ACTIVE Active
CAUSATIVE Causative
PASSIVE Passive

Sentence

Represents a sentence in the input document.

Fields
text

TextSpan

The sentence text.

sentiment

Sentiment

For calls to AnalyzeSentimentRequest or if AnnotateTextRequest.Features.extract_document_sentiment is set to true, this field will contain the sentiment for the sentence.

Sentiment

Represents the feeling associated with the entire text or entities in the text.

Fields
magnitude

float

A non-negative number in the [0, +inf] range, which represents the absolute magnitude of sentiment regardless of score (positive or negative).

score

float

Sentiment score between -1.0 (negative sentiment) and 1.0 (positive sentiment).

TextSpan

Represents a text span in the input document.

Fields
content

string

The content of the text span, which is a substring of the document.

begin_offset

int32

The API calculates the beginning offset of the content in the original document according to the EncodingType specified in the API request.

Token

Represents the smallest syntactic building block of the text.

Fields
text

TextSpan

The token text.

part_of_speech

PartOfSpeech

Parts of speech tag for this token.

dependency_edge

DependencyEdge

Dependency tree parse for this token.

lemma

string

Lemma of the token.