Audience
This tutorial is designed to let you quickly start exploring and developing applications with the Cloud Natural Language API. It is designed for people familiar with basic programming, though even without much programming knowledge, you should be able to follow along. Having walked through this tutorial, you should be able to use the Reference documentation to create your own basic applications.
This tutorial steps through a Natural Language application using Python code. The purpose here is not to explain the Python client libraries, but to explain how to make calls to the Natural Language API. Applications in Java and Node.js are essentially similar. Consult the Natural Language API Samples for samples in other languages (including the sample in this tutorial).
Prerequisites
This tutorial has several prerequisites:
- You've set up a Cloud Natural Language project in the Google Cloud console.
- You've set up your environment using Application Default Credentials in the Google Cloud console.
- You are familiar with Python in the Google Cloud console programming.
- You have set up your Python development environment. It is recommended that you have
the latest version of Python,
pip
, andvirtualenv
installed on your system. For instructions, see the Python Development Environment Setup Guide for Google Cloud Platform. - You've installed the Google Cloud Client Library for Python
Overview
This tutorial walks you through a basic Natural Language application, using
classifyText
requests, which classifies content into categories along with
a confidence score, such as:
category: "/Internet & Telecom/Mobile & Wireless/Mobile Apps & Add-Ons"
confidence: 0.6499999761581421
To see the list of all available category labels, see Categories.
In this tutorial, you will create an application to perform the following tasks:
- Classify multiple text files and write the result to an index file.
- Process input query text to find similar text files.
- Process input query category labels to find similar text files.
The tutorial uses content from Wikipedia. You could create a similar application to process news articles, online comments, and so on.
Source Files
You can find the tutorial source code in the Python Client Library Samples on GitHub.
This tutorial uses sample source text from Wikipedia. You can find the sample text files in the resources/texts folder of the GitHub project.
Importing libraries
To use the Cloud Natural Language API, you must to import the
language
module from the google-cloud-language
library. The language.types
module
contains classes that are required for creating requests. The language.enums
module
is used to specify the type of the input text. This tutorial
classifies plain text content (language.enums.Document.Type.PLAIN_TEXT
).
To calculate the similarity between text based on their resulting
content classification, this tutorial uses numpy
for vector calculations.
Python
To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Python API reference documentation.
To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Step 1. Classify content
You can use the Python client library to make a request to the Natural Language API to classify content. The Python client library encapsulates the details for requests to and responses from the Natural Language API.
The classify
function in the tutorial calls the Natural Language API
classifyText
method, by first creating
an instance of the LanguageServiceClient
class, and then calling the classify_text
method of the LanguageServiceClient
instance.
The tutorial classify
function only classifies text content for this
example. You can also classify the content of
a web page by passing in the source HTML of the web page as the text
and by setting the type
parameter to language.enums.Document.Type.HTML
.
For more information, see Classifying Content. For details about the structure of requests to the Natural Language API, see the Natural Language Reference.
Python
To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Python API reference documentation.
To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
The returned result is a dictionary with the category labels as keys, and confidence scores as values, such as:
{
"/Computers & Electronics": 0.800000011920929,
"/Internet & Telecom/Mobile & Wireless/Mobile Apps & Add-Ons": 0.6499999761581421
}
The tutorial Python script is organized so that it can be run from the command line for quick experiments. For example you can run:
python classify_text_tutorial.py classify "Google Home enables users to speak voice commands to interact with services through the Home's intelligent personal assistant called Google Assistant. A large number of services, both in-house and third-party, are integrated, allowing users to listen to music, look at videos or photos, or receive news updates entirely by voice. "
Step 2. Index multiple text files
The index
function in the tutorial script takes, as input, a directory
containing multiple text files, and the path to a file where it stores
the indexed output (the default file name is index.json
).
The index
function reads the content
of each text file in the input directory, and then passes the text files
to the Cloud Natural Language API to be classified into
content categories.
Python
To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Python API reference documentation.
To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
The results from the Cloud Natural Language API for each file are organized into a single dictionary, serialized as a JSON string, and then written to a file. For example:
{
"android.txt": {
"/Computers & Electronics": 0.800000011920929,
"/Internet & Telecom/Mobile & Wireless/Mobile Apps & Add-Ons": 0.6499999761581421
},
"google.txt": {
"/Internet & Telecom": 0.5799999833106995,
"/Business & Industrial": 0.5400000214576721
}
}
To index text files from the command line with the default output filename
index.json
, run the following command:
python classify_text_tutorial.py index resources/texts
Step 3. Query the index
Query with category labels
Once the index file (default file name = index.json
) has been created, we can
make queries to the index to retrieve some of the filenames and their
confidence scores.
One way to do this is to use a category label as the query, which the tutorial
accomplishes with the query_category
function. The implementation of
the helper functions, such as similarity
, can be found in the
classify_text_tutorial.py
file. In your applications the
similarity scoring and ranking should be carefully designed around specific
use cases.
Python
To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Python API reference documentation.
To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
For a list of all of the available categories, see Categories.
As before, you can call the query_category
function from the command line:
python classify_text_tutorial.py query-category index.json "/Internet & Telecom/Mobile & Wireless"
You should see output similar to the following:
Query: /Internet & Telecom/Mobile & Wireless
Most similar 3 indexed texts:
Filename: android.txt
Similarity: 0.665573579045
Filename: google.txt
Similarity: 0.517527175966
Filename: gcp.txt
Similarity: 0.5
Query with text
Alternatively, you can query with text that may not be part of the indexed
text. The tutorial query
function is similar to the query_category
function,
with the added step of making a classifyText
request for the text input, and
using the results to query the index file.
Python
To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Python API reference documentation.
To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
To do this from the command line, run:
python classify_text_tutorial.py query index.json "Google Home enables users to speak voice commands to interact with services through the Home's intelligent personal assistant called Google Assistant. A large number of services, both in-house and third-party, are integrated, allowing users to listen to music, look at videos or photos, or receive news updates entirely by voice. "
This prints something similar to the following:
Query: Google Home enables users to speak voice commands to interact with services through the Home's intelligent personal assistant called Google Assistant. A large number of services, both in-house and third-party, are integrated, allowing users to listen to music, look at videos or photos, or receive news updates entirely by voice.
Category: /Internet & Telecom, confidence: 0.509999990463
Category: /Computers & Electronics/Software, confidence: 0.550000011921
Most similar 3 indexed texts:
Filename: android.txt
Similarity: 0.600579500049
Filename: google.txt
Similarity: 0.401314790229
Filename: gcp.txt
Similarity: 0.38772339779
What's next
With the content classification API you can create other applications. For example:
Classify every paragraph in an article to see the transition between topics.
Classify timestamped content and analyze the trend of topics over time.
Compare content categories with content sentiment using the
analyzeSentiment
method.Compare content categories with entities mentioned in the text.
Additionally, other Google Cloud Platform products can be used to streamline your workflow:
In the sample application for this tutorial, we processed local text files, but you can modify the code to process text files stored in a Google Cloud Storage bucket by passing a Google Cloud Storage URI to the
classify_text
method.In the sample application for this tutorial, we stored the index file locally, and each query is processed by reading through the whole index file. This means high latency if you have a large amount of indexed data or if you need to process numerous queries. Datastore is a natural and convenient choice for storing the index data.