Wrappers for Document AI Document type.
Classes
Document
Document(
shards: List[google.cloud.documentai_v1.types.document.Document],
gcs_bucket_name: Optional[str] = None,
gcs_prefix: Optional[str] = None,
gcs_input_uri: Optional[str] = None,
)
Represents a wrapped Document
.
This class hides away the complexities of using Document
protobuf
response outputted by BatchProcessDocuments
or ProcessDocument
methods and implements convenient methods for searching and
extracting information within the Document
.
Optional. The name of the gcs bucket.
Format: gs://{bucket_name}/{optional_folder}/{target_folder}/
where gcs_bucket_name=bucket
.
:type: Optional[str]
(List[Entity]): A list of Entities in the Document.
Modules Functions
_bigquery_column_name
_bigquery_column_name(input_string: str)
Converts a string into a BigQuery column name. https://cloud.google.com/bigquery/docs/schemas#column_names
Parameter | |
---|---|
Name | Description |
input_string |
str
Required: The string to convert. |
_convert_to_vision_annotate_file_response
_convert_to_vision_annotate_file_response(
text: str, pages: List[google.cloud.documentai_toolbox.wrappers.page.Page]
)
Convert OCR data from Document.proto to AnnotateFileResponse.proto for Vision API.
Parameters | |
---|---|
Name | Description |
text |
str
Required. Contents of document. |
pages |
List[Page]
Required. A list of pages. |
Returns | |
---|---|
Type | Description |
AnnotateFileResponse | Proto with TextAnnotations. |
_dict_to_bigquery
_dict_to_bigquery(
dic: Dict, dataset_name: str, table_name: str, project_id: Optional[str]
)
Loads dictionary to a BigQuery table.
Parameters | |
---|---|
Name | Description |
dic |
Dict
Required: The dictionary to insert. |
dataset_name |
str
Required. Name of the BigQuery dataset. |
table_name |
str
Required. Name of the BigQuery table. |
project_id |
Optional[str]
Optional. Project ID containing the BigQuery table. If not passed, falls back to the default inferred from the environment. |
Returns | |
---|---|
Type | Description |
bigquery.job.LoadJob | The BigQuery LoadJob for adding the dictionary. |
_entities_from_shards
_entities_from_shards(
shards: List[google.cloud.documentai_v1.types.document.Document],
)
Returns a list of Entities from a list of documentai.Document shards.
Parameter | |
---|---|
Name | Description |
shards |
List[google.cloud.documentai.Document]
Required. List of document shards. |
Returns | |
---|---|
Type | Description |
List[Entity] | a list of Entities. |
_get_batch_process_metadata
_get_batch_process_metadata(location: str, operation_name: str)
Get BatchProcessMetadata
from a batch_process_documents()
long-running operation.
Parameters | |
---|---|
Name | Description |
location |
str
Required. The location of the processor used for |
operation_name |
str
Required. The fully qualified operation name for a |
Returns | |
---|---|
Type | Description |
documentai.BatchProcessMetadata | Metadata from batch process. |
_get_shards
_get_shards(gcs_bucket_name: str, gcs_prefix: str)
Returns a list of documentai.Document shards from a Cloud Storage folder.
Parameters | |
---|---|
Name | Description |
gcs_bucket_name |
str
Required. The name of the gcs bucket. Format: |
gcs_prefix |
str
Required. The prefix of the json files in the target_folder. Format: |
Returns | |
---|---|
Type | Description |
List[google.cloud.documentai.Document] | A list of documentai.Documents. |
_insert_into_dictionary_with_list
_insert_into_dictionary_with_list(dic: Dict, key: str, value: str)
Inserts value into a dictionary that can contain lists.
Parameters | |
---|---|
Name | Description |
dic |
Dict
Required. The dictionary to insert into. |
key |
str
Required. The key to be created or inserted into. |
value |
str
Required. The value to be inserted. |
Returns | |
---|---|
Type | Description |
Dict | The dictionary after adding the key value pair. |
_pages_from_shards
_pages_from_shards(
shards: List[google.cloud.documentai_v1.types.document.Document],
)
Returns a list of Pages from a list of documentai.Document shards.
Parameter | |
---|---|
Name | Description |
shards |
List[google.cloud.documentai.Document]
Required. List of document shards. |
Returns | |
---|---|
Type | Description |
List[Page] | A list of Pages. |
_text_from_shards
_text_from_shards(shards: List[google.cloud.documentai_v1.types.document.Document])
Gets text from shards.
Parameter | |
---|---|
Name | Description |
shards |
List[google.cloud.documentai.Document]
Required. List of document shards. |
Returns | |
---|---|
Type | Description |
str | Text in all shards. |