Wrappers for Document AI Document type.
Classes
Document
Document(
shards: List[google.cloud.documentai_v1.types.document.Document],
gcs_bucket_name: Optional[str] = None,
gcs_prefix: Optional[str] = None,
gcs_input_uri: Optional[str] = None,
)
Represents a wrapped Document
.
This class hides away the complexities of using Document
protobuf
response outputted by BatchProcessDocuments
or ProcessDocument
methods and implements convenient methods for searching and
extracting information within the Document
.
Optional. The name of the gcs bucket.
Format: gs://{bucket_name}/{optional_folder}/{target_folder}/
where gcs_bucket_name=bucket
.
:type: Optional[str]
(List[Entity]): A list of Entities in the Document.
Modules Functions
_convert_to_vision_annotate_file_response
_convert_to_vision_annotate_file_response(
text: str, pages: List[google.cloud.documentai_toolbox.wrappers.page.Page]
)
Convert OCR data from Document.proto to AnnotateFileResponse.proto for Vision API.
Parameters | |
---|---|
Name | Description |
text |
str
Required. Contents of document. |
pages |
List[Page]
Required. A list of pages. |
Returns | |
---|---|
Type | Description |
AnnotateFileResponse | Proto with TextAnnotations. |
_entities_from_shards
_entities_from_shards(
shards: List[google.cloud.documentai_v1.types.document.Document],
)
Returns a list of Entities from a list of documentai.Document shards.
Parameter | |
---|---|
Name | Description |
shards |
List[google.cloud.documentai.Document]
Required. List of document shards. |
Returns | |
---|---|
Type | Description |
List[Entity] | a list of Entities. |
_get_batch_process_metadata
_get_batch_process_metadata(location: str, operation_name: str)
Get BatchProcessMetadata
from a batch_process_documents()
long-running operation.
Parameters | |
---|---|
Name | Description |
location |
str
Required. The location of the processor used for |
operation_name |
str
Required. The fully qualified operation name for a |
Returns | |
---|---|
Type | Description |
documentai.BatchProcessMetadata | Metadata from batch process. |
_get_shards
_get_shards(gcs_bucket_name: str, gcs_prefix: str)
Returns a list of documentai.Document shards from a Cloud Storage folder.
Parameters | |
---|---|
Name | Description |
gcs_bucket_name |
str
Required. The name of the gcs bucket. Format: |
gcs_prefix |
str
Required. The prefix of the json files in the target_folder. Format: |
Returns | |
---|---|
Type | Description |
List[google.cloud.documentai.Document] | A list of documentai.Documents. |
_pages_from_shards
_pages_from_shards(
shards: List[google.cloud.documentai_v1.types.document.Document],
)
Returns a list of Pages from a list of documentai.Document shards.
Parameter | |
---|---|
Name | Description |
shards |
List[google.cloud.documentai.Document]
Required. List of document shards. |
Returns | |
---|---|
Type | Description |
List[Page] | A list of Pages. |
_text_from_shards
_text_from_shards(shards: List[google.cloud.documentai_v1.types.document.Document])
Gets text from shards.
Parameter | |
---|---|
Name | Description |
shards |
List[google.cloud.documentai.Document]
Required. List of document shards. |
Returns | |
---|---|
Type | Description |
str | Text in all shards. |