Wrappers for Document AI Document type.
Classes
Document
Document(
shards: List[google.cloud.documentai_v1.types.document.Document],
gcs_bucket_name: Optional[str] = None,
gcs_prefix: Optional[str] = None,
)
Represents a wrapped Document.
This class hides away the complexities of using Document protobuf response outputted by BatchProcessDocuments or ProcessDocument methods and implements convenient methods for searching and extracting information within the Document.
Optional. The name of the gcs bucket.
Format: gs://{bucket_name}/{optional_folder}/{target_folder}/
where gcs_bucket_name=bucket
.
:type: Optional[str]
(List[Entity]): A list of Entities in the Document.
Modules Functions
_convert_to_vision_annotate_file_response
_convert_to_vision_annotate_file_response(
text: str, pages: List[google.cloud.documentai_toolbox.wrappers.page.Page]
)
Convert OCR data from Document.proto to AnnotateFileResponse.proto for Vision API.
Parameters | |
---|---|
Name | Description |
text |
str
Required. Contents of document. |
pages |
List[Page]
Required. A list of pages. |
Returns | |
---|---|
Type | Description |
AnnotateFileResponse | Proto with TextAnnotations. |
_entities_from_shards
_entities_from_shards(
shards: List[google.cloud.documentai_v1.types.document.Document],
)
Returns a list of Entities from a list of documentai.Document shards.
Parameter | |
---|---|
Name | Description |
shards |
List[google.cloud.documentai.Document]
Required. List of document shards. |
Returns | |
---|---|
Type | Description |
List[Entity] | a list of Entities. |
_get_bytes
_get_bytes(gcs_bucket_name: str, gcs_prefix: str)
Returns a list of bytes of json files from Cloud Storage.
Parameters | |
---|---|
Name | Description |
gcs_bucket_name |
str
Required. The name of the gcs bucket. Format: |
gcs_prefix |
str
Required. The prefix of the json files in the target_folder Format: |
Returns | |
---|---|
Type | Description |
List[bytes] | A list of bytes. |
_get_shards
_get_shards(gcs_bucket_name: str, gcs_prefix: str)
Returns a list of documentai.Document shards from a Cloud Storage folder.
Parameters | |
---|---|
Name | Description |
gcs_bucket_name |
str
Required. The name of the gcs bucket. Format: |
gcs_prefix |
str
Required. The prefix of the json files in the target_folder. Format: |
Returns | |
---|---|
Type | Description |
List[google.cloud.documentai.Document] | A list of documentai.Documents. |
_get_storage_client
_get_storage_client()
Returns a Storage client with custom user agent header.
_pages_from_shards
_pages_from_shards(
shards: List[google.cloud.documentai_v1.types.document.Document],
)
Returns a list of Pages from a list of documentai.Document shards.
Parameter | |
---|---|
Name | Description |
shards |
List[google.cloud.documentai.Document]
Required. List of document shards. |
Returns | |
---|---|
Type | Description |
List[Page] | A list of Pages. |
_text_from_shards
_text_from_shards(shards: List[google.cloud.documentai_v1.types.document.Document])
Gets text from shards.
Parameter | |
---|---|
Name | Description |
shards |
List[google.cloud.documentai.Document]
Required. List of document shards. |
Returns | |
---|---|
Type | Description |
str | Text in all shards. |