Document AI utilities.
Modules Functions
create_batches
create_batches(
gcs_bucket_name: str, gcs_prefix: str, batch_size: Optional[int] = 50
)
Create batches of documents in Cloud Storage to process with batch_process_documents()
.
Parameters | |
---|---|
Name | Description |
gcs_bucket_name |
str
Required. The name of the gcs bucket. Format: |
gcs_prefix |
str
Required. The prefix of the json files in the |
batch_size |
Optional[int]
Optional. Size of each batch of documents. Default is |
Returns | |
---|---|
Type | Description |
List[documentai.BatchDocumentsInputConfig] | A list of BatchDocumentsInputConfig , each corresponding to one batch. |
list_gcs_document_tree
list_gcs_document_tree(gcs_bucket_name: str, gcs_prefix: str)
Returns a list path to files in Cloud Storage folder and prints the tree to terminal.
Parameters | |
---|---|
Name | Description |
gcs_bucket_name |
str
Required. The name of the gcs bucket. Format: |
gcs_prefix |
str
Required. The prefix of the json files in the target_folder. Format: |
Returns | |
---|---|
Type | Description |
Dict[str, List[str]] | The paths to documents in gs://{gcs_bucket_name}/{gcs_prefix} . |
print_gcs_document_tree
print_gcs_document_tree(gcs_bucket_name: str, gcs_prefix: str)
Prints a tree of filenames in Cloud Storage folder..
Parameters | |
---|---|
Name | Description |
gcs_bucket_name |
str
Required. The name of the gcs bucket. Format: |
gcs_prefix |
str
Required. The prefix of the json files in the target_folder. Format: |