This page documents production updates to Document AI. We recommend that Document AI developers periodically check this list for any new announcements.
You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in the Google Cloud console, or programmatically access release notes in BigQuery.
To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly.
October 22, 2024
The Document AI section of the Google Cloud console now allows you to configure property descriptions as part of the Custom extractor processor-creation process.
Property description allows you to provide additional context, insights, and prior knowledge for each entity to improve extraction accuracy.
Property descriptions can be edited after schema creation. After you update the property descriptions, you will need to either call the pretrained models or create or fine-tune a new processor version for the changes to take effect.
October 01, 2024
Custom Extractor pretrained-foundation-model-v1.2-2024-05-10
and pretrained-foundation-model-v1.3-2024-08-31
are now Stable versions.
v1.2 and v1.3 now have the following features:
- Fine-tuning is now available in Public preview.
- They were internally upgraded to a higher quality model.
- The labeling system has been upgraded to use the latest version of the OCR model.
v1.2 is recommended for the best quality. v1.3 is recommended for the lowest latency.
We recommend creating a new processor and relabeling the training and evaluation documents to benefit from both the improved quality with the new processor versions of Custom Extractor (v1.2 and v1.3) and the enhanced labeling system.
September 26, 2024
Effective April 9, 2025, the following Custom Extractor versions will no longer be accessible:
pretrained-foundation-model-v1.0-2023-08-22
pretrained-foundation-model-v1.1-2024-03-12
You will need to migrate to a later version to avoid any service disruptions, such as pretrained-foundation-model-v1.2-2024-05-10
and pretrained-foundation-model-v1.3-2024-08-31
for improved quality from the latest proprietary vision models and foundation models.
We understand that this update requires planning, but we're here to support you during this process. If you have questions or need assistance, contact Google Cloud support.
The following earlier versions of Document AI Enterprise Document Optical Character Recognition (OCR) and Expense Parser will be discontinued in the United States (US) and European Union (EU) starting April 30, 2025.
Enterprise Document OCR:
pretrained-ocr-v1.0-2020-09-23
pretrained-ocr-v1.1-2022-09-12
Expense Parser:
pretrained-expense-v1.2-2022-02-18
pretrained-expense-v1.3-2022-07-15
pretrained-expense-v1.4-2022-11-18
To ensure uninterrupted service and benefit from improved extraction quality, we recommend you migrate to the following later versions before April 30, 2025:
Enterprise Document OCR (US and EU):
- Migrate to the latest version of OCR processors:
pretrained-ocr-v2.0-2023-06-02
orpretrained-ocr-v2.1-2024-08-07
.
Expense Parser (US and EU):
- Migrate to the later versions of Expense Parser:
pretrained-expense-v1.3.2-2024-09-11
orpretrained-expense-v1.4.2-2024-09-12
. - Or migrate to the latest versions of Custom Extractor:
pretrained-foundation-model-v1.2-2024-05-10
orpretrained-foundation-model-v1.3-2024-08-31
.
To learn more about the migration process, refer to our Manage processor versions documentation.
If you have any questions or require assistance, contact us at Google Cloud support.
September 23, 2024
Models pretrained-expense-v1.3.2-2024-09-11
and pretrained-expense-v1.4.2-2024-09-12
are available as Release Candidates (RC) for Expense Parser. They are upgrades over v1.3 and v1.4 with an enhanced underlying vision model.
For more information about available models, see Expense parser processor versions.
September 20, 2024
Custom extractor now features property descriptions.
Property description allows you to provide additional context, insights, and prior knowledge for each entity to improve extraction accuracy.
Good examples of property descriptions include location information and text patterns of the property values, which help disambiguate potential sources of confusion in the document, guiding the model with rules that ensure more reliable and consistent extractions, regardless of the specific document structure or content variations.
August 23, 2024
Model pretrained-foundation-model-v1.3-2024-08-31
is available as a Release Candidate (RC) for custom extractor. Recommended for those who want the lowest latency and best speed.
For more information about available models, see Custom extractor model versions.
Model pretrained-ocr-v2.1-2024-08-07
is available as RC version of the Document AI OCR 2.1 processor. It has three key improvements:
- Better printed text recognition.
- More precise checkbox detection.
- More accurate reading order.
August 21, 2024
Date and Currency Normalization for custom extractor
With this release, the model will deduce the region information from the document and use it to disambiguate the date and currency formats in the following ways:
- This release will enable the support of region based date and currency normalization of entities with datetime and currency data types in Custom Document Extractor (CDE) Generative AI based processor versions v1.1 and v1.2.
- Currently CDE Generative AI based processor supports date and currency normalization but it defaults to US date format and USD respectively in case the values are ambiguous. In other words, if a date can be parsed in mm/dd/yyyy and dd/mm/yyyy formats, it will use mm/dd/yyyy format for normalization. Similarly if $ can be mean USD or CAD, it would default to USD.
For more information, go to the Entity Normalization page.
July 18, 2024
For custom extractor with generative AI, model pretrained-foundation-model-v1.1-2024-03-12
provides fine-tuning for US/EU in Public preview. For more information about custom extractor models, see Custom extractor model versions.
June 04, 2024
Layout Parser in Document AI is generally available. The Document AI Layout Parser transforms documents in various formats into structured representations. It makes content like paragraphs, tables, lists, and structural elements like headings, page headers, and footers easily accessible. It also creates context-aware chunks that facilitate information retrieval in a range of generative AI and discovery applications.
For more information, see Process documents with Layout Parser.
May 28, 2024
Model pretrained-foundation-model-v1.2-2024-05-10
is available for custom extractor. Recommended for using the largest supported token limits, employing the best quality in identifying entities, or experimenting with newer models.
For more information about available models, see Custom extractor model versions.
May 06, 2024
Batch processing with Layout Parser is available. For more about Layout Parser, see Process documents with Layout Parser.
Model pretrained-foundation-model-v1.1-2024-03-12
is available for custom extractor. For more information about available models, see Custom extractor model versions.
May 01, 2024
Online processing is available for Layout Parser in Document AI. The Document AI Layout Parser transforms documents in various formats into structured representations, making content like paragraphs, tables, lists, and structural elements like headings, page headers, and footers easily accessible, and creating context-aware chunks that facilitate information retrieval in a range of generative AI and discovery applications. For more information, see Process documents with Layout Parser.
April 02, 2024
Fine tuning generative AI models within the Custom Extractor is now supported in GA. For more information, see custom processors and fine tuning pricing.
February 29, 2024
The Custom Extractor supports three levels of nesting so you can easily extract structured data from complex documents and tables (earnings reports, tax forms, invoices, resumes, etc.). Learn how to use three levels of nesting.
The Custom Extractor with generative AI is now available in the asia-southeast1 (Singapore) regions. For more information, see Custom processors.
See the model type, generative or custom, powering a Custom Extractor processor version by getting the model type from the processorVersions API.
February 16, 2024
Enterprise Document OCR version 2.0, pretrained-ocr-v2.0-2023-06-02
, is now Generally Available and ready for production workloads.
Please migrate OCR workloads to this new processor version.
January 09, 2024
The Custom Extractor with generative AI has General Availability and is ready for production workloads. For more information, see the Custom Extractor with generative AI or check out the demo.
- As foundation models evolve, so will versions available within the Custom Extractor. For more information, see Managing processor versions.
- Fine tuning the foundation model within the Custom Extractor is still available, in Preview. For more information, see Fine tune and train by document type.
To better support production workloads, we reduced prices for the Custom Extractor, Custom Classifier, Custom Splitter, and Form Parser. For more information, see Document AI pricing.
Developers can now specify pages Document AI should process within a document. For more information, see IndividualPageSelector within V1 API ProcessOptions.
December 20, 2023
Custom Extractor supports fine tuning (Preview) so that you can customize foundation model results for user specific documents. This feature is available in the US region. For more information, see Fine tune and train by document type.
Custom Extractor with genAI is now available in the EU and northamerica-northeast1
regions. For more information, see Custom processors.
You can now demo genAI-powered extraction results within Custom Extractor along with output from other Document AI products such as OCR, Form Parser, and ID processing.
December 07, 2023
Enterprise Document OCR version 2.0, pretrained-ocr-v2.0-2023-06-02, has an upgraded OCR engine and model improvements. This upgrade better supports high-volume workloads.
September 25, 2023
We are launching an RC version of the pretrained-invoice-v1.5-2023-09-15
invoice processor. It includes:
- Improved base-entity extraction model for documents in English.
- Line-item grouping quality improvements.
- Better support for multi-line, multi-segment entities such as addresses and line-item descriptions.
- Enforcement of occurrence type
OPTIONAL_ONCE
/REQUIRED_ONCE
for properties of nested entities. - Updated OCR engine.
September 21, 2023
Launched Document AI Enterprise Document OCR v2.0 and OCR add ons in Preview.
Enterprise Document OCR launched a Release Candidate, pretrained-ocr-v2.0-2023-06-02
, which includes:
- Upgraded OCR model, optimized for various document use cases.
- Visual-element detector for boxed characters, which can increase quality up to 10% for documents with text boxes.
For more details, see the documentation, including the user guide.
OCR add ons are available from the Enterprise Document OCR processor when using pretrained-ocr-v2.0-2023-06-02
. These include:
- Checkbox extraction: Detects and extracts status (marked/unmarked) in the Enterprise Document OCR response.
- Math OCR: Identifies, recognizes, and extracts formulas from documents in LaTeX output format.
- Font-style detection: Identifies word-level font properties, including type, style, handwriting, weight, and color.
For more details, see the documentation.
August 25, 2023
Document AI Workbench is now powered by generative AI with two feature launches:
Document AI Workbench Summarizer is in Preview:
- The Summarizer provides summaries for documents up to 250 pages long.
- You can customize summaries based on your preferences for length (brief, moderate, comprehensive) and format (paragraph, bullet points).
- See the user guide for more information.
Document AI Workbench custom extractor is in preview:
- Custom extractor with generative AI can help extract data from documents with free-form text (e.g., contracts) and complex layouts (e.g., invoices, W2s, bills of lading).
- The pretrained processor version, which uses generative AI, can be used out of the box without any training. Post a document to the endpoint with a list of fields to get structured data.
- Customize results by confirming content in about five documents. Workbench leverages the examples to improve accuracy using few-shot prediction.
- Extract information from documents up to 200 pages long through the asynchronous API.
- To get started, create or use an existing custom extractor to leverage a processor version.
- See the how-to guide, labeling best practices, and training use cases.
- Current limitations of generative AI extraction within the custom extractor:
- Only the English language is supported.
- Region availability is currently only in the US.
- While in preview, we recommend that you only extract up to 50 entities per endpoint with generative AI.
- When uploading a sample document to define fields and preview results on the Get started page, there can be long latencies. We're working to reduce this latency.
In addition, template-based training is available in GA within the custom extractor:
- Template-based training provides accurate predictions for documents with no layout variation (such as an application form).
- Only six labeled documents are needed to train and use a template-based processor version.
- See the user guide and training use cases.
August 18, 2023
Expense processor
A new RC version pretrained-expense-v1.3.1-2023-08-11
of the Expense processor is now available in the asia-southeast1
region for Expense Parser customers.
This release includes an improved region-based normalization, which results in an average improvement of up to a 15% accuracy on normalized date and currency entities over the current stable version.
August 01, 2023
Launched the following Document AI Workbench features:
Create and train models programmatically with more public APIs, including:
DatasetSchema APIs:
UpdateDatasetSchema
,GetDatasetSchema
. To create schema, use theUpdateDatasetSchema
API (a singleton resource).Dataset APIs:
UpdateDataset
,ImportDocuments
,GetDocument
,BatchDeleteDocuments
. Until we release aListDocument
API, use confirmations from theImportDocuments
API to create a list of documents in your dataset.
Selective labeling within Custom Document Extractor (CDE) helps you prepare a diverse set of training and test documents. Import 125+ documents to a CDE dataset, then CDE recommends documents you should label based on clustering results. Selective labeling only supports new documents imported into your dataset. If you would like recommendations for documents already in your dataset, then delete and re-import them.
Quick Tables within CDE helps you train models faster by labeling a table in bulk by applying the first row pattern to the rest of the table.
Get started with a new processor by using a default storage option for your dataset. You can still configure your own Cloud Storage location using advanced options.
July 28, 2023
Launched an update to the RC release pretrained-invoice-v1.4-2022-10-21
of the invoice processor, available to all Document AI users.
This release includes the features of the Stable version, with an average improvement of 0.03 to 0.05 micro F1 (approximately 5% to 9%) on line item entities.
July 18, 2023
The following Form Parser (pretrained-form-parser-v2.0-2022-11-10
) features are Generally Available (GA):
- General field extraction: You can extract 11 different types of entities from documents.
- Enhanced check-box detection.
- Internationalization support that covers more than 200 languages.
- Upgraded key-value pair (KVP) detection model.
Form parser v2.1 (pretrained-form-parser-v2.1-2023-06-26
) is in Public Preview, which uses our native PDF text extraction model on PDF documents.
The Form Parser features has the following limitations:
- Check-box doesn't support radio buttons and might not reliably parse all selection marks or keyless checkboxes.
- If there is a key without a value, the model might not parse it.
- The quality of KVP parsing might be higher for Latin languages than others.
- For tables, only simple tables are supported (no support for merged cells).
July 17, 2023
The Custom Document Splitter (CDS) within Document AI Workbench is now Generally Available (GA) for production use cases to split and classify multiple documents within a single file. With this release, all Workbench processors currently offered (Custom Document Extractor, Custom Document Classifier, and Custom Document Splitter) are available in GA.
Launched the following features for CDS:
- CDS now supports up to 1,000-page documents for async/batch prediction and up to 200-page documents when importing, labeling, training, or evaluating.
- CDS model evaluation for document split and classification
- Prepare a CDS training dataset faster by bulk labeling documents at import across multiple folders.
Released the the following enhancements for CDS:
- Improved labeling and evaluation experience with the ability to review overall document splits and classifications while viewing individual pages in a side-by-side view.
- Document names are now used in error messaging to improve troubleshooting.
- Hyphens are allowed in schema names.
June 29, 2023
Identity Document AI (IDAI) photo copy detection in ID proofing (Preview)
Updated the pretrained-id-proofing-v1.1-2023-05-18
ID proofing processor for all Document AI users.
This processor includes a new output entity fraud_signals_photocopy_detection
that signals if an attached image might be a photocopy. The entity can be one of the following values: POSSIBLE_PHOTOCOPY
, PASS
, or INCONCLUSIVE
.
June 28, 2023
The following document OCR features are Generally Available (GA). Use document OCR's configurations to optimize for stability, quality, and specific response requirements.
- Intelligent document-quality analysis
- Native text from digital PDF
- Symbol-level extraction
- Language hints
Support for DOCX is in Preview. You can synchronously process DOCX files that are up to 15 pages, or asynchronously process DOCX files that are up to 30. For access, send us a request.
Added fixes to our doc.proto-to-vision.proto
conversion tool, which facilitates migration from Vision API TextDetection
to document OCR.
The document OCR native text from digital PDF feature contains the following known issues:
- For a small number of documents, word order in lines of text that are reported by native text extraction might be inaccurate.
- Invisible text that is embedded in a native PDF might be extracted.
- Japanese documents that contain currency symbols, such as Yen, might be incorrectly extracted as
/
. - Apostrophe symbols might be missing in word and/or line results.
- Native text extraction might report different word and/or line results compared to image-based OCR on an identical document.
April 25, 2023
Launched the following features to improve the usability of the Document AI Workbench Custom Document Extractor (CDE):
- CDE now supports an additional 42 global languages.
- CDE lets you import processor versions across projects and processors to easily manage development and production environments.
- CDE can automatically label documents in a dataset by using a deployed processor version to help you quickly prepare training data.
Document AI Workbench Custom Document Extractor (CDE) has also made the following enhancements:
- The asynchronous prediction API can now extract data from documents up to 200 pages long.
- Improved the accuracy of extracting checkboxes.
April 17, 2023
Identity Document AI (IDAI) pricing change
We are changing the price of our US-related Identity Document processors. The new price is on the pricing page.
March 27, 2023
The Document AI OCR Processor (Doc OCR) now has the following features:
- The OCR Processor supports language hints. The OCR engine prefers your specified languages over inferred languages. To use this feature, set
process_options.ocr_config.hints.language_hints
with a list of BCP-47 language codes in your API request to the OCR Processor. - The OCR Processor supports the option to populate symbol-level data in the document response. If enabled, the field
document.pages.symbols
is populated. To use this feature, setprocess_options.ocr_config.enable_symbol=true
in your API request to the OCR Processor. - A proto converter tool that converts a
Document proto
to anAnnotateFileResponse
proto. This conversion lets you compare the responses between the Document AI OCR processor with the Vision API, which can help you migrate to the Document AI OCR processor from Vision API with minimal downstream changes. For details, see Document AI Toolbox. - The OCR Processor supports a heuristics layout detection algorithm, which serves as an alternative to the current ML-based layout detection algorithm. You can choose the layout algorithm that best suits your needs. To use this feature, set
process_options.ocr_config.advanced_ocr_options= legacy_layout
in your API request to the OCR Processor.
For the Document AI OCR Processor (Doc OCR), you can enable document quality assessments for all processor versions instead of a specific processor version, such as pretrained-ocr-v1.1-2022-09-12
. If you enable document quality assessment, Doc OCR produces a quality score that's based on the document's readability. Quality scores range from 0 to 1, where 1 is perfect quality. Quality scores are returned in the image_quality_scores
field on the Page
object. All detected issues are labeled as quality
or defect
and sorted in descending order by confidence value. To use this feature, set process_options.ocr_config.enable_image_quality_scores= true
in your API request to the OCR Processor.
February 21, 2023
This launch upgrades the lifecycle stage of the Custom Document Extractor (CDE) component of the DocAI Workbench from Public Preview to Generally Available (GA). CDE covers essential workflows for developing custom document extraction processors with end-to-end UI support:
- Data import
- Schema creation and annotation
- Processor model training
- Evaluation and troubleshooting
- Model deployment and version management
- Human-in-the-loop (HITL) integration for "last-mile" processor quality assurance
Notable new Generally Available Custom Document Extractor (CDE) features include:
- Public APIs
- Automatic schema label creation from pre-labeled documents
- Schema label data type and occurrence editable pre-training
- New DocAI Toolkit with a labeled document converter
The following features have been upgraded:
- Processor gallery
- Schema editor
- Labeling UI
- Training pipeline
- Manage versions table
January 10, 2023
The Form Parser Release Candidate version has been renamed to pretrained-form-parser-v2.0-2022-11-10
. See Document AI release notes--December 12, 2022 for more information about this release.
December 22, 2022
We are launching a public preview version of the purchase order processor, pretrained-purchase-order-v1.1-2022-06-17
, with the following features:
- Support for uptraining to improve, add, and remove entities in the schema.
- Support for uptraining to add support for unsupported languages.
- Improvements to overall performance.
December 19, 2022
The Document AI OCR Processor has the following new features:
The OCR Processor now supports extracting embedded text from digital PDFs in public preview. A fallback to the optical OCR model is automatically triggered to extract text in the regions when the PDF being processed contains non-digital text. To opt into this feature, set
process_options.ocr_config.enable_native_pdf_parsing=true
in your API request to the OCR Processor.Added advanced versioning support to the Document AI OCR, which enables OCR users to pin to a historical model version. When enabled, OCR outputs are guaranteed to be consistent and virtually frozen, with zero behavioral drifts. To enable advanced versioning, select the release candidate version
pretrained-ocr-v1.2-2022-11-10
in your Document AI console.
Known issues with the digital PDF feature of the Document AI OCR Processor:
On a small number of documents, the word ordering within lines of text as reported by native text extraction might be wrong.
On certain documents, invisible text embedded in a native PDF may be reported.
On certain Japanese documents, currency symbols such as Yen might be incorrectly extracted as
/
.On certain documents, apostrophe symbols may be missing in word/line results.
On certain documents, native text extraction might report different word/line results than those obtained by image-based OCR on an identical document.
December 15, 2022
We are launching the Release candidate version, pretrained-utility-v1.2-2022-12-15
of the utility processor. This version includes the following features:
- Removal of three entities from schema:
delivery_date
,receiver_email
,receiver_phone
- Improvements to overall performance
The utility processor version pretrained-utility-v1.1-2021-04-09
continues to serve as the current, latest Stable version.
December 12, 2022
The Form Parser now supports Generic Entity Extraction in Public Preview, covering the following entity types:
email
: email addressphone
: phone numberurl
: website URLsdate_time
: partial or full date/time/periodaddress
: full address or street address in a single lineperson
: partial or full name of a personorganization
: full name of an organizationquantity
: a number specifying quantity or percentageprice
: a number specifying monetary amountid
: a number specifying identitypage_number
: a number specifying page number
The Form Parser has the following feature enhancements:
The Form Parser key-value pair (entity and checkbox) extraction and table extraction now support 200+ languages that are supported by the underlying multi-language OCR model. This language expansion is in Public Preview, with key-value pair internationalization backed by quality benchmarks in selected languages such as Simplified Chinese, Traditional Chinese, Japanese, and Korean.
Table extraction in Form Parser is now powered by an enhanced vision-based table parsing model.
These enhanced features are automatically enabled for Form Parser processor version pretrained-parser-v2.0-2022-11-10
and all future versions. Note that this is a Release Candidate version, which is subject to further changes before graduating to the Stable version.
December 08, 2022
Invoice Parser and Expense Parser are now available in the following single-region locations:
australia-southeast1
(Sydney)northamerica-northeast1
(Montréal)
November 21, 2022
Expense Parser Releases
As of November 18, 2022, for the Expense Parser, we have promoted our v1.3 Release Candidate version to a Stable version so that more customers can use it confidently.
New Stable version
Features in the new Stable Expense Parser, pretrained-expense-v1.3-2022-07-15
:
Support for a new language, Japanese, which has been requested by multiple customers.
Better entity performance
Addition of 3 new entity types (
line_item/quantity
,payment_type
,credit_card_last_four_digits
)Better support for hotel and car-rental related expenses
New Release Candidate version
Along with this Stable version, we are also launching a new Release Candidate version of the Expense Parser, pretrained-expense-v1.4-2022-11-18
, with the following new features, in addition to the features in the Stable version:
Improvements to overall performance
Support for two (2) new languages, Italian and Portuguese
Support for Uptraining to improve or add/remove entities in the schema
Support for Uptraining to add support for unsupported languages
Addition of 3 new entity types (
traveler_name
,reservation_id
,line_item/transaction_date
)Maximum pages (online/synchronous requests) limit has been increased to 15.
Deprecation of the old Stable version
The pretrained-expense-v1.1-2021-04-09
version
of the Expense Parser will be deprecated following this release.
Invoice Parser Updates
The previous Stable invoice processor version, pretrained-invoice-v1.1-2021-04-09
, is deprecated as of November 22, 2022.
The Invoice Parser, for v1.3 and v1.4, now has the following quotas and limits:
- Maximum pages (online/synchronous requests): 15
- Maximum pages (batch/offline/asynchronous requests): 200
November 16, 2022
The Identity Document Proofing Processor is now available in Public Preview.
The Identity Document Proofing Processor is designed to help predict the validity of ID documents with four different signals:
is_identity_document detection
: predict whether an image contains a recognized identity document.suspicious_words detection
: predict whether words are present that aren't typical on IDs.image_manipulation detection
: predict whether the image was altered or tampered via an image editing tool.online_duplicate detection
: predict whether the image can be found online.
November 11, 2022
New stable W2 processor version with the following enhancements:
- Breaks down long entities such as addresses into fine-grained sub-entities:
StreetAddressOrPostalBox
,AdditionalStreetAddressOrPostalBox
,City
,State
, andZip
. This modification not only improves accuracy, but also entity specificity. - Handles wider variations of W2 forms including multi-copies (2,3,4-ups) issued by various payroll vendors.
- Introduces 8 new entities for Box 12 that represent both codes and values.
New stable Payslip processor version with the following enhancements:
- Bonus, commissions, holiday, overtime, regular pay and vacation are now part of
earning_item/earning_this_period
andearning_item/earning_ytd
. Captures all types of earnings beyond those categories, and maps them to their respective earning rates, hours and pay (both for the period and year-to-date). - Returns year-to-date and current period taxes and deductions.
- Direct deposits are linked to the corresponding bank account numbers.
- Returns page numbers, state and federal tax exemptions and filing statuses.
October 31, 2022
A new Release Candidate (RC) version of the Document OCR Processor, pretrained-ocr-v1.1-2022-09-12
, is available in the US and EU. This RC can detect document defects.
- If the document is considered to be defective, the API now returns the same 5 document defect types supported by the Intelligent Document Quality Processor:
quality/defect_blurry
quality/defect_noisy
quality/defect_dark
quality/defect_faint
quality/defect_text_too_small
- In addition, it now supports 3 more defect types:
quality/defect_document_cutoff
quality/defect_text_cutoff
quality/defect_glare
- The defect detection results are in the
image_quality_scores
field on thePage
object in the returned JSON. This additional feature adds latency comparable to OCR processing to theprocess
call.
October 21, 2022
As of October 21, 2022, we have promoted our v1.3 Release Candidate version to a Stable version of the Invoice processor.
Features in the new Stable Invoice processor, version pretrained-invoice-v1.3-2022-07-15
.
Support for seven new languages: Italian, Portuguese, Romanian, Swedish, Estonian, Latvian, and Lithuanian.
Support for uptraining using Document AI Workbench. See Uptrain a specialized processor.
Improvements to currency and date normalization.
Improvements to line item extraction.
Quotas and limits
- Maximum pages (online/synchronous requests): 15
- Maximum pages (batch/offline/asynchronous requests): 200
- Regional availability
- US (Multi-region), Europe (Multi-region)
Alongside this Stable version, we are launching a Release Candidate version, pretrained-invoice-v1.4-2022-10-21
.
October 10, 2022
Known issue (Document Labeling)
If you delete one or more documents, and these documents selected for deletion are all associated with an active labeling job, then all documents in that dataset will also be deleted, even if you did not select them for deletion. This is true regardless of the number of documents selected.
Workaround: Do not delete documents during an active labeling job. You can track active labeling jobs on the Dataset management page, under the category Labeling tasks, located on the right side of the page. If you absolutely must delete documents during an active labeling job, ensure that you also select at least one document that is NOT part of this active labeling job. Then, only the non-associated documents will be deleted, and the remaining documents in the dataset will be preserved.
September 29, 2022
This launch upgrades the lifecycle stage of the Custom Document Extractor (CDE) component of the DocAI Workbench from Private Preview to Public Preview. CDE covers essential workflows for developing custom document extraction processors with E2E UI support:
- Data import
- Schema creation and annotation
- Processor model training
- Evaluation and troubleshooting
- Model deployment and version management
- Human-in-the-loop (HITL) integration for "last-mile" processor quality assurance
Notable new Public Preview Custom Document Extractor (CDE) features include:
- Progressive data import
- Direct import of annotated .json files as training or test datasets
- Data labeling platform integration
- Dataset export with metadata preserved
- Auto-labeling using a trained processor version at import to minimize manual annotation efforts
- Tabular entity end-to-end support
- Checkbox boolean annotation, training, and extraction
- Fuzzy matching for more flexible model evaluation
- Exportable / downloadable model evaluation metrics
The following features have been upgraded:
- Data import
- Schema creation and annotation
- Processor model training
- Evaluation and troubleshooting
- Model deployment and version management
- Human-in-the-loop (HITL) integration for "last-mile" processor quality assurance
Known issues
During labeling, checkboxes default to a state that does not reflect the selected or unselected states in the corresponding documents.
Workaround: Label all checkboxes within a schema, regardless of whether they are selected or unselected, for optimal checkbox extraction quality and accurate evaluation. When annotating a checkbox by drawing its bounding box, all checkboxes in the CDE annotation user interface will default to an unselected state, and you must manually update the selected or unselected state as necessary.
If a processor is deleted when there is an active labeling task ongoing, the task does not stop automatically. Labelers and Labeler Managers will still see the task in their labeling and manager consoles.
Workaround: Cancel the active labeling task before you delete the processor. Otherwise, navigate to the manager console to pause or delete the corresponding task queue. Any documents labeled from that task after the processor is deleted will be unretrievable.
Some discrepancies might exist between pretrained and uptrained versions of a processor.
On the Evaluate & Test tab, the evaluation does not show as complete until you click Refresh Table.
If you cannot choose a specialist pool due to no chooser for the specialists, clicking Continue causes an error.
Workaround: Click Task Details to go back and fill in remaining details.
Support for nested entities is limited to data arranged in a tabular format. Other layouts are currently not supported.
Any mention of "nested entities" in previous versions of the Document AI Workbench documentation have been replaced with "tabular entities" to reflect the table-based nesting capabilities for this launch.
If you see this message, "This processor does not currently support uptraining or evaluation capabilities" on a processor in the Processor gallery, this means that only the prediction endpoint is available at this time.
In some cases,
required_once
andoptional_once
entities appear multiple times in a document, with the same value copied to multiple locations. While one annotation is s sufficient when annotating a test set for evaluation, all copies should be annotated to ensure higher recall for trained models.Workaround: Annotate all instances of an entity within a doc. This will support both Evaluation and Training without any infrastructure changes.
Existing Human in the Loop (HITL) configurations for Label-level filters do not automatically populate to new processor configurations.
Workaround: If you require changes to the configuration, navigate to the Human-In-The-Loop tab. Under Set filters, select Label-level filters and click Set Label Filters. A Label-level filters page opens. Manually edit this table to configure all of the labels needed for validation and review.
Support for handwritten entity detection such as signatures, dates, and initials are limited and may require additional configuration for proper evaluation.
Workaround: When evaluating pretrained processors, the Value should be manually updated to
YES
if the entity is present, orNO
otherwise. For uptrained processor versions, the Value should be kept as detected by OCR. This issue affects W9, HUD92900B, SSA-1099, and VBA26-0551 processors.When you attempt to select a specialist pool for a labeling task, and there are missing fields in the form, and then click Continue, you might get an error.
Workaround: Click Task Details, and enter the missing details in the form. You can then successfully create a labeling task.
September 15, 2022
Schema support for checkboxes and nested entitites
- Customers using Document AI Workbench, and processors for Purchase Order (PO), Invoice, or Expense, now have access to a new schema. This schema enables customers to label checkboxes, if they are defined in the schema, and to accurately represent nested entities, such as parent-child relationships, on the HITL annotation and review console. As additional processors adapt the new schema, these release notes will be updated to include those.
Nested entities
- The Annotation console now supports labeling for nested entities. The left panel is refreshed with a new look for nested rows to represent nested entities. The value of "parent" will now be the concatenation of all its "children". The parent is effectively a container for all of its children.
September 01, 2022
We are standarding our release processes and naming conventions for processor versions. For more information, see Manage processor versions.
July 27, 2022
New Release Candidate (RC) versions for PDAI Invoice and Expense processors - July 2022
We have launched new RC versions of Invoice parser and Expense parser on July 15, 2022. These can be accessed in the following way:
- Invoice parser:
pretrained-next-uptrainable
- Expense parser:
pretrained-next
Here are the details about the contents of the RC version updates:
Processor | New Languages | New Entities |
---|---|---|
Invoice: pretrained-next-uptrainable |
Italian, Portuguese, Romanian, Swedish | N/A |
Expense: pretrained-next
|
Japanese | Support for hotel and car rental folios Payment information entities: Last 4 digits of credit card, payment type |
The current limit for uptrainable processors is as follows (it is different from the pre-trained version). We are gathering customer feedback to increase the async limit.
Quotas and limits
Maximum pages (online/synchronous requests): | 10 |
Maximum pages (batch/offline/asynchronous requests): | 15 |
June 30, 2022
VPC Service Control support
Document AI VPC Service Controls provide additional security for your resources and services. To learn more about VPC Service Controls, see the VPC Service Controls overview.
To learn about the limitations when using Document AI with VPC Service Controls, see the supported products and limitations.
June 13, 2022
Document AI is now generally available (GA) in the following new locations:
asia-south1
(Mumbai)australia-southeast1
(Sydney)
You must request access to use the new locations. For more information, see Regional and multi-regional support.
New Identity Processor (Preview)
The France Passport Parser is now available in limited preview.
June 10, 2022
The Contract Parser is now more accurate, can extract more fields and supports higher page limits.
June 01, 2022
Identity DocAI General availability (GA) release
The following Identity DocAI processors are now Generally Available (GA).
For more information, see Document AI for Identity.
April 21, 2022
Document OCR processor
The changes from the Google Default Next version have been applied to the Google default version.
The previous Google default version can still be accessed until July 21, 2022 as pretrained-legacy. After July 21, 2022, that version will be removed.
For more information about using different versions of the processor, see Managing processor versions .
For the original announcement of this change, see the January 14, 2022 release note.
April 08, 2022
New Version of Lending W2 Processor
We have released a new Release Candidate version of the W2 Processor. This version is experimental and has the following features:
- Quality improvement on
SSN
andEIN
fields. - Support for box 12 fields, including both codes and values.
- Fine grained predictions of
EmployeeName
,EmployeeAddress
, andEmployerNameAndAddress
which are no longer part of the output and replaced with additional fields.
March 25, 2022
New & Updated processors available
The following Lending DocAI processors are now available for trusted testers. Access to the trusted testers program is limited and granted on a case by case basis. If you would like to be considered please fill out the DocAI Processor Access Request Form:
New Experimental processors to support new document types:
- Form VA Loan Discharge Statement Processor
- Form USDA Conditional Statement Processor
- Form 1017 Processor
- Form Biweekly Payment Rider Processor
- Form VBA26 1805 Processor
- Form VBA26 6393 Processor
- Form MERS Rider Processor
Updated Experimental processors:
- Form 4506-T Processor
- Form 4506-C Processor
- Form HUD54114 Processor
- Form HUD92900WS Processor
- Form HUD92800 Processor
- Form 1040-NR Processor
- Form HUD92900LT Processor
- Form VBA26 8923 Processor
- Form HUD92900A Processor
- FORM_1005_PROCESSOR
March 10, 2022
Document AI is now generally available (GA) in the following new locations:
europe-west3
asia-southeast1
You must request access to use the new locations. For more information, see Regional and multi-regional support.
February 18, 2022
New Versions of Procurement Processors
We have launched a new Google Pretrained version of the following procurement processors with various quality improvements:
The changes from the old Google default next version have been applied to the new Google Pretrained version. The old Google default version is still available and will not be deprecated for at least 180 days.
January 26, 2022
Enrichment using the Knowledge Graph is now Generally Available.
For more information, see Enterprise Knowledge Graph field enrichment.
January 14, 2022
Document OCR processor
We have updated the Google default next version with quality improvements. Consequently, you have 90 days from today to test the new model before the changes are applied to the Google default version. After that, the original Google default version will be available for another 90 days as legacy. For more information about the processor and its versions, see the Document OCR processor.
For more information about using different versions of the processor, see Managing processor versions.
For the original announcement of this change, see the November 5, 2021 release note.
December 15, 2021
New Lending Processors (Preview)
The following new processors are now available in Preview. To request API access, fill out and submit the Document AI limited access customer request form.
- 1040 Schedule D Parser
- HOA Statement Parser
- HUD-92900B Parser
- SSA-89 Parser
- VBA26-0551 Parser
The following is available through the Cloud console.
- Investment and retirement statement parser
New Versions of Lending Processors
We have launched new versions of the following lending processors, in General Availability (GA).
These new versions use a new lending document splitting and classification model with improved quality and support for more document types.
November 10, 2021
We have lowered the price for many processors. For more information, see the Pricing page.
November 05, 2021
The following procurement processors are now publicly accessible:
We have release a new version of the Document OCR Processor called Google default next. This version changes the distribution of confidence scores in the response. You have 90 days from today to test the new model before the changes are applied to the Google default version . After that event, the original version will still be available for another 90 days as legacy. For more information about using different versions of the processor, see Managing processor versions.
New Lending Processor (Preview)
The Mortgage statement parser is now available in limited preview.
October 15, 2021
Contract DocAI (Preview) released
The Contract parser is now available.
October 06, 2021
Document AI is now generally available (GA) in the following new locations:
europe-west2
northamerica-northeast1
You must request access to use the new locations. For more information, see Regional and multi-regional support.
September 01, 2021
Document AI now supports Data Residency, VPC-SC, Access Transparency, and CMEK.
August 20, 2021
Managing processor versions
You can now switch between different versions of a processor. For more information, see Managing processor versions.
New processor versions
We have added new versions of the following processors:
- Bank statement parser: improved model quality
- Pay slip parser: improved model quality and extraction of three additional fields:
net_pay
,net_pay_ytd
, andemployee_account_number
.
New Lending DocAI processors
The following Lending DocAI (LDAI) processors are now available in limited Preview:
- 1065 parser
- 1099-NEC parser
- 1099-R parser
- 1120 parser
- 1120-S parser
- SSA-1099 parser
Additionally, the LDAI Document Splitter and Classifier has been updated to support the new LDAI processors as well as the following processors:
- US Driver License Parser
- US Passport Parser
Human in the Loop (HITL) support for Lending DocAI processors
The following Lending DocAI processors now support Human in the Loop (HITL):
- 1003 parser
- 1040 Parser
- 1040 Schedule C parser
- 1040 Schedule E parser
- 1099-DIV parser
- 1099-G parser
- 1099-INT parser
- 1099-MISC parser
- Bank Statement parser
- Pay Stub parser
- W2 parser
- W9 parser
Knowledge Graph support
The following processors now support Knowledge Graph enrichment:
- Bank Statement
- Pay Slip
- W2 Parser
- W9 Parser
July 30, 2021
The Invoice Parser now extracts a new field invoice_type
that indicates the type of the input document.
July 02, 2021
Change in processor documentation
The location of individual processor information has changed. You can now find individual processor documentation for all solutions (General, Procurement, Lending) in the following locations:
Human in the Loop (HITL) now supports priority queues for each processor, based on the urgency of each document. For more information, see HITL.
June 09, 2021
VPC Service Controls
Integration with Document AI VPC Service Controls is now generally available.
April 09, 2021
Procurement DocAI General availability (GA) release
Procurement DocAI (PDAI) solution is now available in private General Availability (GA).
This includes the following processors:
- Invoice parser
- Expense parser (formerly Receipt parser)
- Procurement document splitter & classifier
- Utility parser
Human in the Loop (HITL) support for Procurement DocAI processors
Procurement DocAI processors now support Human in the Loop (HITL) AI platform functionality supporting human revisions of predictions.
Invoice parser behavior update
The invoice parser behavior has been updated to include the following features:
- Offers extended support for the following languages (in addition to English):
- French
- Dutch
- German
- Spanish
- Improves supplier parsing accuracy with Knowledge Graph support.
- Improves prediction quality (accuracy).
- Extends the header and line item fields extracted by the parser.
- Increased the number of pages for online processing (10 pages) and offline processing (200 pages).
- Increased the number of documents per batch in offline processing (50 documents).
Expense parser (Receipt parser) behavior update
The expense parser behavior has been updated to include the following features:
- Renamed Receipt parser to Expense parser.
- Improved prediction quality.
- Improved prediction quality for English, French, and Dutch for more expense types (for example hotel statements).
Human in the Loop (HITL) AI General Availability (GA) released
HITL AI is now available in Private General Availability (GA) for human review of Invoice, Expense, and Utility parser predictions.
Features:
- HITL configuration enhanced to designate which fields need review and whether a field is mandatory, saving review time.
- Labeler UI highlights the fields below a confidence score and supports single-click confirmation to improve review efficiency.
- Labeling Manager shows analytics and metrics by task and by labeler to streamline HITL operations.
April 02, 2021
Lending DocAI General Availability (GA) released
Lending DocAI is now General Availability. See the documentation for more information.
Lending DocAI processors added
The following Lending DocAI processors are now available:
March 31, 2021
Document AI General availability (GA) released
Document AI is now General Availability (GA).
January 14, 2021
New Procurement DocAI processor released in limited Preview
The following Procurement DocAI processor is now available in limited Preview:
- Procurement document splitter
For more information, see the processor documentation.
January 11, 2021
Lending processors behavior update
The behavior of the following processors has been updated:
- 1003 parser
- 1040 parser
- 1099-MISC parser
- W2 parser
- W9 parser
Now, if these processors are given a multi-page input file and contains a page that is the correct document type and one of the supported versions the processor performs entity extraction for that page; subsequent applicable pages will not be processed. If the prcoessor doesn't find any applicable documents in the input file it returns an error message.
October 29, 2020
Document AI Preview released
The following beta and preview features are available in API version v1beta3:
- Procurement DocAI processors: Invoice parser and receipt parser.
October 16, 2020
Document AI Preview released
The following beta and preview features are available in API version v1beta3:
- General processors: Document OCR (Optical Character Recognition), form parser, and document splitter.
- Lending processors: W9, 1040, W2, 1099-MISC, and 1003 parsers, as well as lending document splitter & classifier.
uri
field unavailable
- Sending a request with the
uri
field is currently not supported for v1beta3. Any updates to the availability of theuri
field will be announced here.
Workaround: Send requests with image information in the content
field (base64 encoded information).
August 24, 2020
Form Parser model updates
The Form Parser model has been updated. The model update includes the following features:
- Improved OCR quality for English detection.
- Improved key-value pair, checkbox, and table parsing detection quality, particularly for rotated images and handwritten text.
- Decreased latency for complex tables.
August 20, 2020
Invoice Parsing updates
- Document AI now supports normalized values for certain entities returned from Invoice Parsing requests.
- We have improved confidence scores for entities returned from Invoice Parsing requests.
July 04, 2020
Invoice Parsing Beta model upgrade
The Invoice Parsing Beta model has been upgraded. This model upgrade results in higher quality results for the entities and entityRelations. There is no API change.
See the product documentation for more information.
April 14, 2020
Document AI Beta released
The following beta features are available in API version v1beta2:
- Document processing: You can use the API to parse forms or tables from PDF, TIFF, or GIF documents.
- Regional support: The API now offers multi-regional support (
us
andeu
) for all features. Using a multi-region endpoint enables you to configure the API to store and process your data in the United States or European Union.
Invoice processing Beta
- Invoice processing is now available as a restricted feature. See Parsing invoices for more information.