Profile

Document AI in SnapApp

on 05-08-2024 12:00 AM by SnapApp by BlueVector AI

1338

Document AI is a platform for processing and understanding documents, transforming unstructured data into structured formats with specific fields suitable for databases. This makes data easier to comprehend, analyze, and use. Utilizing machine learning and Google Cloud, Document AI enables the creation of scalable, end-to-end, cloud-based document processing applications.

For example, data extraction from an image, document, or handwritten text.

Document AI leverages Vertex AI products, incorporating generative AI to enable scalable, end-to-end, cloud-based document processing applications without requiring specialized machine learning expertise.

To learn more, see Document AI overview

Table of Contents

Document AI Parser

A Document AI parser acts as an intermediary between a document file and a machine learning model, enabling document processing and understanding tasks such as classification, splitting, parsing, and analysis.

Types of parser

  • Document OCR: This parser uses OCR to get text, layout, and various add-ons such as image quality detection (for readability) and deskewing (fully automatic).
  • Form Parser: This parser identifies key-value pairs in structured forms and simple tables and extract form elements such as text and checkboxes.
  • Layout Parser: Identify and extract document layouts and chunks.
  • Specialized Parser: Schematized processors for domain-specific documents.
Parser Description
1003 Parser Extract over 50 fields from Fannie Mae Form 1003 (URLA).
1040 Parser Extract from Form 1040, including name, filing status, amounts, etc.
1040-C Parser Extract from Form 1040 Schedule C, including name, year, etc.
1040-E Parser Extract from Form 1040 Schedule E, including name, year, etc.
1065 Parser Extract from Form 1065, partnership name, address, etc.
1099-G Parser Extract from Form 1099-G, including payer, recipient, etc.
1099-INT Parser Extract from Form 1099-INT, including payer, recipient, etc.
1099-MISC Parser Extract from Form 1099-MISC, including payer, recipient, etc.
1099-NEC Parser Extract from Form 1099-NEC, including payer, recipient, etc.
1099-R Parser Extract from Form 1099-R, including payer, recipient, etc.
1120 Parser Extract from Form 1120, partnership name, address, etc.
1120S Parser Extract from Form 1120S, name, address, assets, etc.
Bank Statement Parser Extract from Bank Statements including name, transactions, etc.
Expense Parser Extracts from Receipts, including supplier, total amount, tip, etc.
France Driver License Parser Extract fields from France Driver License, including names and dates
France National ID Parser Extract fields from France ID, including names, dates, etc.
France Passport Parser Extract fields from France Passport, including names, dates
Identity Document Proofing Predict the validity of ID documents using multiple signals.
India Aadhar Card Parser Extract fields from India Aadhar Card, including names, dates.
India Driver License Parser Extract fields from India Driver License, including names, dates.
India Passport Parser Extract fields from India Passport, including names, dates, etc.
Invoice Parser Extracts 30+ fields from Invoices: ID, amount, line items, etc.
Lending Doc Splitter/Classifier Identify documents in a large file & classify known lending doc types.
Mortgage Statement Parser Extract from mortgage statements, including names, amounts due, etc.
Pay Slip Parser Extract from Pay Slips, including name, business, amounts, etc.
Procurement Doc Splitter Split procurement documents bundled into single PDF file.
Receipt Parser Extracts from Receipts, including supplier, total amount, tip, etc.
SSA-1099 Parser Extract from Form SSA-1099, name, address, SSN, etc.
US Driver License Parser Extract fields from US Driver License, including names and dates
US Passport Parser Extract fields from US Passport, including names, dates, etc.
Utility Parser Extracts 30+ fields from Utility statements: amount, line items, etc.
W2 Parser Extract from Form W2, including employee, employer, wages, etc.
W9 Parser Extract from Form W9 including name, address, TIN, etc.
  • Custom Parser: The parsers that we have to train from scratch to extract required values.
  • Extractor: This parser extracts text and layout information from document files and normalizes entities.
  • Classifier: This parser groups the documents into categories.
  • Splitter: This parser splits and classifies documents by type. It identifies document boundaries in a large file.
  • Summarizer: Generate summaries for short and long documents

To learn more, see Parser List

How to create a custom Document AI Parser?

  1. Create
    • Go to the Google Cloud Console and navigate to the Document AI dashboard.
    • Click on “Create Processor” and select “Custom Processor”
  2. Define
    • Specify the document types and formats your parser will support.
    • Define the extraction rules and entity types using JSON or YAML.
    • Configure any pre-processing or post-processing steps.
  3. Upload
    • Provide a sample document that represents the type of documents your parser will process.
    • This document will be used for training and testing.
  4. Train
    • Click on “Train” to train the parser using the sample document and configuration.
    • The training process may take a few minutes to complete.
  5. Test
    • Click on “Test” to test the parser with a new document.
    • Verify that the parser extracts the expected data and entities.
  6. Deploy
    • Once the parser is trained and tested, click on “Deploy” to deploy it to your Google Cloud account.
    • The parser will be available for use in your Document AI workflows.
**Note**: Ensure you have the required permissions and access to create and deploy custom parsers in Google Cloud. You can also utilize the Document AI API for programmatic creation and management of custom parsers.

To learn more, see Document AI Create Processor

custom-parser-creation

How to import a Document AI parser?

Usually used to import Custom Document AI Parser

  1. Navigate to Settings from the User menu of the top bar.
  2. Tap on Import Doc AI from the Data menu of the left navigation bar.
  3. Enter the name of the parser you want to parse in Parser Name field.
  4. Enter the entities you want to parse through the parser in Parser Expected Entities field.

Here is a simple example of a custom parser import in SnapApp:

custom-parser-details

import-docai

How to do real time extraction and validation using Doc AI?

Settings Description
Quality Validation Checks the qualiity of document and discard the document if the quality is poor
Classify and Validate Detect the correct document and discard the others. You have to choose a classifier parser then to proceed
Entity Validation Validate the entities of the uploaded document
**Pre Conditions**:
  • An object must be created with Elevated Admin Access permissions.
  • Ensure the “Track Attachments” checkbox is selected.
  • Include a field of Document AI data type with the appropriate parser selected, ensuring checkboxes are ticked for Quality Validation, Classify and Validate, and Entity Validation.

For example:

DocAI-field-with-parser

  1. Create another field for real time data extraction and validation

  2. Navigate to Settings from the User menu of the top bar.

  3. Tap on Fields from the Data menu of the left navigation bar.
  4. Tap on + Add New to add a new field.
  5. Configure the Basic Properties with the Object ID of the recently created object.
  6. Under the Display Properties section, Select Formula to open the expression builder.
  7. Select the Object and Record and prompt the formula in the Expression to see the Output

Expression:

=LOOKUP(CONCAT("gs://",[[docAI_type_fieldname]]), "object_of_selected_parser", "_gcs_uri", "data_extraction_field_name") ``` Output:

  Extracted value (from data_extraction_field_name)

``` For example:

lookup-formula-configuration

  1. Data Extraction and validation from Document

  2. Navigate to Settings from the User menu of the top bar.

  3. Tap on Objects from the Data menu of the left navigation bar.
  4. Tap on the object in the list that you want to preview.
  5. Tap on the Preview button.
  6. Tap on + Add New to enter records.
  7. Upload the document and tap on Save.
  8. The data will extract and display.

    For example:

    ein-generation

    validation-successful

    validation-failed

How to validate rules using Document AI?

Validation rule is required to compare the extracted data with the provided condition to get a successful validated extraction.

  1. Check the box for Document AI validation and Publish in the view builder of Inspector View.
  2. Validity Column will be visible.
  3. Navigate to Edit Fields from the User menu of the top bar.
  4. Tap on the field you want to edit
  5. Under the Display Properties section, prompt an expression for setting a condition. For example: validate-condition

  6. Tap on insert expression.

  7. Prompt a text to show as an invalid error and tap on Save.

For example:

valid-check

But if the condition isn’t met, then:

invalid-condition validation-rules-unsuccessful

Now, your Document AI parser is ready to extract and validate your data.


Thank you for following these steps to configure your SnapApp components effectively If you have any questions or need further assistance, please don’t hesitate to reach out to our support team. We’re here to help you make the most out of your SnapApp experience.

For support, email us at snapapp@bluevector.ai


Generate Text