Which of the following are unstructured documents?
D
Explanation:
Unstructured documents are those that do not have a predefined format or layout, and therefore
cannot be easily processed by traditional methods. They often contain free-form text, images, tables,
and other elements that vary from document to document. Examples of unstructured documents
include contracts, agreements, emails, letters, reports, articles, and so on.
UiPath Document
Understanding is a solution that enables the processing of unstructured documents using AI-
powered models and RPA workflows1
.
The other options are not correct because they are examples of structured or semi-structured
documents. Structured documents are those that have a fixed format or layout, and can be easily
processed by rules-based methods. They often contain fields, labels, and values that are consistent
across documents. Examples of structured documents include banking forms, tax forms, surveys,
identity cards, and so on. Semi-structured documents are those that have some elements of
structure, but also contain variations or unstructured content. They often require a combination of
rules-based and AI-powered methods to process.
Examples of semi-structured documents include
invoices, receipts, purchase orders, medical bills, and so on2
.
References: 1
:
Unstructured Data Analysis with AI, RPA, and OCR | UiPath 2
:
Structured, semi
structured, unstructured sample documents for UiPath document understanding - Studio - UiPath
Community Forum
When creating a training dataset, what is the recommended number of samples for the Classification
fields?
C
Explanation:
According to the UiPath documentation, the recommended number of samples for the classification
fields depends on the number of document types and layouts that you want to classify. The more
document types and layouts you have, the more samples you need to cover the diversity of your
data.
However, a general guideline is to have at least 20-50 document samples from each class, as
this would provide enough data for the classifiers to learn from12
.
A large number of samples per
layout is not mandatory, as the classifiers can generalize from other layouts as well3
.
References: 1: Document Classification Training Overview 2: Document Classification Training Related
Activities 3
: Training High Performing Models
What is one of the purposes of the Config file in the UiPath Document Understanding Template?
B
Explanation:
The Config file in the UiPath Document Understanding Template is a JSON file that contains various
parameters and values that control the behavior and functionality of the template. One of the
purposes of the Config file is to store the API keys and authentication credentials for accessing
external services, such as the Document Understanding API, the Computer Vision API, the Form
Recognizer API, and the Text Analysis API. These services are used by the template to perform
document classification, data extraction, and data validation tasks. The Config file also allows the
user to customize the template according to their needs, such as enabling or disabling human-in-the-
loop validation, setting the retry mechanism, defining the custom success logic, and specifying the
taxonomy of document types.
References:
Document Understanding Process: Studio Template
,
Automation Suite - Document
Understanding configuration file
Which of the following file types are supported for the DocumentPath property in the Classify
Document Scope activity?
B
Explanation:
According to the UiPath documentation portal1
, the DocumentPath property in the Classify
Document Scope activity accepts the path to the document you want to validate. This field supports
only strings and String variables. The supported file types for this property field are .png, .gif, .jpe,
.jpg, .jpeg, .tiff, .tif, .bmp, and .pdf. Therefore, option B is the correct answer, as it contains four of the
supported file types. Option A is incorrect, as .psd is not a supported file type. Option C is incorrect,
as .raw is not a supported file type. Option D is incorrect, as .eps is not a supported file type.
References: 1
Activities - Classify Document Scope - UiPath Documentation Portal
When processing a document type that comes in a high variety of layouts, what is the recommended
data extraction methodology?
B
Explanation:
Based on the classification of documents, there are two common types of data extraction
methodologies: rule-based data extraction and model-based data extraction1
.
Rule-based data
extraction targets structured documents, while model-based data extraction is used to process semi-
structured and unstructured documents1
. However, neither of these methods alone can handle the
high variety of layouts that some document types may have.
Therefore, a hybrid data extraction
approach is recommended, which combines the strengths of both methods and allows for more
flexibility and accuracy23
.
A hybrid data extraction approach can use one or more extractors, such as
RegEx Based Extractor, Form Extractor, Intelligent Form Extractor, Machine Learning Extractor, or
FlexiCapture Extractor, depending on the document type and the fields of interest3
.
The Data
Extraction Scope activity in UiPath enables the configuration and execution of a hybrid data
extraction methodology, by allowing the user to customize which fields are requested from each
extractor, what is the minimum confidence threshold for a given data point extracted by each
extractor, what is the taxonomy mapping, at field level, between the project taxonomy and the
extractor’s internal taxonomy (if any), and how to implement “fall-back” rules for data extraction2
.
References: 2
:
Data Extraction Overview 3
:
Data Extraction 1
:
Document Processing with Improved
Data Extraction
Which is a high-level view of the tabs within an AI Center project?
D
Explanation:
A high-level view of the tabs within an AI Center project is as follows:
Dashboard: This tab provides an overview of the project’s status, such as the number of datasets,
pipelines, packages, skills, and logs, as well as the AI Units consumption and quota.
Datasets: This tab enables you to upload, view, and manage the datasets that are used for training
and evaluating the ML models within the project.
A dataset is a folder of storage containing arbitrary
files and sub-folders1
.
Data Labeling: This tab enables you to upload raw data, annotate text data in the labeling tool (for
classification or entity recognition), and use the labeled data to train ML models.
It is also used by
the human reviewer to re-label incorrect predictions as part of the feedback process2
.
ML Packages: This tab enables you to upload, view, and manage the ML packages and package
versions within the project.
An ML package is a group of package versions of the same package type,
and a package version is a trained model that can be deployed to a skill3
.
Pipelines: This tab enables you to create, view, and manage the pipelines and pipeline runs within
the project.
A pipeline is a description of an ML workflow, including the functions and their order of
execution, and a pipeline run is an execution of a pipeline based on code provided by the user4
.
ML Skills: This tab enables you to deploy, view, and manage the ML skills within the project.
An ML
skill is a live deployment of a package version, which can be consumed by an RPA workflow using an
ML skill activity in UiPath Studio5
.
ML Logs: This tab enables you to view and filter the logs related to the project, such as the events,
messages, and errors that occurred during the pipeline runs, skill deployments, and skill executions6
.
References:
:
About Datasets 2
:
About Data Labeling 3
:
About ML Packages 4
:
About Pipelines 5
:
About ML
Skills 6
:
About ML Logs
Can you use Queues in the Document Understanding Process?
B
Explanation:
The Document Understanding Process is a fully functional UiPath Studio project template based on a
document processing flowchart. It supports both attended and unattended robots with human-in-
the-loop validation via Action Center. The process uses queues to store and process the input files,
one file per queue item. However, the Auto Retry Functionality should be disabled on queues,
because it can interfere with the human validation step and cause errors or duplicates. The process
handles the retry mechanisms internally, using the Try/Catch and Error management features.
References:
Document Understanding Process: Studio Template
Document Understanding Process - New Studio Template
How long does the typical Machine Learning model deployment process take in UiPath AI Center?
C
Explanation:
The typical machine learning model deployment process in UiPath AI Center usually takes
between 10-15 minutes1
.
This process involves wrapping the model in UiPath’s serving framework
and deploying it within a namespace on AI Fabric’s Kubernetes cluster that is only accessible by your
tenant1
. Please note that the actual time may vary depending on the complexity of the model and
other factors.
AI Center - Managing ML Skills (uipath.com)
What are the available options for Scoring in Document Manager, that apply only to string content
type?
C
Explanation:
According to the UiPath documentation, the available options for Scoring in Document Manager, that
apply only to string content type, are exact match and Levenshtein. Exact match is a scoring strategy
that considers a prediction to be correct only if it exactly matches the true value. Levenshtein is a
scoring strategy that measures the similarity between two strings by counting the minimum number
of edits (insertions, deletions, or substitutions) required to transform one string into another. The
lower the Levenshtein distance, the higher the score. These options can be configured in the
Advanced tab of the Edit Field window for string fields.
References:
Document Understanding - Create and Configure Fields
Document Understanding - Training High Performing Models
What is the name of the web application that allows users to prepare, review, and make corrections
to datasets required for Machine Learning models?
C
Explanation:
Data Manager is a web application that allows users to prepare, review, and make corrections to
datasets required for Machine Learning models. Data Manager enables users to create and manage
datasets, label data, validate and export data, and monitor data quality and progress. Data Manager
supports various types of data, such as documents, images, text, and tables.
Data Manager is
integrated with AI Center, where users can train and deploy Machine Learning models using the
datasets created or modified in Data Manager12
.
References: 1
:
Data Manager Overview 2
:
AI Center - About Datasets