Preparing data for text extraction

Updated on March 11, 2021

In the Source selection step of the text extraction model creation wizard, select the extraction type and provide the data for training and testing of your text extraction model.

In the Extraction type section select a recognizer type:
- To detect word-level entities, such as person or location, select Default entity recogniser.
- To detect paragraph-level entities, such as email disclaimer, select Paragraph entity recogniser.
Optional: To view the template for testing and training data, click Download template.
An example training data record is: Hi, this is <START:name> Bart <END>, where:
- <START:name> – Marks the start and type of the entity. In the preceding example, the model will detect the string Bart as name.
- <END> – Marks the end of the entity.
To select and upload a CSV, XLS, or XLS file that contains training and testing data for your text extraction model, click Choose file.
After you select a valid file, you can preview the types of identified entities and the size of training and testing data. Depending on your business needs, you can exclude entity types from training data. Additionally, you can view errors, for example, missing <START> or <END> tags.
If your file contains errors, perform any of the following actions:
- Exclude errors from the model by selecting the Exclude below error records and build model check box.
- Correct errors in the file and repeat step 3.
Click Next.

Previous topic Building machine-learning text extraction models
Next topic Defining the training set and training the text extraction model

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Visit the Support Center

Get Started with Community

Preparing data for text extraction

Related articles

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

Get Started with Community

Related articles

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

We'd prefer it if you saw us at our best.