Text extraction analysis

Text extraction analysis is the process of extracting named entities from unstructured text such as press articles, Facebook posts, or tweets, and categorizing them. Typically, a named entity is a proper noun that falls into a commonly understood category such as a person, organization, or location. An entity can also be a Social Security number, email address, or a ZIP code.

Auto tags

You can configure a Text Analyzer to automatically detect and mark the most important concepts that are expressed in a document. This option is useful when you want to tag a document with the most relevant words or phrases, create word clouds, or perform faceted search according to semantic categories.

Keywords-based text extraction

You can specify the list of key terms and their synonyms that belong to a particular domain. For example, you can create a keyword-based text extraction model to track social media messages that pertain to the latest release of a product or a group of products of your competitor. You create keyword-based text extraction models in the Analytics Center.

Machine learning-based text extraction models

Identify entities from an open dictionary of terms (for example, products, people's names, locations, and so on). Extraction of entities that belong to open dictionaries is based on machine-learning models. You can select one of the default entity extraction models or create custom models in the Analytics Center by using the conditional random fields (CRF) algorithm.

Pattern extraction models

Use pattern extraction models to extract entities whose structures match a specific pattern, for example, ZIP codes, case numbers, email addresses, and so on. You can select one of the default pattern extraction models or create custom models in the Analytics Center by using the Rule-based Text Annotation (Ruta) script language.