Document Processing Service custom parameters

You can optimize how your application runs the Document Processing Service (DPS) component by specifying profiles for the optical character recognition (OCR) and highlighting services. In addition, you can edit additional DPS parameters to customize this component further. Modifying custom parameters for DPS helps you optimize the OCR and highlighting services for your needs and requirements. For example, you can modify custom DPS parameters so that the system automatically corrects layout orientation during document processing, and recognizes text in image-based documents in the English, French, and Spanish languages.

Main parameters

The following table lists the main DPS parameters that you modify in the configureDPSABBYY data transform:

Parameter name Description Values
exactMatch Specifies whether to perform exact matching of text during document processing. true, false
highlightFileExportFormat Specifies the file format for the highlighting service. FEF_RTF, FEF_HTMLVersion10Defaults, FEF_HTMLUnicodeDefaults, FEF_PDF, FEF_TextVersion10Defaults, FEF_TextUnicodeDefaults, FEF_XML, FEF_DOCX, FEF_XLSX, FEF_PPTX, FEF_ALTO, FEF_EPUB, FEF_FB2, FEF_ODT
ocrFileExportFormat Specifies the file format for the OCR service. FEF_RTF, FEF_HTMLVersion10Defaults, FEF_HTMLUnicodeDefaults, FEF_PDF, FEF_TextVersion10Defaults, FEF_TextUnicodeDefaults, FEF_XML, FEF_DOCX, FEF_XLSX, FEF_PPTX, FEF_ALTO, FEF_EPUB, FEF_FB2, FEF_ODT
textLanguage Specifies the languages of the text to be recognized, including programming languages, as a comma-separated list. Define up to three languages for this property, as more language definitions may impact system performance. Abkhaz, Adyghe, Afrikaans, Agul, Albanian, Altaic, Arabic, ArmenianEast, ArmenianGrabar, ArmenianWestern, Anwar, Aymara, AzeriCyrillic, AzericLatin, Bashkir, Basic, Basque, Belarusian, Bemba, Blackfoot, Breton, Bugotu, Bulgarian, Burmese, Buryat, C++, Catalan, Chamorro, Chechen, Chemistry, ChinesePRC, ChineseTaiwan, Chukcha, Chuvash, CMC7, Cobol, Corsican, CrimeanTatar, Croatian, Crow, Czech, Danish, Dargwa, Digits, Dungan, Dutch, DutchBelgian, E13B, English, EskimoCyrillic, EskimoLatin, Esperanto, Estonian, Even, Evenki, Faeroese, Farsi, Fijian, Finnish, Fortan, French, Frisian, Friulian, GaelicScottish, Gagauz, Galician, Ganda, German, GermanLuxembourg, GermanNewSpelling, Greek, Guarani, Hani, Hausa, Hawaiian, Hebrew, Hungarian, Icelandic, Ido, Indonesian, Ingush, Interlingua, Irish, Italian, Japanese, JapaneseModern, Java, Kabardian, Kalmyk, KarachayBalkar, Karakalpak, Kasub, Kawa, Kazakh, Khakas, Khanty, Kikuyu, Kirghiz, Kongo, Korean, KoreanHangul, Koryak, Kpelle, Kumyk, Kurdish, Lak, Lappish, Latin, Latvian, LatvianGothic, Lezgin, Lithuanian, Luba, Macedonian, Malagasy, Malay, Malinke, Maltese, Mansi, Maori, Mari, Maya, Miao, Minankabaw, Mohawk, Mongol, Mordvin, Nahuatl, Nenets, Nivkh, Nogay, Norwegian, NorwegianBokmal, NorwegianNynorsk, Nyanja, Occidental, OcrA, OcrB, Ojibway, OldEnglish, OldFrench, OldGerman, OldItalian, OldSlavonic, OldSpanish, Ossetic, Papiamento, Pascal, Pashto, PidginEnglish, Polish, PortugueseBrazilian, PortugueseStandard, Provencal, Quechua, RhaetoRomanic, Romanian, RomanianMoldavia, Romany, Ruanda, Rundi, RussianOldSpelling, Russian, RussianWithAccent, Samoan, Selkup, SerbianCyrillic, SerbianLatin, Shona, Sioux, Slovak, Slovenian, Somali, Sorbian, Sotho, Spanish, Sunda, Swahili, Swazi, Swedish, Tabassaran, Tagalog, Tahitian, Tajik, Tatar, Thai, Tinpo, Tongan, Tswana, Tun, Turkish, Turkmen, TurkmenLatin, Tuvin, Udmurt, UighurCyrillic, UighurLatin, Ukrainian, Urdu, UzbekCyrillic, UzbekLatin, Vietnamese, Visayan, Welsh, Wolof, Xhosa, Yakut, Yiddish, Zapotec, Zulu

Page preprocessing parameters

The table below lists the DPS parameters that the system uses during page preprocessing, which you can modify in the configureDPSABBYY data transform:

Parameter name Description Values
applySigmaFilter Specifies whether to apply the noise reduction filter to the image during page preprocessing. If you specify TSPV_Auto, the system determines automatically whether to use the noise reduction filter. TSPV_Yes, TSPV_No, TSPV_Auto
correctOrientation Specifies whether to automatically rotate the image during page preprocessing, if the detected page orientation is different from normal. VARIANT_TRUE, VARIANT_FALSE
correctInvertedImage Specifies whether to automatically invert the image, if the detected image is inverted - white text is displayed on a black background. VARIANT_TRUE, VARIANT_FALSE
correctResolution Specifies whether to correct the resolution of the image during page preprocessing. If you specify TSPV_Auto, the system chooses to automatically correct the image resolution, if the system finds the resolution to be insufficient. TSPV_Yes, TSPV_No, TSPV_Auto
correctShadowsAndHighlights Specifies whether to improve the recognition quality, by correcting excessive shadows and highlighting in the image during page preprocessing. Use this property with photo images. If you specify TSPV_Auto, the system automatically determines whether to perform correction of excessive shadows and highlighting. TSPV_Yes, TSPV_No, TSPV_Auto
correctSkew Specifies whether image skew is corrected during page preprocessing. If you specify TSPV_Auto, the system automatically determines whether to perform image skew correction. TSPV_Yes, TSPV_No, TSPV_Auto
correctGeometry Specifies whether geometrical distortions in photo images are removed during page processing. If you specify TSPV_Auto, the system automatically determines whether to remove geometrical distortions in photo images. TSPV_Yes, TSPV_No, TSPV_Auto
cropImage Specifies whether document edges are detected in the image, and whether the image is cropped during page processing. If you specify TSPV_Auto, the system automatically determines whether to detect edges and crop the image. TSPV_Yes, TSPV_No, TSPV_Auto

PDF export parameters

The table below lists the DPS parameters that the system uses for PDF exporting, which you can modify in the configureDPSABBYY data transform:

Parameter name Description Values
jpgQuality Specifies the image quality for saving in PDF files, as a percentage. A number between 1 and 100.
pdfExportScenario Specifies the scenario used to export images to a PDF format that is balanced or based on quality and size, versus the speed of the operation. PES_MaxQuality, PES_Balanced, PES_MinSize, PES_MaxSpeed

Object extraction parameters

The table below lists the DPS parameters that the system uses during object extraction, which you can modify in the configureDPSABBYY data transform:

Parameter name Description Values
sourceContentReuseMode Specifies how the text and image layers of the source PDF file are handled: automatic, with content only, or do not reuse. CRM_Auto, CRM_DoNotReuse, CRM_ContentOnly