Skip to main content

 –

Data analysis settings

Suggest edit Updated on March 11, 2021

You can change these settings to affect the Data analysis step of the predictive model configuration process that is described in Analyzing data. The settings include the names of the sections that are displayed for the step and the default values for particular options.

SettingDescription
Label
Wide of schemeChange the label for cases not found in the development sample.
MissingChange the label for missing values.
Residual groupChange the label for the intervals that are so small that their behavior is not a reliable basis for grouping them in another interval.
Remaining symbolsChange the label for the intervals that are so small that their behavior is not a reliable basis for grouping them in another category.
IgnoredChange the label for fields that are excluded from subsequent analysis and modeling.
Binning and grouping settings
Number of bins for numeric fieldsSet the initial number of bins used to analyze the values of each numeric.
Number of bins for symbolic fieldsSet the initial number of bins used to analyze the symbols of each symbolic field.
Create equal width intervalsSelect this option to create equal width intervals by default.
Ignore ordering This option is for symbolic predictors only, and by default, it is enabled.

Select this option to combine a category with others most similar in behavior. When this option is disabled, the order of the symbolic categories is assumed to have some meaning and only the neighboring categories are grouped.

Use z-score instead of student's test The z-score and student's test methods determine whether the behavior in different bins is similar. The student's test is the most widely used statistical method to see if two sets of data differ significantly.

Select this option for compatibility with previous Prediction Studio versions.

Auto grouping Select this option to set auto grouping as a default setting. For more information, see Auto grouping option for predictors.
GranularitySet the highest acceptable probability that the difference in behavior between two adjacent intervals is spurious. Reducing the granularity reduces the number of intervals.
Minimum size (% of the sample)Set the minimum number of sample cases in each interval. Use this setting to ensure that there is sufficient evidence of the behavior of cases in the interval for its behavior to be used in grouping. Intervals with few cases are combined with their nearest neighbor.
Merge bins below minimum size in one residual bin This option is for symbolic predictors only.

Bins below the minimum size are combined into a residual bin on the assumption that there are insufficient cases for their behavior to be a basis for predictor grouping.

Deselect predictors with performance belowSet the minimum level of predictive power for a field to continue as a predictor.
Display settings
Use scientific notationSelect this option to see values displayed in a scientific notation.
Real value precisionSet the number of decimal places to display real values.
Performance difference threshold Set the maximum value for the Performance difference column in the Data analysis step. When you change a predictor's role and its performance difference value is higher than the threshold, the value is highlighted in red. This setting applies to the samples constructed with a validation set.
Did you find this content helpful? YesNo

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

We'd prefer it if you saw us at our best.

Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us