Genetic algorithm settings
In the Genetic algorithm section of a predictive model, you can change default values of selected options for creating these algorithms.
Genetic algorithm tab settings
 Add pool
 Add a pool (or population) of predictive models under the control of the genetic algorithm. Click the name of the pool to change its settings.
 Default settings
 Define the settings for the construction and operation of a pool.
Pool details dialog box settings
 Pool details
 Pool name
 Enter a name of the pool.
 Pool size
 Enter several models in the Pool
 Technique
 Technique

Select the type of genetic algorithm that is used for developing the pool:
 Generational
 Each generation creates an entirely new pool of models by selecting the fittest ones from the original pool as parents, and recombining them to produce new offspring.
 Steady state
 Each generation replaces a certain number of models from the pool. In each generation, the fittest models from the original pool are selected as parents and recombined to produce new offspring. The new offspring replace the worst models in the original pool. This algorithm tends to converge faster than the generational algorithm.
 Hill climbing
 Each generation uses every model as a parent. After randomly selecting another parent, the offspring are created by recombining the parents. The offspring replace the parents only if they are fitter than the parents. This ensures a monotonically increasing average fitness.
 Simulated annealing

This algorithm uses mutation to create offspring. Each generation mutates every model to create new offspring. If the fitness of the offspring is better than their parent, they replace the parent. Otherwise, there is still the probability of acceptance determined by the Boltzmann equation (difference in fitness divided by the current temperature). After each generation, the temperature is decreased by using the specified decrease factor. The simulated annealing algorithm is designed to circumvent premature convergence at early stages.
If the best and average performance in a pool have not improved for several generations, try switching to this technique to produce new models and, after some time, select one of the other genetic algorithm techniques.
 Optimize new (sub)models
 Select this option to optimize the parameters of each part of the model as it is changed. This ensures the best use of the predictive information is made throughout the model.
 Sampling mechanism

Select the sampling mechanism that is used for developing the pool:
 Stochastic universal
 The relative fitness of a model, when compared to all other models in the pool, determines the probability of its selection as a parent. This is known as the stochastic universal version of the roulette wheel selection. The stochastic universal mechanism produces a selection that is more accurate in reflecting the relative fitness of the models than the steady state mechanism.
 Roulette wheel
 The relative fitness of a model, when compared to all other models in the pool, determines the probability of its selection as a parent. This is known as the roulette wheel selection because the process is similar to spinning a roulette wheel in which fitter models have more numbers on the wheel relatively to less fit models. There is a greater probability of selecting a highly fit model. The wheel is spun to select each parent.
 Tournament
 This method randomly picks a certain number of models as contestants for a tournament. The fittest model in this collection wins the tournament and it is selected as parent.
 Scaling method

Select the scaling method that is used for developing the pool:
 No scaling
 The raw fitness values are used to determine the selection probabilities of models. However, this can also lead to premature convergence when some of the models have exceptionally high fitness values. Before using raw fitness values, rescale the fitness values by using an alternative scaling method.
 Rank linear

Using this method, the fittest model is given a fitness of between 1 and 2. The worst model is given a fitness of . Intermediate models get the fitness value given by the following interpolation formula:
where .
 Rank exponential
 Exponential ranking gives more chance to the worst models at the expense of those above average. The fittest model gets a fitness of 1.0, the second best is given a fitness of (typically, about 0.99). The third best is assigned and so on. The last one receives .
 Linear

Linear scaling adjusts the fitness values of all models in such a way that models with average fitness get a fixed number of expected offspring.
If the minimum yields a positive scaled value:
Otherwise:
In both cases, the average always gets a scaled value of 1. In the first case, the maximum is assigned a scaled value of , whereas, in the second case, the minimum is mapped to 0.
 Windowing
 This scaling method introduces a moving baseline. The worst value observed in the most recent generations is subtracted from the fitness values, where is known as the window size, typically between 2 and 10. This scaling method increases the chance of selecting the worst model, which prevents the pool from prematurely optimizing around the current best model.
 Sigma
 This scaling method dynamically determines a baseline based on standard deviation. It sets the baseline s, and the standard deviation below the mean, where s is the scaling factor, typically between 2 and 5.
 Elite size
 Number of the topperforming models in one generation that are carried onto the next generation. Enter 1 to prevent the pool from losing its best model.
 Replacement count
 Enter the number of models to replace at each generation of the steady state algorithm.
 Tournament size
 Enter the number of tournament contestants for the tournament sampling.
 Scaling parameter
 Enter the number for the parameter or parameters that are used in each scaling method for finetuning.
 Model construction
 Use bivariate statistics
 Select this option to use the operators and their parameters that are identified as best at modeling the interactions between predictors when you create a bivariate model.
 Use predictor groups
 Select this option to use one predictor from each of the groups that are identified during predictor grouping and only replace a predictor with another one from the same group. This option prevents the inclusion of duplicate predictors and minimizes the size of the model that is required to incorporate all information. Clear this option to increase model depth and allow more freedom to the genetic algorithm.
 Enable intelligent genetics
 Enable intelligent genetics to develop nonlinear models (where nonlinearity is assumed from the outset) that might outperform models that are developed by structural genetics. This strategy initially generates models with a lower performance, and it is a slow and computationally more expensive process. The result is identical size models and, if the relationship between data and behavior is nonlinear, these models have greater predictive power.
 Enable structural genetics
 Structural genetics is the default strategy to develop nearlinear models that are at least as powerful as regression models. Nonlinear operators are introduced only where they improve performance. Initially, structural genetics generates models with higher performance, and model generation is faster. The result is variable size models with greater data efficiency, which is translated in achieving more power from the same data. The models are easier to understand because they are more linear and robust, and more likely to perform as expected on different data.
 Maximum tree depth

Specify the maximum number of levels in the models. For balanced models, the minimum
is given by the following formula:
 Crossover mutation
 Crossover probability
 Specify the probability of crossover occurrence during the creation of the offspring. Crossover is the process of creating models by exchanging branches of parent trees.
 Mutation probability
 Specify the probability of mutation occurrence on the created offspring. Mutation is the random alteration of a (randomly selected) node in a model.
 Branch replacement
 Specify the probability of replacing whole branches with randomly created ones during mutation.
 Node replacement
 Specify the probability of changing only the type of a node in a model.
 Argument swapping
 Specify the probability of changing the child order (argument order) of a node in a model.
 Simulated annealing
 Initial temperature
 Specify the initial value of the temperature that controls the amount of change to models.
 Temperature decrease
 Specify the rate at which the temperature decreases with each generation.