[ VIGRA Homepage | Function Index | Class Index | Namespaces | File List | Main Page ]
Options object for the random forest. More...
#include <vigra/random_forest/rf_common.hxx>
Public Member Functions | |
RandomForestOptions & | features_per_node (RF_OptionTag in) |
use built in mapping to calculate mtry | |
RandomForestOptions & | features_per_node (int(*in)(int)) |
use a external function to calculate mtry | |
RandomForestOptions & | features_per_node (int in) |
Set mtry to a constant value. | |
RandomForestOptions & | featuresPerNode (unsigned int n) |
RandomForestOptions & | min_split_node_size (int in) |
Number of examples required for a node to be split. | |
RandomForestOptions & | minSplitNodeSize (unsigned int n) |
RandomForestOptions & | predict_weighted () |
weight each tree with number of samples in that node | |
RandomForestOptions () | |
RandomForestOptions () | |
create a RandomForestOptions object with default initialisation. | |
RandomForestOptions & | sample_with_replacement (bool in) |
sample from training population with or without replacement? | |
RandomForestOptions & | sampleClassesIndividually (bool s) |
RandomForestOptions & | samples_per_tree (int(*in)(int)) |
use external function to calculate the number of samples each tree should be learnt with. | |
RandomForestOptions & | samples_per_tree (double in) |
specify the fraction of the total number of samples used per tree for learning. | |
RandomForestOptions & | samples_per_tree (int in) |
directly specify the number of samples per tree | |
RandomForestOptions & | sampleWithReplacement (bool r) |
RandomForestOptions & | trainingSetSizeAbsolute (unsigned int s) |
RandomForestOptions & | trainingSetSizeProportional (double p) |
RandomForestOptions & | tree_count (int in) |
RandomForestOptions & | use_stratification (RF_OptionTag in) |
specify stratification strategy | |
template<class WeightIterator > | |
RandomForestOptions & | weights (WeightIterator weights, unsigned int classCount) |
Public Attributes | |
sampling options | |
double | training_set_proportion_ |
int | training_set_size_ |
int(* | training_set_func_ )(int) |
RF_OptionTag | training_set_calc_switch_ |
bool | sample_with_replacement_ |
RF_OptionTag | stratification_method_ |
general random forest options | |
these usually will be used by most split functors and stopping predicates | |
RF_OptionTag | mtry_switch_ |
int | mtry_ |
int(* | mtry_func_ )(int) |
bool | predict_weighted_ |
int | tree_count_ |
int | min_split_node_size_ |
bool | prepare_online_learning_ |
Options object for the random forest.
usage: RandomForestOptions a = RandomForestOptions() .param1(value1) .param2(value2) ...
This class only contains options/parameters that are not problem dependent. The ProblemSpec class contains methods to set class weights if necessary.
Note that the return value of all methods is *this which makes concatenating of options as above possible.
create a RandomForestOptions object with default initialisation.
look at the other member functions for more information on default values
Initialize all options with default values.
RandomForestOptions& use_stratification | ( | RF_OptionTag | in | ) |
specify stratification strategy
default: RF_NONE possible values: RF_EQUAL, RF_PROPORTIONAL, RF_EXTERNAL, RF_NONE RF_EQUAL: get equal amount of samples per class. RF_PROPORTIONAL: sample proportional to fraction of class samples in population RF_EXTERNAL: strata_weights_ field of the ProblemSpec_t object has been set externally. (defunct)
RandomForestOptions& sample_with_replacement | ( | bool | in | ) |
sample from training population with or without replacement?
Default: true
RandomForestOptions& samples_per_tree | ( | double | in | ) |
specify the fraction of the total number of samples used per tree for learning.
This value should be in [0.0 1.0] if sampling without replacement has been specified.
default : 1.0
RandomForestOptions& samples_per_tree | ( | int(*)(int) | in | ) |
use external function to calculate the number of samples each tree should be learnt with.
in | function pointer that takes the number of rows in the learning data and outputs the number samples per tree. |
RandomForestOptions& features_per_node | ( | RF_OptionTag | in | ) |
use built in mapping to calculate mtry
Use one of the built in mappings to calculate mtry from the number of columns in the input feature data.
in | possible values: RF_LOG, RF_SQRT or RF_ALL default: RF_SQRT. |
RandomForestOptions& features_per_node | ( | int | in | ) |
Set mtry to a constant value.
mtry is the number of columns/variates/variables randomly choosen to select the best split from.
RandomForestOptions& features_per_node | ( | int(*)(int) | in | ) |
use a external function to calculate mtry
in | function pointer that takes int (number of columns of the and outputs int (mtry) |
RandomForestOptions& tree_count | ( | int | in | ) |
How many trees to create?
Default: 255.
RandomForestOptions& min_split_node_size | ( | int | in | ) |
Number of examples required for a node to be split.
When the number of examples in a node is below this number, the node is not split even if class separation is not yet perfect. Instead, the node returns the proportion of each class (among the remaining examples) during the prediction phase.
Default: 1 (complete growing)
RandomForestOptions& featuresPerNode | ( | unsigned int | n | ) |
Number of features considered in each node.
If n is 0 (the default), the number of features tried in every node is determined by the square root of the total number of features. According to Breiman, this quantity should slways be optimized by means of the out-of-bag error.
Default: 0 (use sqrt(columnCount(featureMatrix))
)
RandomForestOptions& sampleWithReplacement | ( | bool | r | ) |
How to sample the subset of the training data for each tree.
Each tree is only trained with a subset of the entire training data. If r is true
, this subset is sampled from the entire training set with replacement.
Default: true
(use sampling with replacement))
RandomForestOptions& trainingSetSizeProportional | ( | double | p | ) |
Proportion of training examples used for each tree.
If p is 1.0 (the default), and samples are drawn with replacement, the training set of each tree will contain as many examples as the entire training set, but some are drawn multiply and others not at all. On average, each tree is actually trained on about 65% of the examples in the full training set. Changing the proportion makes mainly sense when sampleWithReplacement() is set to false
. trainingSetSizeProportional() gets overridden by trainingSetSizeAbsolute().
Default: 1.0
RandomForestOptions& trainingSetSizeAbsolute | ( | unsigned int | s | ) |
Size of the training set for each tree.
If this option is set, it overrides the proportion set by trainingSetSizeProportional(). When classes are sampled individually, the number of examples is divided by the number of classes (rounded upwards) to determine the number of examples drawn from every class.
Default: 0
(determine size by proportion)
RandomForestOptions& sampleClassesIndividually | ( | bool | s | ) |
Are the classes sampled individually?
If s is false
(the default), the training set for each tree is sampled without considering class labels. Otherwise, samples are drawn from each class independently. The latter is especially useful in connection with the specification of an absolute training set size: then, the same number of examples is drawn from every class. This can be used as a counter-measure when the classes are very unbalanced in size.
Default: false
RandomForestOptions& minSplitNodeSize | ( | unsigned int | n | ) |
Number of examples required for a node to be split.
When the number of examples in a node is below this number, the node is not split even if class separation is not yet perfect. Instead, the node returns the proportion of each class (among the remaining examples) during the prediction phase.
Default: 1 (complete growing)
RandomForestOptions& weights | ( | WeightIterator | weights, |
unsigned int | classCount | ||
) |
Use a weighted random forest.
This is usually used to penalize the errors for the minority class. Weights must be convertible to double
, and the array of weights must contain as many entries as there are classes.
Default: do not use weights
© Ullrich Köthe (ullrich.koethe@iwr.uni-heidelberg.de) |
html generated using doxygen and Python
|