[ VIGRA Homepage | Function Index | Class Index | Namespaces | File List | Main Page ]

Public Member Functions
RandomForestOptions Class Reference

Options object for the random forest. More...

#include <vigra/random_forest/rf_common.hxx>

List of all members.

Public Member Functions

RandomForestOptionsfeatures_per_node (RF_OptionTag in)
 use built in mapping to calculate mtry
RandomForestOptionsfeatures_per_node (int(*in)(int))
 use a external function to calculate mtry
RandomForestOptionsfeatures_per_node (int in)
 Set mtry to a constant value.
RandomForestOptionsfeaturesPerNode (unsigned int n)
RandomForestOptionsmin_split_node_size (int in)
 Number of examples required for a node to be split.
RandomForestOptionsminSplitNodeSize (unsigned int n)
RandomForestOptionspredict_weighted ()
 weight each tree with number of samples in that node
 RandomForestOptions ()
 RandomForestOptions ()
 create a RandomForestOptions object with default initialisation.
RandomForestOptionssample_with_replacement (bool in)
 sample from training population with or without replacement?
RandomForestOptionssampleClassesIndividually (bool s)
RandomForestOptionssamples_per_tree (int(*in)(int))
 use external function to calculate the number of samples each tree should be learnt with.
RandomForestOptionssamples_per_tree (double in)
 specify the fraction of the total number of samples used per tree for learning.
RandomForestOptionssamples_per_tree (int in)
 directly specify the number of samples per tree
RandomForestOptionssampleWithReplacement (bool r)
RandomForestOptionstrainingSetSizeAbsolute (unsigned int s)
RandomForestOptionstrainingSetSizeProportional (double p)
RandomForestOptionstree_count (int in)
RandomForestOptionsuse_stratification (RF_OptionTag in)
 specify stratification strategy
template<class WeightIterator >
RandomForestOptionsweights (WeightIterator weights, unsigned int classCount)

Public Attributes

sampling options
double training_set_proportion_
int training_set_size_
int(* training_set_func_ )(int)
RF_OptionTag training_set_calc_switch_
bool sample_with_replacement_
RF_OptionTag stratification_method_
general random forest options

these usually will be used by most split functors and stopping predicates

RF_OptionTag mtry_switch_
int mtry_
int(* mtry_func_ )(int)
bool predict_weighted_
int tree_count_
int min_split_node_size_
bool prepare_online_learning_

Detailed Description

Options object for the random forest.

usage: RandomForestOptions a = RandomForestOptions() .param1(value1) .param2(value2) ...

This class only contains options/parameters that are not problem dependent. The ProblemSpec class contains methods to set class weights if necessary.

Note that the return value of all methods is *this which makes concatenating of options as above possible.


Constructor & Destructor Documentation

create a RandomForestOptions object with default initialisation.

look at the other member functions for more information on default values

Initialize all options with default values.


Member Function Documentation

RandomForestOptions& use_stratification ( RF_OptionTag  in)

specify stratification strategy

default: RF_NONE possible values: RF_EQUAL, RF_PROPORTIONAL, RF_EXTERNAL, RF_NONE RF_EQUAL: get equal amount of samples per class. RF_PROPORTIONAL: sample proportional to fraction of class samples in population RF_EXTERNAL: strata_weights_ field of the ProblemSpec_t object has been set externally. (defunct)

RandomForestOptions& sample_with_replacement ( bool  in)

sample from training population with or without replacement?


Default: true

RandomForestOptions& samples_per_tree ( double  in)

specify the fraction of the total number of samples used per tree for learning.

This value should be in [0.0 1.0] if sampling without replacement has been specified.


default : 1.0

RandomForestOptions& samples_per_tree ( int(*)(int)  in)

use external function to calculate the number of samples each tree should be learnt with.

Parameters:
infunction pointer that takes the number of rows in the learning data and outputs the number samples per tree.
RandomForestOptions& features_per_node ( RF_OptionTag  in)

use built in mapping to calculate mtry

Use one of the built in mappings to calculate mtry from the number of columns in the input feature data.

Parameters:
inpossible values: RF_LOG, RF_SQRT or RF_ALL
default: RF_SQRT.
RandomForestOptions& features_per_node ( int  in)

Set mtry to a constant value.

mtry is the number of columns/variates/variables randomly choosen to select the best split from.

RandomForestOptions& features_per_node ( int(*)(int)  in)

use a external function to calculate mtry

Parameters:
infunction pointer that takes int (number of columns of the and outputs int (mtry)
RandomForestOptions& tree_count ( int  in)

How many trees to create?


Default: 255.

RandomForestOptions& min_split_node_size ( int  in)

Number of examples required for a node to be split.

When the number of examples in a node is below this number, the node is not split even if class separation is not yet perfect. Instead, the node returns the proportion of each class (among the remaining examples) during the prediction phase.
Default: 1 (complete growing)

RandomForestOptions& featuresPerNode ( unsigned int  n)

Number of features considered in each node.

If n is 0 (the default), the number of features tried in every node is determined by the square root of the total number of features. According to Breiman, this quantity should slways be optimized by means of the out-of-bag error.
Default: 0 (use sqrt(columnCount(featureMatrix)))

RandomForestOptions& sampleWithReplacement ( bool  r)

How to sample the subset of the training data for each tree.

Each tree is only trained with a subset of the entire training data. If r is true, this subset is sampled from the entire training set with replacement.
Default: true (use sampling with replacement))

RandomForestOptions& trainingSetSizeProportional ( double  p)

Proportion of training examples used for each tree.

If p is 1.0 (the default), and samples are drawn with replacement, the training set of each tree will contain as many examples as the entire training set, but some are drawn multiply and others not at all. On average, each tree is actually trained on about 65% of the examples in the full training set. Changing the proportion makes mainly sense when sampleWithReplacement() is set to false. trainingSetSizeProportional() gets overridden by trainingSetSizeAbsolute().
Default: 1.0

RandomForestOptions& trainingSetSizeAbsolute ( unsigned int  s)

Size of the training set for each tree.

If this option is set, it overrides the proportion set by trainingSetSizeProportional(). When classes are sampled individually, the number of examples is divided by the number of classes (rounded upwards) to determine the number of examples drawn from every class.
Default: 0 (determine size by proportion)

RandomForestOptions& sampleClassesIndividually ( bool  s)

Are the classes sampled individually?

If s is false (the default), the training set for each tree is sampled without considering class labels. Otherwise, samples are drawn from each class independently. The latter is especially useful in connection with the specification of an absolute training set size: then, the same number of examples is drawn from every class. This can be used as a counter-measure when the classes are very unbalanced in size.
Default: false

RandomForestOptions& minSplitNodeSize ( unsigned int  n)

Number of examples required for a node to be split.

When the number of examples in a node is below this number, the node is not split even if class separation is not yet perfect. Instead, the node returns the proportion of each class (among the remaining examples) during the prediction phase.
Default: 1 (complete growing)

RandomForestOptions& weights ( WeightIterator  weights,
unsigned int  classCount 
)

Use a weighted random forest.

This is usually used to penalize the errors for the minority class. Weights must be convertible to double, and the array of weights must contain as many entries as there are classes.
Default: do not use weights


The documentation for this class was generated from the following files:

© Ullrich Köthe (ullrich.koethe@iwr.uni-heidelberg.de)
Heidelberg Collaboratory for Image Processing, University of Heidelberg, Germany

html generated using doxygen and Python
vigra 1.7.0 (Thu Aug 25 2011)