8.6.6. sklearn.ensemble.GradientBoostingRegressor¶
- class sklearn.ensemble.GradientBoostingRegressor(loss='ls', learn_rate=0.1, n_estimators=100, subsample=1.0, min_samples_split=1, min_samples_leaf=1, max_depth=3, init=None, random_state=None)¶
Gradient Boosting for regression.
GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage a regression tree is fit on the negative gradient of the given loss function.
Parameters : loss : {‘ls’, ‘lad’}, optional (default=’ls’)
loss function to be optimized. ‘ls’ refers to least squares regression. ‘lad’ (least absolute deviation) is a highly robust loss function soley based on order information of the input variables.
learn_rate : float, optional (default=0.1)
learning rate shrinks the contribution of each tree by learn_rate. There is a trade-off between learn_rate and n_estimators.
n_estimators : int (default=100)
The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
max_depth : integer, optional (default=3)
maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
min_samples_split : integer, optional (default=1)
The minimum number of samples required to split an internal node.
min_samples_leaf : integer, optional (default=1)
The minimum number of samples required to be at a leaf node.
subsample : float, optional (default=1.0)
The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators.
References
J. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics, Vol. 29, No. 5, 2001.
- Friedman, Stochastic Gradient Boosting, 1999
T. Hastie, R. Tibshirani and J. Friedman. Elements of Statistical Learning Ed. 2, Springer, 2009.
Examples
>>> samples = [[0, 0, 2], [1, 0, 0]] >>> labels = [0, 1] >>> from sklearn.ensemble import GradientBoostingRegressor >>> gb = GradientBoostingRegressor().fit(samples, labels) >>> print gb.predict([[0, 0, 0]]) [ 1.32806997e-05]
Attributes
feature_importances_ array, shape = [n_features] The feature importances (the higher, the more important the feature). oob_score_ array, shape = [n_estimators] Score of the training dataset obtained using an out-of-bag estimate. The i-th score oob_score_[i] is the deviance (= loss) of the model at iteration i on the out-of-bag sample. train_score_ array, shape = [n_estimators] The i-th score train_score_[i] is the deviance (= loss) of the model at iteration i on the in-bag sample. If subsample == 1 this is the deviance on the training data. Methods
fit(X, y) Fit the gradient boosting model. fit_stage(i, X, X_argsorted, y, y_pred, ...) Fit another stage of n_classes_ trees to the boosting model. get_params([deep]) Get parameters for the estimator predict(X) Predict regression target for X. score(X, y) Returns the coefficient of determination R^2 of the prediction. set_params(**params) Set the parameters of the estimator. staged_decision_function(X) Compute decision function for X. staged_predict(X) Predict regression target at each stage for X. - __init__(loss='ls', learn_rate=0.1, n_estimators=100, subsample=1.0, min_samples_split=1, min_samples_leaf=1, max_depth=3, init=None, random_state=None)¶
- fit(X, y)¶
Fit the gradient boosting model.
Parameters : X : array-like, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples and n_features is the number of features. Use fortran-style to avoid memory copies.
y : array-like, shape = [n_samples]
Target values (integers in classification, real numbers in regression) For classification, labels must correspond to classes 0, 1, ..., n_classes_-1
Returns : self : object
Returns self.
- fit_stage(i, X, X_argsorted, y, y_pred, sample_mask)¶
Fit another stage of n_classes_ trees to the boosting model.
- get_params(deep=True)¶
Get parameters for the estimator
Parameters : deep: boolean, optional :
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- predict(X)¶
Predict regression target for X.
Parameters : X : array-like of shape = [n_samples, n_features]
The input samples.
Returns : y: array of shape = [n_samples] :
The predicted values.
- score(X, y)¶
Returns the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y - y_pred) ** 2).sum() and v is the residual sum of squares ((y_true - y_true.mean()) ** 2).sum(). Best possible score is 1.0, lower values are worse.
Parameters : X : array-like, shape = [n_samples, n_features]
Training set.
y : array-like, shape = [n_samples]
Returns : z : float
- set_params(**params)¶
Set the parameters of the estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns : self :
- staged_decision_function(X)¶
Compute decision function for X.
This method allows monitoring (i.e. determine error on testing set) after each stage.
Parameters : X : array-like of shape = [n_samples, n_features]
The input samples.
Returns : f : array of shape = [n_samples, n_classes]
The decision function of the input samples. Classes are ordered by arithmetical order. Regression and binary classification are special cases with n_classes == 1.
- staged_predict(X)¶
Predict regression target at each stage for X.
This method allows monitoring (i.e. determine error on testing set) after each stage.
Parameters : X : array-like of shape = [n_samples, n_features]
The input samples.
Returns : y : array of shape = [n_samples]
The predicted value of the input samples.