logitboost {LogitBoost} | R Documentation |
An implementation of the LogitBoost classification algorithm with decision stumps as weak learners. Additionally, a feature preselection method for handling datasets with many explanatory variables and and estimation of the stopping parameter via v-fold cross validation are provided.
logitboost(xlearn, ylearn, xtest, mfinal, presel = 0, estimate = 0, verbose = FALSE)
xlearn |
A matrix, whose n rows contain the training instances. |
ylearn |
A vector of length n containing the class labels from individuals of K different classes. The labels need to be coded by consecutive integers from 0 to (K-1). |
xtest |
A matrix, whose rows contain the test instances. |
mfinal |
An integer, describing the number of iterations for which boosting should be run. |
presel |
An integer, giving the number of features to be used for classification. If presel=0, no feature preselection is carried out. |
estimate |
An integer, specifying the v of an additional, internal v-fold cross validation on the respective training data for stopping parameter estimation. Please note that this is (especially for larger values of 'estimate') extremly time consuming. The default value of estimate=0 means no stopping parameter estimation. |
verbose |
Logical, indicates whether comments should be given. |
probs |
Array, whose rows contain out of sample probabilities that the class labels are predicted as 1, for every boosting iteration. For multiclass problems, the third dimension of the array are the probabilites for the K binary one-against-all partitions of the data. |
loglikeli |
Array, contains the log-likelihood across the training instances for determination of the stopping parameter if estimate>0. For multiclass problems, the third dimension of the array contains the values for the K binary one-against-all partitions of the data. |
Marcel Dettling
See "Boosting for Tumor Classification of Gene Expression Data", Dettling and Buhlmann (2002), available on the web page http://stat.ethz.ch/~dettling/boosting.html
data(leukemia) ## Dividing the leukemia dataset into training and test data xlearn <- leukemia.x[c(1:20, 34:38),] ylearn <- leukemia.y[c(1:20, 34:38)] xtest <- leukemia.x[21:33,] ytest <- leukemia.y[21:33] ## An example without stopping parameter estimation fit <- logitboost(xlearn, ylearn, xtest, mfinal=100, presel=75, verbose=TRUE) summarize(fit, ytest) ## Now with stopping parameter estimation by 4-fold cross validation fit <- logitboost(xlearn, ylearn, xtest, mfinal=100, pre=75, esti=4, verb=TRUE) summarize(fit, ytest)