caret

Introduction#

caret is an R package that aids in data processing needed for machine learning problems. It stands for classification and regression training. When building models for a real dataset, there are some tasks other than the actual learning algorithm that need to be performed, such as cleaning the data, dealing with incomplete observations, validating our model on a test set, and compare different models.

caret helps in these scenarios, independent of the actual learning algorithms used.

Preprocessing

Pre-processing in caret is done through the preProcess() function. Given a matrix or data frame type object x, preProcess() applies transformations on the training data which can then be applied to testing data.

The heart of the preProcess() function is the method argument. Method operations are applied in this order:

Zero-variance filter
Near-zero variance filter
Box-Cox/Yeo-Johnson/exponential transformation
Centering
Scaling
Range
Imputation
PCA
ICA
Spatial Sign

Below, we take the mtcars data set and perform centering, scaling, and a spatial sign transform.

auto_index <- createDataPartition(mtcars$mpg, p = .8,
                                  list = FALSE,
                                  times = 1)

mt_train <- mtcars[auto_index,]
mt_test <- mtcars[-auto_index,]

process_mtcars <- preProcess(mt_train, method = c("center","scale","spatialSign"))

mtcars_train_transf <- predict(process_mtcars, mt_train)
mtcars_test_tranf <- predict(process_mtcars,mt_test)