R Language

caret

Introduction#

caret is an R package that aids in data processing needed for machine learning problems. It stands for classification and regression training. When building models for a real dataset, there are some tasks other than the actual learning algorithm that need to be performed, such as cleaning the data, dealing with incomplete observations, validating our model on a test set, and compare different models.

caret helps in these scenarios, independent of the actual learning algorithms used.

Preprocessing

Pre-processing in caret is done through the preProcess() function. Given a matrix or data frame type object x, preProcess() applies transformations on the training data which can then be applied to testing data.

The heart of the preProcess() function is the method argument. Method operations are applied in this order:

  1. Zero-variance filter
  2. Near-zero variance filter
  3. Box-Cox/Yeo-Johnson/exponential transformation
  4. Centering
  5. Scaling
  6. Range
  7. Imputation
  8. PCA
  9. ICA
  10. Spatial Sign

Below, we take the mtcars data set and perform centering, scaling, and a spatial sign transform.

auto_index <- createDataPartition(mtcars$mpg, p = .8,
                                  list = FALSE,
                                  times = 1)

mt_train <- mtcars[auto_index,]
mt_test <- mtcars[-auto_index,]

process_mtcars <- preProcess(mt_train, method = c("center","scale","spatialSign"))

mtcars_train_transf <- predict(process_mtcars, mt_train)
mtcars_test_tranf <- predict(process_mtcars,mt_test)

This modified text is an extract of the original Stack Overflow Documentation created by the contributors and released under CC BY-SA 3.0 This website is not affiliated with Stack Overflow