caret
Introduction#
caret
is an R package that aids in data processing needed for machine learning problems. It stands for classification and regression training. When building models for a real dataset, there are some tasks other than the actual learning algorithm that need to be performed, such as cleaning the data, dealing with incomplete observations, validating our model on a test set, and compare different models.
caret
helps in these scenarios, independent of the actual learning algorithms used.
Preprocessing
Pre-processing in caret is done through the preProcess()
function. Given a matrix or data frame type object x
, preProcess()
applies transformations on the training data which can then be applied to testing data.
The heart of the preProcess()
function is the method
argument. Method operations are applied in this order:
- Zero-variance filter
- Near-zero variance filter
- Box-Cox/Yeo-Johnson/exponential transformation
- Centering
- Scaling
- Range
- Imputation
- PCA
- ICA
- Spatial Sign
Below, we take the mtcars data set and perform centering, scaling, and a spatial sign transform.
auto_index <- createDataPartition(mtcars$mpg, p = .8,
list = FALSE,
times = 1)
mt_train <- mtcars[auto_index,]
mt_test <- mtcars[-auto_index,]
process_mtcars <- preProcess(mt_train, method = c("center","scale","spatialSign"))
mtcars_train_transf <- predict(process_mtcars, mt_train)
mtcars_test_tranf <- predict(process_mtcars,mt_test)