R Language

Random Forest Algorithm


RandomForest is an ensemble method for classification or regression that reduces the chance of overfitting the data. Details of the method can be found in the Wikipedia article on Random Forests. The main implementation for R is in the randomForest package, but there are other implementations. See the CRAN view on Machine Learning.

Basic examples - Classification and Regression

    ######  Used for both Classification and Regression examples
    library(car)            ## For the Soils data
    ##    RF Classification Example
    set.seed(656)            ## for reproducibility
    S_RF_Class = randomForest(Gp ~ ., data=Soils[,c(4,6:14)])
    Gp_RF = predict(S_RF_Class, Soils[,6:14])
    length(which(Gp_RF != Soils$Gp))            ## No Errors

    ## Naive Bayes for comparison
    S_NB  = naiveBayes(Soils[,6:14], Soils[,4]) 
    Gp_NB = predict(S_NB, Soils[,6:14], type="class")
    length(which(Gp_NB != Soils$Gp))            ## 6 Errors

This example tested on the training data, but illustrates that RF can make very good models.

    ##    RF Regression Example
    set.seed(656)            ## for reproducibility
    S_RF_Reg = randomForest(pH ~ ., data=Soils[,6:14])
    pH_RF = predict(S_RF_Reg, Soils[,6:14])

    ## Compare Predictions with Actual values for RF and Linear Model
    S_LM = lm(pH ~ ., data=Soils[,6:14])
    pH_LM = predict(S_LM, Soils[,6:14])
    plot(Soils$pH, pH_RF, pch=20, ylab="Predicted", main="Random Forest")
    plot(Soils$pH, pH_LM, pch=20, ylab="Predicted", main="Linear Model")

Predicted Values vs Actuals for RF and Linear model

This modified text is an extract of the original Stack Overflow Documentation created by the contributors and released under CC BY-SA 3.0 This website is not affiliated with Stack Overflow