# To Trust or Not-to-Trust?

There is no simple rule to determine if someone - or something - is trustworthy. Unlike little ditties for poison ivy like leaflets three, let it be, you need to examine multiple features of alien lifeforms to determine if they are friendly or not.

This slide presentation provides the results of predicting the friendly from dangerous Boozonians on planet Hamiltus in the Allen galaxy. Truth be told, the inspiration for this data set is an adaption of the Mushroom Dataset found in the UCI Mushroom Data Set. There are so many web-based solutions for UCI datasets that the team decided to obscure the data without affecting its underlying predictive value.

The dataset provides 8,123 observations from a plantary probe send prior to mission execution. It contains 21 features. The dataset is fairly balanced, which allows for a more generous environment in applying predictive modeling techniques:

• friendly lifeforms: 4,208 (51.8%)
• dangerous lifeforms: 3,915 (48.2%)

This deck is written in R Slidy to demonstrate modeling approaches that yield accurate prediction of the data provided into one of two classes.

Machine learning can keep you alive in use cases of interstellar exploration. Keeping yourself safe from the many other ways of becoming alien-chow is on you.

Good luck, live long and prosper using this model.

# Three classification methods checked

• Classification Tree (less than 1% error)
• Conditional Inference Tree (less than 1% error)
• Random Forest (perfect classifier)

If you had to choose your friends, which ones “work for you?”

# Classification Tree

Let’s split the data into a 70% training set to do the machine learning and use the other 30% to test the model.

alien <- read.csv("./data/mushroomUCI_adapted.csv")

set.seed(524)
train <- sample_frac(alien, 0.7, replace = FALSE)
rows <- as.numeric(row.names(train))
test <- alien[-rows, ]

fit <- rpart(result ~ . , data = train, method = "class")
predicted <- predict(fit, newdata = test, type = "class")
table(predicted, test$result) ## ## predicted dangerous friendly ## dangerous 1164 0 ## friendly 11 1262 Not bad…unless you meet one of those 11 aliens. # Classification Tree fancyRpartPlot(fit, main = "Alien Lifeform Classification Results", sub = "Variable Feature legend is available in the data dictionary") # Conditional Inference Tree fitC <- ctree(result ~ ., data = train) table(predict(fitC, newdata = test), test$result)
##
##             dangerous friendly
##   dangerous      1169        0
##   friendly          6     1262

This is a better predictor, but you still have 6 chances to…well…

                     (*_*)  -->  (X_X)

# Random Forest

fitRF <- randomForest(result ~ . ,   data = train)
predictedRF <- predict(fitRF, newdata = test, type = "class")
table(predictedRF, test\$result)
##
## predictedRF dangerous friendly
##   dangerous      1175        0
##   friendly          0     1262

This table shows the results of a random forest model.

In this case, it is a perfect classifier, because of the iterative nature of random forests.

They are not plotted easily, so this output is the way to determine results.

# Random Forest

…and here are the relative importance of the variables

##                   variable importance
## 1                     odor 977.350903
## 2       tattoo.print.color 466.871794
## 3               gill.color 242.682321
## 4                gill.size 178.217424
## 5  body.surface.above.neck 137.263940
## 6  body.surface.below.neck 131.236102
## 7                 eye.type 130.456605
## 8               population 109.929538
## 9                  habitat  75.269221
## 10            gill.spacing  67.248376
## 11                 lesions  64.421793
## 21               foot.type   0.000000