Explaining AI Visualization(ModelStudio)

Machine Learning modelStudio tidyverse DALEX

This post is inspired by the model and post written by Matt Dancho and being repricated on other data set to ultimately interprete the outputs of ML nodel in visual and most understandable way. ‘Machine learning is great, until you have to explain it’: Matt Dancho has said,

RPubs blog posts modelStudio

Overview

This small blog post is aiming to visual explain the output of trained and won machine learning model through the usage of modelstudio library created by Matt Dancho and being hosted in CRAN. modelStudio is a new R package that makes it easy to interactively explain machine learning models using state-of-the-art techniques like Shapley Values, Break Down plots, and Partial Dependence (Matt Dancho, 2022).

Steps and Workflow

In this blog, we will learn how to make my 4 most important Explainable AI plots:

Step1: Loading data and Libaries

library(modelStudio)
library(tidyverse)
library(DALEX)
library(tidymodels)

data<- rio::import("C:/Users/jmurera/Desktop/Blog/myblog/data/Breast_cancer_data.csv")

data_tbl<-data%>% mutate_if(is.integer, as.factor) %>% as.tibble()

The data to be used looks like this

library(flextable)
ft <- flextable(head(data_tbl))
ft <- autofit(ft)

for(i in 1:1){
  flextable_to_rmd(ft)
}

We want to understand how breast cancer diagnosis status can be estimated based on the remaining 5 columns.

Step2: Make a Predictive Model

The best way to understand what affects cancer diagnosis decision is to build a predictive model (and then explain it). Let’s build an xgboost model using the tidymodels ecosystem. If you’ve never heard of Tidymodels, it’s like Scikit Learn for R and CARET ecosystem.

fit_xgboost<- boost_tree(learn_rate = 0.3) %>% 
              set_mode("classification") %>% 
              set_engine("xgboost") %>% 
              fit(diagnosis~.,data=data_tbl)
#fit_xgboost

Step3: Make Explainer

With above predictive model, we are ready to create an explainer. In basic terms, an explainer is a consistent and unified way to explain predictive models. The explainer can accept many different model types like:

Now, below is the code to create the explainer.

#---------Explainer
explainer<-DALEX::explain(
  model = fit_xgboost,
  data= data_tbl[,-6],
  y=as.numeric(unlist(data_tbl[,6])),
  label = "Extreme Gradient Boosting Machine (XGBoost)"
)

Preparation of a new explainer is initiated -> model label : Extreme Gradient Boosting Machine (XGBoost) -> data : 569 rows 5 cols -> data : tibble converted into a data.frame -> target variable : 569 values -> predict function : yhat.model_fit will be used ( default ) -> predicted values : No value for predict function target column. ( default ) -> model_info : package parsnip , ver. 1.0.3 , task classification ( default ) -> predicted values : numerical, min = 0.00912416 , mean = 0.6253965 , max = 0.9911299
-> residual function : difference between y and yhat ( default ) -> residuals : numerical, min = 0.1481095 , mean = 1.00202 , max = 1.515258
A new explainer has been created!

Step4: Run modelStudio

modStudio<-modelStudio::modelStudio(explainer = explainer)
#modStudio

Acknowledgement

We intensively express our recognition to the developers of modelStudio who are Hubert Baniecki and Przemyslaw Biecek. This package is part of the Dr. Why ecosystem of R packages, which are a collection of tools for Visual Exploration, Explanation and Debugging of Predictive Models. Thank you for everything you do, we owe you much respect to simplify our work.