Module `connectome.visualization.lgb_shapley`

this module contains a wrapper to compute and visualize the SHAP values for the given model at hand - implemented for gradient boosting models (lightgbm) so far

Classes

class ShapleyLGB (model: Union[lightgbm.sklearn.LGBMModel, GB], data: pandas.core.frame.DataFrame)

wrapper to use/compute and visualise shapley values for a given lgb model

Examples:

>>> from connectome.visualization.lgb_shapley import ShapleyLGB
>>> from sklearn.datasets import make_classification
>>> import pandas as pd
>>> from lightgbm import LGBMClassifier
>>>
>>> # initialize model
>>> lgb_class = LGBMClassifier()
>>>
>>> # create synthetic data
>>> X, y = make_classification(n_informative=10)
>>> X = pd.DataFrame(X, columns=["feature_" + str(i) for i in range(X.shape[1])])
>>>
>>> # fit the model
>>> lgb_class.fit(X, y)
>>>
>>> # shapley values and analysis for the classification case
>>> sh_class = ShapleyLGB(lgb_class, X)
>>> sh_class.summ_plot(5)
>>> class_imp = sh_class.shapley_importance()
>>> print(class_imp)
>>> sh_class.depend_plot(class_imp.iloc[0, 2])

Methods

def depend_plot(self, feature_ind: int) ‑> None

visualises the dependence plot for the selected feature

Args

feature_ind: index of the feature/column

Returns

None

def explain_prediction(self, ind: int) ‑> pandas.core.frame.DataFrame

returns the computed shapley values for a given observation

Args

ind: index of the observation / row

Returns

dataframe containing the feature names and associated shapley values for the chosen observation

def get_shapley_values(self) ‑> numpy.ndarray

returns the computed shapley values as a numpy array

Returns

numpy array containing the shapley values

def get_shapley_values_df(self) ‑> pandas.core.frame.DataFrame

returns the computed shapley values as a DataFrame

Returns

DataFrame containing the shapley values

def plot_n_imp(self, n: int = 10) ‑> None

plots the dependence plots for the n most important features (based on the sum of the absolute shapley values)

Args

n: the n most important features to return

Returns:

def plot_single_prediction(self, ind: int) ‑> shap.plots._force.AdditiveForceVisualizer

visualises the computed shapley values for a single observation

Args

ind: index of the observation / row

Returns

plot of the shapley values

def shapley_importance(self, n: int = 10)

returns the n most important features based on the sum of the absolute value of the shapley values

Args

n: the n most important features to return

Returns

DataFrame of the n most important features - including shapley values and the index of the respective column

def summ_plot(self, max_features: int = 25) ‑> None

visualises the shapley values depending on the values of the respective features

Args

max_features: number of features to display

Returns

None