Module connectome.visualization.lgb_shapley
this module contains a wrapper to compute and visualize the SHAP values for the given model at hand - implemented for gradient boosting models (lightgbm) so far
Classes
class ShapleyLGB (model: Union[lightgbm.sklearn.LGBMModel, GB], data: pandas.core.frame.DataFrame)-
wrapper to use/compute and visualise shapley values for a given lgb model
Examples:
>>> from connectome.visualization.lgb_shapley import ShapleyLGB >>> from sklearn.datasets import make_classification >>> import pandas as pd >>> from lightgbm import LGBMClassifier >>> >>> # initialize model >>> lgb_class = LGBMClassifier() >>> >>> # create synthetic data >>> X, y = make_classification(n_informative=10) >>> X = pd.DataFrame(X, columns=["feature_" + str(i) for i in range(X.shape[1])]) >>> >>> # fit the model >>> lgb_class.fit(X, y) >>> >>> # shapley values and analysis for the classification case >>> sh_class = ShapleyLGB(lgb_class, X) >>> sh_class.summ_plot(5) >>> class_imp = sh_class.shapley_importance() >>> print(class_imp) >>> sh_class.depend_plot(class_imp.iloc[0, 2])Methods
def depend_plot(self, feature_ind: int) ‑> None-
visualises the dependence plot for the selected feature
Args
feature_ind- index of the feature/column
Returns
None
def explain_prediction(self, ind: int) ‑> pandas.core.frame.DataFrame-
returns the computed shapley values for a given observation
Args
ind- index of the observation / row
Returns
dataframe containing the feature names and associated shapley values for the chosen observation
def get_shapley_values(self) ‑> numpy.ndarray-
returns the computed shapley values as a numpy array
Returns
numpy array containing the shapley values
def get_shapley_values_df(self) ‑> pandas.core.frame.DataFrame-
returns the computed shapley values as a DataFrame
Returns
DataFrame containing the shapley values
def plot_n_imp(self, n: int = 10) ‑> None-
plots the dependence plots for the n most important features (based on the sum of the absolute shapley values)
Args
n- the n most important features to return
Returns:
def plot_single_prediction(self, ind: int) ‑> shap.plots._force.AdditiveForceVisualizer-
visualises the computed shapley values for a single observation
Args
ind- index of the observation / row
Returns
plot of the shapley values
def shapley_importance(self, n: int = 10)-
returns the n most important features based on the sum of the absolute value of the shapley values
Args
n- the n most important features to return
Returns
DataFrame of the n most important features - including shapley values and the index of the respective column
def summ_plot(self, max_features: int = 25) ‑> None-
visualises the shapley values depending on the values of the respective features
Args
max_features- number of features to display
Returns
None