Module connectome.visualization.fi_rf_gb

functions to compute / obtain the basic feature importances and permutation feature importances from gradient boosting and random forest models

Functions

def get_fi(model: Union[lightgbm.sklearn.LGBMClassifier, lightgbm.sklearn.LGBMRegressor, sklearn.ensemble._forest.RandomForestClassifier, sklearn.ensemble._forest.RandomForestRegressor, GB], data: pandas.core.frame.DataFrame, feature_names: list = None, n: int = 10) ‑> pandas.core.frame.DataFrame

obtains the feature importances from an lgb or RF model

Examples:

>>> from connectome.visualization.fi_rf_gb import get_fi
>>> from sklearn.datasets import make_classification
>>> import pandas as pd
>>> from lightgbm import LGBMClassifier
>>>
>>> # initialize the model
>>> lgb_class = lgb.LGBMClassifier()
>>>
>>> # create synthetic data
>>> X, y = make_classification(n_informative=10)
>>> X = pd.DataFrame(X, columns=["feature_" + str(i) for i in range(X.shape[1])])
>>> # fit the model
>>> lgb_class.fit(X, y)
>>>
>>> # obtain FIs
>>> fis = get_fi(mod, X)
>>> fis.plot.bar(x='features', y='importances')

Args

model
lgb or RF model of which the feature importances should be obtained
data
DataFrame to get the column / feature names
feature_names
in case data is not in form of a pd.DataFrame
n
the first n FIs to be obtained - in a descending order

Returns

a pandas DataFrame containing the feature importances and names

def get_pfi(model: Union[lightgbm.sklearn.LGBMClassifier, lightgbm.sklearn.LGBMRegressor, sklearn.ensemble._forest.RandomForestClassifier, sklearn.ensemble._forest.RandomForestRegressor, GB], x_val: Union[numpy.ndarray, pandas.core.frame.DataFrame], y_val: numpy.ndarray, feature_names: list = None, n: int = 10, repeats: int = 30) ‑> pandas.core.frame.DataFrame

obtains the permutation feature importances from an lgb or RF model

Examples:

>>> from connectome.visualization.fi_rf_gb import get_pfi
>>> from sklearn.datasets import make_classification
>>> import pandas as pd
>>> from lightgbm import LGBMClassifier
>>>
>>> # initialize the model
>>> lgb_class = lgb.LGBMClassifier()
>>>
>>> # create synthetic data
>>> X, y = make_classification(n_informative=10)
>>> X = pd.DataFrame(X, columns=["feature_" + str(i) for i in range(X.shape[1])])
>>> # fit the model
>>> lgb_class.fit(X, y)
>>>
>>> # obtain FIs
>>> pfis = get_pfi(mod, X, y, repeats=2)
>>> pfis.plot.bar(x='features', y='importances_mean')

Args

model
lgb or RF model of which the PFIs should be obtained
x_val
DataFrame or array containing features of held out data
y_val
target variable for held out data
feature_names
in case data is not in form of a pd.DataFrame
n
the first n PFIs to be obtained - in a descending order
repeats
how often the PFI calculation should be repeated

Returns

a pandas DataFrame containing the PFIs and feature names