Module connectome.visualization.fi_rf_gb
functions to compute / obtain the basic feature importances and permutation feature importances from gradient boosting and random forest models
Functions
def get_fi(model: Union[lightgbm.sklearn.LGBMClassifier, lightgbm.sklearn.LGBMRegressor, sklearn.ensemble._forest.RandomForestClassifier, sklearn.ensemble._forest.RandomForestRegressor, GB], data: pandas.core.frame.DataFrame, feature_names: list = None, n: int = 10) ‑> pandas.core.frame.DataFrame-
obtains the feature importances from an lgb or RF model
Examples:
>>> from connectome.visualization.fi_rf_gb import get_fi >>> from sklearn.datasets import make_classification >>> import pandas as pd >>> from lightgbm import LGBMClassifier >>> >>> # initialize the model >>> lgb_class = lgb.LGBMClassifier() >>> >>> # create synthetic data >>> X, y = make_classification(n_informative=10) >>> X = pd.DataFrame(X, columns=["feature_" + str(i) for i in range(X.shape[1])]) >>> # fit the model >>> lgb_class.fit(X, y) >>> >>> # obtain FIs >>> fis = get_fi(mod, X) >>> fis.plot.bar(x='features', y='importances')Args
model- lgb or RF model of which the feature importances should be obtained
data- DataFrame to get the column / feature names
feature_names- in case data is not in form of a pd.DataFrame
n- the first n FIs to be obtained - in a descending order
Returns
a pandas DataFrame containing the feature importances and names
def get_pfi(model: Union[lightgbm.sklearn.LGBMClassifier, lightgbm.sklearn.LGBMRegressor, sklearn.ensemble._forest.RandomForestClassifier, sklearn.ensemble._forest.RandomForestRegressor, GB], x_val: Union[numpy.ndarray, pandas.core.frame.DataFrame], y_val: numpy.ndarray, feature_names: list = None, n: int = 10, repeats: int = 30) ‑> pandas.core.frame.DataFrame-
obtains the permutation feature importances from an lgb or RF model
Examples:
>>> from connectome.visualization.fi_rf_gb import get_pfi >>> from sklearn.datasets import make_classification >>> import pandas as pd >>> from lightgbm import LGBMClassifier >>> >>> # initialize the model >>> lgb_class = lgb.LGBMClassifier() >>> >>> # create synthetic data >>> X, y = make_classification(n_informative=10) >>> X = pd.DataFrame(X, columns=["feature_" + str(i) for i in range(X.shape[1])]) >>> # fit the model >>> lgb_class.fit(X, y) >>> >>> # obtain FIs >>> pfis = get_pfi(mod, X, y, repeats=2) >>> pfis.plot.bar(x='features', y='importances_mean')Args
model- lgb or RF model of which the PFIs should be obtained
x_val- DataFrame or array containing features of held out data
y_val- target variable for held out data
feature_names- in case data is not in form of a pd.DataFrame
n- the first n PFIs to be obtained - in a descending order
repeats- how often the PFI calculation should be repeated
Returns
a pandas DataFrame containing the PFIs and feature names