Module connectome.preprocessing.data_loader
several preprocessing and data transformation helpers
Functions
def create_target(dataset: pandas.core.frame.DataFrame) ‑> None-
This function creates the target variable based on the prmdiag column
Args
dataset- The dataset for which the target variable should be created
Returns
None
def drop_cases(dataset: pandas.core.frame.DataFrame) ‑> pandas.core.frame.DataFrame-
Drops the observations of variable prmdiag with value 1 or 4
Args
dataset- dataset on which the obs. should be dropped
Returns
None
def drop_cols(dataset: pandas.core.frame.DataFrame, cols: tuple = ('ConnID', 'Repseudonym', 'siteid', 'visdat', 'IDs', 'prmdiag')) ‑> pandas.core.frame.DataFrame-
Drops the columns which are not needed for further modelling
Args
dataset- dataset on which the cols should be dropped
cols:
Returns
None
Raises
KeyError- …
def flat_to_mat(x: numpy.ndarray) ‑> numpy.ndarray-
- converts a flat np.array into a matrix by turning the values of the array into a symmetric matrix
- excluding diagonal
Examples:
>>> import numpy as np >>> from connectome.preprocessing.data_loader import flat_to_mat >>> k = 50 # >>> m = int((k*k)/2 - k/2) >>> x = np.random.standard_normal(size=m) >>> mat = flat_to_mat(x) >>> print(mat)Args
x- 1D array which should be turned into symmetric matrix
Returns
np.ndarray - matrix
def flat_to_mat_aggregation(x: numpy.ndarray) ‑> numpy.ndarray-
- converts a flat np.array into a matrix by turning the values of the array into a symmetric matrix
Examples:
>>> import numpy as np >>> from connectome.preprocessing.data_loader import flat_to_mat_aggregation >>> k = 8 # >>> m = int((k*k)/2 + k/2) >>> x = np.random.standard_normal(size=m) >>> mat = flat_to_mat_aggregation(x) >>> print(mat)Args
x- 1D array which should be turned into symmetric matrix
Returns
np.ndarray - matrix
def preprocess_data(dataset: pandas.core.frame.DataFrame) ‑> tuple-
Combines several preprocessing steps which are to be performed on the given dataset. Results are then returned as target and features (splitted)
Args
dataset- The dataset on which the preprocessing should be performed
Returns
tuple, of (target, features)
def split_target_data(dataset: pandas.core.frame.DataFrame) ‑> tuple-
splits the given dataset into target variable and features
Args
dataset:
Returns
None