Module `connectome.preprocessing.data_loader`

several preprocessing and data transformation helpers

Functions

def create_target(dataset: pandas.core.frame.DataFrame) ‑> None

This function creates the target variable based on the prmdiag column

Args

dataset: The dataset for which the target variable should be created

Returns

None

def drop_cases(dataset: pandas.core.frame.DataFrame) ‑> pandas.core.frame.DataFrame

Drops the observations of variable prmdiag with value 1 or 4

Args

dataset: dataset on which the obs. should be dropped

Returns

None

def drop_cols(dataset: pandas.core.frame.DataFrame, cols: tuple = ('ConnID', 'Repseudonym', 'siteid', 'visdat', 'IDs', 'prmdiag')) ‑> pandas.core.frame.DataFrame

Drops the columns which are not needed for further modelling

Args

dataset: dataset on which the cols should be dropped

cols:

Returns

None

Raises

KeyError: …

def flat_to_mat(x: numpy.ndarray) ‑> numpy.ndarray

converts a flat np.array into a matrix by turning the values of the array into a symmetric matrix
excluding diagonal

Examples:

>>> import numpy as np
>>> from connectome.preprocessing.data_loader import flat_to_mat
>>> k = 50 #
>>> m = int((k*k)/2 - k/2)
>>> x = np.random.standard_normal(size=m)
>>> mat = flat_to_mat(x)
>>> print(mat)

Args

x: 1D array which should be turned into symmetric matrix

Returns

np.ndarray - matrix

def flat_to_mat_aggregation(x: numpy.ndarray) ‑> numpy.ndarray

converts a flat np.array into a matrix by turning the values of the array into a symmetric matrix

Examples:

>>> import numpy as np
>>> from connectome.preprocessing.data_loader import flat_to_mat_aggregation
>>> k = 8 #
>>> m = int((k*k)/2 + k/2)
>>> x = np.random.standard_normal(size=m)
>>> mat = flat_to_mat_aggregation(x)
>>> print(mat)

Args

x: 1D array which should be turned into symmetric matrix

Returns

np.ndarray - matrix

def preprocess_data(dataset: pandas.core.frame.DataFrame) ‑> tuple

Combines several preprocessing steps which are to be performed on the given dataset. Results are then returned as target and features (splitted)

Args

dataset: The dataset on which the preprocessing should be performed

Returns

tuple, of (target, features)

def split_target_data(dataset: pandas.core.frame.DataFrame) ‑> tuple

splits the given dataset into target variable and features

Args

dataset:

Returns

None