Module connectome.preprocessing.data_loader

several preprocessing and data transformation helpers

Functions

def create_target(dataset: pandas.core.frame.DataFrame) ‑> None

This function creates the target variable based on the prmdiag column

Args

dataset
The dataset for which the target variable should be created

Returns

None

def drop_cases(dataset: pandas.core.frame.DataFrame) ‑> pandas.core.frame.DataFrame

Drops the observations of variable prmdiag with value 1 or 4

Args

dataset
dataset on which the obs. should be dropped

Returns

None

def drop_cols(dataset: pandas.core.frame.DataFrame, cols: tuple = ('ConnID', 'Repseudonym', 'siteid', 'visdat', 'IDs', 'prmdiag')) ‑> pandas.core.frame.DataFrame

Drops the columns which are not needed for further modelling

Args

dataset
dataset on which the cols should be dropped

cols:

Returns

None

Raises

KeyError
def flat_to_mat(x: numpy.ndarray) ‑> numpy.ndarray
  • converts a flat np.array into a matrix by turning the values of the array into a symmetric matrix
  • excluding diagonal

Examples:

>>> import numpy as np
>>> from connectome.preprocessing.data_loader import flat_to_mat
>>> k = 50 #
>>> m = int((k*k)/2 - k/2)
>>> x = np.random.standard_normal(size=m)
>>> mat = flat_to_mat(x)
>>> print(mat)

Args

x
1D array which should be turned into symmetric matrix

Returns

np.ndarray - matrix

def flat_to_mat_aggregation(x: numpy.ndarray) ‑> numpy.ndarray
  • converts a flat np.array into a matrix by turning the values of the array into a symmetric matrix

Examples:

>>> import numpy as np
>>> from connectome.preprocessing.data_loader import flat_to_mat_aggregation
>>> k = 8 #
>>> m = int((k*k)/2 + k/2)
>>> x = np.random.standard_normal(size=m)
>>> mat = flat_to_mat_aggregation(x)
>>> print(mat)

Args

x
1D array which should be turned into symmetric matrix

Returns

np.ndarray - matrix

def preprocess_data(dataset: pandas.core.frame.DataFrame) ‑> tuple

Combines several preprocessing steps which are to be performed on the given dataset. Results are then returned as target and features (splitted)

Args

dataset
The dataset on which the preprocessing should be performed

Returns

tuple, of (target, features)

def split_target_data(dataset: pandas.core.frame.DataFrame) ‑> tuple

splits the given dataset into target variable and features

Args

dataset:

Returns

None