Module connectome.preprocessing.preprocessing_matlab_files

this module is used as a framework to prepare raw connectivity matrices for analysis Important: In your excel sheet with subject information, name the id column: "ConnID". This column will be used to merge the matlab files.

Functions

def col_names_conn_matrix(n: int, preprocessing_type: str = 'conn')

creates the column names for the flattened connectivity matrix

Args

n
number of columns in connectivity matrix
preprocessing_type
conn for connectivity matrix, "aggregation" for aggregated conn matrix, "graph" for graph matrices

Returns

Column names for connectivity matrix

def col_names_final_df(data_from_excel: pandas.core.frame.DataFrame, shape: int = 246, preprocessing_type: str = 'conn') ‑> list

creates the columns names for the final data frame based on shape / number of columns of the used connectivity matrix

Args

data_from_excel
A pd.Dataframe
shape
number of columns in connectivity matrix
preprocessing_type
conn for connectivity matrix, "aggregation" for aggregated conn matrix, "graph" for graph matrices

Returns

A list of column names for the final dataset

def create_final_df(file_names: list, final_columns: list, stacked_matrices: numpy.ndarray, data_from_excel: pandas.core.frame.DataFrame) ‑> pandas.core.frame.DataFrame

this function merges the connectivity matrices, the excel and the subject ids

Args

file_names
list of matlab file names
final_columns
list of final column names
stacked_matrices
a stacked connectivity matrix
data_from_excel
a pd Dataframe with extra information on patients

Returns

A Merged dataframe of connectivity matrix + patient information

def create_train_test_split(data: pandas.core.frame.DataFrame, split_size: float = 0.8, seed: int = 42) ‑> list

takes the final data set and splits it into random train and test subsets. Returns a list containing train-test split of inputs

Args

data
dataset to be split into train/test
split_size
the size of the train dataset (default .8)
seed
pass an int for reproducibility purposes

Returns

A list containing train-test split of inputs

def flatten_conn_matrix(matrix: numpy.ndarray, upper: bool = True, preprocessing_type: str = 'conn') ‑> numpy.ndarray

turns the connectivity matrix into a 1d array

Args

matrix
A connectivity matrix
upper
whether only the entries above the diagonal should be considered
preprocessing_type
conn for connectivity matrix, "aggregation" for aggregated conn matrix, "graph" for graph matrices

Returns

flattened connectivity matrix as a 1d array

def get_subject_ids(file_names: list) ‑> numpy.ndarray

gets the subjectIDs if the filenames correspond to the used format: resultsROI_Subject006_Condition001.mat would correspond to subject ID 6

Args

file_names
list of matlab file names

Returns

A np.ndarray in a readable format

def grouped_conn_df(data: pandas.core.frame.DataFrame, regions: list = None, cols: list = None, return_arrays: bool = True, stack_options: dict = None, **kwargs) ‑> Union[list, pandas.core.frame.DataFrame]

function to compute the grouped / aggregated conn matrices from a pd.DataFrame

Args

data
dataFrame containing the conn data
regions
list of names of the regions of the conn matrix in case reordered IMPORTANT: region names AFTER aggregation needed
cols
list of columns of the DataFrame data which contain conn data
return_arrays
whether the aggregated data should be returned in the form of arrays or a dataframe
stack_options
options passed to "stack_matrices"
**kwargs
anything that´s passed to "grouped_conn_mat"

Returns

list containing the grouped connectivity matrices or dataFrame

def load_matlab_files(directory: str, mat_key: str = 'Z') ‑> tuple

imports all matlab files from specified directory

Args

directory
Path to Matlab Files
mat_key
the key under which the connectivity data is saved in the matlab files

Returns

A list where the first argument is the collection of connectivity matrix and the 2nd argument is the names of the connectivity matrix

Raises

KeyError
FileNotFoundError
def main()
def preprocess_mat_files(matlab_dir: str = None, excel_path: str = None, export_file: bool = False, write_dir: str = None, preprocessing_type: str = 'conn', network: str = 'yeo7', upper: bool = True, statistic: str = 'mean', mat_key: str = 'Z', file_format: str = 'csv') ‑> pandas.core.frame.DataFrame

Final function which combines all the other functions to read in and transform the data.

Examples:

>>> # Preprocess Connectivity Matrices to aggregated matrices
>>> matlab_dir = r"./Data/MatLab" # Enter the directory for the matlab files
>>> excel_path = r"./Data/DELCODE_dataset_910.xlsx" # Enter the directory for the corresponding excel sheet
>>> preprocessing_type = 'aggregation'
>>> write_dir = r"./path_to_save" # ...
>>> export_file = True # rename to export file
>>> statistic = 'greater_zero'
>>> preprocess_mat_files(matlab_dir=matlab_dir, excel_path=excel_path, preprocessing_type=preprocessing_type,
>>>                      write_dir=write_dir, export_file=export_file, statistic=statistic)

Args

matlab_dir
path to matlab files
excel_path
path to excel list
export_file
If false return as pd dataframe
write_dir
path where to write the dataset to if save_file = True
preprocessing_type
conn for connectivity matrix, "aggregation" for aggregated conn matrix
network
yeo7 or yeo17 network (only applicable if preprocessing_type = aggregation)
statistic
Summary statistic to be applied - only applicable if preprocessing_type = aggregation - one of (mean, max, min and greater_zero)
upper
boolean whether only upper diagonal elements of connecivity matrices should be used
mat_key
the key under which the connectivity data is saved in the matlab files
file_format
str. Pass "h5" for further modelling in python or "csv" for R (default "csv")

Returns

  • DataFrame containing the processes matlab files + excel file
  • optionally saves a file (a train/test split of datasets) for further use in modelling.
def stack_matrices(matrices: list, upper: bool = True, preprocessing_type: str = 'conn') ‑> numpy.ndarray

this function stacks the connectivity matrices for the subjects upon each other so they can be used in a dataframe

Args

matrices
List of connectivity matrix
upper
whether only upper diagonal values should be considered
preprocessing_type
conn for connectivity matrix, "aggregation" for aggregated conn matrix, "graph" for graph matrices

Returns

A flattenened np.ndarray of connectivtity matrices

def test_grouped_conn_df()
def write_to_dir(dataset: pandas.core.frame.DataFrame, t_direct: str = None, file_format: str = 'csv') ‑> None

writes the list of train/test splits to hdf files for future use in python or csv for future use in R into the specified directory

Args

dataset
the final dataset to save
t_direct
path where to save the dataframes to
file_format
The fileformat the data should be saved as (csv of hdf) -> input must be csv or h5

Returns

None - saves a csv or hdf file

Raises

FileNotFoundError