Module connectome.preprocessing.preprocessing_matlab_files
this module is used as a framework to prepare raw connectivity matrices for analysis Important: In your excel sheet with subject information, name the id column: "ConnID". This column will be used to merge the matlab files.
Functions
def col_names_conn_matrix(n: int, preprocessing_type: str = 'conn')-
creates the column names for the flattened connectivity matrix
Args
n- number of columns in connectivity matrix
preprocessing_type- conn for connectivity matrix, "aggregation" for aggregated conn matrix, "graph" for graph matrices
Returns
Column names for connectivity matrix
def col_names_final_df(data_from_excel: pandas.core.frame.DataFrame, shape: int = 246, preprocessing_type: str = 'conn') ‑> list-
creates the columns names for the final data frame based on shape / number of columns of the used connectivity matrix
Args
data_from_excel- A pd.Dataframe
shape- number of columns in connectivity matrix
preprocessing_type- conn for connectivity matrix, "aggregation" for aggregated conn matrix, "graph" for graph matrices
Returns
A list of column names for the final dataset
def create_final_df(file_names: list, final_columns: list, stacked_matrices: numpy.ndarray, data_from_excel: pandas.core.frame.DataFrame) ‑> pandas.core.frame.DataFrame-
this function merges the connectivity matrices, the excel and the subject ids
Args
file_names- list of matlab file names
final_columns- list of final column names
stacked_matrices- a stacked connectivity matrix
data_from_excel- a pd Dataframe with extra information on patients
Returns
A Merged dataframe of connectivity matrix + patient information
def create_train_test_split(data: pandas.core.frame.DataFrame, split_size: float = 0.8, seed: int = 42) ‑> list-
takes the final data set and splits it into random train and test subsets. Returns a list containing train-test split of inputs
Args
data- dataset to be split into train/test
split_size- the size of the train dataset (default .8)
seed- pass an int for reproducibility purposes
Returns
A list containing train-test split of inputs
def flatten_conn_matrix(matrix: numpy.ndarray, upper: bool = True, preprocessing_type: str = 'conn') ‑> numpy.ndarray-
turns the connectivity matrix into a 1d array
Args
matrix- A connectivity matrix
upper- whether only the entries above the diagonal should be considered
preprocessing_type- conn for connectivity matrix, "aggregation" for aggregated conn matrix, "graph" for graph matrices
Returns
flattened connectivity matrix as a 1d array
def get_subject_ids(file_names: list) ‑> numpy.ndarray-
gets the subjectIDs if the filenames correspond to the used format: resultsROI_Subject006_Condition001.mat would correspond to subject ID 6
Args
file_names- list of matlab file names
Returns
A np.ndarray in a readable format
def grouped_conn_df(data: pandas.core.frame.DataFrame, regions: list = None, cols: list = None, return_arrays: bool = True, stack_options: dict = None, **kwargs) ‑> Union[list, pandas.core.frame.DataFrame]-
function to compute the grouped / aggregated conn matrices from a pd.DataFrame
Args
data- dataFrame containing the conn data
regions- list of names of the regions of the conn matrix in case reordered IMPORTANT: region names AFTER aggregation needed
cols- list of columns of the DataFrame data which contain conn data
return_arrays- whether the aggregated data should be returned in the form of arrays or a dataframe
stack_options- options passed to "stack_matrices"
**kwargs- anything that´s passed to "grouped_conn_mat"
Returns
list containing the grouped connectivity matrices or dataFrame
def load_matlab_files(directory: str, mat_key: str = 'Z') ‑> tuple-
imports all matlab files from specified directory
Args
directory- Path to Matlab Files
mat_key- the key under which the connectivity data is saved in the matlab files
Returns
A list where the first argument is the collection of connectivity matrix and the 2nd argument is the names of the connectivity matrix
Raises
KeyError- FileNotFoundError
def main()def preprocess_mat_files(matlab_dir: str = None, excel_path: str = None, export_file: bool = False, write_dir: str = None, preprocessing_type: str = 'conn', network: str = 'yeo7', upper: bool = True, statistic: str = 'mean', mat_key: str = 'Z', file_format: str = 'csv') ‑> pandas.core.frame.DataFrame-
Final function which combines all the other functions to read in and transform the data.
Examples:
>>> # Preprocess Connectivity Matrices to aggregated matrices >>> matlab_dir = r"./Data/MatLab" # Enter the directory for the matlab files >>> excel_path = r"./Data/DELCODE_dataset_910.xlsx" # Enter the directory for the corresponding excel sheet >>> preprocessing_type = 'aggregation' >>> write_dir = r"./path_to_save" # ... >>> export_file = True # rename to export file >>> statistic = 'greater_zero' >>> preprocess_mat_files(matlab_dir=matlab_dir, excel_path=excel_path, preprocessing_type=preprocessing_type, >>> write_dir=write_dir, export_file=export_file, statistic=statistic)Args
matlab_dir- path to matlab files
excel_path- path to excel list
export_file- If false return as pd dataframe
write_dir- path where to write the dataset to if save_file = True
preprocessing_type- conn for connectivity matrix, "aggregation" for aggregated conn matrix
network- yeo7 or yeo17 network (only applicable if preprocessing_type = aggregation)
statistic- Summary statistic to be applied - only applicable if preprocessing_type = aggregation - one of (mean, max, min and greater_zero)
upper- boolean whether only upper diagonal elements of connecivity matrices should be used
mat_key- the key under which the connectivity data is saved in the matlab files
file_format- str. Pass "h5" for further modelling in python or "csv" for R (default "csv")
Returns
- DataFrame containing the processes matlab files + excel file
- optionally saves a file (a train/test split of datasets) for further use in modelling.
def stack_matrices(matrices: list, upper: bool = True, preprocessing_type: str = 'conn') ‑> numpy.ndarray-
this function stacks the connectivity matrices for the subjects upon each other so they can be used in a dataframe
Args
matrices- List of connectivity matrix
upper- whether only upper diagonal values should be considered
preprocessing_type- conn for connectivity matrix, "aggregation" for aggregated conn matrix, "graph" for graph matrices
Returns
A flattenened np.ndarray of connectivtity matrices
def test_grouped_conn_df()def write_to_dir(dataset: pandas.core.frame.DataFrame, t_direct: str = None, file_format: str = 'csv') ‑> None-
writes the list of train/test splits to hdf files for future use in python or csv for future use in R into the specified directory
Args
dataset- the final dataset to save
t_direct- path where to save the dataframes to
file_format- The fileformat the data should be saved as (csv of hdf) -> input must be csv or h5
Returns
None - saves a csv or hdf file
Raises
FileNotFoundError