# Network medicine strategies for disease module characterization (Estrategias de la medicina de redes para la caracterización de módulos de enfermedad)
This repository contains the data and code generated for the TFG (*Trabajo Fin de Grado*) '**Estrategias de la medicina de redes para la caracterización de módulos de enfermedad (Network medicine strategies for disease module characterization)**' by Antonio Gil Hoed. Tutored by Lucía Prieto Santamaría and cotutored by Alejandro Rodríguez González.
The main objective of this study is to identify which seed disease genes should be included, which parameters contribute to the significance of the disease module and compare different disease module discovery algorithms, being: **LCC** (Largest Connected Component), **DIAMOnD** (DIseAse MOdule Detection), **DOMINO**, **ROBUST** and **TOPAS** ( TOP-down Attachment of Seeds).

The Gene-Disease association (GDA) score is obtained from DisGeNET, and it is calculated, as explained in: [GDA score calculation](GDA_score_calculation.pdf)

Below, the graphical abstract of the methodology employed in this research.

![](figure_summary_1.svg)
![](figure_summary_2.svg)


# Repositories

## Data
Files containing the Protein-Protein Interactions, the Gene-Protein relationships and the Disease-Gene associations. These data was previously gathered from the platform of the DISNET project.

## Disease Module obtention
Jupyter Notebooks used in this study to generate the representative sample, construct disease modules for each method, and apply filtering based on GDA scores.

## Significance analysis
Jupyter Notebooks employed in this study to assess whether the obtained disease modules are significant or not. The significance was evaluated for the LCC method on the entire dataset, and with five different disease module obtention algorithms on the representative sample.

## Graphics
Jupyter Notebooks were used in this study to generate various graphics. This section is divided into four notebooks, each focusing on a different part of the analysis:

- **graphics_initial_dataset.ipynb**  
  Contains an initial analysis of the dataset.

- **graphics_unfiltered_GDA.ipynb**  
  Includes the visualizations used to compare the different disease module algorithms.

- **graphics_filtered_GDA.ipynb**  
  Presents graphics related to the GDA score analysis for the five different methods.

- **graphics_whole_dataset_GDA_analysis.ipynb**  
  Displays the graphics generated for the GDA-filtered analysis using the entire dataset, considering only the LCC method.