This repository documents the work conducted for my master’s thesis titled "Analyzing Gene Expression Datasets for Disease Annotation Modeling."
Author: Laura Masa Martínez
This repository documents the work conducted for my master’s thesis titled "Analyzing Gene Expression Datasets for Disease Annotation Modeling"
A visual summary of the entire workflow and main steps of the project is shown below.
**Objectives**
The principal objective of this thesis was to design and implement an automated system for the extraction, processing, and analysis of gene-disease association data. This system was developed to generate a detailed gene-disease annotation model, including gene identifiers, gene expression profiles, and comprehensive metadata. The automation of these processes was intended to provide a structured and efficient platform to support biomedical research efforts and the identification of novel therapeutic targets.
In addition to the main objective, the thesis aimed to achieve several secondary goals:
Differential Gene Expression Analysis: To create a Personalized Perturbation Profile (PEEP) that captures gene expression variations for individual and group-level comparisons.
Advancing Personalized Medicine: To leverage PEEP profiles for discovering tailored therapeutic interventions and exploring opportunities for drug repositioning for novel therapeutic applications.
Modeling Disease-Gene Associations: To enhance the semantic understanding and predictive accuracy of gene-disease relationships through advanced modeling techniques.
1. Differential Gene Expression Analysis: To create a Personalized Perturbation Profile (PEEP) that captures gene expression variations for individual and group-level comparisons.
2. Advancing Personalized Medicine: To leverage PEEP profiles for discovering tailored therapeutic interventions and exploring opportunities for drug repositioning for novel therapeutic applications.
3. Modeling Disease-Gene Associations: To enhance the semantic understanding and predictive accuracy of gene-disease relationships through advanced modeling techniques.
**Folders structure**
| Folder | Content |
| ------ | ------ |
| data_processing | Code and datasets associated with the initial phase of the project. It includes resources for **collecting gene expression data**, **preprocessing the data**, and **selecting relevant subsets** for subsequent analysis. |
| data_analysis | Code, data, and figures used for the **analysis and visualization of gene expression data**. This includes performing **differential gene expression analysis**, generating descriptive statistics, and visualizing the results of both group-wise and individual-level gene regulation studies. |
| analysis_drug_repurposing | Code and data used for analyzing drug-target relationships and visualizing potential treatments based on the findings from the gene-disease association data. |