# **Analyzing Gene Expression Datasets for Disease Annotation Modeling** Author: Laura Masa Martínez This repository documents the work conducted for my master’s thesis. ## **Visual summary** A visual summary of the entire workflow and main steps of the project is shown below. ![Alt text](GEO.png) ## **Objectives** The principal objective of this thesis was to design and implement an automated system for the extraction, processing, and analysis of gene-disease association data. This system was developed to generate a detailed gene-disease annotation model, including gene identifiers, gene expression profiles, and comprehensive metadata. The automation of these processes was intended to provide a structured and efficient platform to support biomedical research efforts and the identification of novel therapeutic targets. In addition to the main objective, the thesis aimed to achieve several secondary goals: 1. Differential Gene Expression Analysis: To create a Personalized Perturbation Profile (PEEP) that captures gene expression variations for individual and group-level comparisons. 2. Advancing Personalized Medicine: To leverage PEEP profiles for discovering tailored therapeutic interventions and exploring opportunities for drug repositioning for novel therapeutic applications. 3. Modeling Disease-Gene Associations: To enhance the semantic understanding and predictive accuracy of gene-disease relationships through advanced modeling techniques. ## **Folders structure** | Step | Folder | Content | | ------ | ------ | ------ | | 1. Data Processing | data_processing | Code and datasets associated with the initial phase of the project. It includes resources for **collecting gene expression data**, **preprocessing the data**, and **selecting relevant subsets** for subsequent analysis. | | 2. Data Analysis | data_analysis | Code, data, and figures used for the **analysis and visualization of gene expression data**. This includes performing **differential gene expression analysis**, and **descriptive analysis** generating statistics, and visualizing the results of both group-wise and individual-level gene regulation studies. | | 3, Analysis for Drug Repurposing | analysis_drug_repurposing | Code and data used for analyzing drug-target relationships and visualizing potential treatments based on the findings from the gene-disease association data. |