README.md 2.21 KB
Newer Older
Laura Masa's avatar
Laura Masa committed
1 2
**Analyzing Gene Expression Datasets for Disease Annotation Modeling**

Laura Masa's avatar
Laura Masa committed
3
Author: Laura Masa Martínez
Laura Masa's avatar
Laura Masa committed
4 5

This repository documents the work conducted for my master’s thesis.
Laura Masa's avatar
Laura Masa committed
6

Laura Masa's avatar
Laura Masa committed
7 8
A visual summary of the entire workflow and main steps of the project is shown below.

Laura Masa's avatar
Laura Masa committed
9
![Alt text](GEO.png)
Laura Masa's avatar
Laura Masa committed
10 11

**Objectives**
Laura Masa's avatar
Laura Masa committed
12 13 14 15
The principal objective of this thesis was to design and implement an automated system for the extraction, processing, and analysis of gene-disease association data. This system was developed to generate a detailed gene-disease annotation model, including gene identifiers, gene expression profiles, and comprehensive metadata. The automation of these processes was intended to provide a structured and efficient platform to support biomedical research efforts and the identification of novel therapeutic targets.

In addition to the main objective, the thesis aimed to achieve several secondary goals:

Laura Masa's avatar
Laura Masa committed
16 17 18 19 20 21 22 23 24 25 26 27 28 29
1.  Differential Gene Expression Analysis: To create a Personalized Perturbation Profile (PEEP) that captures gene expression variations for individual and group-level comparisons.
2.  Advancing Personalized Medicine: To leverage PEEP profiles for discovering tailored therapeutic interventions and exploring opportunities for drug repositioning for novel therapeutic applications.
3.  Modeling Disease-Gene Associations: To enhance the semantic understanding and predictive accuracy of gene-disease relationships through advanced modeling techniques.


**Folders structure**

| Folder | Content |
| ------ | ------ |
| data_processing | Code and datasets associated with the initial phase of the project. It includes resources for **collecting gene expression data**, **preprocessing the data**, and **selecting relevant subsets** for subsequent analysis. |
| data_analysis | Code, data, and figures used for the **analysis and visualization of gene expression data**. This includes performing **differential gene expression analysis**, generating descriptive statistics, and visualizing the results of both group-wise and individual-level gene regulation studies. | 
| analysis_drug_repurposing | Code and data used for analyzing drug-target relationships and visualizing potential treatments based on the findings from the gene-disease association data. |