README.md 2.34 KB
Newer Older
Laura Masa's avatar
Laura Masa committed
1
# **Analyzing Gene Expression Datasets for Disease Annotation Modeling**
Laura Masa's avatar
Laura Masa committed
2

Laura Masa's avatar
Laura Masa committed
3
Author: Laura Masa Martínez
Laura Masa's avatar
Laura Masa committed
4 5

This repository documents the work conducted for my master’s thesis.
Laura Masa's avatar
Laura Masa committed
6

Laura Masa's avatar
Laura Masa committed
7
## **Visual summary**
Laura Masa's avatar
Laura Masa committed
8 9
A visual summary of the entire workflow and main steps of the project is shown below.

Laura Masa's avatar
Laura Masa committed
10
![Alt text](GEO.png)
Laura Masa's avatar
Laura Masa committed
11

Laura Masa's avatar
Laura Masa committed
12
## **Objectives**
Laura Masa's avatar
Laura Masa committed
13 14 15 16
The principal objective of this thesis was to design and implement an automated system for the extraction, processing, and analysis of gene-disease association data. This system was developed to generate a detailed gene-disease annotation model, including gene identifiers, gene expression profiles, and comprehensive metadata. The automation of these processes was intended to provide a structured and efficient platform to support biomedical research efforts and the identification of novel therapeutic targets.

In addition to the main objective, the thesis aimed to achieve several secondary goals:

Laura Masa's avatar
Laura Masa committed
17 18 19 20 21
1.  Differential Gene Expression Analysis: To create a Personalized Perturbation Profile (PEEP) that captures gene expression variations for individual and group-level comparisons.
2.  Advancing Personalized Medicine: To leverage PEEP profiles for discovering tailored therapeutic interventions and exploring opportunities for drug repositioning for novel therapeutic applications.
3.  Modeling Disease-Gene Associations: To enhance the semantic understanding and predictive accuracy of gene-disease relationships through advanced modeling techniques.


Laura Masa's avatar
Laura Masa committed
22
## **Folders structure**
Laura Masa's avatar
Laura Masa committed
23

Laura Masa's avatar
Laura Masa committed
24 25 26 27
| Step | Folder | Content |
| ------ | ------ | ------ |
| 1. Data Processing | data_processing | Code and datasets associated with the initial phase of the project. It includes resources for **collecting gene expression data**, **preprocessing the data**, and **selecting relevant subsets** for subsequent analysis. |
| 2. Data Analysis | data_analysis | Code, data, and figures used for the **analysis and visualization of gene expression data**. This includes performing **differential gene expression analysis**, and **descriptive analysis** generating statistics, and visualizing the results of both group-wise and individual-level gene regulation studies. | 
Laura Masa's avatar
Laura Masa committed
28
| 3, Analysis for Drug Repurposing | analysis_drug_repurposing | Code and data used for analyzing drug-target relationships and visualizing potential treatments based on the findings from the gene-disease association data. |
Laura Masa's avatar
Laura Masa committed
29 30