"Scanpy, when running sc.tl.rank_genes_groups(), automatically prioritises adata.raw if it exists. In this case, we saved the raw array in adata.raw to begin with (which is correct), but Scanpy is using that raw array instead of the normalised, log-transformed .X array.\n",
"\n",
"**sc.tl.rank_genes_groups()** performs a differential expression analysis between groups of cells defined in adata_type.obs['disease']. In this case, compares cells labelled as 'Alzheimer disease' against those labelled as 'normal', which is taken as reference group.\n",
"\n",
"**Wilcoxon rank-sum test** is a non-parametric test which finely adapts to scRNA-seq data. This test compares, for each gene, if its expression is significantly different in one group with respect to the other.\n",
"- Null hyphotesis (H₀): the distribution of gene expression is equal in both groups (Alzheimer and normal).\n",
"- Alternative hypothesis (H₁): the distributions are different (the test is bilateral by default).\n",
"\n",
"The test does not assume normality, only that the values are ordinal, which is ideal for scRNA-seq (where the data are usually noisy and with zeros).\n",
"\n",
"**sc.get.rank_genes_groups_df()** extracts the results in DataFrame format, one per case 'Alzheimer disease' vs 'normal'.\n",
"\n",
"Then, a threshold is applied:\n",
"- **abs(logfoldchange) > 0.25** in order for the change to be biologically relevant.\n",
"- **pvals_adj < 0.05** in order for the change to be statistically significant.\n",
"\n",
"This indicates that there is sufficient evidence to reject the null hypothesis that that gene is equally expressed between groups.The test does not assume normality, only that the values are ordinal, which is ideal for scRNA-seq (where the d\n",
"\n",
"Files **degs_{cell_type}_total.csv** stores all differentially expressed genes found per each cell type, identified by the ENSEMBL ID."
],
"metadata": {
...
...
@@ -438,7 +467,8 @@
"pycharm": {
"name": "#%% md\n"
}
}
},
"id": "2e30f0c5e3294df0"
},
{
"cell_type": "code",
...
...
@@ -593,22 +623,159 @@
" # Extract results of DEGs for \"disease\"\" condition (compared to normal)\n",
"The percentage (or proportion) of cells within a group that express a given gene.\n",
"\n",
"For example, suppose you have a cell type (e.g., astrocytes) with 100 cells. For a specific gene (e.g., GEN1), you look at how many of those 100 cells have an expression greater than 0 for that gene. If GEN1 is expressed (i.e., has a value greater than 0) in 25 of those cells, then: fraction_expressed = 25 / 100 = 0.25."
],
"id": "92db8d7ff52f3e4b"
},
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": [
"fractions = []\n",
"\n",
"for type, adata_type in tqdm(data_per_type.items(), desc= 'Analyzing cell types...'):\n",
"\n",
" gene_names = adata_type.var_names\n",
"\n",
" # Compute fraction of cells expressing each gene in each group\n",