README.md 8.3 KB
Newer Older
Joaquin Torres's avatar
Joaquin Torres committed
1
# [Title]
Joaquin Torres's avatar
Joaquin Torres committed
2

Joaquin Torres's avatar
Joaquin Torres committed
3
## Introduction
Joaquin Torres's avatar
Joaquin Torres committed
4 5

This GitLab repository contains the main material used in the paper [title] by [authors].
Joaquin Torres's avatar
Joaquin Torres committed
6

Joaquin Torres's avatar
Joaquin Torres committed
7 8 9 10 11
The study examines patients undergoing treatment for alcohol disorders, utilizing machine learning techniques to predict clinical success or withdrawal. The main goal is to employ explainability tools to assess the impact of individual versus social factors on treatment outcomes. Additionally, the research explores whether the significance of these factors changed during the pandemic by comparing pre-pandemic and post-pandemic patient groups.

[Impact?]

## About the Dataset
Joaquin Torres's avatar
Joaquin Torres committed
12

Joaquin Torres's avatar
Joaquin Torres committed
13 14 15
[Origin, Characteristics]

The dataset has not been provided since the authors do not have permission for its sharing from the data providers.
Joaquin Torres's avatar
Joaquin Torres committed
16

Joaquin Torres's avatar
Joaquin Torres committed
17 18 19 20 21 22 23 24 25
### Patient Counts
| **Period**       | **Medical Discharge** | **Withdrawal**  | **Total** |
|--------------|-------------------|------------|--------|
| Pre-pandemic | 2792              | 20069      | 22861  |
| Post-pandemic| 1882              | 8795       | 10677  |
| **Overall**  | 4674              | 28864      | 33538  |

### List of Selected Features
#### Individual Factors
Joaquin Torres's avatar
Joaquin Torres committed
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
* **Age** -> Age at which the patient started the first treatment.
* **Sex** -> Male or Female.
* **Num_Children**.
* **Smoking** ->  Whether patient is a tobacco smoker (with abuse or dependence) at the time of starting treatment.
* **Bio_Vulner** ->  If the patient presents biological vulnerability, identified by the presence of 2 or more diagnoses of addictive disorders (substance use or behavioral).
* **X_DXCIE** ->  Evaluates the presence of diagnoses (abuse or dependence) of addictive disorders, assessed according to the ICD (International Classification of Diseases).
    * **Opiods_DXCIE**.
    * **Cannabis_DXCIE**.
    * **BZD_DXCIE**.
    * **Cocaine_DXCIE**.
    * **Hallucin_DXCIE**.
    * **Tobacco_DXCIE**.
* **Frequency** ->  Frequency with which patients have used their primary drug during the 30 days prior to starting treatment. One-hot encoded into:
    * **Freq_1dpw**.
    * **Freq_2-3dpw**.
    * **Freq_4-6dpw**.
    * **Freq_l1dpw**.
    * **Freq_None**.
    * **Freq_Everyday**.
* **Years_Drug_Use** -> Number of years the patient has been using their primary drug.
* **Other_Psychiatric_DX** -> Patients who are diagnosed with other psychiatric disorders.
* **Previous_Treatments** -> Whether patients have undergone treatments prior to the one currently being analyzed.
* **Treatment_Adherence** -> Ratio of the number of attended appointments to the number of scheduled appointments associated with the patient's treatment.
Joaquin Torres's avatar
Joaquin Torres committed
49 50

#### Social Factors
Joaquin Torres's avatar
Joaquin Torres committed
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
* **Education** -> Maximum education level achieved by the patient. One-hot encoded into:
    * **Ed_Not_Complete_Primary**
    * **Ed_Primary**
    * **Ed_Secondary**
    * **Ed_Secondary_Technical**
    * **Ed_Tertiary**
* **Social_Protection** -> Whether patients need or currently have social benefits.
* **Job Stability** -> Employment status. One-hot encoded into:
    * **JobIn_Unstable**
    * **JobIn_Stable**
    * **JobIn_Unemployed**
* **Housing** -> Housing situation. One-hot encoded into:
    * **Hous_Institutional**
    * **Hous_Stable**
    * **Hous_Unstable**
* **Early_Alterations** -> Whether patients have used drugs before the age of 11.
* **Social Inclusion** -> Living Situation. One-hot encoded into:
    * **SocInc_Family_Friends**
    * **SocInc_Alone**
    * **SocInc_Instit**
* **Risk_Stigma** -> Whether patients have potentially suffered social discrimination based on their status of having AIDS, hepatitis C, being injection drug users, or being women.
* **Structural_Conflict** -> Household income in the area where the outpatient treatment center is located.

Joaquin Torres's avatar
Joaquin Torres committed
74

Joaquin Torres's avatar
Joaquin Torres committed
75
## Dealing with Class Imbalance
Joaquin Torres's avatar
Joaquin Torres committed
76

Joaquin Torres's avatar
Joaquin Torres committed
77 78 79
One of the primary challenges we encountered was a significant class imbalance, with a higher number of patients withdrawing from treatment compared to those staying.

To address this issue, we implemented four different training approaches or pipelines on both the pre-pandemic and post-pandemic training datasets:
Joaquin Torres's avatar
Joaquin Torres committed
80

Joaquin Torres's avatar
Joaquin Torres committed
81 82 83 84
1. **Using the Original Dataset (ORIG)**: The models were trained on the original datasets.
2. **Class Weight Adjustment (ORIG_CW)**: The models were trained on the original datasets but were penalized more heavily for misclassifying the minority class.
3. **Oversampling (OVER)**: Additional samples were generated for the minority class (patients staying) to balance the dataset.
4. **Undersampling (UNDER)**: Samples from the majority class (patients withdrawing) were reduced to achieve balance.
Joaquin Torres's avatar
Joaquin Torres committed
85 86

These approaches resulted in multiple training datasets. However, to ensure a fair comparison of the models' performance across different pipelines, we utilized a common test dataset for evaluation, irrespective of the training approach followed.
Joaquin Torres's avatar
Joaquin Torres committed
87

Joaquin Torres's avatar
Joaquin Torres committed
88
## Methodology Overview
Joaquin Torres's avatar
Joaquin Torres committed
89

Joaquin Torres's avatar
Joaquin Torres committed
90
![repo_summary](/uploads/423d2236327507c5676b3cb497ee5285/repo_summary.png)
Joaquin Torres's avatar
Joaquin Torres committed
91

Joaquin Torres's avatar
Joaquin Torres committed
92
## Repository
Joaquin Torres's avatar
Joaquin Torres committed
93

Joaquin Torres's avatar
Joaquin Torres committed
94
This repository is organized as follows:
Joaquin Torres's avatar
Joaquin Torres committed
95

Joaquin Torres's avatar
Joaquin Torres committed
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
* [01-EDA](./01-EDA):
  * [EDA.ipynb](./01-EDA/EDA.ipynb): Exploring and filtering data, handling missing values, encoding variables, building the final pre- and post-pandemic datasets, and generating plots for feature distributions, correlations and importance.
  * [results](./01-EDA/results): 
    * [feature_names](./01-EDA/results/feature_names): Names of the selected individual and social variables.
    * [plots](./01-EDA/results/plots)
        * [correlations](./01-EDA/results/plots/correlations): Heatmaps to visualize the pairwise correlations beetween features.
        * [distributions](./01-EDA/results/plots/distributions): Statistical plots to visualize the distribution of features.
        * [feature_importance](./01-EDA/results/plots/feature_importance): Plots to show the importance of each feature in predicting outcomes.
* [02-training_data_generation](./02-training_data_generation):
  * [training_data_generation.ipynb](./02-training_data_generation/training_data_generation.ipynb): Generating training and testing datasets for each of the pipelines.
* [03-model_building](./03-model_building):
  * [hyperparameter_tuning.py](./03-model_building/hyperparameter_tuning.py): Tuning models through a random search of hyperparameters.
  * [cv_metric_generation.py](./03-model_building/cv_metric_generation.py): Generating cross-validation metrics and plots for each of the tuned models.
  * [cv_metric_distribution.py](./03-model_building/cv_metric_distribution.py): Generating boxplots for each cross-validation metric and tuned model.
  * [models_testing.py](./03-model_building/models_testing.py): Testing tuned models with test dataset.
  * [models_final_fitting.py](./03-model_building/models_final_fitting.py): Saving fitted model for each selected final model.
  * [results](./03-model_building/results):
    * [hyperparam](./03-model_building/output/hyperparam): Excel file containing the optimal hyperparameters for each model in each pipeline.
    * [cv_metrics](./03-model_building/output/cv_metrics): Material related to the results of cross-validation: scores, ROC and Precision-Recall curves and boxplots for each metric.
    * [testing](./03-model_building/output/testing): Material related to the results of testing the tuned models: scores, ROC and Precision-Recall curves and confusion matrices.
    * [fitted_models](./03-model_building/output/fitted_models): Final selected trained models.
* [04-explainability](./04-explainability):
  * [shap_vals_computation.py](./04-explainability/shap_vals_computation.py): Computing SHAP values for final models.
  * [shap_inter_vals_computation.py](./04-explainability/shap_inter_vals_computation.py): Computing SHAP interaction values for final models.
  * [shap_plots.ipynb](./04-explainability/shap_plots.ipynb): Generating SHAP summary plots for the SHAP and SHAP interaction values computed. Comparing major differences between pre- and post-pandemic groups.
  * [results](./04-explainability/results):
    * [plots](./04-explainability/plots): SHAP summary and summary interaction plots as well as 
        * [shap_summary](./04-explainability/plots/shap_summary): SHAP summary plots.
        * [shap_inter_summary](./04-explainability/plots/shap_inter_summary): SHAP summary interaction plots.
        * [heatmaps_interactions](./04-explainability/plots/heatmaps_interactions): Heatmaps representing the differences in interactions between pre-pandemic and post-pandemic groups.
        
Joaquin Torres's avatar
Joaquin Torres committed
127
## Contact
Joaquin Torres's avatar
Joaquin Torres committed
128

Joaquin Torres's avatar
Joaquin Torres committed
129
For any inquiry you can contact:
Joaquin Torres's avatar
Joaquin Torres committed
130
[Contact Info]