README.md 3.53 KB
Newer Older
Joaquin Torres's avatar
Joaquin Torres committed
1 2
# Title

Joaquin Torres's avatar
Joaquin Torres committed
3
## Introduction
Joaquin Torres's avatar
Joaquin Torres committed
4

Joaquin Torres's avatar
Joaquin Torres committed
5

Joaquin Torres's avatar
Joaquin Torres committed
6
## Dealing with Class Imbalance
Joaquin Torres's avatar
Joaquin Torres committed
7 8 9 10 11 12 13 14 15
One of the primary challenges we encountered was a significant class imbalance, with a higher number of patients withdrawing from treatment compared to those staying.

To address this issue, we implemented four different training approaches or pipelines on both the pre-pandemic and post-pandemic training datasets:
1. **Using the Original Dataset**: The models were trained on the original datasets.
2. **Class Weight Adjustment**: The models were trained on the original datasets but were penalized more heavily for misclassifying the minority class.
3. **Oversampling**: Additional samples were generated for the minority class (patients staying) to balance the dataset.
4. **Undersampling**: Samples from the majority class (patients withdrawing) were reduced to achieve balance.

These approaches resulted in multiple training datasets. However, to ensure a fair comparison of the models' performance across different pipelines, we utilized a common test dataset for evaluation, irrespective of the training approach followed.
Joaquin Torres's avatar
Joaquin Torres committed
16 17

## Repository
Joaquin Torres's avatar
Joaquin Torres committed
18
This repository is organized as follows:
Joaquin Torres's avatar
Joaquin Torres committed
19 20
* [EDA](./EDA): 
    * [output](./EDA/output): Plots about feature distributions, correlations and importance.
Joaquin Torres's avatar
Joaquin Torres committed
21 22
    * [EDA.ipynb](./EDA/EDA.ipynb): Exploring and filtering data, handling missing values, encoding variables, building the final pre- and post- pandemic datasets, and generating plots for feature distributions, correlations and importance.
* [gen_train_data](./gen_train_data): 
Joaquin Torres's avatar
Joaquin Torres committed
23
    * [gen_train_data.ipynb](./gen_train_data/gen_train_data.ipynb): Generating training and testing datasets for each of the pipelines.
Joaquin Torres's avatar
Joaquin Torres committed
24 25 26 27 28
* [model_selection](./model_selection): 
    * [hyperparam_tuning.py](./model_selection/hyperparam_tuning.py): Tuning models through a random search of hyperparameters.
    * [cv_metric_gen.py](./model_selection/cv_metric_gen.py): Generating cross-validation metrics and plots for each of the tuned models.
    * [cv_metrics_distr.py](./model_selection/cv_metrics_distr.py): Generating boxplots for each cross-validation metric and tuned model.
    * [test_models.py](./model_selection/test_models.py): Testing tuned models with test dataset.
Joaquin Torres's avatar
Joaquin Torres committed
29 30 31 32
    * [output](./model_selection/output):
        * [hyperparam](./model_selection/output/hyperparam): Excel file containing the optimal hyperparameters for each model in each pipeline.
        * [cv_metrics](./model_selection/output/cv_metrics): Material related to the results of cross-validation: scores, ROC and Precision-Recall curves and boxplots for each metric. 
        * [testing](./model_selection/output/testing): Material related to the results of testing the tuned models: scores, ROC and Precision-Recall curves and confusion matrices.
Joaquin Torres's avatar
Joaquin Torres committed
33 34 35 36 37
* [explainability](./explainability):
    * [fit_final_models.py](./explainability/fit_final_models.py): Saving fitted model for each selected final model.
    * [compute_shap_vals.py](./explainability/compute_shap_vals.py): Computing SHAP values for final models.
    * [compute_shap_inter_vals.py](./explainability/compute_shap_inter_vals.py): Computing SHAP interaction values for final models.
    * [shap_plots.py](./explainability/shap_plots.py): Generating SHAP summary plots for the SHAP and SHAP interaction values computed. Comparing major differences between pre- and post-pandemic groups.
Joaquin Torres's avatar
Joaquin Torres committed
38
    * [output](./explainability/output): SHAP and SHAP interaction summary plots.
Joaquin Torres's avatar
Joaquin Torres committed
39

Joaquin Torres's avatar
Joaquin Torres committed
40
## Data
Joaquin Torres's avatar
Joaquin Torres committed
41

Joaquin Torres's avatar
Joaquin Torres committed
42
The dataset has not been provided since the authors do not have permission for its sharing from the data providers.
Joaquin Torres's avatar
Joaquin Torres committed
43

Joaquin Torres's avatar
Joaquin Torres committed
44
For any inquiry you can contact: