README.md 2.82 KB
Newer Older
Joaquin Torres's avatar
Joaquin Torres committed
1 2
# Title

Joaquin Torres's avatar
Joaquin Torres committed
3 4
## Introduction
...
Joaquin Torres's avatar
Joaquin Torres committed
5

Joaquin Torres's avatar
Joaquin Torres committed
6
## Dealing with Class Imbalance
Joaquin Torres's avatar
Joaquin Torres committed
7 8 9 10 11 12 13 14 15
One of the primary challenges we encountered was a significant class imbalance, with a higher number of patients withdrawing from treatment compared to those staying.

To address this issue, we implemented four different training approaches or pipelines on both the pre-pandemic and post-pandemic training datasets:
1. **Using the Original Dataset**: The models were trained on the original datasets.
2. **Class Weight Adjustment**: The models were trained on the original datasets but were penalized more heavily for misclassifying the minority class.
3. **Oversampling**: Additional samples were generated for the minority class (patients staying) to balance the dataset.
4. **Undersampling**: Samples from the majority class (patients withdrawing) were reduced to achieve balance.

These approaches resulted in multiple training datasets. However, to ensure a fair comparison of the models' performance across different pipelines, we utilized a common test dataset for evaluation, irrespective of the training approach followed.
Joaquin Torres's avatar
Joaquin Torres committed
16 17

## Repository
Joaquin Torres's avatar
Joaquin Torres committed
18
This repository is organized as follows:
Joaquin Torres's avatar
Joaquin Torres committed
19 20
* [EDA](./EDA): 
    * [output](./EDA/output): Plots about feature distributions, correlations and importance.
Joaquin Torres's avatar
Joaquin Torres committed
21 22 23 24 25 26 27 28 29 30 31 32 33
    * [EDA.ipynb](./EDA/EDA.ipynb): Exploring and filtering data, handling missing values, encoding variables, building the final pre- and post- pandemic datasets, and generating plots for feature distributions, correlations and importance.
* [gen_train_data](./gen_train_data): 
    * [gen_train_data.ipynb](./gen_train_data/gen_train_data.ipynb): Generating training and testing datasets.
* [model_selection](./model_selection): 
    * [hyperparam_tuning.py](./model_selection/hyperparam_tuning.py): Tuning models through a random search of hyperparameters.
    * [cv_metric_gen.py](./model_selection/cv_metric_gen.py): Generating cross-validation metrics and plots for each of the tuned models.
    * [cv_metrics_distr.py](./model_selection/cv_metrics_distr.py): Generating boxplots for each cross-validation metric and tuned model.
    * [test_models.py](./model_selection/test_models.py): Testing tuned models with test dataset.
* [explainability](./explainability):
    * [fit_final_models.py](./explainability/fit_final_models.py): Saving fitted model for each selected final model.
    * [compute_shap_vals.py](./explainability/compute_shap_vals.py): Computing SHAP values for final models.
    * [compute_shap_inter_vals.py](./explainability/compute_shap_inter_vals.py): Computing SHAP interaction values for final models.
    * [shap_plots.py](./explainability/shap_plots.py): Generating SHAP summary plots for the SHAP and SHAP interaction values computed. Comparing major differences between pre- and post-pandemic groups.
Joaquin Torres's avatar
Joaquin Torres committed
34
    * [output](./explainability/output): SHAP and SHAP interaction summary plots.
Joaquin Torres's avatar
Joaquin Torres committed
35

Joaquin Torres's avatar
Joaquin Torres committed
36
## Outtro