@@ -35,28 +35,37 @@ These approaches resulted in multiple training datasets. However, to ensure a fa
This repository is organized as follows:
*[EDA](./EDA):
*[results](./EDA/results): Plots about feature distributions, correlations and importance.
*[EDA.ipynb](./EDA/EDA.ipynb): Exploring and filtering data, handling missing values, encoding variables, building the final pre- and post- pandemic datasets, and generating plots for feature distributions, correlations and importance.
*[gen_train_data](./gen_train_data):
*[gen_train_data.ipynb](./gen_train_data/gen_train_data.ipynb): Generating training and testing datasets for each of the pipelines.
*[model_building](./model_building):
*[hyperparam_tuning.py](./model_building/hyperparam_tuning.py): Tuning models through a random search of hyperparameters.
*[cv_metric_gen.py](./model_building/cv_metric_gen.py): Generating cross-validation metrics and plots for each of the tuned models.
*[cv_metrics_distr.py](./model_building/cv_metrics_distr.py): Generating boxplots for each cross-validation metric and tuned model.
*[test_models.py](./model_building/test_models.py): Testing tuned models with test dataset.
*[fit_final_models.py](./model_building/fit_final_models.py): Saving fitted model for each selected final model.
*[results](./model_building/results):
*[hyperparam](./model_building/output/hyperparam): Excel file containing the optimal hyperparameters for each model in each pipeline.
*[cv_metrics](./model_building/output/cv_metrics): Material related to the results of cross-validation: scores, ROC and Precision-Recall curves and boxplots for each metric.
*[testing](./model_building/output/testing): Material related to the results of testing the tuned models: scores, ROC and Precision-Recall curves and confusion matrices.
*[fitted_models](./model_building/output/fitted_models): Final selected trained models.
*[explainability](./explainability):
*[compute_shap_vals.py](./explainability/compute_shap_vals.py): Computing SHAP values for final models.
*[compute_shap_inter_vals.py](./explainability/compute_shap_inter_vals.py): Computing SHAP interaction values for final models.
*[shap_plots.py](./explainability/shap_plots.py): Generating SHAP summary plots for the SHAP and SHAP interaction values computed. Comparing major differences between pre- and post-pandemic groups.
*[results](./explainability/results): SHAP and SHAP interaction summary plots.
*[01-EDA](./01-EDA):
*[EDA.ipynb](./01-EDA/EDA.ipynb): Exploring and filtering data, handling missing values, encoding variables, building the final pre- and post-pandemic datasets, and generating plots for feature distributions, correlations and importance.
*[results](./01-EDA/results):
*[feature_names](./01-EDA/results/feature_names): Names of the selected individual and social variables.
*[plots](./01-EDA/results/plots)
*[correlations](./01-EDA/results/plots/correlations): Heatmaps to visualize the pairwise correlations beetween features.
*[distributions](./01-EDA/results/plots/distributions): Statistical plots to visualize the distribution of features.
*[feature_importance](./01-EDA/results/plots/feature_importance): Plots to show the importance of each feature in predicting outcomes.
*[training_data_generation.ipynb](./02-training_data_generation/training_data_generation.ipynb): Generating training and testing datasets for each of the pipelines.
*[03-model_building](./03-model_building):
*[hyperparameter_tuning.py](./03-model_building/hyperparameter_tuning.py): Tuning models through a random search of hyperparameters.
*[cv_metric_generation.py](./03-model_building/cv_metric_generation.py): Generating cross-validation metrics and plots for each of the tuned models.
*[cv_metric_distribution.py](./03-model_building/cv_metric_distribution.py): Generating boxplots for each cross-validation metric and tuned model.
*[models_testing.py](./03-model_building/models_testing.py): Testing tuned models with test dataset.
*[models_final_fitting.py](./03-model_building/models_final_fitting.py): Saving fitted model for each selected final model.
*[results](./03-model_building/results):
*[hyperparam](./03-model_building/output/hyperparam): Excel file containing the optimal hyperparameters for each model in each pipeline.
*[cv_metrics](./03-model_building/output/cv_metrics): Material related to the results of cross-validation: scores, ROC and Precision-Recall curves and boxplots for each metric.
*[testing](./03-model_building/output/testing): Material related to the results of testing the tuned models: scores, ROC and Precision-Recall curves and confusion matrices.
*[fitted_models](./03-model_building/output/fitted_models): Final selected trained models.
*[04-explainability](./04-explainability):
*[shap_vals_computation.py](./04-explainability/shap_vals_computation.py): Computing SHAP values for final models.
*[shap_inter_vals_computation.py](./04-explainability/shap_inter_vals_computation.py): Computing SHAP interaction values for final models.
*[shap_plots.ipynb](./04-explainability/shap_plots.ipynb): Generating SHAP summary plots for the SHAP and SHAP interaction values computed. Comparing major differences between pre- and post-pandemic groups.
*[results](./04-explainability/results):
*[plots](./04-explainability/plots): SHAP summary and summary interaction plots as well as
*[heatmaps_interactions](./04-explainability/plots/heatmaps_interactions): Heatmaps representing the differences in interactions between pre-pandemic and post-pandemic groups.