diff --git a/README.md b/README.md index 59df8727c6de9c518c393f40156b2dd2b7aed66c..f9349ff6c5d1fe356d0c34e233ffba8a7dc3f8a1 100644 --- a/README.md +++ b/README.md @@ -35,28 +35,37 @@ These approaches resulted in multiple training datasets. However, to ensure a fa This repository is organized as follows: -* [EDA](./EDA): - * [results](./EDA/results): Plots about feature distributions, correlations and importance. - * [EDA.ipynb](./EDA/EDA.ipynb): Exploring and filtering data, handling missing values, encoding variables, building the final pre- and post- pandemic datasets, and generating plots for feature distributions, correlations and importance. -* [gen_train_data](./gen_train_data): - * [gen_train_data.ipynb](./gen_train_data/gen_train_data.ipynb): Generating training and testing datasets for each of the pipelines. -* [model_building](./model_building): - * [hyperparam_tuning.py](./model_building/hyperparam_tuning.py): Tuning models through a random search of hyperparameters. - * [cv_metric_gen.py](./model_building/cv_metric_gen.py): Generating cross-validation metrics and plots for each of the tuned models. - * [cv_metrics_distr.py](./model_building/cv_metrics_distr.py): Generating boxplots for each cross-validation metric and tuned model. - * [test_models.py](./model_building/test_models.py): Testing tuned models with test dataset. - * [fit_final_models.py](./model_building/fit_final_models.py): Saving fitted model for each selected final model. - * [results](./model_building/results): - * [hyperparam](./model_building/output/hyperparam): Excel file containing the optimal hyperparameters for each model in each pipeline. - * [cv_metrics](./model_building/output/cv_metrics): Material related to the results of cross-validation: scores, ROC and Precision-Recall curves and boxplots for each metric. - * [testing](./model_building/output/testing): Material related to the results of testing the tuned models: scores, ROC and Precision-Recall curves and confusion matrices. - * [fitted_models](./model_building/output/fitted_models): Final selected trained models. -* [explainability](./explainability): - * [compute_shap_vals.py](./explainability/compute_shap_vals.py): Computing SHAP values for final models. - * [compute_shap_inter_vals.py](./explainability/compute_shap_inter_vals.py): Computing SHAP interaction values for final models. - * [shap_plots.py](./explainability/shap_plots.py): Generating SHAP summary plots for the SHAP and SHAP interaction values computed. Comparing major differences between pre- and post-pandemic groups. - * [results](./explainability/results): SHAP and SHAP interaction summary plots. - +* [01-EDA](./01-EDA): + * [EDA.ipynb](./01-EDA/EDA.ipynb): Exploring and filtering data, handling missing values, encoding variables, building the final pre- and post-pandemic datasets, and generating plots for feature distributions, correlations and importance. + * [results](./01-EDA/results): + * [feature_names](./01-EDA/results/feature_names): Names of the selected individual and social variables. + * [plots](./01-EDA/results/plots) + * [correlations](./01-EDA/results/plots/correlations): Heatmaps to visualize the pairwise correlations beetween features. + * [distributions](./01-EDA/results/plots/distributions): Statistical plots to visualize the distribution of features. + * [feature_importance](./01-EDA/results/plots/feature_importance): Plots to show the importance of each feature in predicting outcomes. +* [02-training_data_generation](./02-training_data_generation): + * [training_data_generation.ipynb](./02-training_data_generation/training_data_generation.ipynb): Generating training and testing datasets for each of the pipelines. +* [03-model_building](./03-model_building): + * [hyperparameter_tuning.py](./03-model_building/hyperparameter_tuning.py): Tuning models through a random search of hyperparameters. + * [cv_metric_generation.py](./03-model_building/cv_metric_generation.py): Generating cross-validation metrics and plots for each of the tuned models. + * [cv_metric_distribution.py](./03-model_building/cv_metric_distribution.py): Generating boxplots for each cross-validation metric and tuned model. + * [models_testing.py](./03-model_building/models_testing.py): Testing tuned models with test dataset. + * [models_final_fitting.py](./03-model_building/models_final_fitting.py): Saving fitted model for each selected final model. + * [results](./03-model_building/results): + * [hyperparam](./03-model_building/output/hyperparam): Excel file containing the optimal hyperparameters for each model in each pipeline. + * [cv_metrics](./03-model_building/output/cv_metrics): Material related to the results of cross-validation: scores, ROC and Precision-Recall curves and boxplots for each metric. + * [testing](./03-model_building/output/testing): Material related to the results of testing the tuned models: scores, ROC and Precision-Recall curves and confusion matrices. + * [fitted_models](./03-model_building/output/fitted_models): Final selected trained models. +* [04-explainability](./04-explainability): + * [shap_vals_computation.py](./04-explainability/shap_vals_computation.py): Computing SHAP values for final models. + * [shap_inter_vals_computation.py](./04-explainability/shap_inter_vals_computation.py): Computing SHAP interaction values for final models. + * [shap_plots.ipynb](./04-explainability/shap_plots.ipynb): Generating SHAP summary plots for the SHAP and SHAP interaction values computed. Comparing major differences between pre- and post-pandemic groups. + * [results](./04-explainability/results): + * [plots](./04-explainability/plots): SHAP summary and summary interaction plots as well as + * [shap_summary](./04-explainability/plots/shap_summary): SHAP summary plots. + * [shap_inter_summary](./04-explainability/plots/shap_inter_summary): SHAP summary interaction plots. + * [heatmaps_interactions](./04-explainability/plots/heatmaps_interactions): Heatmaps representing the differences in interactions between pre-pandemic and post-pandemic groups. + ## Contact For any inquiry you can contact: