Commit 666fe6e1 authored by Joaquin Torres's avatar Joaquin Torres

Update README.md

parent 7ff8fa5c
......@@ -35,28 +35,37 @@ These approaches resulted in multiple training datasets. However, to ensure a fa
This repository is organized as follows:
* [EDA](./EDA):
* [results](./EDA/results): Plots about feature distributions, correlations and importance.
* [EDA.ipynb](./EDA/EDA.ipynb): Exploring and filtering data, handling missing values, encoding variables, building the final pre- and post- pandemic datasets, and generating plots for feature distributions, correlations and importance.
* [gen_train_data](./gen_train_data):
* [gen_train_data.ipynb](./gen_train_data/gen_train_data.ipynb): Generating training and testing datasets for each of the pipelines.
* [model_building](./model_building):
* [hyperparam_tuning.py](./model_building/hyperparam_tuning.py): Tuning models through a random search of hyperparameters.
* [cv_metric_gen.py](./model_building/cv_metric_gen.py): Generating cross-validation metrics and plots for each of the tuned models.
* [cv_metrics_distr.py](./model_building/cv_metrics_distr.py): Generating boxplots for each cross-validation metric and tuned model.
* [test_models.py](./model_building/test_models.py): Testing tuned models with test dataset.
* [fit_final_models.py](./model_building/fit_final_models.py): Saving fitted model for each selected final model.
* [results](./model_building/results):
* [hyperparam](./model_building/output/hyperparam): Excel file containing the optimal hyperparameters for each model in each pipeline.
* [cv_metrics](./model_building/output/cv_metrics): Material related to the results of cross-validation: scores, ROC and Precision-Recall curves and boxplots for each metric.
* [testing](./model_building/output/testing): Material related to the results of testing the tuned models: scores, ROC and Precision-Recall curves and confusion matrices.
* [fitted_models](./model_building/output/fitted_models): Final selected trained models.
* [explainability](./explainability):
* [compute_shap_vals.py](./explainability/compute_shap_vals.py): Computing SHAP values for final models.
* [compute_shap_inter_vals.py](./explainability/compute_shap_inter_vals.py): Computing SHAP interaction values for final models.
* [shap_plots.py](./explainability/shap_plots.py): Generating SHAP summary plots for the SHAP and SHAP interaction values computed. Comparing major differences between pre- and post-pandemic groups.
* [results](./explainability/results): SHAP and SHAP interaction summary plots.
* [01-EDA](./01-EDA):
* [EDA.ipynb](./01-EDA/EDA.ipynb): Exploring and filtering data, handling missing values, encoding variables, building the final pre- and post-pandemic datasets, and generating plots for feature distributions, correlations and importance.
* [results](./01-EDA/results):
* [feature_names](./01-EDA/results/feature_names): Names of the selected individual and social variables.
* [plots](./01-EDA/results/plots)
* [correlations](./01-EDA/results/plots/correlations): Heatmaps to visualize the pairwise correlations beetween features.
* [distributions](./01-EDA/results/plots/distributions): Statistical plots to visualize the distribution of features.
* [feature_importance](./01-EDA/results/plots/feature_importance): Plots to show the importance of each feature in predicting outcomes.
* [02-training_data_generation](./02-training_data_generation):
* [training_data_generation.ipynb](./02-training_data_generation/training_data_generation.ipynb): Generating training and testing datasets for each of the pipelines.
* [03-model_building](./03-model_building):
* [hyperparameter_tuning.py](./03-model_building/hyperparameter_tuning.py): Tuning models through a random search of hyperparameters.
* [cv_metric_generation.py](./03-model_building/cv_metric_generation.py): Generating cross-validation metrics and plots for each of the tuned models.
* [cv_metric_distribution.py](./03-model_building/cv_metric_distribution.py): Generating boxplots for each cross-validation metric and tuned model.
* [models_testing.py](./03-model_building/models_testing.py): Testing tuned models with test dataset.
* [models_final_fitting.py](./03-model_building/models_final_fitting.py): Saving fitted model for each selected final model.
* [results](./03-model_building/results):
* [hyperparam](./03-model_building/output/hyperparam): Excel file containing the optimal hyperparameters for each model in each pipeline.
* [cv_metrics](./03-model_building/output/cv_metrics): Material related to the results of cross-validation: scores, ROC and Precision-Recall curves and boxplots for each metric.
* [testing](./03-model_building/output/testing): Material related to the results of testing the tuned models: scores, ROC and Precision-Recall curves and confusion matrices.
* [fitted_models](./03-model_building/output/fitted_models): Final selected trained models.
* [04-explainability](./04-explainability):
* [shap_vals_computation.py](./04-explainability/shap_vals_computation.py): Computing SHAP values for final models.
* [shap_inter_vals_computation.py](./04-explainability/shap_inter_vals_computation.py): Computing SHAP interaction values for final models.
* [shap_plots.ipynb](./04-explainability/shap_plots.ipynb): Generating SHAP summary plots for the SHAP and SHAP interaction values computed. Comparing major differences between pre- and post-pandemic groups.
* [results](./04-explainability/results):
* [plots](./04-explainability/plots): SHAP summary and summary interaction plots as well as
* [shap_summary](./04-explainability/plots/shap_summary): SHAP summary plots.
* [shap_inter_summary](./04-explainability/plots/shap_inter_summary): SHAP summary interaction plots.
* [heatmaps_interactions](./04-explainability/plots/heatmaps_interactions): Heatmaps representing the differences in interactions between pre-pandemic and post-pandemic groups.
## Contact
For any inquiry you can contact:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment