diff --git a/README.md b/README.md index c426e6f126b0b020e82f184e5ffc6e7630741ea9..59df8727c6de9c518c393f40156b2dd2b7aed66c 100644 --- a/README.md +++ b/README.md @@ -40,17 +40,17 @@ This repository is organized as follows: * [EDA.ipynb](./EDA/EDA.ipynb): Exploring and filtering data, handling missing values, encoding variables, building the final pre- and post- pandemic datasets, and generating plots for feature distributions, correlations and importance. * [gen_train_data](./gen_train_data): * [gen_train_data.ipynb](./gen_train_data/gen_train_data.ipynb): Generating training and testing datasets for each of the pipelines. -* [model_selection](./model_selection): - * [hyperparam_tuning.py](./model_selection/hyperparam_tuning.py): Tuning models through a random search of hyperparameters. - * [cv_metric_gen.py](./model_selection/cv_metric_gen.py): Generating cross-validation metrics and plots for each of the tuned models. - * [cv_metrics_distr.py](./model_selection/cv_metrics_distr.py): Generating boxplots for each cross-validation metric and tuned model. - * [test_models.py](./model_selection/test_models.py): Testing tuned models with test dataset. - * [fit_final_models.py](./model_selection/fit_final_models.py): Saving fitted model for each selected final model. - * [results](./model_selection/results): - * [hyperparam](./model_selection/output/hyperparam): Excel file containing the optimal hyperparameters for each model in each pipeline. - * [cv_metrics](./model_selection/output/cv_metrics): Material related to the results of cross-validation: scores, ROC and Precision-Recall curves and boxplots for each metric. - * [testing](./model_selection/output/testing): Material related to the results of testing the tuned models: scores, ROC and Precision-Recall curves and confusion matrices. - * [fitted_models](./model_selection/output/fitted_models): Final selected trained models. +* [model_building](./model_building): + * [hyperparam_tuning.py](./model_building/hyperparam_tuning.py): Tuning models through a random search of hyperparameters. + * [cv_metric_gen.py](./model_building/cv_metric_gen.py): Generating cross-validation metrics and plots for each of the tuned models. + * [cv_metrics_distr.py](./model_building/cv_metrics_distr.py): Generating boxplots for each cross-validation metric and tuned model. + * [test_models.py](./model_building/test_models.py): Testing tuned models with test dataset. + * [fit_final_models.py](./model_building/fit_final_models.py): Saving fitted model for each selected final model. + * [results](./model_building/results): + * [hyperparam](./model_building/output/hyperparam): Excel file containing the optimal hyperparameters for each model in each pipeline. + * [cv_metrics](./model_building/output/cv_metrics): Material related to the results of cross-validation: scores, ROC and Precision-Recall curves and boxplots for each metric. + * [testing](./model_building/output/testing): Material related to the results of testing the tuned models: scores, ROC and Precision-Recall curves and confusion matrices. + * [fitted_models](./model_building/output/fitted_models): Final selected trained models. * [explainability](./explainability): * [compute_shap_vals.py](./explainability/compute_shap_vals.py): Computing SHAP values for final models. * [compute_shap_inter_vals.py](./explainability/compute_shap_inter_vals.py): Computing SHAP interaction values for final models. diff --git a/explainability/compute_shap_inter_vals.py b/explainability/compute_shap_inter_vals.py index 6c1399a5ccea5fc7b03c5118ce0a7ed15350f6c2..dec8696dab775f315142af4ce2444c2f029a5a27 100644 --- a/explainability/compute_shap_inter_vals.py +++ b/explainability/compute_shap_inter_vals.py @@ -70,7 +70,7 @@ if __name__ == "__main__": print(f"{group}-{method_names[j]}") method_name = method_names[j] model_name = model_choices[method_name] - model_path = f"../model_selection/results/fitted_models/{group}_{method_names[j]}_{model_name}.pkl" + model_path = f"../model_building/results/fitted_models/{group}_{method_names[j]}_{model_name}.pkl" # Load the fitted model from disk with open(model_path, 'rb') as file: fitted_model = pickle.load(file) diff --git a/explainability/compute_shap_vals.py b/explainability/compute_shap_vals.py index 4bdd9c6d037b132c7f3559d33264cb0a274ac03f..42da00ee8b7b3a1e36ab98083ce6612a037fdd0c 100644 --- a/explainability/compute_shap_vals.py +++ b/explainability/compute_shap_vals.py @@ -70,7 +70,7 @@ if __name__ == "__main__": print(f"{group}-{method_names[j]}") method_name = method_names[j] model_name = model_choices[method_name] - model_path = f"../model_selection/results/fitted_models/{group}_{method_names[j]}_{model_name}.pkl" + model_path = f"../model_building/results/fitted_models/{group}_{method_names[j]}_{model_name}.pkl" # Load the fitted model from disk with open(model_path, 'rb') as file: fitted_model = pickle.load(file) diff --git a/model_selection/cv_metric_distr.py b/model_building/cv_metric_distr.py similarity index 100% rename from model_selection/cv_metric_distr.py rename to model_building/cv_metric_distr.py diff --git a/model_selection/cv_metric_gen.py b/model_building/cv_metric_gen.py similarity index 100% rename from model_selection/cv_metric_gen.py rename to model_building/cv_metric_gen.py diff --git a/model_selection/fit_final_models.py b/model_building/fit_final_models.py similarity index 100% rename from model_selection/fit_final_models.py rename to model_building/fit_final_models.py diff --git a/model_selection/hyperparam_tuning.py b/model_building/hyperparam_tuning.py similarity index 100% rename from model_selection/hyperparam_tuning.py rename to model_building/hyperparam_tuning.py diff --git a/model_selection/results/cv_metrics/curves/post_ORIG.svg b/model_building/results/cv_metrics/curves/post_ORIG.svg similarity index 100% rename from model_selection/results/cv_metrics/curves/post_ORIG.svg rename to model_building/results/cv_metrics/curves/post_ORIG.svg diff --git a/model_selection/results/cv_metrics/curves/post_ORIG_CW.svg b/model_building/results/cv_metrics/curves/post_ORIG_CW.svg similarity index 100% rename from model_selection/results/cv_metrics/curves/post_ORIG_CW.svg rename to model_building/results/cv_metrics/curves/post_ORIG_CW.svg diff --git a/model_selection/results/cv_metrics/curves/post_OVER.svg b/model_building/results/cv_metrics/curves/post_OVER.svg similarity index 100% rename from model_selection/results/cv_metrics/curves/post_OVER.svg rename to model_building/results/cv_metrics/curves/post_OVER.svg diff --git a/model_selection/results/cv_metrics/curves/post_UNDER.svg b/model_building/results/cv_metrics/curves/post_UNDER.svg similarity index 100% rename from model_selection/results/cv_metrics/curves/post_UNDER.svg rename to model_building/results/cv_metrics/curves/post_UNDER.svg diff --git a/model_selection/results/cv_metrics/curves/pre_ORIG.svg b/model_building/results/cv_metrics/curves/pre_ORIG.svg similarity index 100% rename from model_selection/results/cv_metrics/curves/pre_ORIG.svg rename to model_building/results/cv_metrics/curves/pre_ORIG.svg diff --git a/model_selection/results/cv_metrics/curves/pre_ORIG_CW.svg b/model_building/results/cv_metrics/curves/pre_ORIG_CW.svg similarity index 100% rename from model_selection/results/cv_metrics/curves/pre_ORIG_CW.svg rename to model_building/results/cv_metrics/curves/pre_ORIG_CW.svg diff --git a/model_selection/results/cv_metrics/curves/pre_OVER.svg b/model_building/results/cv_metrics/curves/pre_OVER.svg similarity index 100% rename from model_selection/results/cv_metrics/curves/pre_OVER.svg rename to model_building/results/cv_metrics/curves/pre_OVER.svg diff --git a/model_selection/results/cv_metrics/curves/pre_UNDER.svg b/model_building/results/cv_metrics/curves/pre_UNDER.svg similarity index 100% rename from model_selection/results/cv_metrics/curves/pre_UNDER.svg rename to model_building/results/cv_metrics/curves/pre_UNDER.svg diff --git a/model_selection/results/cv_metrics/distributions/post_ORIG.svg b/model_building/results/cv_metrics/distributions/post_ORIG.svg similarity index 100% rename from model_selection/results/cv_metrics/distributions/post_ORIG.svg rename to model_building/results/cv_metrics/distributions/post_ORIG.svg diff --git a/model_selection/results/cv_metrics/distributions/post_ORIG_CW.svg b/model_building/results/cv_metrics/distributions/post_ORIG_CW.svg similarity index 100% rename from model_selection/results/cv_metrics/distributions/post_ORIG_CW.svg rename to model_building/results/cv_metrics/distributions/post_ORIG_CW.svg diff --git a/model_selection/results/cv_metrics/distributions/post_OVER.svg b/model_building/results/cv_metrics/distributions/post_OVER.svg similarity index 100% rename from model_selection/results/cv_metrics/distributions/post_OVER.svg rename to model_building/results/cv_metrics/distributions/post_OVER.svg diff --git a/model_selection/results/cv_metrics/distributions/post_UNDER.svg b/model_building/results/cv_metrics/distributions/post_UNDER.svg similarity index 100% rename from model_selection/results/cv_metrics/distributions/post_UNDER.svg rename to model_building/results/cv_metrics/distributions/post_UNDER.svg diff --git a/model_selection/results/cv_metrics/distributions/pre_ORIG.svg b/model_building/results/cv_metrics/distributions/pre_ORIG.svg similarity index 100% rename from model_selection/results/cv_metrics/distributions/pre_ORIG.svg rename to model_building/results/cv_metrics/distributions/pre_ORIG.svg diff --git a/model_selection/results/cv_metrics/distributions/pre_ORIG_CW.svg b/model_building/results/cv_metrics/distributions/pre_ORIG_CW.svg similarity index 100% rename from model_selection/results/cv_metrics/distributions/pre_ORIG_CW.svg rename to model_building/results/cv_metrics/distributions/pre_ORIG_CW.svg diff --git a/model_selection/results/cv_metrics/distributions/pre_OVER.svg b/model_building/results/cv_metrics/distributions/pre_OVER.svg similarity index 100% rename from model_selection/results/cv_metrics/distributions/pre_OVER.svg rename to model_building/results/cv_metrics/distributions/pre_OVER.svg diff --git a/model_selection/results/cv_metrics/distributions/pre_UNDER.svg b/model_building/results/cv_metrics/distributions/pre_UNDER.svg similarity index 100% rename from model_selection/results/cv_metrics/distributions/pre_UNDER.svg rename to model_building/results/cv_metrics/distributions/pre_UNDER.svg diff --git a/model_selection/results/cv_metrics/metrics.xlsx b/model_building/results/cv_metrics/metrics.xlsx similarity index 100% rename from model_selection/results/cv_metrics/metrics.xlsx rename to model_building/results/cv_metrics/metrics.xlsx diff --git a/model_selection/results/fitted_models/post_ORIG_CW_RF.pkl b/model_building/results/fitted_models/post_ORIG_CW_RF.pkl similarity index 100% rename from model_selection/results/fitted_models/post_ORIG_CW_RF.pkl rename to model_building/results/fitted_models/post_ORIG_CW_RF.pkl diff --git a/model_selection/results/fitted_models/post_ORIG_XGB.pkl b/model_building/results/fitted_models/post_ORIG_XGB.pkl similarity index 100% rename from model_selection/results/fitted_models/post_ORIG_XGB.pkl rename to model_building/results/fitted_models/post_ORIG_XGB.pkl diff --git a/model_selection/results/fitted_models/post_OVER_XGB.pkl b/model_building/results/fitted_models/post_OVER_XGB.pkl similarity index 100% rename from model_selection/results/fitted_models/post_OVER_XGB.pkl rename to model_building/results/fitted_models/post_OVER_XGB.pkl diff --git a/model_selection/results/fitted_models/post_UNDER_XGB.pkl b/model_building/results/fitted_models/post_UNDER_XGB.pkl similarity index 100% rename from model_selection/results/fitted_models/post_UNDER_XGB.pkl rename to model_building/results/fitted_models/post_UNDER_XGB.pkl diff --git a/model_selection/results/fitted_models/pre_ORIG_CW_RF.pkl b/model_building/results/fitted_models/pre_ORIG_CW_RF.pkl similarity index 100% rename from model_selection/results/fitted_models/pre_ORIG_CW_RF.pkl rename to model_building/results/fitted_models/pre_ORIG_CW_RF.pkl diff --git a/model_selection/results/fitted_models/pre_ORIG_XGB.pkl b/model_building/results/fitted_models/pre_ORIG_XGB.pkl similarity index 100% rename from model_selection/results/fitted_models/pre_ORIG_XGB.pkl rename to model_building/results/fitted_models/pre_ORIG_XGB.pkl diff --git a/model_selection/results/fitted_models/pre_OVER_XGB.pkl b/model_building/results/fitted_models/pre_OVER_XGB.pkl similarity index 100% rename from model_selection/results/fitted_models/pre_OVER_XGB.pkl rename to model_building/results/fitted_models/pre_OVER_XGB.pkl diff --git a/model_selection/results/fitted_models/pre_UNDER_XGB.pkl b/model_building/results/fitted_models/pre_UNDER_XGB.pkl similarity index 100% rename from model_selection/results/fitted_models/pre_UNDER_XGB.pkl rename to model_building/results/fitted_models/pre_UNDER_XGB.pkl diff --git a/model_selection/results/hyperparam/hyperparamers.xlsx b/model_building/results/hyperparam/hyperparamers.xlsx similarity index 100% rename from model_selection/results/hyperparam/hyperparamers.xlsx rename to model_building/results/hyperparam/hyperparamers.xlsx diff --git a/model_selection/results/testing/plots/post_ORIG.svg b/model_building/results/testing/plots/post_ORIG.svg similarity index 100% rename from model_selection/results/testing/plots/post_ORIG.svg rename to model_building/results/testing/plots/post_ORIG.svg diff --git a/model_selection/results/testing/plots/post_ORIG_CW.svg b/model_building/results/testing/plots/post_ORIG_CW.svg similarity index 100% rename from model_selection/results/testing/plots/post_ORIG_CW.svg rename to model_building/results/testing/plots/post_ORIG_CW.svg diff --git a/model_selection/results/testing/plots/post_OVER.svg b/model_building/results/testing/plots/post_OVER.svg similarity index 100% rename from model_selection/results/testing/plots/post_OVER.svg rename to model_building/results/testing/plots/post_OVER.svg diff --git a/model_selection/results/testing/plots/post_UNDER.svg b/model_building/results/testing/plots/post_UNDER.svg similarity index 100% rename from model_selection/results/testing/plots/post_UNDER.svg rename to model_building/results/testing/plots/post_UNDER.svg diff --git a/model_selection/results/testing/plots/pre_ORIG.svg b/model_building/results/testing/plots/pre_ORIG.svg similarity index 100% rename from model_selection/results/testing/plots/pre_ORIG.svg rename to model_building/results/testing/plots/pre_ORIG.svg diff --git a/model_selection/results/testing/plots/pre_ORIG_CW.svg b/model_building/results/testing/plots/pre_ORIG_CW.svg similarity index 100% rename from model_selection/results/testing/plots/pre_ORIG_CW.svg rename to model_building/results/testing/plots/pre_ORIG_CW.svg diff --git a/model_selection/results/testing/plots/pre_OVER.svg b/model_building/results/testing/plots/pre_OVER.svg similarity index 100% rename from model_selection/results/testing/plots/pre_OVER.svg rename to model_building/results/testing/plots/pre_OVER.svg diff --git a/model_selection/results/testing/plots/pre_UNDER.svg b/model_building/results/testing/plots/pre_UNDER.svg similarity index 100% rename from model_selection/results/testing/plots/pre_UNDER.svg rename to model_building/results/testing/plots/pre_UNDER.svg diff --git a/model_selection/results/testing/testing_tuned_models.xlsx b/model_building/results/testing/testing_tuned_models.xlsx similarity index 100% rename from model_selection/results/testing/testing_tuned_models.xlsx rename to model_building/results/testing/testing_tuned_models.xlsx diff --git a/model_selection/test_models.py b/model_building/test_models.py similarity index 100% rename from model_selection/test_models.py rename to model_building/test_models.py