diff --git a/README.md b/README.md index 13abc9501a150ae351e31b5c350a647c88fc46e6..94742f4b3c714d40f4e12f6a971faab487946332 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,15 @@ ... ## Dealing with Class Imbalance +One of the primary challenges we encountered was a significant class imbalance, with a higher number of patients withdrawing from treatment compared to those staying. + +To address this issue, we implemented four different training approaches or pipelines on both the pre-pandemic and post-pandemic training datasets: +1. **Using the Original Dataset**: The models were trained on the original datasets. +2. **Class Weight Adjustment**: The models were trained on the original datasets but were penalized more heavily for misclassifying the minority class. +3. **Oversampling**: Additional samples were generated for the minority class (patients staying) to balance the dataset. +4. **Undersampling**: Samples from the majority class (patients withdrawing) were reduced to achieve balance. + +These approaches resulted in multiple training datasets. However, to ensure a fair comparison of the models' performance across different pipelines, we utilized a common test dataset for evaluation, irrespective of the training approach followed. ## Repository This repository is organized as follows: