@@ -16,10 +16,10 @@ The dataset has not been provided since the authors do not have permission for i
...
@@ -16,10 +16,10 @@ The dataset has not been provided since the authors do not have permission for i
One of the primary challenges we encountered was a significant class imbalance, with a higher number of patients withdrawing from treatment compared to those staying.
One of the primary challenges we encountered was a significant class imbalance, with a higher number of patients withdrawing from treatment compared to those staying.
To address this issue, we implemented four different training approaches or pipelines on both the pre-pandemic and post-pandemic training datasets:
To address this issue, we implemented four different training approaches or pipelines on both the pre-pandemic and post-pandemic training datasets:
1.**Using the Original Dataset**: The models were trained on the original datasets.
1.**Using the Original Dataset (ORIG)**: The models were trained on the original datasets.
2.**Class Weight Adjustment**: The models were trained on the original datasets but were penalized more heavily for misclassifying the minority class.
2.**Class Weight Adjustment (ORIG_CW)**: The models were trained on the original datasets but were penalized more heavily for misclassifying the minority class.
3.**Oversampling**: Additional samples were generated for the minority class (patients staying) to balance the dataset.
3.**Oversampling (OVER)**: Additional samples were generated for the minority class (patients staying) to balance the dataset.
4.**Undersampling**: Samples from the majority class (patients withdrawing) were reduced to achieve balance.
4.**Undersampling (UNDER)**: Samples from the majority class (patients withdrawing) were reduced to achieve balance.
These approaches resulted in multiple training datasets. However, to ensure a fair comparison of the models' performance across different pipelines, we utilized a common test dataset for evaluation, irrespective of the training approach followed.
These approaches resulted in multiple training datasets. However, to ensure a fair comparison of the models' performance across different pipelines, we utilized a common test dataset for evaluation, irrespective of the training approach followed.