Baseline 1 2 3
Model Type LASSO Random Forest Random Forest Random Forest
Metrics
Unlabeled data Ignored unlabeled data Assumed negative, ignored actual negative cases Assumed negative, ignored actual negative cases Assumed negative, ignored actual negative cases
Imbalanced data N/A Downsample, class weight Downsample, subsample balanced weight Repeated random subsampling
Validation method 80% / 20% split validation Nested cross validation Nested cross validation Nested cross validation
Optimized metric in hyperparameter selection None F(beta=10) using 100 random iterations F(beta=10) × 100 + PU score using 100 random iterations F(beta=10) × 100 + PU score using 60 random iterations
Scores
Fbeta=10 score (all data) 0.32 0.71 0.71 0.72
PU score (all data) 0.93 9.22 12.45 10.69
Recall (labeled data) 0.92 0.81 0.77 0.80
Brier score loss (labeled data) 0.10 0.14 0.16 0.15
Brier score loss (unlabeled data assumed negative) 0.60 0.06 0.03 0.04
F1 score (labeled data) 0.90 0.86 0.84 0.86
Precision (labeled data) 0.88 0.91 0.93 0.91
Probability of unlabeled cases to be labeled as positive 0.93 0.07 0.04 0.06