Performance Metrics
Recall is 3.2 points below target. In a fraud use case, that gap means missed positives and avoidable exposure.
A fraud detection model performs adequately on average, but shows measurable drift, weak recall on high-value fraud cases, and subgroup gaps that should be fixed before scaling.
ML Health Check
71 / 100
Needs Attention
Performance
78/100
AUC is strong, recall is below target.
Drift
69/100
Digital goods traffic shifted sharply.
Bias & fairness
63/100
Recall gap on digital goods and gaming.
Production readiness
70/100
Monitoring and retention gaps remain.
Recall is 3.2 points below target. In a fraud use case, that gap means missed positives and avoidable exposure.
Current traffic differs from the reference window. The model is also overconfident above 0.60, which can create noisy escalation for review teams.
Logging and labels are usable, but monitoring and feature snapshot retention are too weak for reliable incident investigation.