The metrics that confused you for years — understood in one scroll.
No fifth option. No grey area. Using a stroke detection model as our example — every single patient visit ends up in one of these boxes.
The trick to decode any term instantly: "True/False" tells you whether the model was right or wrong. "Positive/Negative" tells you what the model predicted. Read it backwards — False Negative → model said Negative → but it lied → patient was actually sick.
Stroke patient flagged correctly. The outcome every model is built to maximise.
Stroke patient sent home as "healthy." In medical AI, this is the worst possible failure.
Healthy patient flagged as sick. Costly and stressful — but not fatal.
Healthy patient correctly cleared. No unnecessary panic or wasted resources.
Adjust the sliders and watch precision, recall, F1, and accuracy react in real time. Hit the scenario buttons to load real-world examples. Try "The Lazy Model" — 95% accuracy, zero sick people caught.
"Which mistake is more costly — a false alarm, or a miss?"
False alarm is costly → optimize Precision. You'd rather miss a few real cases than cause unnecessary harm to healthy people. Spam filters, loan approvals, criminal sentencing — a false alarm here ruins lives or wastes huge resources.
A miss is costly → optimize Recall. You'd rather send a hundred healthy people for a second check than let one sick person walk out. Cancer screening, stroke detection, fraud systems — a miss here is catastrophic.
F1 is your metric when both matter and your classes are imbalanced. It's the harmonic mean of precision and recall — if either is zero, F1 is zero. No lazy model can hide behind it.
Accuracy is for balanced datasets and textbook problems. On real-world imbalanced data it is a comfortable lie. Always pair it with recall before trusting it.
Next time someone shares model metrics in a paper or a meeting, run them through this table before nodding along.
| What you see | What it actually means |
|---|---|
| High accuracy, low recall | Lazy model hiding behind imbalanced data. Do not trust it in production. |
| High precision, low recall | Very cautious. Misses many real cases. Fine for spam, dangerous for medicine. |
| High recall, low precision | Catches everything, too many false alarms. Good starting point for safety-critical systems. |
| High F1 | Actually balanced and genuinely working. The number to trust on imbalanced datasets. |
| F1 = 0 | Either precision or recall is zero. The model is broken, full stop. |