An Interactive Explainer · Machine Learning Metrics

Precision.
Recall.
F1.

The metrics that confused you for years — understood in one scroll.

scroll to explore
01 — The Foundation

Every prediction falls into exactly one of four buckets.

No fifth option. No grey area. Using a stroke detection model as our example — every single patient visit ends up in one of these boxes.

The trick to decode any term instantly: "True/False" tells you whether the model was right or wrong. "Positive/Negative" tells you what the model predicted. Read it backwards — False Negative → model said Negative → but it lied → patient was actually sick.

Predicted: Sick
Predicted: Healthy
Actually
Sick
TP · True Positive
The Good Catch

Stroke patient flagged correctly. The outcome every model is built to maximise.

💀
FN · False Negative
The Silent Killer

Stroke patient sent home as "healthy." In medical AI, this is the worst possible failure.

Actually
Healthy
⚠️
FP · False Positive
The False Alarm

Healthy patient flagged as sick. Costly and stressful — but not fatal.

TN · True Negative
Correct Dismissal

Healthy patient correctly cleared. No unnecessary panic or wasted resources.

02 — The Playground

Drag. Break it.
Watch it click.

Adjust the sliders and watch precision, recall, F1, and accuracy react in real time. Hit the scenario buttons to load real-world examples. Try "The Lazy Model" — 95% accuracy, zero sick people caught.

Live Confusion Matrix — drag anything
TP — True Positive 40
Sick patients correctly caught
TN — True Negative 50
Healthy patients correctly cleared
FP — False Positive 10
Healthy patients wrongly flagged
FN — False Negative 8
Sick patients sent home — the danger
Predicted Sick
Predicted Healthy
Sick
40
True Pos
8
False Neg
Healthy
10
False Pos
50
True Neg
Precision
80%
Recall
83%
F1 Score
82%
Accuracy
85%
Click any cell or metric above
Or try the scenario buttons — load "The Lazy Model" and watch recall hit zero while accuracy stays at 95%.
03 — The Mental Model

One question cuts through every situation.

"Which mistake is more costly — a false alarm, or a miss?"

False alarm is costly → optimize Precision. You'd rather miss a few real cases than cause unnecessary harm to healthy people. Spam filters, loan approvals, criminal sentencing — a false alarm here ruins lives or wastes huge resources.

A miss is costly → optimize Recall. You'd rather send a hundred healthy people for a second check than let one sick person walk out. Cancer screening, stroke detection, fraud systems — a miss here is catastrophic.

F1 is your metric when both matter and your classes are imbalanced. It's the harmonic mean of precision and recall — if either is zero, F1 is zero. No lazy model can hide behind it.

Accuracy is for balanced datasets and textbook problems. On real-world imbalanced data it is a comfortable lie. Always pair it with recall before trusting it.

04 — The Cheat Sheet

What the numbers actually say.

Next time someone shares model metrics in a paper or a meeting, run them through this table before nodding along.

What you seeWhat it actually means
High accuracy, low recallLazy model hiding behind imbalanced data. Do not trust it in production.
High precision, low recallVery cautious. Misses many real cases. Fine for spam, dangerous for medicine.
High recall, low precisionCatches everything, too many false alarms. Good starting point for safety-critical systems.
High F1Actually balanced and genuinely working. The number to trust on imbalanced datasets.
F1 = 0Either precision or recall is zero. The model is broken, full stop.