Precision & Recall — An Interactive Explainer

01 — The Foundation

Every prediction falls into exactly one of four buckets.

No fifth option. No grey area. Using a stroke detection model as our example — every single patient visit ends up in one of these boxes.

The trick to decode any term instantly: "True/False" tells you whether the model was right or wrong. "Positive/Negative" tells you what the model predicted. Read it backwards — False Negative → model said Negative → but it lied → patient was actually sick.

Predicted: Sick

Predicted: Healthy

Actually
Sick

✅

TP · True Positive

The Good Catch

Stroke patient flagged correctly. The outcome every model is built to maximise.

💀

FN · False Negative

The Silent Killer

Stroke patient sent home as "healthy." In medical AI, this is the worst possible failure.

Actually
Healthy

⚠️

FP · False Positive

The False Alarm

Healthy patient flagged as sick. Costly and stressful — but not fatal.

✅

TN · True Negative

Correct Dismissal

Healthy patient correctly cleared. No unnecessary panic or wasted resources.

02 — The Playground

Drag. Break it.
Watch it click.

Adjust the sliders and watch precision, recall, F1, and accuracy react in real time. Hit the scenario buttons to load real-world examples. Try "The Lazy Model" — 95% accuracy, zero sick people caught.

Live Confusion Matrix — drag anything

TP — True Positive 40

Sick patients correctly caught

TN — True Negative 50

Healthy patients correctly cleared

FP — False Positive 10

Healthy patients wrongly flagged

FN — False Negative 8

Sick patients sent home — the danger

Predicted Sick

Predicted Healthy

Sick

40

True Pos

8

False Neg

Healthy

10

False Pos

50

True Neg

Precision

80%

Recall

83%

F1 Score

82%

Accuracy

85%

Click any cell or metric above

Or try the scenario buttons — load "The Lazy Model" and watch recall hit zero while accuracy stays at 95%.

03 — The Mental Model

One question cuts through every situation.

"Which mistake is more costly — a false alarm, or a miss?"

False alarm is costly → optimize Precision. You'd rather miss a few real cases than cause unnecessary harm to healthy people. Spam filters, loan approvals, criminal sentencing — a false alarm here ruins lives or wastes huge resources.

A miss is costly → optimize Recall. You'd rather send a hundred healthy people for a second check than let one sick person walk out. Cancer screening, stroke detection, fraud systems — a miss here is catastrophic.

F1 is your metric when both matter and your classes are imbalanced. It's the harmonic mean of precision and recall — if either is zero, F1 is zero. No lazy model can hide behind it.

Accuracy is for balanced datasets and textbook problems. On real-world imbalanced data it is a comfortable lie. Always pair it with recall before trusting it.

04 — The Cheat Sheet

What the numbers actually say.

Next time someone shares model metrics in a paper or a meeting, run them through this table before nodding along.

What you see	What it actually means
High accuracy, low recall	Lazy model hiding behind imbalanced data. Do not trust it in production.
High precision, low recall	Very cautious. Misses many real cases. Fine for spam, dangerous for medicine.
High recall, low precision	Catches everything, too many false alarms. Good starting point for safety-critical systems.
High F1	Actually balanced and genuinely working. The number to trust on imbalanced datasets.
F1 = 0	Either precision or recall is zero. The model is broken, full stop.

Precision.Recall.F1.

Every prediction falls into exactly one of four buckets.

Drag. Break it.Watch it click.

One question cuts through every situation.

What the numbers actually say.

Precision.
Recall.
F1.

Drag. Break it.
Watch it click.