How to Tell a Good AI Answer from a Confidently Wrong One

Q: What are the warning signs of an AI hallucination in brewing advice?

Suspicious precision without a source ('exactly 11.4% alpha'), perfect confidence on lot-dependent facts, invented citations or product names, and answers that never say 'it depends' on questions that genuinely depend — on your water, your yeast lot, your system.

Short answer: a language model states a wrong hop alpha-acid figure in exactly the same confident tone as a right one — confidence carries zero information. The skill that protects your beer is claim-sorting: split any AI answer into reasoning (auditable by you, usually trustworthy), general process facts (checkable in a text, usually fine) and specific numbers from memory (alpha acids, attenuation, salt grams — the highest-risk category, always verified against a COA, a calculator or the lab before anything touches the kettle). This is the final post in the series, and the one that makes the other four safe to use.

Confidence is not evidence: audit the reasoning, spot-check the textbook facts, and always take the specific numbers to the COA or the lab.

Why the wrongness is invisible

A model doesn’t know when it’s wrong, so it can’t warn you. It generates the most plausible continuation of your conversation — and “Citra typically runs 11–13% alpha” is exactly as plausible-sounding as a fabricated “Citra typically runs 8–9%”. The duff figure arrives wrapped in the same fluent, helpful prose. I’ve written before about where AI burned me, and the pattern was never a wild claim — it was a small, specific, checkable number stated smoothly enough that I didn’t check it.

For a new brewer this is doubly dangerous, because the beginner lacks the background “that doesn’t sound right” alarm that twenty years in a brewhouse installs. The fix is not better instincts — it’s a sorting procedure that doesn’t need them.

Sort every answer into three bins

Bin 1 — Reasoning. “If your dry-hop charge is heavy and the beer hasn’t reached terminal gravity, enzymes from the hops can restart attenuation.” You can audit this yourself: each step either follows or it doesn’t. Reasoning is the safest thing a model produces, and conveniently it’s also the most useful — it’s the why that textbooks compress away.

Bin 2 — General process facts. “Diacetyl is reabsorbed by yeast during a warm rest at the end of fermentation.” Stable, textbook-level, checkable in any IBD-grade reference in two minutes. Spot-check these early in your use of a tool; once a model proves reliable at this level for your topics, provisional trust is reasonable.

Bin 3 — Specific numbers from memory. Alpha acids for a named hop, attenuation for a named strain, grams of gypsum for your water, an IBU contribution. This bin is always verified, no exceptions — not because the model is usually wrong, but because these facts are lot-dependent and system-dependent: alpha varies by crop year, attenuation by pitch and wort, utilisation by your kettle. The right source is never memory — yours or the model’s — it’s the certificate of analysis, the supplier sheet, a brewing calculator, or your own measurements. This is the same discipline as pointing the model at the COA instead of trusting its recall.

The tells worth learning

Some fabrications carry signatures. Suspicious precision: “11.4% alpha” with no source is more suspect than “typically 11–13%”. Missing ‘it depends’: real brewing answers hedge on water, yeast condition and system losses; an answer that never hedges on a question that genuinely depends is performing certainty. Invented specifics: named studies, products or suppliers you can’t find in ten seconds of searching. Drift under challenge: ask “are you sure — what’s that based on?” — a grounded answer holds its shape; a fabricated one often mutates or capitulates instantly. (Note the trap: models also capitulate when they were right, just to be agreeable — so a retraction is a flag to verify, not proof of error.)

And one structural defence beats all the tells: give the model the document. A model reading your actual COA, your actual water report, your actual gravity log has far less room to invent than a model answering from recall. Grounding beats vigilance.

Where this breaks

Verification has costs, and pretending otherwise would break this blog’s house rule. You can’t check everything — the procedure above is triage, and triage means accepting small risk on Bin 2 to afford rigour on Bin 3. Trusted sources disagree — texts differ on diacetyl rest temperatures; verification sometimes upgrades “the model said X” to “sources say X or Y”, which is honest but not tidy. Your own measurements can be the wrong ones — an uncalibrated pH meter out-lies any chatbot. And the failure mode nobody warns beginners about: verification fatigue. Three months in, the checks feel needless because the model has been right so often — and that’s precisely when the wrong alpha figure sails into a 500L batch. The discipline must outlive the novelty.

The bottom line

Don’t ask “can I trust AI?” — ask “which claim in this answer is load-bearing, and which bin is it in?” Audit the reasoning yourself, spot-check the textbook facts, and always take the specific numbers to the COA, the calculator or the lab. Do that, and everything else in this series compounds safely: the question literacy, the question-first loop, the data map and the prompt patterns all rest on this floor. AI is a fast, well-read, occasionally wrong colleague. Brewers have always known how to work with one of those: listen carefully, then measure.

Frequently asked questions

How do I know if an AI brewing answer is wrong? Sort the answer into claims. Specific numbers stated from memory (alpha acids, attenuation ranges, salt additions) are the highest-risk category — verify them against a COA, a calculator or a trusted text. Reasoning and checklists are lower risk because you can follow the logic yourself.

What are the warning signs of an AI hallucination in brewing advice? Suspicious precision without a source (“exactly 11.4% alpha”), perfect confidence on lot-dependent facts, invented citations or product names, and answers that never say “it depends” on questions that genuinely depend — on your water, your yeast lot, your system.

Should brewers stop using AI because it hallucinates? No — they should change what they use it for. Treat it as a fast colleague whose reasoning you can audit, not a reference whose facts you can quote. Plans, explanations and checklists are its strength; lot-specific numbers belong to the COA and the lab.

Part of the Brewing Science track.