Skip to main content

When TenderB makes things up — what hallucinations are, why they happen, and how to work with AI without getting burned

Why every LLM (including TenderB, Claude, ChatGPT, Gemini) sometimes confidently states wrong facts, what you should and shouldn't rely on it for, and the working rule we recommend: 'use AI the way you'd use an intern'.

E
Written by Eden Noelle

Sometimes TenderB will state a fact with full confidence — an address, a name, a capacity figure, a date — and that fact turns out to be wrong. Sometimes it stays wrong even after you correct it. This is called a hallucination, and it happens to every large language model: ours, ChatGPT, Gemini, all of them. It is not a TenderB-specific bug, and it cannot be fully engineered away. What you can do is understand why it happens and use AI in a way that catches these errors before they end up in your tender.

The honest story in three layers

Three properties of every LLM — including ours — explain why hallucinations occur:

1. An LLM is a text predictor, not a database. It fills in plausible words based on patterns in its training data. When the training data strongly associates "Aldi" with "Groningen", asking for the Aldi DC address pulls a plausible-sounding address out of those associations — even when the address is not in your stakeholder list. The model does not know what it does not know.

2. Confidence does not correlate with correctness. A human who is unsure sounds different from a human who is certain. With an LLM, you cannot hear that difference. A hallucination has exactly the same tone as a correct answer. This is the most dangerous property, because it disables your natural radar for unreliability.

3. LLMs are sycophantic. If you say with conviction "no, Rembrandtlaan 2", the model agrees. If you say "no, Rembrandtlaan 5", it also agrees. That is not verification — that is the model deferring to your tone. When the model "confirms" your correction, it is not checking; it is going along.

What this means for the way you use TenderB

AI is an excellent junior colleague who never says "I don't know". That changes how you should deploy it.

Strong at:

  • Structure and outline

  • Language, tone, rewriting, summarising

  • Synthesising what is in a document you provided

  • Generating ideas and first drafts

  • Compliance-mapping a draft against a known evaluation criterion

Weak at:

  • Factual claims that are not explicitly in the source — addresses, names, amounts, dates, capacity figures, contact persons, specifications

  • Anything that has to be verifiable in a courtroom

The practical rule we give every customer: if it is a fact that has to be verifiable in a courtroom, verify it yourself against the source. AI may look it up, summarise it, put it in your house style — but the final check is yours.

What TenderB does about this (and what we don't pretend to do)

We do not promise that we have "solved" hallucinations. Nobody has. What we do is design the workspace so that the human is in the loop at the right moments:

  • Retrieval over your own documents. The model is primarily pointed at your project files and knowledge base, not at its training data. This drastically reduces hallucinations on project-specific facts — but does not eliminate them, especially when a project file is silent on a question and the model has to "fill in".

  • Source attribution per claim (in development). Per sentence, you should be able to see where it came from — and therefore whether it came from anywhere at all.

  • Specialised models for different roles (analysis vs. writing) so the division of labour is clearer.

  • Incident escalation. Conversations like the one that triggered this article are routed to our team. We look at why the model did not say "Aldi is not in your stakeholder list" and use that to improve future safeguards.

We will keep narrowing the gap. We will not pretend the gap is zero.

The mindset: use AI the way you use an intern

The framing we use in workshops: you use AI the way you use an intern — with enthusiasm for the work it delivers, and with judgment for what you check before it goes out the door.

The Aldi-address mistake is one an intern could make. The difference is that an intern, after being corrected three times, would blush and stop guessing. An LLM stays polite and keeps guessing. You need to know that — and once you do, you can work with it productively.

Practical checklist

Before any AI-written passage leaves your hands and goes into a submission, run this short check:

  1. Highlight every concrete fact in the passage: addresses, names, amounts, dates, capacities, contact persons, certification numbers, regulatory references.

  2. For each one, ask: "Is this fact in one of the source documents I gave TenderB?" If yes, open the source and verify it character by character. If no, verify against a primary source (the company's own website, the KvK register, the official publication).

  3. Treat AI agreement as zero evidence. "Yes, you're right" is not verification. It is the model deferring to your tone.

  4. Do not stack corrections. If TenderB has been wrong about the same fact twice in a conversation, do not try a third correction. Open the source yourself, paste the correct fact in, and tell TenderB to use only that.

  5. Save a copy of the source quote next to the claim. When you build a habit of "claim + source paste-in", hallucinations have nowhere to hide.

Did this answer your question?