In its scariest “feature,” AI can sometimes hallucinate or present false propositions as truth. Sometimes, those claims are ridiculous, but other times, they can seem quite truthful. We’ve heard about stories of lawyers submitting court briefs with false citations and want to avoid egg on our faces, too. What are some helpful practices to avoid that?
We have to accept AI’s limitations, three of which stand out. First, we have to accept that AI is inherently probabilistic. That is, because it doesn’t know what’s 100% accurate, it plays the probabilities and will always be wrong some of the time. Much like crowdsourcing research, it is not perfect. Therefore, it needs our fact-checking.
Second, we have to accept how AI “knows” things. It has no ability to fact-check claims in reality. Instead, it takes in training data and, based on those data, tries to predict a how to respond. It’s not a scientist interested in foundational truths. As with other authorities, we scientists have a responsibility to question authorities to see whether they align with our facts.
Finally, hallucinations often occur the deeper you get into a session, for yet-unknown reasons. The more you communicate with AI in a given instance, the more likely it will spit out a counterfactual remark.
But all hope is not lost! AI is actually pretty good at checking itself. Researchers call this self-evaluation feature “AI as judge.” If you copy-and-paste an entire session into another AI instance and ask it to check itself for accuracy, it’s pretty good at finding its errors. Thus, it can act like sensitive and specific tests in biomedicine. Any test on its own is often not perfectly sensitive and specific. However, when combined, the tests’ probabilities make for a strong combination. Even then, they might not be perfect (strongly sensitive and strongly specific tests still produce false positives), but the results together are weighty. Similar percentages are at play with AI.
A less expensive technique is to ask AI to explain its reasoning (i.e., tell it to “show your work”). This technique helps us fact-check its references and become better scientists and thinkers ourselves. The AI bot can explain its reasoning behind a decision based on its evidence. If that evidence seems dubious, then the conclusions should be rejected.
Even so, the danger remains of accepting evidence that “sounds right” even if it’s not right. That’s why we’re responsible for our own work. It’s a reminder why AI will never be able to replace good, scientific work that’s based on understanding the universe’s truths. AI cannot reality-test items; only we can. At present, it merely accepts anything that it’s read in its training data, and we all know how many falsehoods exist in this large, diverse world.
Like human mistakes, hallucinations seem to be with us. Our minds cannot go to sleep when an AI bot points us to some conclusion. Nonetheless, we can take steps to minimize those mistakes, often using AI itself. The responsibility of fact-checking remains ours, though, especially as scientists who the public relies on to grasp truth. No AI, under current models, can achieve that trust.
