8 AI hallucinations examples

Although AI can write, paraphrase, translate, recommend, and respond extremely quickly, fast does not necessarily mean accurate. The main challenge when creating generative AI models involves the risk of generating hallucinations in which the AI system creates content that is untrue, misleading, or made up entirely.

The results of AI hallucinations may be serious in some instances. Some reported cases of AI hallucinations include false legal references, incorrect medical transcription, fictitious books, flawed customer support advice, and even embarrassing experiences for well-known corporations. Organizations relying on AI systems in customer service, medicine, search, law, or reporting face substantial risks associated with hallucinations.

In this article, we review examples of AI hallucinations in various domains, identify their causes and potential consequences, and learn how to minimize the probability of such problems.

What are AI hallucinations, and why do they happen?

An example of AI hallucination involves instances where a language model, such as GPT, Claude, Google Gemini, or any other generative AI system, generates information that is either erroneous, unsupported, or entirely fictional. While the answer can appear well-formulated and credible, the information provided within it could be misleading.

The nature of hallucinations is rather diverse. They can include situations where the generative AI makes up citations, twists corporate policies, fabricates product features, generates fictitious medical information, or formulates an answer that contradicts the inputted information. It is worth noting that these responses can be highly credible and believable, making it difficult for users to identify them.

A significant factor contributing to the occurrence of hallucinations is the structure and function of the language model involved. Unlike humans, who possess real-world knowledge, generative AI uses algorithms that predict the most probable next word based on previous data patterns. As such, the language model fills in the blanks using plausible information.

The problem may be in the absence of grounding, which means that most AI systems create outputs by using training data instead of verifying them with a document, database, or information provided by an official source. The absence of connection with a trustworthy source may result in the creation of an answer based on outdated or unfounded information.

Training data may also influence AI systems and cause hallucination. If the system were trained on biased, incorrect, fabricated, or outdated web content, then the problems mentioned above would arise in output messages. In addition, prompt design becomes very important. It is crucial because if a person makes a non-specific prompt or asks for a certain number of answers, the system tries to fulfil it.

AI hallucinations examples

AI-based systems are currently being used for such purposes as searches, customer service, speech-to-text conversion, legal work, business reporting, and publishing articles. As their use becomes more common, the dangers of creating AI-based misinformation increase.

While some of these hallucinations lead to viral jokes, others result in severe negative consequences. Businesses will be under scrutiny and may suffer legal ramifications, customer dissatisfaction, or economic damages if their artificial intelligence systems produce false information. Some real-world examples that demonstrate how an AI hallucination occurs are listed below.

AI fabrications in a government contract report

One such case related to a document prepared by Deloitte for the Australian government, which was later discovered to contain fabrications in the form of fake citations and non-existent footnotes. This is obviously not a small problem; it makes the credibility of the report and the employment of generative AI in the government report questionable.

Following the detection of the issues with the document and the demand for a deeper look from a Sydney University academic, Deloitte admitted to using an AI technology in order to fill out some of the document’s blanks regarding traceability and citation.

However, despite the fact that all key recommendations remained the same, the event has certainly affected the credibility of the process. The fact that a paid expert opinion contains non-existent references generated by AI creates plenty of questions regarding reviews.

Struggling with math

While language models might seem confident in solving mathematical tasks, they do not perform calculations the way a traditional calculator does. In other words, large models will provide correct numbers depending on their training. As a result, it is possible that the answer will be incorrect while providing a plausible explanation.

More recent AI models have been developed to cope better with easy calculations and problems that only require one step. It becomes harder for such programs to solve more complex tasks where several steps are involved. Therefore, AI will provide the right explanation of some elements of the process, but end up with the wrong number.

The takeaway message from the above-mentioned examples is that AI can help with mathematical reasoning but should not be trusted with critical calculations.

Transcription tool fabricates text

Several sectors, including healthcare, have applied OpenAI’s speech-to-text model, Whisper. However, in multiple instances, the model has hallucinated content by generating words, phrases, or even complete sentences that were not present in the original audio clip.

In a medical setting, the implications could be grave for patients, as any errors in transcription during a consultation will impact the record-keeping of patients’ treatment plans and communication among the members of the healthcare team. Reported instances have seen false information regarding the patient’s race, threats, violence, or non-existent medical procedures.

Despite OpenAI advising against the use of Whisper for sensitive applications, many healthcare workers still utilize the service in their practice.

Chatbot’s wrong answer drops company shares

Google’s Bard chatbot created a major public issue when it gave an incorrect answer in a promotional video. The chatbot claimed that the James Webb Space Telescope had taken the first images of a planet outside our solar system. The statement was inaccurate, and the mistake quickly gained attention.

The error caused investor concern and contributed to a major decline in Alphabet’s market value. Beyond the financial impact, the incident showed how even a single incorrect AI-generated claim can damage confidence when presented publicly by a major technology company.

In response, Google introduced additional processes to check Bard’s answers for accuracy, safety, and grounding. This example shows why public-facing AI tools must be carefully tested before launch, especially when the brand’s credibility is at stake.

Support chatbot cites made-up company policy

One of the examples of legal issues arising from the usage of artificial intelligence in Air Canada is an incident where the company’s chatbot provided incorrect information about the company’s policy regarding airfare.

Initially, the company attempted to state that the chatbot is not associated with Air Canada and that the company cannot be held responsible for its answers. Nevertheless, the tribunal denied the arguments. As a result, Air Canada had to compensate for the information provided in the chatbot on the website.

The case above offers valuable lessons for companies introducing artificial intelligence into their customer service. Indeed, the chatbot represents the company. Therefore, any inaccurate information provided in its policy may lead to difficulties.

Bogus summer reading list

An instance of AI hallucination with respect to a newspaper involved the Chicago Sun-Times, where readers noticed issues with a “Summer Reading List.” Several books on the list seemed fictitious and belonged to real authors. Of the 15 books mentioned, five were real.

The fake book titles were accompanied by realistic descriptions, which initially made the reading list seem legitimate. It was reported by management that the contents originated from another publication house, which confirmed its use of AI to create the material.

Though the online version of the reading list was taken down, there were print versions that readers had already gotten hold of. Some were unhappy that they paid for content filled with false suggestions.

ChatGPT references nonexistent legal cases

Another example of AI hallucinations that was very popular among discussions included the instance of a U.S. attorney who employed ChatGPT to assist in preparing the court documents. The document cited non-existent legal precedents. When the adversary raised doubts about the validity of the citation, it turned out that the AI had fabricated the citations to cases.

The attorney confessed that he was unaware of ChatGPT’s ability to fabricate information and presumed that it operated like an established legal resource. The court took the matter into account, and a federal judge ordered new regulations obliging attorneys to certify the use of AI in their documents and to verify the truthfulness of the AI-produced material.

The case mentioned above demonstrates how dangerous the use of unreliable AI systems might be for professionals in the fields of law, finance, medicine, regulation, etc.

Smart search recommends putting glue on pizza

The Overview feature by Google’s AI technology has also been criticized for producing bizarre and inaccurate responses. For instance, there was one situation where the artificial intelligence recommended adding non-toxic glue to the pizza sauce so that the cheese would stick better.

This response seemed to stem from a wrong interpretation of online content, perhaps a sarcastic post or a forum discussion. While it was entertaining and a laughing matter, the question that remained was whether the artificial intelligence could be relied upon to differentiate between credible sources of information and unreliable ones.

Google later said they were making changes to its AI algorithm to solve this problem.

How to reduce AI hallucinations

It may not be possible to entirely avoid AI hallucinations; however, it can be reduced by designing, testing, and monitoring the AI system correctly. These techniques aim to develop an AI system that is reliable and rooted in credible information.

For instance, there is the concept of Retrieval-Augmented Generation (RAG). It refers to the process of using the AI system to retrieve information from reliable sources before generating its output. An illustration of this technique is in customer support, where the chatbot can retrieve information from the company’s policies before answering the question.

In addition, there is the use of curated datasets. In industries such as healthcare, finance, insurance, and legal services, the model can be trained or fine-tuned with credible industry-specific data.

Prompt design can also play an important part. Instructions may be clear enough to keep the machine limited to the context in which it is working. Prompts like “Use only the context you have been given” and “If you cannot find the answer, reply with ‘not found'” are among those that can minimize unverified statements.

Verification steps also matter. Companies can employ automated verification mechanisms, an LLM-as-a-judge solution, rule-based validation, and human verification processes to ensure that any incorrect results will not be produced for users. Human validation is required for high-stakes industries.

Lastly, businesses must continue evaluating and monitoring the AI solutions they implement. The performance of AI algorithms may depend on the inputs they receive from users and other sources. Monitoring will help the team spot and fix issues as soon as possible.

Test your AI app

These examples of AI hallucinations clearly illustrate the need for proper evaluation and monitoring of AI systems. Generative AI can indeed be extremely valuable, but this is possible only when it is accompanied by a robust process of evaluation and monitoring.

Blockchain Studioz assists companies in evaluating, monitoring, and testing various AI applications powered by LLMs, including chatbots, retrieval-augmented generation models, and other assistants. Its open-source library is widely used and includes features like built-in tests, flexible evaluation workflows, and LLM judges.

In addition, Blockchain Studioz Cloud offers a platform free of code where teams can work on ensuring high AI quality, generate datasets, perform evaluations, create LLM judges, and measure performance. This solution is perfect for businesses developing their own AI-based products.

Drawing lessons from previous mistakes made in developing AI solutions, companies will be able to create AI systems that are not only effective but also reliable and responsible.

Ankur Shrivastav
CEO and Co-Founder

Ankur is a veteran entrepreneur with over ten years of experience in creating successful web and app products for startups, small and medium enterprises, and large corporations. He has a strong passion for technology leadership and excels at building robust engineering teams.