Iris.ai, an Oslo-based startup, has introduced a breakthrough solution to reduce AI hallucinations

TL;DR:

  • Iris.ai, a leading AI startup, has unveiled a solution to combat AI hallucinations.
  • AI hallucinations involve AI systems generating false information as if it were true, causing various problems.
  • Iris.ai’s innovative approach involves knowledge graphs and validation techniques to ensure factual correctness.
  • Integration of this technology has reduced AI hallucinations significantly.
  • Challenges remain when applying this approach to popular large language models (LLMs).
  • Microsoft’s Phi-1.5 model and training on coding language are proposed solutions to mitigate AI hallucinations.
  • Collaboration with LLM-makers is expected to further reduce AI hallucinations.

Main AI News:

In the world of artificial intelligence, a concerning issue has long plagued the landscape—AI hallucinations. These instances involve AI systems generating false information while confidently presenting it as accurate, resulting in a myriad of adverse effects. From limiting the true potential of AI to causing tangible harm in real-world scenarios, the consequences are far-reaching.

As generative AI becomes increasingly mainstream, the urgency to address this predicament has grown louder. European researchers have taken up the challenge, tirelessly experimenting with potential remedies. Just last week, a promising solution emerged from the labs—a breakthrough that could reduce AI hallucinations to single-digit percentages.

Enter Iris.ai, a forward-thinking startup based in Oslo. Founded in 2015, this innovative company has developed an AI engine tailored for comprehending scientific text. It delves into vast repositories of research data, analyzing, categorizing, and summarizing it effectively. Among its satisfied clientele is the Finnish Food Authority, which utilized Iris.ai’s platform to expedite research related to a potential avian flu crisis, ultimately saving 75% of researchers’ valuable time.

However, what doesn’t save time is AI’s penchant for hallucinations. Large language models (LLMs) often generate nonsensical and erroneous information, leading to reputational damage and even potential harm. For instance, during the launch demonstration of Microsoft Bing AI, the system produced an error-laden analysis of Gap’s earnings report. In more dire scenarios, chatbots like ChatGPT can spout dangerous medical recommendations, posing a threat to software developers.

The challenge lies in distinguishing these AI hallucinations from factually valid generated text. Victor Botev, CTO of Iris.ai, highlights the complexity, stating, “Unfortunately, LLMs are so adept at phrasing that it is hard to distinguish hallucinations from factually valid generated text. If this issue is not overcome, users of models will have to dedicate more resources to validating outputs rather than generating them.”

These hallucinations also erode trust in AI within the research community. According to an Iris.ai survey of 500 corporate R&D professionals, only 22% of respondents trust systems like ChatGPT, yet a staggering 84% still rely on it as their primary AI tool for research support—a concerning paradox.

To combat these challenges, Iris.ai employs several techniques to measure the accuracy of AI outputs. The most crucial involves validating factual correctness. By mapping key knowledge concepts expected in a correct answer and checking if they come from reliable sources, they establish a benchmark for factual accuracy. Additionally, a proprietary metric called WISDM evaluates semantic similarity to a verified “ground truth,” covering topics, structure, and key information. Another method ensures coherence by incorporating relevant subjects, data, and sources.

Iris.ai’s approach simplifies the verification process by employing knowledge graphs, which illustrate data relationships and the steps a language model takes to arrive at its outputs. This structure allows for the identification and resolution of problems, potentially enabling models to identify and correct their own mistakes, producing coherent and factually accurate answers.

The integration of this technology into a new Chat feature within Iris.ai’s Researcher Workspace platform has shown promising results, significantly reducing AI hallucinations.

However, the challenges persist, particularly when applying this approach to popular LLMs. Botev acknowledges that user knowledge plays a pivotal role. When users lack expertise in a subject, they may misinterpret AI results, leading to self-misdiagnosis and misinformation online.

Addressing the root cause of AI hallucinations, Microsoft has introduced the Phi-1.5 model, pre-trained on “textbook quality” data, aiming to reduce instances of hallucination. Another method involves training models on coding language, emphasizing reason over interpretation and potentially guiding LLMs to factually accurate answers.

Despite its limitations, Iris.ai’s method represents a significant step forward. By employing the knowledge graph structure, transparency and explainability can be injected into AI systems, ultimately leading to the identification and resolution of hallucinations.

Botev expresses optimism about ongoing collaborations with LLM-makers to build larger datasets, infer knowledge graphs from texts, and develop self-assessment metrics. With these advancements, the future holds the promise of further reductions in AI hallucinations.

For Botev, the work carries profound significance, rooted in trust. “It is, to a large extent, a matter of trust,” he asserts. “How can users capitalize on the benefits of AI if they don’t trust the model they’re using to give accurate responses?” The journey to conquering AI hallucinations may well be the key to unlocking the full potential of artificial intelligence.

Conclusion:

Iris.ai’s groundbreaking approach to combat AI hallucinations holds great promise for enhancing the trust and reliability of AI systems in various industries. As AI becomes more integral to businesses, addressing this issue is crucial for ensuring accurate and dependable AI-generated information, which, in turn, will contribute to the continued growth and adoption of AI technologies in the market.

Source