Research Reveals Limitations in SEC Filing Analysis by GPT and AI Models

TL;DR:

Large AI language models, including GPT-4-Turbo, struggle with analyzing SEC filings, per Patronus AI.
Even with access to complete filings, GPT-4-Turbo answered only 79% of questions correctly.
Challenges include refusals to answer and generating inaccurate data, raising concerns for automation in finance.
Incorporating non-deterministic AI models into regulated industries requires rigorous testing and human oversight.
Patronus AI’s FinanceBench dataset sets a performance standard for language AI in finance.
Despite challenges, there is potential for AI to aid finance professionals but with ongoing human involvement.

Main AI News:

In a recent study conducted by Patronus AI, it has been revealed that large language models, akin to the one that powers ChatGPT, face significant challenges when it comes to analyzing Securities and Exchange Commission (SEC) filings. Even the most advanced AI model tested, OpenAI’s GPT-4-Turbo, when provided with the entirety of an SEC filing along with a related question, managed to answer only 79% of the questions correctly in Patronus AI’s new test.

The issues encountered ranged from outright refusal to answer questions to the creation of inaccurate information not present in the SEC filings, a phenomenon referred to as “hallucination.” Anand Kannappan, Co-founder of Patronus AI, expressed his concerns, stating, “That type of performance rate is just absolutely unacceptable. It has to be much, much higher for it to really work in an automated and production-ready way.”

These findings underscore the challenges faced by AI models, especially in industries subject to rigorous regulation, such as finance, where integrating cutting-edge technology for tasks like customer service and research is a priority. One of the most promising applications for AI in finance is the ability to swiftly extract critical financial data and analyze narratives, and SEC filings are a treasure trove of such information.

In recent times, major players like Bloomberg LP, business school professors, and JPMorgan have all ventured into the domain of AI-driven financial solutions. However, GPT’s entry into this industry hasn’t been without its hiccups. For instance, when Microsoft launched Bing Chat using OpenAI’s GPT, it presented an example where the chatbot summarized an earnings press release, but astute observers quickly spotted inaccuracies and fabricated numbers in the provided content.

One of the main challenges in incorporating Large Language Models (LLMs) like GPT into real-world applications is their inherent non-deterministic nature. LLMs do not guarantee consistent output for the same input, necessitating rigorous testing to ensure their correct operation, relevance, and reliability.

The Co-founders of Patronus AI, who previously worked at Meta (Facebook’s parent company) on AI-related problems, established their startup to automate LLM testing. They aim to provide companies with the assurance that their AI bots won’t provide off-topic or incorrect responses, thus ensuring a more responsible AI deployment.

To create a robust testing dataset, Patronus AI compiled over 10,000 questions and answers derived from SEC filings of major publicly traded companies, called FinanceBench. This dataset not only contains correct answers but also specifies where in the filing to locate them. Some questions even require light mathematical or reasoning skills, setting a “minimum performance standard” for language AI in the financial sector.

Here are a few sample questions from the dataset provided by Patronus AI:

Has CVS Health distributed dividends to common shareholders in Q2 of FY2022?
Did AMD disclose customer concentration in FY22?
What is Coca-Cola’s FY2021 COGS % margin? Calculate it using the line items clearly shown in the income statement.

Patronus AI evaluated four language models: OpenAI’s GPT-4 and GPT-4-Turbo, Anthropic’s Claude 2, and Meta’s Llama 2, using a subset of 150 questions. Different configurations and prompts were tested, including “Oracle” mode, where models were given the exact source text for the answer. GPT-4-Turbo, for instance, answered correctly in “Oracle” mode 85% of the time but still provided incorrect responses 15% of the time, illustrating the difficulty of automating the precise task of locating information in filings.

Llama 2, developed by Meta, struggled with “hallucinations,” generating incorrect answers 70% of the time and correct answers only 19% of the time when provided access to underlying documents. Anthropic’s Claude 2 performed well when given “long context,” answering 75% of the questions correctly.

Even when models performed relatively well, Patronus AI concluded that their accuracy wasn’t sufficient, especially in regulated industries where even a 5% error rate is unacceptable. Despite the challenges, the Co-founders of Patronus AI remain optimistic about the potential of language models like GPT in the finance industry, believing that continuous improvement will eventually lead to automated solutions. However, they acknowledge that, for now, human oversight remains essential for ensuring the accuracy and reliability of AI-driven workflows in finance.

OpenAI, in response to these findings, has emphasized the importance of adhering to usage guidelines, especially in financial applications, where qualified human review is necessary, and clear disclaimers regarding AI usage and its limitations are mandated.

Conclusion:

The revelation that even advanced AI models face difficulties in analyzing SEC filings highlights the challenges in integrating AI into the finance industry. The inability of AI models to consistently provide accurate answers, coupled with their non-deterministic nature, underscores the need for extensive testing and human supervision in regulated sectors like finance. While AI holds promise for aiding financial professionals, its current limitations call for a cautious approach to automation in this market.

Source

Innovative Strategy for Enhanced Efficiency in Large Language Model Training: Introducing COLLAGE

A Survey Report on Novel Approaches to Combat Hallucination in Multimodal Large Language Models

The Rise of AI Voice Agents in Call Centers: A Retell AI Perspective

Unveiling the Risks of LLM-generated Code: Insights from Backslash Security

Forging Multinational AI Frontiers: Upstage and Flitto’s Collaborative Leap

Report: Technical SEOs Embrace AI Amid Job Security Concerns

TranscendAP Ventures into the Future of AI-Powered Accounts Payable Automation for Enterprises

ReSource Pro Debuts Cutting-Edge AI-Powered Policy Validation Service

The Adecco Group’s report highlights a prevalent preference for hiring external talent over upskilling existing employees for AI adoption

FinVolution Set to Host 9th Global Data Science Competition, Emphasizing Deepfake Speech Detection in AI Age

The US Army is close to issuing new directives to regulate the use of large language models (LLMs) and generative artificial intelligence

Thailand’s Expanding Initiatives in AI and Electric Vehicles Garner Business Interest

US Marine Forces Special Operations Command (MARSOC) evaluating Ghost Robotics’ robotic quadrupeds

North Korea’s military unveiled initiative aimed at harnessing the power of AI technology for national defense

Xtend Secures $40M Funding Round to Strengthen Defense Capabilities

Recent study shows machine learning makes low-power MRI more affordable and safer

Unveiling the Risks of LLM-generated Code: Insights from Backslash Security

Researchers employ AI to accurately identify tumor origins in cancers with unknown primary sites

Research: AI Competes with Physicians in Emergency Triage

FinVolution Set to Host 9th Global Data Science Competition, Emphasizing Deepfake Speech Detection in AI Age

Food tech innovator, Hungryroot, leverages AI to combat food waste

Advancing Wildlife Conservation: AI Empowers Marbled Murrelet Monitoring

AI-Driven Maps Validate Low Phosphorus Levels in Amazonian Soil

Driving Efficiency and Sustainability: Globe’s AI-Powered Energy Management System

umgrauemeio: Pioneering AI-Powered Environmental Innovation with $3.6 Million Funding Round

Research Reveals Limitations in SEC Filing Analysis by GPT and AI Models

TL;DR:

Main AI News:

Conclusion:

Research Reveals Limitations in SEC Filing Analysis by GPT and AI Models

TL;DR:

Main AI News:

Conclusion:

Subscribe Now