- NTT Corporation introduces an upgraded version of tsuzumi, a lightweight LLM capable of understanding visual elements within documents.
- Collaboration with Professor Jun Suzuki led to breakthrough technique for integrating visual data seamlessly into tsuzumi’s processing capabilities.
- Tsuzumi incorporates handwriting recognition technology, enhancing its ability to assimilate diverse forms of information.
- Outperforms existing multimodal LLMs in 12 visual document understanding tasks.
- Identified primary use cases include customer and employee experience solutions, value chain transformation, and software engineering applications.
- Tsuzumi offers energy-efficient and cost-effective options with varying parameter sizes.
- NTT envisions organizations deploying multiple instances of tsuzumi, creating a “constellation” of specialized LLMs.
- Shift towards subject matter expertise reflects strategic focus on agility, efficiency, and knowledge preservation.
Main AI News:
At the 2024 Upgrade event, NTT Corporation unveiled the latest iteration of tsuzumi, its proprietary lightweight LLM. With this update, tsuzumi now boasts enhanced capabilities in comprehending visual elements embedded within documents, such as graphs, charts, and layouts. Currently undergoing testing, this technology is slated for public release later this year.
Kyosuke Nishida, a senior distinguished researcher at NTT, emphasized the advancements in LLMs, highlighting their increased proficiency in high-level natural language processing tasks, particularly with the emergence of multimodal models integrating vision and language. Despite these strides, challenges persist in understanding documents or computer screens containing both textual and visual information.
The significance of this development lies in the growing volume of data presented graphically, posing a challenge for existing LLM solutions. From organizational charts to instructional visuals, graphical representations often convey information more intuitively. However, extracting such data effectively has proven difficult for many LLMs. To address this challenge, NTT collaborated with Professor Jun Suzuki from Tohoku University’s Center for Data-driven Science and Artificial Intelligence to develop a groundbreaking technique. This innovation enables tsuzumi to perceive a page akin to the human eye, facilitating seamless integration of visual data into its processing capabilities.
Beyond images, tsuzumi incorporates handwriting recognition technology, catering to domains where handwritten documents remain prevalent, such as healthcare, law enforcement, and education. This comprehensive approach enhances the LLM’s capacity to assimilate diverse forms of information, augmenting its utility across various sectors.
In performance evaluations encompassing 12 visual document understanding tasks, NTT’s model surpassed existing multimodal LLMs, including LLaVA, GPT-3.5, and GPT-4, showcasing superior capabilities in tasks like information extraction and document classification.
In strategizing the deployment of tsuzumi, NTT identified four primary use cases:
- Customer experience solutions, including call center automation
- Employee experience solutions, streamlining manual tasks like data searching and reporting
- Value chain transformation for industries like life sciences and manufacturing
- Software engineering applications in systems and IT departments, facilitating development and automation processes
Positioned as an energy-efficient and cost-effective LLM, tsuzumi offers two versions: an ultra-lightweight variant with 600 million parameters and a lightweight version supporting 7 billion parameters. The compact size of these models translates to reduced resource requirements, making them economically viable for businesses. NTT’s focus on specialized, domain-specific LLMs underscores the importance of agility and efficiency in addressing targeted use cases.
By envisioning organizations deploying multiple instances of tsuzumi, each tailored to a specific subject matter expertise, NTT introduces the concept of a “constellation” of LLMs. This approach facilitates collaboration among specialized instances, fostering an AI-driven knowledge network within organizations.
NTT’s strategic shift towards subject matter expertise reflects a departure from conventional, broad-spectrum LLM approaches. By harnessing practical knowledge and experience, tsuzumi enables organizations to unlock valuable insights from their data repositories, safeguarding institutional knowledge and fostering innovation.
NTT’s unveiling of tsuzumi marks a significant milestone in the evolution of LLM technology, signaling a strategic pivot towards specialized, domain-centric solutions. The concept of a constellation of LLMs introduces new possibilities for collaborative knowledge sharing within organizations, paving the way for enhanced productivity and innovation.
As customers evaluate the potential of this paradigm shift, the adoption of highly focused LLMs promises to empower organizations to leverage their internal expertise and data assets more effectively. Moreover, the emergence of specialized LLMs challenges traditional vendor-centric models, emphasizing the importance of practical knowledge and contextual understanding in AI-driven solutions.
Conclusion:
NTT’s unveiling of the upgraded tsuzumi LLM marks a significant advancement in specialized AI solutions. By addressing the challenges of understanding visual data and prioritizing subject matter expertise, NTT sets a new standard for AI-driven knowledge networks within organizations. This strategic shift has profound implications for the market, emphasizing the importance of agility, efficiency, and practical knowledge in AI solutions, while challenging traditional vendor-centric models.