Alibaba Cloud Introduces Tongyi Tingwu: Empowering Audio and Video with AI Capabilities


  • Alibaba Cloud’s Tongyi Tingwu, an AI model focused on audio and video, is now in beta testing.
  • Tongyi Tingwu offers real-time transcription, retrieval, summarization, and sorting of multimedia content.
  • It can automatically take notes, organize interviews, and extract slides.
  • The model is an evolution of Alibaba Cloud’s internal tool called Tingwu, which provides audio transcription for project meetings.
  • Tongyi Tingwu supports language translation, speech summarization, and question review, along with enhanced video features.
  • The training of Tongyi Qianwen, the predecessor, relies more on basic knowledge than user data.
  • Alibaba Cloud plans to integrate Tongyi Qianwen into its apps and make them accessible to corporate users.

Main AI News:

Alibaba Group Holding’s cloud computing division, Alibaba Cloud, has recently initiated beta testing for Tongyi Tingwu, an innovative artificial intelligence model with a primary focus on audio and video applications. This move comes shortly after the successful launch of Tongyi Qianwen, a large language model (LLM) by the company.

Tongyi Tingwu showcases remarkable capabilities in real-time transcription, retrieval, summarization, and sorting of audio and video content, as demonstrated during its preview. As an LLM, it possesses the unique ability to automatically take notes, organize interviews, and extract pertinent information from various multimedia sources.

The predecessor of Tongyi Tingwu, known as Tingwu, was originally an internal tool developed by Alibaba Cloud. Primarily used for audio transcription during project meetings within departments such as investment and human resources, Tingwu’s potential was further explored in recent times, according to Yan Zhijie, the head of technology for Tingwu, as reported by Yicai Global.

With Tongyi Tingwu, users can now benefit from its real-time recording capabilities, language translation, speech summarization, and question review functionalities, as highlighted by Zhou Jingren, the Chief Technology Officer of Alibaba Cloud. Additionally, Tongyi Tingwu provides an enhanced video experience by supporting chapter overviews, seamless imports and uploads from cloud and local disks, as well as comprehensive text summarization.

Addressing concerns surrounding data security and privacy associated with LLMs, Zhou emphasized that the training of Tongyi Qianwen does not rely heavily on user data but instead leverages fundamental knowledge. He stated, “Today’s Tongyi Tingwu combines foundational knowledge with meeting and video scenarios, employing technical models to assist in the generation of summaries, translations, and content extraction.”

In a significant development, on April 11, Alibaba Cloud announced its decision to integrate Tongyi Qianwen into all its apps and grant corporate users access to this powerful LLM. Zhou highlighted the company’s commitment to making Tongyi Qianwen more accessible and cost-effective through a series of optimizations at the cloud’s underlying infrastructure, aimed at serving the larger community.


The introduction of Alibaba Cloud’s Tongyi Tingwu represents a significant advancement in audio and video AI capabilities. The model’s real-time transcription, retrieval, summarization, and organization features, coupled with its language translation and enhanced video functionalities, hold great potential for various industries. By leveraging fundamental knowledge instead of heavy reliance on user data, Alibaba Cloud addresses concerns about data security and privacy.

The integration of Tongyi Qianwen into the company’s apps and its availability to corporate users further expands the reach and impact of this powerful AI model. This development signals Alibaba Cloud’s commitment to driving innovation and productivity in the market, paving the way for new possibilities in audio and video processing.
