Researchers from Yandex and NeuralMagic unveil advanced compression techniques for large AI models on consumer devices

Yandex LLC and NeuralMagic Inc. have developed new compression methods for large language models (LLMs).
Two techniques, Additive Quantization for Language Models (AQLM) and PV-Tuning, allow LLMs to be reduced by up to eight times while retaining 95% response quality.
AQLM compresses model parameters to two or three bits, and PV-Tuning enhances fine-tuning and error correction.
The combined techniques enable “ultra-compact” LLMs with almost equivalent capabilities to full-sized models.
Compressed models operate up to four times faster and reduce hardware costs by two to six times.
The advancements support deployment on consumer devices, enabling applications like text generation, voice assistance, and real-time translation without internet access.
The research is featured at the 41st International Conference on Machine Learning in Vienna.
The techniques are available on GitHub, with pre-compressed models on HuggingFace.

Main AI News:

Artificial intelligence researchers from Yandex LLC and NeuralMagic Inc. have announced a breakthrough in compressing large language models (LLMs) like Meta Platforms Inc.’s Llama 2 for deployment on everyday devices, such as smartphones and smart speakers. In collaboration with the Institute of Science and Technology Austria and King Abdullah University of Science and Technology, the team has developed two novel compression techniques—Additive Quantization for Language Models (AQLM) and PV-Tuning.

These techniques enable LLMs to be reduced in size by up to eightfold while maintaining an average response quality of 95%. AQLM utilizes “additive quantization” to compress model parameters to just two or three bits, ensuring accuracy is preserved. PV-Tuning, on the other hand, is a representation-agnostic framework that enhances existing fine-tuning strategies and mitigates errors during compression.

Designed to work in conjunction, these methods allow for the creation of “ultra-compact” LLMs that offer nearly the same capabilities as their full-sized versions. The researchers emphasize that these techniques address the challenge of balancing model size and computational efficiency, a problem that has previously limited LLM deployment on consumer hardware.

The new methods, which are open source and detailed in an academic paper on arxiv.org, show promising results. Compressed versions of popular LLMs like Llama 2, Mistral, and Mixtral achieved 95% answer quality on benchmarks such as WikiText2 and C4, despite being reduced in size by eight times. Additionally, these compressed models operate up to four times faster due to fewer computational demands.

This advancement offers significant cost savings for companies developing proprietary and open-source LLMs. For instance, compressing the 13 billion-parameter Llama 2 model to run on a single GPU, rather than four, could reduce hardware costs by two to six times. More importantly, it enables the deployment of advanced LLMs on personal computers and smartphones, unlocking new applications such as text and image generation, voice assistance, and real-time translation without internet connectivity.

The research paper is featured at the 41st International Conference on Machine Learning in Vienna, Austria, from July 21-27. AQLM and PV-Tuning are available on GitHub, with pre-compressed versions of popular models accessible on HuggingFace.

Conclusion:

The development of AQLM and PV-Tuning represents a significant leap forward in the deployment of large language models on consumer devices. By drastically reducing model size while preserving performance and enhancing operational speed, these techniques offer a substantial reduction in hardware costs and open new opportunities for integrating advanced AI capabilities into everyday devices. This progress not only makes powerful AI more accessible but also drives innovation in consumer applications, potentially reshaping the market landscape by making sophisticated AI technology more ubiquitous and practical for a wide range of uses.

Source

OpenAI Fast-Tracks Release of New AI Model “Strawberry,” Focuses on Advanced Reasoning

Revolutionizing AI: Efficient Diffusion Models for High-Dimensional Data

Digital Dubai Partners with RIT Dubai to Advance AI Skills and Drive Digital Transformation

CAST AI Launches Enhanced Kubernetes Security Solution to Boost Runtime Threat Detection

Dubai’s AI Hub: Paving the Way for Global Technological Leadership

Glean Technologies Secures $260M in Series E Funding, Valued at $4.6B as Enterprise AI Adoption Grows

Dubai’s AI Hub: Paving the Way for Global Technological Leadership

AI’s Role in Transforming the Banking Industry

Fintech: The Future of Finance and Technology Careers

AI’s Impact on the Workforce: Risks, Opportunities, and the Path Forward

Ford’s Advanced Technologies Aim to Tackle Quality Issues and Boost Efficiency

Aifleet Secures $16.6M to Revolutionize Trucking Industry with AI Solutions

SiMa Technologies Advances Edge AI with High-Performance Multimodal Chip

Microsoft’s FPDT Breakthrough Extends Long-Context LLM Training Capabilities

Apple Intelligence: Will Delays Impact the iPhone 16’s Supercycle Potential?

AI’s Role in Defense: Opportunities and Challenges Ahead

JFrog and Nvidia Partner to Secure AI Models with New Runtime Security Solution

ServiceNow Unveils Advanced AI Features and Platform Enhancements to Boost Enterprise Productivity

Med-MoE: A Scalable AI Framework Revolutionizing Healthcare Efficiency

Deloitte Launches AI Factory as a Service, Partnering with NVIDIA and Oracle for Scalable AI Solutions

Vietnam’s AI Rise: A Path Toward Technological Independence

AI Unlocks Pig Communication: A Step Toward Better Animal Welfare

Abu Dhabi’s Sustainable Aquaculture Initiative: A New Approach to Marine Conservation and Economic Growth

Rising AI Demand Escalates Water Consumption in Data Centers, Poses Sustainability Concerns

Leaf: Modernizing Farm Data Management with Cutting-Edge Technology

Researchers from Yandex and NeuralMagic unveil advanced compression techniques for large AI models on consumer devices

Main AI News:

Conclusion:

Researchers from Yandex and NeuralMagic unveil advanced compression techniques for large AI models on consumer devices

Main AI News:

Conclusion:

Subscribe Now