TL;DR:
- FriendliAI introduces PeriFlow Cloud, a powerful platform for generative AI serving in a managed cloud environment.
- PeriFlow engine achieves remarkable improvements in throughput and low latency for large language models (LLMs).
- Supports a wide range of LLMs, diverse decoding options, and multiple data types to optimize precision and speed.
- PeriFlow Container gains popularity for efficient LLM serving, reducing infrastructure costs.
- PeriFlow Cloud offers exceptional speed at reduced costs (70~90% GPU savings) with centralized management and monitoring.
- Founder & CEO, Byung-Gon Chun, emphasizes the importance of efficient LLM serving and innovation in generative AI.
Main AI News:
FriendliAI, a leading company in the field of generative AI engines, is thrilled to introduce the public beta release of PeriFlow Cloud—a cutting-edge platform that empowers users to utilize the PeriFlow engine within a managed cloud environment. With a laser-focused approach tailored to large language models (LLMs), the PeriFlow engine showcases remarkable enhancements in throughput while maintaining impressively low latency. Leveraging FriendliAI’s pioneering batching and scheduling techniques, which are safeguarded by patents in the United States and Korea, including U.S. Patent No. 11,514,370, U.S. Patent No. 11,442,775, Korean Patent No. 10-2498595, and Korean Patent No. 10-2479264, PeriFlow is set to revolutionize the field of generative AI serving.
PeriFlow: The Fast and Versatile Solution
With PeriFlow, businesses now have access to a fast and versatile engine that is already attracting a growing number of companies developing their own LLMs through pretraining or fine-tuning open-source LLMs. PeriFlow proudly supports a broad range of LLMs, including GPT, GPT-J, GPT-NeoX, MPT, LLaMA, Dolly, OPT, BLOOM, T5, FLAN, UL2, and many more, offering diverse decoding options such as greedy, top-k, top-p, beam search, and stochastic beam search. Furthermore, it boasts compatibility with multiple data types, including fp32, fp16, bf16, and int8, allowing users to optimize the delicate balance between precision and speed—a crucial factor in the AI landscape.
PeriFlow Container: The Choice for LLM Serving
FriendliAI also extends the convenience of PeriFlow as a container solution—PeriFlow Container. This innovative solution has gained considerable traction among companies for LLM serving, as it enables them to efficiently manage multiple LLMs, including popular ones like Luda 2.0. For instance, Scatter Lab, a prominent social chatbot company in Korea, has successfully optimized its high user traffic by leveraging PeriFlow Container, leading to a remarkable 50% reduction in infrastructure costs related to serving.
The Unbeatable Advantages of PeriFlow Cloud
PeriFlow Cloud is the key to effortless and seamless adoption of PeriFlow for organizations of any scale. Users can now enjoy unparalleled speed at significantly reduced costs, with the potential for 70~90% GPU savings in LLM serving. The platform eliminates the hassle of cloud resource setup and management, making it an ideal choice for businesses seeking maximum efficiency and productivity.
Centralized Management and Monitoring
Through PeriFlow Cloud, users can centrally manage all deployed LLMs from anywhere. This empowering feature enables effortless model checkpoint uploads, model deployment, and instantaneous inference request handling. Furthermore, comprehensive monitoring tools facilitate the tracking of events, errors, and performance metrics, while interactive testing capabilities allow users to fine-tune their deployed LLMs in the playground. The platform’s dynamic performance and fault issue handling, combined with intelligent auto-scaling based on traffic patterns, ensure a smooth user experience and liberate organizations to focus on driving innovation through LLM development.
A Visionary CEO’s Insight
Byung-Gon Chun, Founder & CEO of FriendliAI, emphasizes the significance of efficient LLM serving in this era of revolutionizing generative AI. According to Chun, many organizations are now training their own LLMs without fully understanding the costly and painful implications of serving these models at scale to a large user base.
PeriFlow Cloud: A Cost-Effective Solution
Chun further affirms that a significant transformation in LLM serving is long overdue, and PeriFlow Cloud emerges as the instant and cost-effective remedy to the prevailing challenges. FriendliAI eagerly awaits the innovative services and products that businesses will develop with their generative AI models, powered by the remarkable PeriFlow Cloud.
Seize the Opportunity: PeriFlow Cloud’s Public Beta is Now Live!
The public beta version of PeriFlow Cloud is available for users to deploy their large language models (LLMs) and experience the blazing-fast generative AI inference serving engine—PeriFlow—in a matter of minutes.
Conclusion:
The introduction of PeriFlow Cloud by FriendliAI signifies a significant advancement in the field of generative AI serving. Its high throughput and low latency capabilities, combined with support for various LLMs and decoding options, make it a versatile and cost-effective solution for businesses seeking to optimize AI deployment. With centralized management and monitoring features, PeriFlow Cloud empowers organizations to efficiently handle their AI models, leading to improved productivity and reduced infrastructure costs. As the market continues to evolve with AI-driven innovations, PeriFlow Cloud’s public beta release provides businesses with a valuable opportunity to leverage powerful AI inference serving capabilities.