TL;DR:
- Crater Labs collaborates with organizations to scale AI/ML projects for long-term impact.
- The lab’s shift from cloud to on-premise infrastructure, driven by data demands, reduced costs and increased efficiency.
- Pure Storage’s FlashBlade technology accelerated data processing, reducing project times tenfold.
- Large Language Models (LLMs) present new opportunities and challenges for Crater Labs.
- Pruning LLMs for industry-specific applications generates more data, necessitating high parallelism in storage.
- Crater Labs’ partnership with Pure Storage positions them to lead the AI market with unmatched velocity and adaptability.
Main AI News:
In the rapidly evolving landscape of artificial intelligence and machine learning, Crater Labs has emerged as a game-changer. This Canadian research lab, comprising a dynamic team of academic researchers, application developers, and seasoned business executives, specializes in collaborating with organizations that are already familiar with AI/ML models but seek assistance in scaling these solutions to tackle complex, future-defining challenges.
Crater Labs proudly champions the motto, ‘moonshot with impact!’. In practical terms, this translates into real-world AI/ML projects geared toward solving monumental problems. Yet, the path to solving these grand challenges often entails tremendous demands on technology infrastructure, particularly in the realm of storage, given the constant data generation associated with model retraining and pruning.
Khalid Eidoo, CTO of Crater Labs, eloquently describes AI/ML models as “living organisms.” Their biases evolve, their accuracy fluctuates, and their adaptability to various variables changes over time as they ingest more data. This is where Crater Labs steps in to support its clients.
Take, for example, the tier one OEM automotive manufacturers. They’ve mastered the art of producing high-quality products over decades, understanding precisely where failures might occur. Their QA teams, augmented by AI, have deep knowledge of their processes. However, when these manufacturers embark on designing radically innovative components, the landscape shifts. The existing AI models struggle to identify defects in the new designs. Crater Labs bridges this gap by developing advanced models capable of synthesizing defects within the evolving design landscape.
Initially, Crater Labs operated in the cloud, leveraging standard cloud services from AWS and Google Cloud. While this worked well in many respects, concerns arose among clients regarding data hosting, hosting methods, and the escalating costs of managing ever-expanding models. Eidoo underscores that while cloud storage appears economical, this holds true primarily for cold or semi-warm storage configurations. In the AI/ML realm, data must be instantly accessible—a ‘hot’ commodity, if you will.
This realization prompted a pivotal shift for Crater Labs. Recognizing the paramount importance of storage, especially for clients working with terabytes of data, coupled with stringent geographic and security requisites, Crater Labs decided to establish on-premise infrastructure. This move allowed the lab to shield its clients from mounting storage costs.
Eidoo elucidates the challenges further: a one terabyte dataset can swiftly balloon to several terabytes during preprocessing. For every terabyte training run, Crater Labs generates 50% to 75% new data. With numerous training runs for large clients, the importance of robust storage infrastructure becomes unmistakable. Even without security and privacy constraints, cloud-based operations for these endeavors remain infeasible.
Elevating Efficiency and Reducing Costs: The Pure Storage Solution
In the realm of research, failure is an intrinsic part of the process. Crater Labs endeavors to equip its clients for success, but there are always costs associated with identifying what doesn’t work. To mitigate these expenses, Crater Labs explored various storage solutions, including servers with extensive storage volumes. However, these solutions were ill-suited for the distributed, GPU-intensive environment Crater Labs required.
The breakthrough came when Crater Labs adopted Pure Storage’s FlashBlade technology. Eidoo highlights FlashBlade’s capacity to handle highly unstructured data with lightning-fast access. Equally impressive is its rapid data transfer capability to Crater Labs’ servers over extended periods.
Implementing Pure Storage was remarkably seamless, thanks to the vendor’s high-performance transfer tools. Crater Labs swiftly migrated its data and was up and running within hours. Astoundingly, within just fifteen minutes of operation, the organization had a new model running.
Crater Labs conducted comparative assessments to quantify the benefits of their choice. One compelling case involved data preprocessing in a database, with databases initially sized at 500 gigabytes expanding to three terabytes after various queries and data preprocessing. In the cloud, this process took 72 to 96 hours, even with ample computational resources. In contrast, with FlashBlade, Crater Labs completed the task in under 10 hours, using fewer compute nodes. This tenfold acceleration underscores the transformative impact of Pure Storage.
However, Crater Labs recognizes that it’s not just about speed. The cloud’s appeal lies in its ease of use, and Crater Labs sought to replicate this convenience. FlashBlade’s self-sufficiency perfectly aligned with these goals. Whether spinning up containers or orchestrating projects, FlashBlade’s reliability simplified operations across the board.
The Era of Large Language Models (LLMs)
As the AI landscape continues to evolve, Crater Labs is embracing the advent of Large Language Models (LLMs) head-on. With the introduction of models like ChatGPT, organizations are eager to harness the power of LLMs to enhance their operations and gain critical business insights.
LLMs, with their extensive training on vast datasets, present unique challenges and opportunities. Crater Labs’ clients are increasingly interested in deploying LLMs to analyze industry-specific information and sentiments. To achieve this, Crater Labs focuses on pruning and customizing models to align them with clients’ industries.
Pruning involves creating highly specialized models from larger ones, a process that surprisingly requires even more data. As nodes are removed from the original large model, new nodes are introduced, necessitating extensive validation. This iterative approach generates a wealth of additional data.
Crater Labs’ storage requirements are evolving accordingly. They now require high parallelism to manage the unstructured data that characterizes language models. With documents of varying sizes, a flexible storage system is essential to feed data to training models quickly, maintaining both access speed and throughput.
The demand for specialized language models across diverse industries is growing rapidly, necessitating the ability to train multiple models in parallel with high efficiency. Crater Labs has found the perfect ally in FlashBlade, delivering the parallelism, speed, and ease of use required to propel their projects forward at an unprecedented velocity.
Conclusion:
Crater Labs’ strategic transition to on-premise infrastructure, powered by Pure Storage, has not only optimized their AI projects but also positioned them at the forefront of the evolving AI market. Their agility in harnessing Large Language Models (LLMs) and the ability to efficiently customize them for industry-specific applications showcase their readiness to drive innovation and shape the future of AI-powered business solutions.