TL;DR:
- Giskard, a French startup, pioneers an open-source testing framework for large language models (LLMs).
- The framework assesses LLMs for biases, security vulnerabilities, and harmful content generation, which is crucial in the evolving AI regulatory landscape.
- As AI regulations like the EU’s AI Act come into effect, companies must demonstrate compliance, making ML testing systems a hot topic.
- Giskard’s Python library integrates seamlessly with LLM projects and offers compatibility with various ML ecosystem tools.
- It assists in creating comprehensive test suites covering performance, ethics, and regulatory aspects.
- Tests can be integrated into CI/CD pipelines, with automated reports for issue detection.
- Giskard tailors test to specific use cases, ensuring relevance and accuracy.
- The startup’s AI Quality Hub aids debugging and model comparisons with future plans for regulatory compliance documentation.
- LLMon, Giskard’s real-time monitoring tool, evaluates LLM responses for common issues.
- Giskard’s potential to alert developers on the misuse of LLMs enriched with external data positions it favorably in the regulatory landscape.
- With a team of 20, Giskard aims to become the leading LLM antivirus solution in the market.
Main AI News:
Giskard, the innovative French startup, is at the forefront of developing an open-source testing framework designed to evaluate large language models (LLMs) rigorously. In an era marked by the rising prominence of artificial intelligence (AI) models, Giskard’s framework stands as a critical tool, alerting developers to potential risks associated with biases, security vulnerabilities, and the generation of harmful or toxic content.
As the European Union’s AI Act looms on the horizon, along with similar regulatory initiatives in other nations, the spotlight on machine learning (ML) testing systems is intensifying. Companies venturing into AI model development must now demonstrate their adherence to a set of stringent rules and take proactive steps to mitigate risks, lest they incur substantial fines.
Giskard is a pioneer in this realm, proudly embracing the evolving landscape of AI regulation. It represents one of the first developer tools with a laser focus on efficient testing methodologies.
Alex Combessie, Giskard’s co-founder and CEO, draws on his experience at Dataiku, particularly in NLP model integration, to explain the driving force behind Giskard. “When I was in charge of testing,” he states, “I encountered numerous challenges in applying testing methodologies to practical use cases. Additionally, comparing the performance of various suppliers proved to be a daunting task.”
Giskard’s testing framework consists of three key components. Initially, the company offered an open-source Python library that seamlessly integrates with LLM projects, particularly retrieval-augmented generation (RAG) projects. This library has gained considerable popularity on GitHub and boasts compatibility with various ML ecosystem tools, including Hugging Face, MLFlow, Weights & Biases, PyTorch, Tensorflow, and Langchain.
Once integrated, Giskard assists in the creation of a comprehensive test suite to be regularly applied to your model. These tests encompass a wide array of critical areas, including performance assessment, detection of hallucinations, identification of misinformation, evaluation of non-factual output, detection of biases, prevention of data leakage, and assessment of harmful content generation and prompt injections.
As Combessie emphasizes, “There are several aspects to consider, from performance to ethics, which are now a matter of both brand image and regulatory compliance.“
Developers can seamlessly integrate these tests into their continuous integration and continuous delivery (CI/CD) pipeline, ensuring that tests run automatically with every new code iteration. In the event of any discrepancies or issues, developers receive a detailed scan report within their GitHub repository, streamlining the debugging process.
Giskard tailors its tests to match the specific use case of the model in question. For instance, companies engaged in RAG development can grant Giskard access to vector databases and knowledge repositories to ensure the relevance and accuracy of the test suite. This ensures that models like chatbots, providing information on climate change using LLMs, can be rigorously tested for misinformation, contradictions, and other critical factors.
Giskard’s second offering is an AI quality hub, a premium tool that facilitates the debugging of large language models and facilitates comparisons with other models. In the future, the startup aims to generate documentation that serves as evidence of regulatory compliance.
Combessie shares, “We’ve begun selling the AI Quality Hub to organizations such as Banque de France and L’Oréal, helping them identify and address errors. In the future, we plan to integrate all regulatory features into this hub.“
The third product in Giskard’s arsenal is “LLMon,” a real-time monitoring tool capable of evaluating LLM responses for common issues like toxicity, hallucination and fact-checking before delivering the response to the user. While currently designed for OpenAI’s APIs and LLMs, Giskard is actively working on integrations with platforms like Hugging Face and Anthropic.
The emerging regulatory landscape for AI models presents several challenges and uncertainties. It remains unclear whether the AI Act and similar regulations will primarily apply to foundational models from organizations like OpenAI, Anthropic, Mistral or exclusively to specific applied use cases.
In this context, Giskard emerges as a vital ally, poised to alert developers about the potential misuse of LLMs enriched with external data, a concept known as retrieval-augmented generation (RAG).
With a team of 20 dedicated professionals, Giskard is ready to expand its presence in the market. Combessie affirms, “We see a clear market fit with LLM customers, and we are committed to doubling the size of our team to become the leading LLM antivirus solution in the market.”
Conclusion:
Giskard’s innovative AI model testing framework, tailored to meet emerging regulatory requirements, positions it as a pivotal player in the market. As AI regulations gain traction, Giskard’s comprehensive tools and commitment to growth enable it to serve as a trusted ally for companies seeking to ensure compliance and ethical AI model development.