New AI Evaluation Tools Launched by UK Agency

  • UK’s AI Safety Institute introduces Inspect, an open-source toolkit for AI evaluation.
  • Inspect assesses core knowledge and reasoning abilities of AI models, facilitating industry, research, and academia.
  • Components include datasets, solvers, and scorers, with extensibility via Python packages.
  • Praised by experts for its potential in enhancing AI accountability and model testing.
  • US counterpart, NIST GenAI, also focuses on assessing generative AI technologies.
  • UK and US collaborate to advance AI model testing, aiming for comprehensive risk evaluation.

Main AI News:

In a bid to fortify AI safety measures, the UK’s AI Safety Institute has unveiled a suite of tools dubbed Inspect. This toolset, released under the MIT License, aims to streamline AI evaluations for industry players, research institutions, and academia. Inspect evaluates AI models on core knowledge, reasoning abilities, and generates comprehensive scores. Notably, this marks the debut of a state-backed AI safety testing platform for broader application.

Ian Hogarth, Chair of the AI Safety Institute, emphasized the importance of collaborative efforts in AI safety testing. He stated, “Successful collaboration on AI safety testing means having a shared, accessible approach to evaluations, and we hope Inspect can be a building block.” The institute envisions global adoption of Inspect to enhance model safety testing and foster iterative improvements through its open-source nature.

Inspect’s framework comprises three primary components: datasets, solvers, and scorers. Datasets provide evaluation samples, solvers execute tests, and scorers assess solver performance while aggregating test scores into meaningful metrics. The extensibility of Inspect allows integration with new testing techniques through third-party Python packages.

Deborah Raj, a research fellow at Mozilla and renowned AI ethicist, praised Inspect as a testament to the potential of public investment in open-source AI accountability tools. Meanwhile, Clément Delangue, CEO of AI startup Hugging Face, proposed integrating Inspect with Hugging Face’s model library or establishing a public leaderboard showcasing evaluation results.

This initiative follows the recent launch of NIST GenAI by the US National Institute of Standards and Technology (NIST), focusing on assessing various generative AI technologies. NIST GenAI aims to introduce benchmarks, develop content authenticity detection systems, and combat fake or misleading AI-generated information.

The UK and US have joined forces to advance AI model testing, building on commitments made during the UK’s AI Safety Summit in Bletchley Park last November. This collaboration includes the forthcoming launch of a US AI safety institute tasked with evaluating AI and generative AI risks on a broad scale.


The release of Inspect by the UK’s AI Safety Institute signifies a crucial step forward in standardizing AI evaluation processes. Its open-source nature fosters collaboration and innovation across industries, academia, and research institutions. Moreover, the collaboration between the UK and US in advancing AI model testing underscores the global commitment to ensuring AI safety and accountability. This development suggests a growing demand for reliable AI evaluation tools in the market, presenting opportunities for businesses to innovate and contribute to the enhancement of AI safety measures.