Singapore’s Ethical Quest to Train Sea-Lion AI Model

  • AI Singapore prioritizes ethical data sourcing for Sea-Lion AI model, led by Leslie Teo.
  • Sea-Lion, part of the National Multimodal Large Language Model Programme (NMLP), integrates 11 regional languages.
  • Sea-Lion outperforms competitors in regional language understanding and context sensitivity.
  • Ethical data procurement involves meticulous evaluation and rejecting dubious sources.
  • Budget allocation for Sea-Lion development balances GPU costs and other project needs.
  • Future Sea-Lion iterations focus on safety enhancements and increased parameter count.

Main AI News:

AI Singapore is committed to upholding the highest ethical standards in the development of its AI model, Sea-Lion, ensuring that the data it utilizes is sourced ethically, asserts Leslie Teo, the Senior Director of AI Products at AI Singapore.

In a recent media briefing, Teo, alongside two team members, delved into the intricacies of their work on Sea-Lion, an open-source model designed to encompass the linguistic diversity of Southeast Asia, aptly named Southeast Asian Languages in One Network (Sea-Lion).

Last year, the Singaporean government allocated USD52 million towards establishing the National Multimodal Large Language Model Programme (NMLP), aimed at constructing the region’s inaugural large language model. Sea-Lion serves as the foundation for NMLP, having been trained on eleven regional languages, with plans to expand its capabilities into a multimodal speech-to-text model.

Sea-Lion’s Distinct Advantages

Teo and his team showcased Sea-Lion’s unique prowess by comparing it against leading language models such as Meta’s Llama 2, Alibaba’s SeaLLM, and OpenAI’s GPT-4 Turbo. Sea-Lion excelled particularly in regional languages like Bahasa Indonesia, Thai, and Tamil, offering contextually relevant insights tailored to local sensitivities and realities.

In contrast, competitors like Llama 2 and SeaLLM often faltered, hesitating to address sensitive topics, providing generic responses, or dispensing ill-advised counsel. This discrepancy can be attributed to Sea-Lion’s comprehensive training data, with a significant portion originating from Southeast Asia, unlike its counterparts.

Ethical Data Procurement

Amidst recent accusations against tech giants for unscrupulous data practices, AI Singapore remains steadfast in its commitment to ethical data sourcing. Teo emphasized the team’s meticulous approach to data evaluation and cleansing, rejecting offers from data brokers peddling data of dubious origin.

Teo stressed the importance of adhering to ethical standards, even if it means compromising the model’s performance. Despite the challenges, regional partners have expressed interest in contributing data, inspired by AI Singapore’s ethical framework.

Budget-Conscious Training

Teo clarified that while a portion of the allocated funds for NMLP will cover GPU expenses, the majority will be allocated to other project components. Nonetheless, the budget is deemed sufficient for the development of the next two iterations of Sea-Lion.

With a focus on cost-efficiency, Teo anticipates leveraging declining GPU costs and a motivated team compensated at “academic rates” to mitigate expenses. The forthcoming iterations of Sea-Lion, boasting 13 billion and 30 billion parameters respectively, will prioritize safety enhancements for public deployment.

Looking Ahead

Teo views the evolution of Sea-Lion as a pivotal endeavor to bridge representation gaps and establish Singapore as a prominent player in AI innovation on the global stage. Upholding a steadfast commitment to their mission, Teo and his team remain resolute in their pursuit of ethical excellence and technological advancement.

Conclusion:

The ethical advancements demonstrated by AI Singapore’s Sea-Lion AI model signify a paradigm shift in the industry’s approach to data sourcing and model development. By prioritizing ethical considerations, Sea-Lion not only sets new standards for AI innovation but also enhances trust and credibility in the market. This commitment to ethical excellence positions Singapore as a leader in responsible AI development, paving the way for future advancements in the global AI landscape.

Source