Spain’s Venture into Open-Source Language Models in Regional Languages 

  • Spain embarks on creating an open-source large language model (LLM) in Spanish and regional languages.
  • The project, announced at the Mobile World Congress, aims for inclusivity across Spanish-speaking nations.
  • Collaboration involves public and private entities to foster linguistic diversity and empower AI startups.
  • Development led by Barcelona Supercomputer Center with support from academia and industry.
  • The initiative builds on existing projects, gathering linguistic data from diverse regions across Spain.
  • BSC data, devoid of licensed content, is accessible to companies for model enhancement.
  • BSC LLM promises to elevate accuracy and fluency, benefiting AI startups in Spanish-speaking regions.

Main AI News:

Spain has set its sights on creating an open-source large language model (LLM) proficient in Spanish (Castellano), Basque, Catalan, Galician, and Valencian. This ambitious endeavor was announced by Spanish Prime Minister Pedro Sánchez at the ongoing Mobile World Congress in Barcelona. The collaborative effort will engage various public and private entities.

Sánchez emphasized the inclusivity of the project, inviting Latin American countries to participate in the LLM’s training. The aim is to ensure its applicability across Spanish-speaking nations. This move is not just about linguistic diversity; it’s also a strategic step to empower Spanish AI startups to tap into the expansive markets of Latin America and the Spanish-speaking communities in the US.

According to Carlos KiK, the CTO and co-founder of AiMA Beyond AI startup in Barcelona, this initiative is crucial for Spanish tech firms to remain competitive amidst the dominance of American tech giants. KiK highlights the urgency, warning against the potential imposition of foreign language models if Spain fails to act swiftly.

The development of the LLM will be a collaborative effort between public and private entities, including the Barcelona Supercomputer Center (BSC), the Spanish Supercomputing Network, the Royal Spanish Academy, and the Association of Spanish Language Academies. These organizations aim to uphold the integrity of the Spanish language worldwide.

Albert Cañigueral, the tech transfer director for AI and language technology projects at BSC, envisions the LLM to rival OpenAI’s GPT-3 model in capabilities. If all goes as planned, the LLM could be launched as early as summer, leveraging the computational power of MareNostrum 5, one of the world’s top supercomputers.

Catalyzing Industry Advancement 

The initiative builds upon existing projects at BSC, namely Aina and Ilenia, focusing on Catalan, Spanish, and other regional languages. These projects have amassed substantial linguistic data from diverse regions across Spain. The subsequent phase will prioritize industry and institutional adoption of the LLM.

The data collected by BSC, devoid of licensed content, is already accessible to companies of all sizes. Cañigueral notes its utility, citing Google’s utilization of the data to enhance its language models. Additionally, existing startups and projects focusing on LLMs in Spanish and Basque stand to benefit from BSC’s initiative, fostering a collaborative ecosystem.

Cañigueral emphasizes the coexistence of multiple language models, dispelling the notion of a singular dominating model. This approach allows for collaboration with specialized models tailored to specific tasks, all while leveraging the vast dataset provided by BSC for enhanced accuracy and fluency.

Enhancing Language Precision 

KiK underscores the potential of the BSC LLM project to elevate the accuracy of AI interactions in Spanish-speaking regions. Currently, many AI startups in Spain rely heavily on English language data, resulting in suboptimal performance in local languages. For instance, KiK shares the challenges faced with AiMA, highlighting its struggle to deliver proficient responses in Catalan.

The BSC LLM promises to streamline development efforts by eliminating the need for extensive modifications to achieve linguistic authenticity. Moreover, it enables AI companions to discern regional dialects, allowing for more personalized interactions aligned with users’ linguistic nuances and expressions.

Conclusion:

Spain’s initiative to develop an open-source LLM trained in Spanish and regional languages signifies a significant advancement in language technology. This collaborative effort not only promotes linguistic diversity but also empowers Spanish AI startups to compete in global markets. With the potential to enhance accuracy and fluency in AI interactions, the BSC LLM initiative opens doors for innovation and growth in the language technology sector.

Source