TL;DR:
- Big Tech’s attempt to privatize web data poses a threat to the AI revolution.
- AI relies on quality data for its effectiveness and potential.
- Publicly available web data fuels the growth of AI models.
- Meta’s release of LLaMA aims to democratize access to AI research.
- However, Meta is also pursuing litigation to restrict access to public web data it doesn’t own.
- Privatizing publicly available data hinders AI from reaching its full potential.
- The volume of data worldwide is expected to triple this year, highlighting its importance.
- Restricting access to public web data limits AI’s development aligned with society’s best interests.
- Publicly available data is vital for current business operations and non-profit activities.
- AI has the potential for social good, such as addressing societal issues through pro bono programs.
- Access to diverse and up-to-date public web data is crucial for ethical AI training.
- Balancing the interests of Big Tech and the advancement of AI for society is a significant challenge.
Main AI News:
In the realm of technological advancements, transformative innovations emerge only once every few years. The Internet revolutionized the world, and now Artificial Intelligence (AI) is poised to do the same. AI possesses immense potential to enhance lives and revolutionize industries spanning from healthcare to finance. However, the effectiveness of AI is intrinsically tied to the quality of the data it is trained on.
The proliferation of textual content, images, videos, and audio accessible on the public web has propelled the growth of AI models, serving as an ever-expanding wellspring of knowledge. This very reason is why experts forecast the AI industry, already valued at $137 billion, to experience an annual growth rate of over 37% throughout this decade.
For example, Meta, the parent company of Facebook, recently unveiled LLaMA, a collection of foundational language models designed to democratize access to AI research. Meta proudly asserts, “We train our models on trillions of tokens, demonstrating the feasibility of developing state-of-the-art models using exclusively publicly available datasets.”
Paradoxically, even as Meta extols the significance of publicly accessible data for AI, it is concurrently engaged in legal battles to restrict access to public web data that it acknowledges does not belong to it. This approach, if permitted, could create a walled garden around data existing in the public domain (excluding data behind a login), stifling AI’s potential.
Looking ahead, the volume of data generated, captured, copied, and consumed globally is projected to reach a staggering 120 zettabytes this year—a threefold increase compared to 2019. If publicly available web data is wrested away from the public and monopolized by tech giants, it would severely impede the progress of AI, preventing it from realizing its full potential.
By allowing only a select few companies to spearhead cutting-edge AI advancements, the development of this transformative technology would veer away from the best interests of humanity. It is crucial to recognize that publicly available data serves not only as the lifeblood of emerging AI tools but also as an indispensable resource for current business operations.
Companies and non-profits alike heavily rely on publicly accessible web data to efficiently and effectively fulfill their missions. A survey conducted among 150 experts in IT, technology, and data analytics from US retail, technology, and non-profit organizations reveals that 94% of them utilize public web data on a daily basis. Alarmingly, almost 80% of respondents claim that their operations would be severely hampered without access to such data.
Moreover, the potential for AI to be leveraged for the greater social good is incredibly promising. Consider, for instance, our pro bono program, The Bright Initiative, which supports non-profit, academic, and charitable organizations in combatting pressing social issues such as antisemitism, hate speech, and human trafficking.
In a broader context, it is imperative to ensure that developers have unfettered access to the datasets they require to train AI ethically. By providing an extensive array of diverse and current information, public web data can be harnessed to train machine learning models, enhance accuracy, and align AI with the collective goals of humanity.
Conlcusion:
The attempt by Big Tech to privatize web data and restrict access to publicly available information poses a significant threat to the AI revolution. This has implications for the market as it could create a walled garden around data, limiting competition and innovation in the AI sector. The monopolization of web data by a few powerful companies could stifle the development of cutting-edge AI technologies aligned with society’s best interests. It is crucial for market stakeholders to address this challenge and strike a balance between commercial interests and the advancement of AI for the benefit of the market and society as a whole.