- Spawning AI, led by Jordan Meyer and Mathew Dryhurst, introduces Source.Plus for curating ethical AI training data.
- Source.Plus’ dataset includes nearly 40 million public domain images and CC0 licensed images, prioritizing creators’ rights.
- The platform offers tools for rights management and data filtering to ensure ethical sourcing and integrity.
- Compensation model allows artists to set prices per download, with Spawning charging a flat rate fee, fostering transparency and fair revenue sharing.
- Source.Plus aims to expand beyond images to include audio and video datasets, potentially revolutionizing the generative AI landscape.
Main AI News:
Spawning AI, led by Jordan Meyer and Mathew Dryhurst, endeavors to empower artists in controlling the utilization of their creations online. Their newest venture, Source.Plus, aspires to curate AI model training data deemed “non-infringing.”
The first endeavor of the Source.Plus project involves a dataset comprising nearly 40 million public domain images and those under the Creative Commons’ CC0 license. This license allows creators to relinquish almost all legal interests in their works. Despite its relatively smaller size compared to other AI training datasets, Meyer asserts that Source.Plus’ dataset is of high enough quality to train state-of-the-art image-generating models.
Meyer articulates, “With Source.Plus, we’re establishing a universal ‘opt-in’ platform. Our objective is to facilitate rights holders in offering their media for generative AI training on their terms, while also making it seamless for developers to integrate such media into their workflows.”
Innovative Rights Management
The ethical considerations surrounding the training of generative AI models, especially art-generating ones like Stable Diffusion and OpenAI’s DALL-E 3, remain contentious, carrying significant implications for artists.
Meyer, the CEO of Spawning, contends that there is no universally agreed-upon best approach to AI training yet. He highlights the prevalent practice of defaulting to the easiest available data, which may not always be sourced responsibly.
Source.Plus, currently in limited beta, expands on Spawning’s existing tools for managing art provenance and usage rights. It represents Spawning’s inaugural effort to construct a media library internally, starting with the PD/CC0 image dataset, suitable for commercial or research applications.
Redefining Fair Data Sourcing
While organizations like Getty Images, Adobe, Shutterstock, and AI startup Bria claim to use fairly sourced data for model training, Spawning aims to set a “higher bar” for data fairness. Source.Plus meticulously filters images for “opt-outs” and adheres to artist training preferences, providing provenance information on image sourcing.
To address historical issues with training datasets containing problematic content, Spawning employs classifier models to detect undesirable elements like nudity and violence. Additionally, Source.Plus allows users to flexibly filter datasets according to their preferences, enhancing data integrity.
Compensation Reinvented
Unlike existing platforms, Source.Plus will allow artists and rights holders to set their own prices per download, with Spawning charging a flat rate fee. Additionally, a subscription plan, Source.Plus Curation, offers enhanced features for managing image collections.
Meyer emphasizes that this pricing model is designed to prioritize artists’ revenue share and autonomy. He anticipates that this approach will lead to higher payouts and greater transparency, offering a more favorable alternative to percentage revenue splits.
Expanding Horizons
Should Source.Plus gain traction, Spawning plans to extend its offerings to include audio and video datasets. Discussions are underway with undisclosed firms to onboard their data onto Source.Plus. Spawning also contemplates developing its generative AI models using data from Source.Plus datasets.
Source.Plus emerges as a promising avenue for artists to participate in the generative AI economy while ensuring fair compensation for their contributions. It addresses the creative community’s growing demand for alternatives to platforms perceived as exploiting their work.
Yet, the scalability and efficacy of Source.Plus remain subjects of scrutiny, especially considering the moderation challenges inherent in managing vast amounts of user-generated content.
Only time will reveal the extent of Source.Plus’ impact and its ability to navigate the complexities of the evolving AI landscape.
Conclusion:
The Source.Plus initiative by Spawning AI signifies a significant step towards elevating ethical standards in the AI training data landscape. By prioritizing artists’ rights and autonomy while offering a transparent and flexible compensation model, Source.Plus sets a precedent for fair data sourcing in the industry. This move not only addresses the concerns of artists but also reflects a growing demand for responsible AI development practices. As Source.Plus expands its offerings and potentially influences market norms, it could reshape the dynamics of the AI training data market, fostering greater collaboration and accountability among stakeholders.