Automattic, the parent company of WordPress and Tumblr, is considering selling platform content to AI firms for training

  • Automattic, the parent company of WordPress and Tumblr, is considering selling platform content to AI firms like MidJourney and OpenAI.
  • Internal conflict arises over scraped content, including private data and unrelated advertisements, potentially compromising user privacy.
  • Generative AI, led by initiatives such as OpenAI’s ChatGPT, faces scrutiny over data integrity and copyright violations.
  • Automattic plans to introduce opt-out feature for users, following similar initiatives by competitors like Squarespace.
  • The company emphasizes user control and industry compliance in response to mounting concerns.

Main AI News:

The recent revelation that Automattic, the umbrella company for WordPress and Tumblr, is contemplating the sale of platform content to AI entities like MidJourney and OpenAI has ignited discussions on privacy and user consent. As per a detailed report by 404 Media released on Tuesday, the proposed deal raises significant questions about data usage and protection.

Internal strife within Automattic has surfaced, with reports indicating that some scraped content destined for AI training includes private data never intended for storage by the company. Moreover, advertisements not owned by Automattic, including relics from an old Apple Music campaign, have allegedly found their way into the training dataset, further complicating the matter.

The controversy has escalated to the extent that a product manager within the company has taken personal action, removing their own photos from Tumblr to safeguard against their potential incorporation into AI training sets, according to sources cited by 404.

The advent of Generative AI, spearheaded by initiatives like OpenAI’s ChatGPT, has transformed various industries since its emergence in late 2022. Text-prompt image generators swiftly followed suit, enticing companies with promises of original content creation. However, the integrity of the data used for training has been contested, prompting legal challenges from major publishers who assert that much of the data violates copyright laws or fair use principles.

In response to mounting concerns, Automattic is poised to unveil a new feature by Wednesday, allowing users to opt out of contributing to AI training datasets. Nonetheless, ambiguity shrouds whether this setting will default to on or off for the majority of users. This move mirrors similar initiatives by competitors such as Squarespace, which implemented an opt-out mechanism last year to protect user data from AI training purposes.

When approached for comment, Automattic referred inquiries to a recent blog post acknowledging the reported discussions while framing the decision as empowering users with greater control over their content. However, the tone of the post appears defensive, emphasizing the absence of legal mandates dictating compliance with user preferences and positioning the company’s actions as aligning with industry norms.

Automattic’s statement underscores their commitment to respecting opt-out preferences and pledges to inform partners of any new opt-outs, facilitating the removal of associated content from past and future training endeavors. This proactive stance aims to navigate the evolving landscape of AI integration while upholding user privacy and choice.

Conclusion:

The proposed sale of user content by Automattic to AI companies underscores the growing intersection of privacy, data usage, and AI training in the market. As concerns mount over the integrity of training datasets and user consent, companies like Automattic face pressure to balance innovation with user rights. Implementing transparent opt-out mechanisms and adhering to evolving industry standards will be critical for maintaining consumer trust and navigating the complex regulatory landscape surrounding AI integration.

Source