Risks Arising from Employing Customer Data to Train AI: A Growing Concern for Enterprises

TL;DR:

  • Zoom’s plan to use customer data for AI training sparks concerns.
  • More companies are adopting similar strategies, necessitating proactive measures.
  • Distinction between data for service use and AI training is critical.
  • Lawsuits against Google and Microsoft highlight data exploitation concerns.
  • Organizations must prioritize ethical data usage and transparency.
  • Encryption and control over data access are essential safeguards.

Main AI News:

In recent times, Zoom has faced criticism for its intention to leverage customer data for the training of its machine learning models. Yet, this occurrence merely underscores a trend that extends beyond Zoom. Many other companies, amidst integrating AI tools for internal functions, are traversing the same path. Consequently, these potential strategies demand attention from enterprises. Such challenges necessitate proactive approaches involving novel procedures, oversight mechanisms, and technological controls wherever feasible.

Abandoned Ventures into AI This year, Zoom underwent a significant alteration in its terms of service, granting itself the authority to utilize certain customer content for the training of its AI and machine learning models. However, the company reversed this change in early August due to backlash from concerned customers. The apprehension centered around the utilization of their audio, video, chat records, and other forms of communication for this purpose.

This incident serves as a reminder that companies must remain vigilant regarding the utilization of their data in the swiftly advancing era of AI. It is erroneous to assume that data collected by technology firms for AI training closely resembles data acquired for service usage, according to Claude Mandy, Chief Evangelist of Data Security at Symmetry Systems. While both types involve customer data, Mandy asserts that there is a substantial distinction between data about the customer and data belonging to the customer.

A Definitive Contrast This distinction has already taken the spotlight in various lawsuits involving prominent technology companies and consumers. For instance, Google finds itself in a legal battle against a class of millions of consumers. The lawsuit alleges that Google scraped publicly accessible online data, including personal and professional information, creative works, copyrighted content, photos, and even emails. This data was purportedly utilized to train its Bard generative AI technology.

Another lawsuit targets Microsoft, accusing it of employing a similar strategy to train AI tools like ChatGPT, Dall.E, and Vall.E. Comedian Sarah Silverman and two authors lodged a class-action lawsuit asserting that Meta and Microsoft employed their copyrighted materials without consent for AI training purposes.

While these lawsuits involve consumers, the implications for organizations are evident: they must ensure that technology companies do not exploit their data in similar manners.

Inequivalence of Purpose Denis Mandich, Co-founder of Qrypt and a former member of the US intelligence community, highlights that there is a fundamental difference between using customer data to enhance user experience and employing it for AI training. He asserts that the risks associated with AI are far-reaching, as it can make individually predictive decisions, potentially jeopardizing individuals and companies.

For instance, Mandich cites a startup employing third-party communication platforms for video and file transfers. Should a generative AI tool like ChatGPT be trained on this data, it could inadvertently expose valuable information to competitors. The crux of the issue lies in the content itself, not in aspects like user experience or audio-visual quality.

Supervision and Thorough Scrutiny The pivotal question revolves around the measures organizations can undertake to mitigate the risk of their sensitive data becoming integrated into AI models. Omri Weinberg, Co-founder and Chief Risk Officer at DoControl, suggests a pragmatic starting point: organizations should refrain from participating in AI training and generative AI features that lack private deployment. Such a precautionary stance becomes essential when the precise usage and potential risks of data exposure remain unclear.

Furthermore, meticulous scrutiny of technology vendors’ terms of service is crucial. Heather Shoemaker, CEO and Founder of Language I/O, emphasizes the significance of ethical data usage, underpinned by policy transparency and informed consent.

AI tools can store customer information beyond the scope of training, potentially rendering data susceptible to cyberattacks or breaches. To counter this, Mandich advocates for the adoption of end-to-end encryption whenever feasible. He emphasizes that companies should demand explicit provisions in end-user license agreements (EULAs) regarding encryption and data access by third parties. The ultimate goal is for companies to retain control over encryption keys, ensuring that sensitive data remains under their purview, rather than the technology provider’s.

Conclusion:

In a rapidly evolving AI landscape, the utilization of customer data for training purposes presents both opportunities and challenges. While companies like Zoom may face initial backlash, the broader market is witnessing a trend of AI integration. Enterprises must differentiate between data types and prioritize transparency and ethical data practices. This shift underscores the need for robust oversight, stringent vendor agreements, and the strategic implementation of encryption technologies. Companies that navigate these complexities adeptly can not only protect their data but also gain a competitive advantage in a data-centric business environment.

Source