Kiteworks Proactively Protects Confidential IP and Private Data from Exposure in Data-hungry Generative AI LLMs

Kiteworks platform uses content-defined zero trust and digital rights management to address rising concerns over sensitive content ingestion in large language model (LLM) tools.

August 17, 2023

San Mateo, CA – Kiteworks, which delivers data privacy and compliance for sensitive content communications through its Private Content Network, announced that the Kiteworks platform uses next-generation digital rights management (DRM) protection to protect critical corporate intellectual property (IP) and personally identifiable information (PII) from ingestion into a burgeoning number of large language model (LLM) tools built on generative artificial intelligence (AI).

A recent survey by Gartner found that enterprises list generative AI as their second-highest risk, pinpointing three primary risk aspects: 1) IP used as part of the training set and leveraged in outputs for other users, 2) PII and other sensitive personal data being used in AI tools that violate data privacy laws, and 3) bad actors using generative AI tools to accelerate the development of attacks and exploitations.

Sensitive content at risk includes 1) training data—that used to train the AI language model, 2) knowledgebase data—confidential, proprietary information used to generate responses from the generative AI LLM tool, and 3) confidential chatbot interactions in customer support, sales, and marketing scenarios where PII and other personal data information are entered, both intentionally and inadvertently, into the chat interface. Many organizations rank the risk as serious, enough that three-quarters of organizations are currently implementing or considering bans on ChatGPT and other generative AI applications within the workplace; 61% of those indicate the bans are intended for the long term or even permanently (source).

Recent studies show there is a significant risk of sensitive content leakage into generative AI LLM tools. These include:

15% (a number that is growing fast) of employees regularly post company data into generative AI LLMs, and one-quarter of that data is considered sensitive. (source)
For workers using generative AI LLM tools, they use them an average of 36 times a day with 25% of the uses including a data paste. (source)
The top categories of confidential information being inputted into generative AI LLMs include internal business data (43%), source code (31%), PII (12%), and customer data (9%). (source)
There are an astounding 30,000 GPT-related projects on GitHub. (source)
Only 28% of organizations have instituted processes to mitigate regulatory compliance risks. (source)
Only 20% of organizations have instituted processes to mitigate PII being used in generative AI LLMs. (source)
Earlier this year, in just six weeks, the amount of sensitive data being ingested into generative AI LLMs shot up 60%. (source)

Without content-based risk policies and controls in place, sensitive content leakage into the generative AI LLMs can be a serious threat for organizations. In addition to cybercriminals manipulating generative LLMs for malicious activity, security researchers believe training data extraction attacks could successfully recover verbatim texts, PII, and IP from generative AI models that cybercriminals hold for ransom. Loss of PII and protected health information (PHI), even if inadvertent, may violate data privacy regulations like GDPR, HIPAA, PIPEDA, PCI DSS, and numerous others that require public disclosure and notification. This can result in regulatory fines and penalties, brand denigration, diminished productivity, and decreased revenue.

“Generative AI LLMs present an urgent data protection challenge to organizations,” said Tim Freestone, CMO at Kiteworks. “The number of employees and contractors using generative AI LLMs is skyrocketing—and will continue to do so because of the immense competitive advantages it offers. And one must remember those within your network are just the cusp of the problem; most organizations send, share, receive, and store sensitive content with thousands of third parties.”

The good news is organizations with Kiteworks-enabled Private Content Networks can track and control confidential information, such as trade secrets, customer data, PII, PHI, and financials, preventing it from being exposed to generative AI LLMs. Kiteworks delivers capabilities based on content policy risk:

Low Risk: Content-defined Zero Trust. Use least-privilege access policies and controls for employees and third parties defined based on the sensitivity of content assets. Watermarking can be applied to alert users that specific content should not be used in generative AI LLMs.
Moderate Risk: View-only DRM. Employ Kiteworks SafeView™ to disallow local copies of content so that employees and third parties cannot extract and upload the copy into a generative AI LLM. Additional policy can be set to expire access at a specified time period or passage of a specified number of views.
High Risk: Next-generation DRM for Collaboration. Leverage next-generation DRM using Kiteworks SafeEdit™ to prevent data from leaving an organization’s network data center and repository while still empowering efficient collaboration with internal users and third parties. Kiteworks SafeEdit streams and transmits an editable video image to users.

“Kiteworks next-generation DRM capabilities enable businesses to track and control the sensitive content employees, contractors, and third parties ingest into generative AI LLMs,” said Kiteworks’ Freestone. “Using our SafeView and SafeEdit capabilities, businesses track and control how end-users access and use confidential data, including what can and cannot be ingested in generative AI LLMs. With Kiteworks, businesses that have embraced generative AI LLMs can do so with confidence, while those that have banned their use can reevaluate their decision, knowing their sensitive data is protected.”

About Accellion

Kiteworks’ mission is to empower organizations to effectively manage risk in every send, share, receive, and save of sensitive data. The Kiteworks platform provides customers with a Private Data Network that delivers data governance, compliance, and protection. The platform unifies, tracks, controls, and secures sensitive data moving within, into, and out of their organization, significantly improving risk management and ensuring regulatory compliance on all private data communications.

Media Contacts

Additional Resources

Kiteworks platform uses content-defined zero trust and digital rights management to address rising concerns over sensitive content ingestion in large language model (LLM) tools.

About Accellion

Media Contacts

Get started.