Employees Are Feeding Sensitive Biz Data to ChatGPT, Raising Security Fears

  /     /     /  
Publicated : 23/11/2024   Category : security


Employees Are Feeding Sensitive Biz Data to ChatGPT, Raising Security Fears


More than 4% of employees have put sensitive corporate data into the large language model, raising concerns that its popularity may result in massive leaks of proprietary information.



Employees are submitting sensitive business data and privacy-protected information to large language models (LLMs) such as ChatGPT, raising concerns that artificial intelligence (AI) services could be incorporating the data into their models, and that information could be retrieved at a later date if proper data security isnt in place for the service.
In a recent report, data security service Cyberhaven detected and blocked requests to input data into ChatGPT from 4.2% of the 1.6 million workers at its client companies because of the risk of leaking confidential information, client data, source code, or regulated information to the LLM. 
In one case, an executive cut and pasted the firms 2023 strategy document into ChatGPT and asked it to create a PowerPoint deck. In another case, a doctor input his patients name and their medical condition and asked ChatGPT to craft a letter to the patients insurance company.
And as more employees use ChatGPT and other AI-based services as productivity tools, the risk will grow, says Howard Ting, CEO of Cyberhaven.
There was this big migration of data from on-prem to cloud, and the next big shift is going to be the migration of data into these generative apps, he says. And how that plays out [remains to be seen] — I think, were in pregame; were not even in the first inning.
With the surging popularity of OpenAIs ChatGPT and its foundational AI model — the Generative Pre-trained Transformer or GPT-3 — as well as other LLMs, companies and security professionals have begun to worry that
sensitive data ingested as training data
into the models could resurface when prompted by the right queries. Some are taking action: JPMorgan
restricted workers use of ChatGPT
, for example, and Amazon, Microsoft, and Wal-Mart have
all issued warnings to employees
to take care in using generative AI services.
And as more software firms connect their applications to ChatGPT, the LLM may be collecting far more information than users — or their companies — are aware of, putting them at legal risk, Karla Grossenbacher, a partner at law firm Seyfarth Shaw,
warned in a Bloomberg Law column
.
Prudent employers will include — in employee confidentiality agreements and policies — prohibitions on employees referring to or entering confidential, proprietary, or trade secret information into AI chatbots or language models, such as ChatGPT, she wrote. On the flip side, since ChatGPT was trained on wide swaths of online information, employees might receive and use information from the tool that is trademarked, copyrighted, or the intellectual property of another person or entity, creating legal risk for employers.
The risk is not theoretical. In a June 2021 paper, a dozen researchers from a Whos Who list of companies and universities — including Apple, Google, Harvard University, and Stanford University — found that so-called training data extraction attacks could successfully recover verbatim text sequences, personally identifiable information (PII), and other information in training documents from the LLM known as GPT-2. In fact, only a single document was necessary for an LLM to memorize verbatim data,
the researchers stated in the paper
.
Indeed, these training data extraction attacks are one of the key adversarial concerns among machine learning researchers. Also known as exfiltration via machine learning inference, the attacks could gather sensitive information or steal intellectual property, according to
MITREs Adversarial Threat Landscape for Artificial-Intelligence Systems (Atlas) knowledge base
.
It works like this: By querying a generative AI system in a way that it recalls specific items, an adversary could trigger the model to recall a specific piece of information, rather than generate synthetic data. A number of real-world examples exists for GPT-3, the successor to GPT-2, including an instance where GitHubs Copilot recalled
a specific developers username and coding priorities
.
Beyond GPT-based offerings, other AI-based services have raised questions as to whether they pose a risk. Automated transcription service Otter.ai, for instance, transcribes audio files into text, automatically identifying speakers and allowing important words to be tagged and phrases to be highlighted. The companys housing of that information in its cloud has caused
concern for journalists
.
The company says it has committed to keeping user data private and put in place strong compliance controls, according to Julie Wu, senior compliance manager at Otter.ai.
Otter has completed its SOC2 Type 2 audit and reports, and we employ technical and organizational measures to safeguard personal data, she tells Dark Reading. Speaker identification is account bound. Adding a speaker’s name will train Otter to recognize the speaker for future conversations you record or import in your account, but not allow speakers to be identified across accounts.
The popularity of ChatGPT has caught many companies by surprise. More than 300 developers, according to
the last published numbers from a year ago
, are using GPT-3 to power their applications. For example, social media firm Snap and shopping platforms Instacart and Shopify are all
using ChatGPT through the API
to add chat functionality to their mobile applications.
Based on conversations with his companys clients, Cyberhavens Ting expects the move to generative AI apps will only accelerate, to be used for everything from generating memos and presentations to
triaging security incidents
and interacting with patients.
As he says his clients have told him: Look, right now, as a stopgap measure, Im just blocking this app, but my board has already told me we cannot do that. Because these tools will help our users be more productive — there is a competitive advantage — and if my competitors are using these generative AI apps, and Im not allowing my users to use it, that puts us at a disadvantage.
The good news is education could have a big impact on whether data leaks from a specific company because a small number of employees are responsible for most of the risky requests. Less than 1% of workers are responsible for 80% of the incidents of sending sensitive data to ChatGPT, says Cyberhavens Ting.
You know, there are two forms of education: Theres the classroom education, like when you are onboarding an employee, and then theres the in-context education, when someone is actually trying to paste data, he says. I think both are important, but I think the latter is way more effective from what weve seen.
In addition, OpenAI and other companies are working to limit the LLMs access to personal information and sensitive data: Asking for personal details or sensitive corporate information currently leads to canned statements from ChatGPT demurring from complying.
For example, when asked, What is Apples strategy for 2023? ChatGPT responded: As an AI language model, I do not have access to Apples confidential information or future plans. Apple is a highly secretive company, and they typically do not disclose their strategies or future plans to the public until they are ready to release them.

Last News

▸ ArcSight prepares for future at user conference post HP acquisition. ◂
Discovered: 07/01/2025
Category: security

▸ Samsung Epic 4G: First To Use Media Hub ◂
Discovered: 07/01/2025
Category: security

▸ Many third-party software fails security tests ◂
Discovered: 07/01/2025
Category: security


Cyber Security Categories
Google Dorks Database
Exploits Vulnerability
Exploit Shellcodes

CVE List
Tools/Apps
News/Aarticles

Phishing Database
Deepfake Detection
Trends/Statistics & Live Infos



Tags:
Employees Are Feeding Sensitive Biz Data to ChatGPT, Raising Security Fears