Flawed AI Tools Create Worries for Private LLMs, Chatbots

  /     /     /  
Publicated : 23/11/2024   Category : security


Flawed AI Tools Create Worries for Private LLMs, Chatbots


Companies are looking to large language models to help their employees glean information from unstructured data, but vulnerabilities could lead to disinformation and, potentially, data leaks.



Companies that use private instances of large language models (LLMs) to make their business data searchable through a conversational interface face risks of data poisoning and potential data leakage if they do not properly implement security controls to harden the platforms, experts say.
Case in point: This week, Synopsys disclosed a cross-site request forgery (CSRF) flaw that affects applications based on the EmbedAI component created by AI provider SamurAI; it could allow attackers to fool users into uploading poisoned data into their language model, the application-security firm warned. The attack exploits the open source components lack of a safe cross-origin policy and failure to implement session management, and could allow an attacker to affect even a private LLM instance or chatbot, says Mohammed Alshehri, the Synopsys security researcher who found the vulnerability.
The risks are similar to those facing developers of software applications, but with an AI twist, he says.
Therere products where they take an existing AI implementation [and open source components] and merge them together to create something new, he says. What we want to highlight here is that even after the integration, companies should test to ensure that the same controls we have for Web applications are also implemented on the APIs for their AI applications.
The research underscores that the rush to integrate AI into business processes does pose risks, especially for companies that are giving LLMs and other generative-AI applications access to large repositories of data. Overall, only 4% of US companies have adopted AI as part of their business operations, but some industries have higher adoption rates, with the information sector at 14% and the professional services sector at 9%, according to
a survey by the US Census Bureau conducted in October 2023
.
The risks posed by the adoption of next-gen artificial intelligence and machine learning (AI/ML) are not necessarily due to the models, which tend to have smaller attack surfaces, but the software components and tools for developing AI applications and interfaces, says Dan McInerney, lead AI threat researcher with Protect AI, an AI application security firm.
Theres not a lot of magical incantations that you can send to an LLM and have it spit out passwords and sensitive info, he says. But theres a lot of vulnerabilities in the servers that are used to host LLMs. The [LLM] is really not where youre going to get hacked — youre going to get hacked from all the tools you use around the LLM.
Such vulnerabilities have already become actively exploited. In March, Oligo Security reported on the discovery of active attacks against Ray, a popular AI framework, using a previously discovered security issue,
one of five vulnerabilities
that had been discovered by research groups at Protect AI and Bishop Fox, along with independent researcher Sierra Haex. Anyscale, the company behind Ray,
fixed four vulnerabilities but considered the fifth to be a misconfiguration issue
.
Yet, attackers managed to find hundreds of deployments that inadvisedly exposed a Ray server to the Internet and compromised the systems,
according to an analysis published by Oligo Security in March
.
This flaw has been under active exploitation for the last seven months, affecting sectors like education, cryptocurrency, biopharma and more, the company stated. All organizations using Ray are advised to review their environments to ensure they are not exposed and to analyze any suspicious activity.
In its own March advisory,
Anyscale acknowledged the attacks
and released a tool to detect insecurely configured systems.
While the vulnerability in the Ray framework exposed public-facing servers to attack, even private AI-powered LLMs and chatbots could face exploitations. In May, AI-security firm Protect AI released the latest tranche of vulnerabilities discovered by its
bug bounty community, Huntr
, encompassing
32 issues from critical remote exploits to low-severity race conditions
. Some attacks may require access to the API, but others could be carried out through malicious documents and other vectors.
In its own research, Synopsys researcher Alshehri discovered the cross-site request forgery (CSRF) issue, which gives an attacker the ability to poison an LLM through a watering hole attack.
Exploitation of this vulnerability could affect the immediate functioning of the model and can have long-lasting effects on its credibility and the security of the systems that rely on it,
Synopsys stated in its advisory
. This can manifest in various ways, including the spread of misinformation, introduction of biases, degradation of performance, and potential for denial-of-service attacks.
By using a private instance of a chatbot service or internally hosting an LLM, many companies believe they have minimized the risk of exploitation, says Tyler Young, CISO at BigID, a data management firm.
Most enterprises are leaning toward leveraging private LLM chatbots on top of those LLM algorithms, simply because it offers that comfort, just like hosting something in your own cloud, where you have control over who can access the data, he says. But there are risks ... because the second you have an inherent trust, you start pumping more and more data in there, and you have overexposure. All it takes is one of those accounts to get compromised.
Companies need to assume that the current crop of AI systems and services have had only limited security design and review, because the platforms often are based on open-source components that have small teams and limited oversight, says Synopsyss Alshehri. In fact, in February, the
Hugging Face AI open source model repository
was found to be riddled with malicious code-execution models.
The same way we do regular testing and those code reviews with black-box and white-box testing, we need to do that ... when it comes to adopting these new technologies, he says.
Companies that are implementing AI systems based on internal data should segment the data — and the resulting LLM instances — so that only employees are allowed access to just those LLM services that were built on the data to which they have access. Each collection of users with a specific privilege level will require a separate LLM trained on their accessible data.
You cannot just give the LLM access to a giant dump of data and say, OK, everyone has access to this, because thats the equivalent of giving everyone access to a database with all the data inside of it, right? says Protect AIs McInerney. So you got to clean the data.
Finally, companies need to minimize the components they are using to develop their AI tools and then regularly update those software assets and implement controls to make exploitation more difficult, he says.

Last News

▸ Researchers create BlackForest to gather, link threat data. ◂
Discovered: 23/12/2024
Category: security

▸ Travel agency fined £150,000 for breaking Data Protection Act. ◂
Discovered: 23/12/2024
Category: security

▸ 7 arrested, 3 more charged in StubHub cyber fraud ring. ◂
Discovered: 23/12/2024
Category: security


Cyber Security Categories
Google Dorks Database
Exploits Vulnerability
Exploit Shellcodes

CVE List
Tools/Apps
News/Aarticles

Phishing Database
Deepfake Detection
Trends/Statistics & Live Infos



Tags:
Flawed AI Tools Create Worries for Private LLMs, Chatbots