Pen Testers Need to Hack AI, but Also Question Its Existence

  /     /     /  
Publicated : 23/11/2024   Category : security


Pen Testers Need to Hack AI, but Also Question Its Existence


Learning how to break the latest AI models is important, but security researchers should also question whether there are enough guardrails to prevent the technologys misuse.



Samsung has
banned some uses of ChatGPT
, Ford Motor and Volkswagen shuttered their self-driving car firm, and a letter
calling for a pause in training more powerful AI systems
has garnered more than 25,000 signatures.
Overreactions? No, says Davi Ottenheimer, the vice president of trust and digital ethics at Inrupt, a startup creating digital identity and security solutions. A pause is needed to develop better approaches to testing, not just of the security, but the safety of machine-learning and artificial-intelligence models. These include ChatGPT, self-driving vehicles, and autonomous drones.
A steady stream of security researchers and technologists have already found ways to circumvent protections placed on AI systems, but society needs to have broader discussions about how to test and improve safety, say Ottenheimer, who will give a presentation on the topic at the RSA Conference in San Francisco next week.
Especially from the context of a pentest, Im supposed to go in and basically assess [an AI system] for safety, but whats missing is that were not making a decision about whether it is safe, whether the application is acceptable, he says. A servers security, for example, does not speak to whether the system is safe if you are running the server in a way thats unacceptable ... and we need to get to that level with AI.
With the introduction of ChatGPT in November, interest in artificial intelligence and machine learning — already surging due to applications in the data science field — took off. The eerie capabilities of the large language model (LLM) to seemingly understand human language and to synthesize coherent responses has led to
a surge in proposed applications
based on the technology and other forms of AI. ChatGPT has already been used to
triage security incidents
,  and a more advanced LLM forms the core of
Microsofts Security Copilot
.
Yet the generative pre-trained transformer (GPT) is just one form of AI model, and all of them can have significant problems with bias, false positives, and other issues.
These shortcomings, and a general lack of explainability in AI models, means that any model can be attacked in ways that the creators may not have imagined, Inrupts Ottenheimer will say in his RSA Conference presentation,
Pentesting AI: How to Hunt a Robot
. If AI models are quickly adopted without adequate study, they may make their way into critical application, where they could be attacked or fail spectacularly, he says.
Its actually super easy to make them fail, Ottenheimer says. Most people are looking at it as,
Can I fool it in this one area?
but thats not the discussion you should be having, because — oh my god — youre using this technology in a totally inappropriate way.
Recent research demonstrates how simple attacking AI can be. Asking ChatGPT to mimic specific people, also known as assigning it a persona, can result in the AI model breaking its guardrails, according to a team of researchers from the Allen Institute for AI, the Georgia Institute of Technology, and Princeton University. The researchers had ChatGPT assume a variety of personas, and even a general persona — such as a bad person — can result in the large language model using toxic language, the team stated in a paper published on April 11.
With a plethora of products already shipping using ChatGPT, the researchers warn that it can unexpectedly result in harmful behavior.
We hope that our findings inspire the broader AI community to rethink the efficacy of current safety guardrails and develop better techniques that lead to robust, safe, and trustworthy AI systems, the researchers
stated in their paper
.
Ottenheimer breaks down AI tests into six categories based on the traditional CIA triad: Confidentiality, integrity, and availability. False positives, for example, can lead to significant costs for society, such as emergency responders who are overtaxed because
skiers Apple watches are dialing 9-1-1
due to jarring runs down slopes. The academic research on using personas to jailbreak ChatGPTs content protections is similar to other research, which created a
persona DAN (Do Anything Now)
that allowed users to bypass safeguards.
Companies and researchers need to find ways to do a hard reset of such systems, to purge any toxic inputs, but at the same time teach the AI to take such actions in the future.
You actually have to reset it, such that the harms dont happen again, or you have to reset it in a way that the harms can be undone, Ottenheimer says.
Finally, the threat to privacy is a significant threat as well as large language models use a vast data set, typically copied from the Internet, without the permissions of the publishers of that data. Italy has given OpenAI
until the end of April
to find ways to protect peoples data and allow correction or deletion. And the efforts may grow as the European Data Protection Board (EDPB) has
launched a task force
dedicated to studying the issue and fostering cooperation.

Last News

▸ Making use of a homemade Android army ◂
Discovered: 23/12/2024
Category: security

▸ CryptoWall is more widespread but less lucrative than CryptoLocker. ◂
Discovered: 23/12/2024
Category: security

▸ Feds probe cyber breaches at JPMorgan, other banks. ◂
Discovered: 23/12/2024
Category: security


Cyber Security Categories
Google Dorks Database
Exploits Vulnerability
Exploit Shellcodes

CVE List
Tools/Apps
News/Aarticles

Phishing Database
Deepfake Detection
Trends/Statistics & Live Infos



Tags:
Pen Testers Need to Hack AI, but Also Question Its Existence