Dangerous AI Workaround: Skeleton Key Unlocks Malicious Content

  /     /     /  
Publicated : 23/11/2024   Category : security


Dangerous AI Workaround: Skeleton Key Unlocks Malicious Content


Microsoft, OpenAI, Google, and Meta GenAI models could be convinced to ditch their guardrails, opening the door to chatbots giving unfettered answers on building bombs, creating malware, and much more.



A new type of direct
prompt injection attack
dubbed Skeleton Key could allow users to bypass the ethical and safety guardrails built into generative AI models like ChatGPT, Microsoft is warning. It works by providing context around normally forbidden chatbot requests, allowing users to access offensive, harmful, or illegal content.
For instance, if a user asked for instructions on how to make a dangerous wiper malware that could bring down power plants, most commercial chatbots would first refuse. But, after revising the prompt to note that the request is for a safe education context with advanced researchers trained on ethics and safety and to provide the requested information with a warning disclaimer, then its very likely that the AI would then provide the uncensored content.
In other words, Microsoft found it was possible to convince most top AIs that a malicious request is for perfectly legal, if not noble, causes — just by telling them that the information is for research purposes.
Once guardrails are ignored, a model will not be able to determine malicious or unsanctioned requests from any other, explained Mark Russinovich, CTO for Microsoft Azure, in a
post today
on the tactic. Because of its full bypass abilities, we have named this jailbreak technique Skeleton Key.
He added, Further, the models output appears to be completely unfiltered and reveals the extent of a models knowledge or ability to produce the requested content.
The technique affects multiple GenAI models that Microsoft researchers tested, including Microsoft Azure AI-managed models, and those from Meta, Google Gemini, Open AI, Mistral, Anthropic, and Cohere.
All the affected models complied fully and without censorship for [multiple forbidden] tasks, Russinovich noted.
The computing giant fixed the problem in Azure by introducing new prompt shields to detect and block the tactic, and making a few software updates to the large language model (LLM) that powers Azure AI. It also disclosed the issue to the other vendors affected.
Admins still need to update their models to implement any fixes that those vendors may have rolled out. And those who are building their own AI models can also use the following mitigations, according to Microsoft:
Input filtering to identify any requests that contain harmful or malicious intent, regardless of any disclaimers that accompany them.
An additional guardrail that specifies that any attempts to undermine safety guardrail instructions should be prevented.
And output filtering that identifies and prevents responses that breach safety criteria.

Last News

▸ Senate wants changes to cybercrime law. ◂
Discovered: 23/12/2024
Category: security

▸ Car Sector Speeds Up In Security. ◂
Discovered: 23/12/2024
Category: security

▸ Making use of a homemade Android army ◂
Discovered: 23/12/2024
Category: security


Cyber Security Categories
Google Dorks Database
Exploits Vulnerability
Exploit Shellcodes

CVE List
Tools/Apps
News/Aarticles

Phishing Database
Deepfake Detection
Trends/Statistics & Live Infos



Tags:
Dangerous AI Workaround: Skeleton Key Unlocks Malicious Content