Dangerous AI Workaround: Skeleton Key Unlocks Malicious Content

Microsoft, OpenAI, Google, and Meta GenAI models could be convinced to ditch their guardrails, opening the door to chatbots giving unfettered answers on building bombs, creating malware, and much more.

A new type of direct
prompt injection attack
dubbed Skeleton Key could allow users to bypass the ethical and safety guardrails built into generative AI models like ChatGPT, Microsoft is warning. It works by providing context around normally forbidden chatbot requests, allowing users to access offensive, harmful, or illegal content.
For instance, if a user asked for instructions on how to make a dangerous wiper malware that could bring down power plants, most commercial chatbots would first refuse. But, after revising the prompt to note that the request is for a safe education context with advanced researchers trained on ethics and safety and to provide the requested information with a warning disclaimer, then its very likely that the AI would then provide the uncensored content.
In other words, Microsoft found it was possible to convince most top AIs that a malicious request is for perfectly legal, if not noble, causes — just by telling them that the information is for research purposes.
Once guardrails are ignored, a model will not be able to determine malicious or unsanctioned requests from any other, explained Mark Russinovich, CTO for Microsoft Azure, in a
post today
on the tactic. Because of its full bypass abilities, we have named this jailbreak technique Skeleton Key.
He added, Further, the models output appears to be completely unfiltered and reveals the extent of a models knowledge or ability to produce the requested content.
The technique affects multiple GenAI models that Microsoft researchers tested, including Microsoft Azure AI-managed models, and those from Meta, Google Gemini, Open AI, Mistral, Anthropic, and Cohere.
All the affected models complied fully and without censorship for [multiple forbidden] tasks, Russinovich noted.
The computing giant fixed the problem in Azure by introducing new prompt shields to detect and block the tactic, and making a few software updates to the large language model (LLM) that powers Azure AI. It also disclosed the issue to the other vendors affected.
Admins still need to update their models to implement any fixes that those vendors may have rolled out. And those who are building their own AI models can also use the following mitigations, according to Microsoft:
Input filtering to identify any requests that contain harmful or malicious intent, regardless of any disclaimers that accompany them.
An additional guardrail that specifies that any attempts to undermine safety guardrail instructions should be prevented.
And output filtering that identifies and prevents responses that breach safety criteria.

Last News
▸ ArcSight prepares for future at user conference post HP acquisition. ◂ Discovered: 07/01/2025 Category: security	▸ Samsung Epic 4G: First To Use Media Hub ◂ Discovered: 07/01/2025 Category: security	▸ Many third-party software fails security tests ◂ Discovered: 07/01/2025 Category: security

**Cyber Security Categories**
Google Dorks Database	Exploits Vulnerability	Exploit Shellcodes

CVE List

Tools/Apps

News/Aarticles

Phishing Database

Deepfake Detection

Trends/Statistics & Live Infos

Tags:
Dangerous AI Workaround: Skeleton Key Unlocks Malicious Content