Google Categorizes 6 Real-World AI Attacks to Prepare for Now

  /     /     /  
Publicated : 23/11/2024   Category : security


Google Categorizes 6 Real-World AI Attacks to Prepare for Now


The models powering generative AI like ChatGPT are open to several common attack vectors that organizations need to understand and get ready for, according to Googles dedicated AI Red Team.



Google researchers have identified six specific attacks that can occur against real-world AI systems, finding that these common attack vectors demonstrate a unique complexity, That will require a combination of adversarial simulations and the help of AI subject-matter expertise to construct a solid defense, they noted.
The company revealed in
a report published
this week that its dedicated AI red team has already uncovered various threats to the fast-growing technology, mainly based on how attackers can manipulate the large language models (LLMs) that drive
generative AI products like ChatGPT
, Google Bard, and more.
The attacks largely result in the technology producing unexpected or even malice-driven results, which can lead to outcomes as benign as the average persons photos showing up on a celebrity photo website, to more serious consequences such as security-evasive phishing attacks or data theft.
Googles findings come on the heels of its release of
the Secure AI Framework (SAIF)
, which the company said is aimed at getting out in front of the
AI security issue
before its too late, as the technology already is experiencing rapid adoption, creating new security threats in its wake.
The first group of common attacks that Google IDd are
prompt attacks
, which involve prompt engineering. Thats a term that refers to crafting effective prompts that instruct LLMs to perform desired tasks. This influence on the model,
when malicious
, can in turn maliciously influence the output of an LLM-based app in ways that are not intended, the researchers said.
An example of this would be if someone added a paragraph to an AI-based phishing attack that is invisible to the end user but could direct the AI to classify a phishing email as legitimate. This might allow it to get past email anti-phishing protections and increase the chances that a phishing attack is successful.
Another type of attack that the team uncovered is one called
training-data extraction
, which aims to reconstruct verbatim training examples that an LLM uses — for example, the contents of the Internet.
In this way, attackers can extract secrets such as verbatim
personally identifiable information (PII)
or passwords from the data. Attackers are incentivized to target personalized models, or models that were trained on data containing PII, to gather sensitive information, the researchers wrote.
A third potential AI attack is
backdooring the model
, whereby an attacker may attempt to covertly change the behavior of a model to produce incorrect outputs with a specific trigger word or feature, also known as a backdoor, the researchers wrote. In this type of an attack, a threat actor can hide code either in the model or in its output to conduct malicious activity.
A fourth attack type, called
adversarial examples
, are inputs that an attacker provides to a model to result in a deterministic, but highly unexpected output, the researchers wrote. An example would be that the model could show an image that clearly shows one thing to the human eye but which the model recognizes as something else entirely. This type of attack could be fairly benign — in a case where someone could train the model to recognize his or her own photo as one deemed worthy of inclusion on a celebrity website — or critical, depending on the technique and intent.
An attacker also could use a
data-poisoning attack
to manipulate the training data of the model to influence the models output according to the attacker’s preference — something that also could threaten the security of the
software supply chain
if developers are using AI to help them develop software. The impact of this attack could be similar to backdooring the model, the researchers noted.
The final type of attack identified by Googles dedicated AI red team is an
exfiltration attack
, in which attackers can copy the file representation of a model to
steal sensitive intellectual property
stored in it. They can then that information to generate their own models that can be used to give attackers unique capabilities in custom-crafted attacks.
Googles initial AI red-team exercise taught the researchers some valuable lessons that other enterprises also can employ to defend against attacks on AI systems, according to the Internet giant. The first one is that while red-team activity is a good start, organizations also should team up with AI experts to conduct realistic end-to-end adversarial simulations for maximum defense.
Indeed,
red-team exercises
, in which an organizations enlists a team of ethical hackers to try to infiltrate its own systems to identify potential vulnerabilities, are becoming a popular trend to help enterprises
bolster their overall security postures
.
We believe that red teaming will play a decisive role in preparing every organization for attacks on AI systems and look forward to working together to help everyone utilize AI in a secure way, the researchers wrote in the report.
However, there was some good news for organizations in another lesson the team learned: Traditional security controls can effectively and significantly mitigate
risk to AI systems
.
This is true in particular for protecting the integrity of AI models throughout their lifecycle to prevent data poisoning and backdoor attacks, the researchers wrote.
As with all the other assets in a traditional enterprise system, organizations also should ensure the systems and models are properly locked down to defend against AI attacks. Further, organizations can use a similar
approach to detection
for attacks on AI systems as they do to sniff out traditional attacks, the researchers noted.
They wrote: Traditional security philosophies, such as validating and sanitizing both input and output to the models still apply in the AI space.

Last News

▸ New threat discovered: Mobile phone ownership compromised. ◂
Discovered: 23/12/2024
Category: security

▸ Some DLP Products Vulnerable to Security Holes ◂
Discovered: 23/12/2024
Category: security

▸ Scan suggests Heartbleed patches may not have been successful. ◂
Discovered: 23/12/2024
Category: security


Cyber Security Categories
Google Dorks Database
Exploits Vulnerability
Exploit Shellcodes

CVE List
Tools/Apps
News/Aarticles

Phishing Database
Deepfake Detection
Trends/Statistics & Live Infos



Tags:
Google Categorizes 6 Real-World AI Attacks to Prepare for Now