ML Model Repositories: The Next Big Supply Chain Attack Target

  /     /     /  
Publicated : 23/11/2024   Category : security


ML Model Repositories: The Next Big Supply Chain Attack Target


Machine-learning model platforms like Hugging Face are suspectible to the same kind of attacks that threat actors have executed successfully for years via npm, PyPI, and other open source repos.



Repositories for
machine learning models like Hugging Face
give threat actors the same opportunities to sneak malicious code into development environments as open source public repositories like npm and PyPI.
At an upcoming Black Hat Asia presentation this April entitled
Confused Learning: Supply Chain Attacks through Machine Learning Models
, two researchers from Dropbox will demonstrate multiple techniques that threat actors can use to distribute malware via ML models on Hugging Face. The techniques are similar to ones that attackers have successfully used for years to upload malware to open source code repositories, and highlight the need for organizations to implement controls for thoroughly inspecting ML models before use.
Machine learning pipelines are a brand new supply chain attack vector and companies need to look at what analysis and sandboxing they are doing to protect themselves, says Adrian Wood, security engineer at Dropbox. ML models are not pure functions. They are full-blown malware vectors ripe for exploit.
Repositories such as Hugging Face are an attractive target because ML models give threat actors access to sensitive information and environments. They are also relatively new, says Mary Walker, a security engineer at Dropbox and co-author of the Black Hat Asia paper. Hugging Face is quite new in a way, Walker says. If you look at their trending models, often youll see a model has suddenly become popular that some random person put there. Its not always the trusted models that people use, she says.
Hugging Face is a repository for ML tools, data sets, and models that developers can download and integrate into their own projects. Like many public code repositories, it allows developers to create and upload their own ML models, or look for models that match their requirements. Hugging Faces security controls include scanning for malware, vulnerabilities, secrets, and sensitive information across the repository. It also offers a format called
Safetensors,
that allows developers to more securely store and upload large tensors — or the core data structures in machine learning models.
Even so, the repository — and other ML model repositories — give openings for attackers to upload malicious models with a view to getting developers to download and use them in their projects.
Wood for instance found that it was trivial for an attacker to register a namespace within the service that appeared to belong to a brand-name organization. There is little to then prevent an attacker from using that namespace to trick actual users from that organization to start uploading ML models to it — which the attacker could poison at will.
Wood says that, in fact, when he registered a namespace that appeared to belong to a well-known brand, he did not even have to try to get users from the organization to upload models. Instead, software engineers and ML engineers from the organizations contacted him directly with requests to join the namespace so they could upload ML models to it, which then Wood could have backdoored at will.
In addition to such namesquatting attacks, threat actors also have other avenues to sneak malware into ML models on repositories such as Hugging Face, Wood says — for instance, using models with typosquatted names. Another example is a model confusion attack when a threat actor might discover the name of private dependencies within a project, and then create public malicious dependencies with the exact names. In the past, such
confusion attacks on open source repositories such as npm
and PyPI have resulted in internal projects defaulting to the malicious dependencies with the same name.
Threat actors have already begun eyeing ML repositories as potential supply chain attack vector. Only earlier this year for instance, researchers at JFrog
discovered a malicious ML model
on Hugging Face that, upon loading, executed malicious code that gave attackers full control of the victim machine. In that instance, the model used something called the pickle file format, which JFrog described as a common format for serializing Python objects.
Code execution can happen when loading certain types of ML models from an untrusted source, JFrog noted. For example, some models use the pickle format, which is a common format for serializing Python objects. However, pickle files can also contain arbitrary code that is executed when the file is loaded.
Woods demonstration involves injecting malware into models using the Keras library and Tensorflow as the backend engine. Wood found that Keras models offer attackers a way to execute arbitrary code in the background while having the model perform in exactly the manner intended. Others have used different methods. In 2020, researchers from HiddenLayer, for instance, used
something similar to steganography
to embed a ransomware executable into a model, and then loaded it using pickle.

Last News

▸ Scan suggests Heartbleed patches may not have been successful. ◂
Discovered: 23/12/2024
Category: security

▸ IoT Devices on Average Have 25 Vulnerabilities ◂
Discovered: 23/12/2024
Category: security

▸ DHS-funded SWAMP scans code for bugs. ◂
Discovered: 23/12/2024
Category: security


Cyber Security Categories
Google Dorks Database
Exploits Vulnerability
Exploit Shellcodes

CVE List
Tools/Apps
News/Aarticles

Phishing Database
Deepfake Detection
Trends/Statistics & Live Infos



Tags:
ML Model Repositories: The Next Big Supply Chain Attack Target