ML Model Repositories: The Next Big Supply Chain Attack Target

Machine-learning model platforms like Hugging Face are suspectible to the same kind of attacks that threat actors have executed successfully for years via npm, PyPI, and other open source repos.

Repositories for
machine learning models like Hugging Face
give threat actors the same opportunities to sneak malicious code into development environments as open source public repositories like npm and PyPI.
At an upcoming Black Hat Asia presentation this April entitled
Confused Learning: Supply Chain Attacks through Machine Learning Models
, two researchers from Dropbox will demonstrate multiple techniques that threat actors can use to distribute malware via ML models on Hugging Face. The techniques are similar to ones that attackers have successfully used for years to upload malware to open source code repositories, and highlight the need for organizations to implement controls for thoroughly inspecting ML models before use.
Machine learning pipelines are a brand new supply chain attack vector and companies need to look at what analysis and sandboxing they are doing to protect themselves, says Adrian Wood, security engineer at Dropbox. ML models are not pure functions. They are full-blown malware vectors ripe for exploit.
Repositories such as Hugging Face are an attractive target because ML models give threat actors access to sensitive information and environments. They are also relatively new, says Mary Walker, a security engineer at Dropbox and co-author of the Black Hat Asia paper. Hugging Face is quite new in a way, Walker says. If you look at their trending models, often youll see a model has suddenly become popular that some random person put there. Its not always the trusted models that people use, she says.
Hugging Face is a repository for ML tools, data sets, and models that developers can download and integrate into their own projects. Like many public code repositories, it allows developers to create and upload their own ML models, or look for models that match their requirements. Hugging Faces security controls include scanning for malware, vulnerabilities, secrets, and sensitive information across the repository. It also offers a format called
Safetensors,
that allows developers to more securely store and upload large tensors — or the core data structures in machine learning models.
Even so, the repository — and other ML model repositories — give openings for attackers to upload malicious models with a view to getting developers to download and use them in their projects.
Wood for instance found that it was trivial for an attacker to register a namespace within the service that appeared to belong to a brand-name organization. There is little to then prevent an attacker from using that namespace to trick actual users from that organization to start uploading ML models to it — which the attacker could poison at will.
Wood says that, in fact, when he registered a namespace that appeared to belong to a well-known brand, he did not even have to try to get users from the organization to upload models. Instead, software engineers and ML engineers from the organizations contacted him directly with requests to join the namespace so they could upload ML models to it, which then Wood could have backdoored at will.
In addition to such namesquatting attacks, threat actors also have other avenues to sneak malware into ML models on repositories such as Hugging Face, Wood says — for instance, using models with typosquatted names. Another example is a model confusion attack when a threat actor might discover the name of private dependencies within a project, and then create public malicious dependencies with the exact names. In the past, such
confusion attacks on open source repositories such as npm
and PyPI have resulted in internal projects defaulting to the malicious dependencies with the same name.
Threat actors have already begun eyeing ML repositories as potential supply chain attack vector. Only earlier this year for instance, researchers at JFrog
discovered a malicious ML model
on Hugging Face that, upon loading, executed malicious code that gave attackers full control of the victim machine. In that instance, the model used something called the pickle file format, which JFrog described as a common format for serializing Python objects.
Code execution can happen when loading certain types of ML models from an untrusted source, JFrog noted. For example, some models use the pickle format, which is a common format for serializing Python objects. However, pickle files can also contain arbitrary code that is executed when the file is loaded.
Woods demonstration involves injecting malware into models using the Keras library and Tensorflow as the backend engine. Wood found that Keras models offer attackers a way to execute arbitrary code in the background while having the model perform in exactly the manner intended. Others have used different methods. In 2020, researchers from HiddenLayer, for instance, used
something similar to steganography
to embed a ransomware executable into a model, and then loaded it using pickle.

Last News
▸ ArcSight prepares for future at user conference post HP acquisition. ◂ Discovered: 07/01/2025 Category: security	▸ Samsung Epic 4G: First To Use Media Hub ◂ Discovered: 07/01/2025 Category: security	▸ Many third-party software fails security tests ◂ Discovered: 07/01/2025 Category: security

**Cyber Security Categories**
Google Dorks Database	Exploits Vulnerability	Exploit Shellcodes

CVE List

Tools/Apps

News/Aarticles

Phishing Database

Deepfake Detection

Trends/Statistics & Live Infos

Tags:
ML Model Repositories: The Next Big Supply Chain Attack Target