One-Third of Popular PyPI Packages Mistakenly Flagged as Malicious

  /     /     /  
Publicated : 23/11/2024   Category : security


One-Third of Popular PyPI Packages Mistakenly Flagged as Malicious


The scans used by the Python Package Index (PyPI) to find malware fail to catch 41% of bad packages, while creating plentiful false positives.



The scanners tasked with weeding out malicious contributions to packages distributed via the popular open source code repository Python Package Index (PyPI) create a significant number of false alerts, researchers have found. 
According to a Chainguard analysis of PyPI — the main repository for software components used in applications written in Python — the approach catches 59% of malicious packages but also flags a third of popular legitimate Python packages and 15% of a random selection of packages. 
The research aims to create a data set that Python maintainers and the PyPI repository can use to determine the efficacy of their system for scanning projects for malicious changes and supply chain attacks, the Chainguard researchers stated in a Tuesday analysis.
While the existing approach detects the majority of malware, it clearly needs significant improvements to prevent wasting project managers time with false alarms, says Zack Newman, a senior software engineer at Chainguard, who collaborated on the research.
These are volunteers with countless other responsibilities, not security researchers who are willing to spend all day trawling through suspicious code, he says. They care a great deal about the security of PyPI and work very hard to improve the situation, but the return on effort just isnt there at the moment for these scanners.
False positives are the bane of many software analysis tools, and therefore security teams. Even with a system that is 100% accurate at finding malicious packages, if it has a 1% false positive rate, developers and application-security professionals would still have to dig through 200 alerts each week to determine if any of the 20,000 weekly PyPI releases are actually malicious.
Hundreds of packages triggered alerts, Newman says. While we did some spot checks, just a quick look isnt enough to tell for sure whether a package is malicious — thats why malware-detection tools are so important. This gave us a lot of empathy for the repository administrators, who would face this volume of alerts tenfold each week.
He adds, To be useful, a scanner would need to reduce that false positive rate to around 0.01%, even at the expense of missing some malicious packages.
PyPI aims to foil software supply chain attacks by checking packages and projects in two ways. The PyPI scans the packages
setup.py
file using signatures to detect known suspicious patterns — expressed by YARA rules, an industry standard for creating malware signatures — that could indicate the inclusion of malicious functionality. (YARA stands for Yet Another Recursive Acronym, more of an inside industry joke than a descriptive name.) In addition, the repositorys scanning tools analyze a projects commits and contributors for suspicious changes that could suggest malicious contributions.
The researchers built their data set using 168 known examples of malicious attacks on the PyPI repository. They then created a second data set with the 1,000 most-downloaded packages and the 1,000 most-imported packages, and when they eliminated duplicates, they ended up with 1,430 popular packages. Finally, they also created a data set of a random selection of 1,000 packages, which resulted in 986 random Python packages, since 14 did not have any Python code.
The popular and randomly selected packages were all assumed to be legitimate, the researchers said. In addition, the popular projects likely had better security hygiene and abided by programming best practices.
While there is a chance that some of these packages are malicious, the chance that more than a handful of these packages is malicious is vanishingly small, they wrote in the analysis,
issued Tuesday
. Importantly, these packages are more likely to represent a package selected from PyPI at random.
The research comes as application-security professionals and software developers look for ways to ensure the security of the open source software components that
make up 78% of the code in an average program

The Open Source Security Foundation (OpenSSF) has launched a number of initiatives to improve the security of the open source software supply chain, including identifying the most critical packages that need more security scrutiny, and
support for the adoption of SigStore
, a way of cryptographically linking source code to compiled packages.
Attacks on the software supply chain have increased over the past few years. In the past month alone,
security firm Kaspersky found
 malware in the Node Package Manager (npm) repository, while security firms
Check Point and Snyk
found nearly a score of malicious packages hosted on the PyPI repository service.
And it came to light that a school-aged kid in Italy 
uploaded multiple malicious Python packages
containing ransomware scripts to PyPI, supposedly as an experiment.
Its unlikely that PyPI is alone in having problematic scanning results. Going forward, the Chainguard researchers plan to extend their analysis to evaluate at least four open source software malware analyzers, such as
OSSGadget Detect Backdoor
,
bandit4ma
, and
OSSF Package Analysis
, as well as translating the PyPI Malware Checks rules to
SemGrep
, a multilanguage open source static code analyzer.

Last News

▸ IoT Devices on Average Have 25 Vulnerabilities ◂
Discovered: 23/12/2024
Category: security

▸ DHS-funded SWAMP scans code for bugs. ◂
Discovered: 23/12/2024
Category: security

▸ Debunking Machine Learning in Security. ◂
Discovered: 23/12/2024
Category: security


Cyber Security Categories
Google Dorks Database
Exploits Vulnerability
Exploit Shellcodes

CVE List
Tools/Apps
News/Aarticles

Phishing Database
Deepfake Detection
Trends/Statistics & Live Infos



Tags:
One-Third of Popular PyPI Packages Mistakenly Flagged as Malicious