Simplifying SQL Injection Detection

  /     /     /  
Publicated : 22/11/2024   Category : security


Simplifying SQL Injection Detection


Black Hat researcher releases new lexical analysis tool that doesnt rely on regular expressions



Even many years after gaining prominence as one of the most popular and convenient ways for criminals to break into corporate databases through vulnerable web applications, SQL injection still remains the apple of the eye of many a black hat hacker. While there are plenty of reasons to conspire against enterprises doing a better job preventing these attacks, one of the most fundamental is that it is very difficult to detect SQL injection attacks. This week at Black Hat, a researcher released a new tool to embed in applications that makes that detection process easier.
Part of the problem with many existing detection mechanisms today, including in many web application firewalls, Nick Galbreath, director of engineering at Etsy, told his audience yesterday, is their dependence on regular SQL expressions to do that detection. Analysis using regular expressions quickly gets bogged down because SQL is such a rich, complicated language. He cited a Black Hat talk back in 2005 by Hanson and Patterson that shows how regular expressions can be prone to breaking down and producing false positives.
So what happens is, a lot of the web application firewalls have sort of ended up using what I call regular expression soup, he says.Its impossible to debug and test against. Regular expressions, no matter what you do, are gonna miss something and something that you dont want is going to be flagged as a false positive.
One of the big difficulties in analyzing user input as a potential SQL injection attack is the fact that it is very tough to automatically tell the difference between things like phone numbers or Twitter handles and snippets of SQL statements used to inject code for attacks.
It turns out to be a difficult problem. How do you detect if user input is SQL, good input, or what? Is that my phone number or an arithmetic expression? Is it a Twitter handle or or is it a SQL variable? he says. So trying to disambiguate these things turns out to be a hard problem.
As Galbreath examined that problem, he considered using some existing SQL parsers to do the heavy lifting. But as he doveinto them he found that not only would they only parse their particular flavor of SQL, but that theyre not really designed to handle partial bits of code. Theyre also hard to extend and are very worried about correctness, because theyre usually meant to ensure code runs properly. But someone seeking out SQL injection isnt so worried about correctness.
So instead of depending on tools not specifically meant for SQL injection analysis, Galbreath wrote his own.
It sounds crazy but it turns out is pretty straightforward and not so bad (because) we dont need it to actually run SQL, he says. What it does is it converts input into a stream of tokens. Theres a master list of keywords and functions which is sort of combined against all the major databases. Its not completely intractable and it handles also the comments strings, literals and all the weird cases and things like that.
Called libinjection, its an open source C library that takes a lexical analysis approach that was trained with real user input data from his companys site, a top 50 internet site with a rich base of user input data. With the tokenization approach, the tool is more lightweight and streamlines the process of analyzing user data.
So it goes through, disambiguates, merges tokens, specializes, merges strings together, does all the stuff it needs to do and then it does one last step, which is really designed to reduce false positives, he says. If it sees a bunch of arithmetic operations together, it just merges them all together. My phone number just returns into 1. We dont actually care what the value is because sql injection doesnt care what the value is, just that theres a number there. Same thing with multiple nested parenthesis, it just gets rid of them.
By parsing and analyzing these tokens in this way, what Galbreath finds is that his tool doesnt have to sift through bytes and bytes of user data to find whether or not user input is SQL injection or benign. In fact, through his testing of millions of user input and SQL injection input scenarios he found the magic number of tokens needed to distinguish between SQL injection and benign input was just five tokens.
Thats pretty interesting compared to regular expression, because then youre parsing the entire input. If you have a 10 megs of input, its going to be parsing 10 megs of data, he says. This, as soon as it hits 5 tokens, done.
Have a comment on this story? Please click Add Your Comment below. If youd like to contact
Dark Readings
editors directly,
send us a message
.

Last News

▸ Hack Your Hotel Room ◂
Discovered: 23/12/2024
Category: security

▸ Website hacks happened during World Cup final. ◂
Discovered: 23/12/2024
Category: security

▸ Criminal Possession of Government-Grade Stealth Malware ◂
Discovered: 23/12/2024
Category: security


Cyber Security Categories
Google Dorks Database
Exploits Vulnerability
Exploit Shellcodes

CVE List
Tools/Apps
News/Aarticles

Phishing Database
Deepfake Detection
Trends/Statistics & Live Infos



Tags:
Simplifying SQL Injection Detection