Teach Your AI Well: A Potential New Bottleneck for Cybersecurity

Artificial intelligence (AI) holds the promise of easing the skills shortage in cybersecurity, but implementing AI may result in a talent gap of its own for the industry.

Ann Johnson is leaning forward from her seat in the lobby of a tourist-district Hilton as she shares her excitement about the promise of AI. Microsoft sees 6.5 trillion security signals a day, she says. AI helps rationalize them down to a quantity that humans can deal with.
As corporate vice president in Microsofts Cybersecurity Solutions Group, Johnson spends a lot of time thinking about tools to make human security analysts more effective. The goal is to reduce the number of humans required - since there arent nearly enough humans to do the work - and automate simple remediation, leaving humans to do more complex work, she explains.
The shortage of qualified security analysts is an issue that the IT security industry
has been dealing with
for years. There is little question that technology tools - from better analytics engines, to increased automation, to artificial intelligence - are seen as methods for dealing with the shortage. But will the fact that artificial intelligence, like its human analog must be carefully trained, limit its ability to help the industry out of its expertise deficit?
A Blank Slate
Whether the technology is labeled artificial intelligence or machine learning, it is almost never a one size fits all proposition. Heather Lawrence, a data scientist at the Nebraska Applied Research Institute, explains: Every clients environment is different, she says. The machine learning algorithm needs to learn on the clients data.
And every AI engine deployed in a real-world situation must be trained on environment-specific data, whether the AI is looking at a problem as narrowly defined as stopping phishing messages - or as broad as a generalized SIEM.
WatchGuard Technologies, for example, uses AI as part of its anti-malware protection product. Theres definitely a big data aspect behind the training, as were training machine learning algorithms, says Corey Nachreiner, WatchGuard Technologies CTO. We gather millions and millions of files that, over time, have been known to be bad; and millions and millions of files that have been known to be good and we throw them through these various types of machine learning algorithms.
These millions of files are where AI can slow down, because the files dont appear by magic. Someone (or a team of someones) must choose the files and make sure that theyre in a format usable by the AI engine. And someone must make sure that the files chosen train the AI to do the right thing.
Intelligence v. Learning
Training is the critical ingredient in AI, but is it also crucial for machine learning (ML) success? For that matter, is there a meaningful difference between AI and ML when it comes to their application in security?
When people use AI in cybersecurity, more often than not they are referring to the application of machine learning, either unsupervised or supervised, for various tasks, says David Atkinson, founder of Senseon and former commercial director of Darktrace. He explains that, for him, machine learning is where the engine is looking at a known set of data, and through analysis produces a predictable range of outcomes. Artificial intelligence, on the other hand, can produce outcomes that lie completely outside the range of predicted results because its techniques can go beyond those reachable through linear algorithms.
Its not just that commercial security companies havent developed the technology to do true AI - in most cases customers wouldnt be comfortable with the possibilities AI presents, says Ariel Herbert-Voss, a Ph.D candidate at Harvard University. We like to have things that are interpretable; you want to know why your algorithm is making a particular decision because if you dont know, then you might be making some terrible, horrible decision, she explains.
Whether your preferred term is AI or ML, training is as vital to success as the specifics of the engine, both to increase the chances of application success and to keep the AI from turning to evil purposes.
At the DEF CON hacker conference back in August, Nebraska Applied Research Institutes Lawrence spoke in the AI Village about techniques for mis-training an AI engine - and how those engines can be protected. Because both AI and ML engines learn from the data they receive, she said, flooding one with bad data results in bad lessons-learned. Perhaps the most famous example of this was the infamous
Tay chatbot
Microsoft created in 2016. Designed as a machine learning bot that would learn to converse with humans on Twitter, Tay was taught to be an abusive racist in less than 24 hours in a concerted effort to feed it bad information.
Lawrence pointed to the stickers recently developed by researchers at Google that
tricked image recognition systems
into seeing toasters where none existed. The key, she said, is in understanding how the AI system learns, and the factors it uses for recognition.
Sven Cattell, a researcher at EndGame, explains how that understanding can be so difficult to develop. For example, consider the dimensions of objects in the theoretical space where they exist - a space that may have many more dimensions than the four-dimensional world humans inhabit. Most people are quite comfortable thinking about the three dimensions in which we move, plus one more for time. Four dimensions is what were taught in geometry and trigonometry classes.
But these four dimensions used by human brains to figure out virtually everything about our environments can be insufficient for machine-based AI. AI engines may need to resort to representing objects in many dimensions — hundreds, in fact. Humans who have to build and train the AI then must employ such an advanced mathematics model to train AI to analyze tasks that range from visual recognition to treating the multiple dimensions of potentially malicious human behavior. Thats why training the AI — building the models the system will use to understand and act upon — is a rigorous discipline.
Moving the Bottleneck
The paradox this creates is that the AI seen as a potential fix for the shortage of trained security professionals yet theres a shortage of skilled AI trainers.
At the end of the day its people building these systems, and its people maintaining these systems, and its people using these systems, says Harvards Herbert-Voss. You have very few machine learning professionals that can handle and clarify and gain meaning from the data, right? So in [my] presentation [at DEF CON] there was a number of
22,000 professionals worldwide
as estimated by elementAI that can perform research in this area, she says.
And as the discussion progresses, those professionals may be as much a limiting factor as anything else on how quickly AI can rescue security from its talent deficit
.
Its not just a question of throwing bodies at the problem — they need to be the right bodies, notes Microsofts Johnson. We have learned that volume isnt the key in training, she says. Diversity is the key in type, geography, and other aspects. Thats important because you have to have non-bias in training.
Bias can include things like making assumptions about gender, social network profiles, or other behavioral markers, and items like looking for a specific nation-state actor and missing actors who are from other areas, she explains.
Even with the difficulty in training AI and ML engines, though, machine intelligence is increasingly becoming a feature in security products. We are incrementally building the presence of AI in security, Johnson says. Its not flipping a switch.
One of the issues around human resources is that the people with expertise in security and the people with expertise in training AI engines are rarely the same people.

Data scientists dont have the subject matter expertise about which devices are vulnerable or why are theyre acting in a particular way, Nebraska Applied Research Institutes Lawrence says. These are questions that most data science professionals dont have answers to. And then on the flip side, cybersecurity experts they have all of this data and they dont understand how to train the machine learning algorithms to get alerts, or get additional automation to reduce their overhead or their labor.
Ultimately, she says, I think its kind of an amplification effect, where you have one group of subject matter experts and another group of data science experts - both of which the talent pool is lacking.
Hybrid Win
For Chris Morales, head of security analytics at Vectra, the answer to both shortages is an approach in which AI augments human effort rather than seeking to replace it.
Machine learning allows us as defenders to adapt much more quickly in real-time to threats that are constantly changing, he says. What machine learning is good at doing is learning over time and adapting. As environments change, the machine can start to change.
Morales explains his thinking. The threat constantly changes and adapts; and if we have a changing landscape, and we have a changing threat, we dont know whats going to happen next.
But machine learning is well suited to those dynamic environments. I think thats going to continue to be true that machine learning allows us, as defenders, to adapt much more quickly, in real time, to threats that are constantly changing, he says.
Related Content:
The 7 Habits of Highly Effective Security Teams
The Enigma of AI & Cybersecurity
The Double-Edged Sword of Artificial Intelligence in Security
Machine Learning, Artificial Intelligence & the Future of Cybersecurity

Black Hat Europe returns to London Dec 3-6 2018 with hands-on technical Trainings, cutting-edge Briefings, Arsenal open-source tool demonstrations, top-tier security solutions and service providers in the Business Hall. Click for information on the
conference
and
to register.

Last News
▸ ArcSight prepares for future at user conference post HP acquisition. ◂ Discovered: 07/01/2025 Category: security	▸ Samsung Epic 4G: First To Use Media Hub ◂ Discovered: 07/01/2025 Category: security	▸ Many third-party software fails security tests ◂ Discovered: 07/01/2025 Category: security

**Cyber Security Categories**
Google Dorks Database	Exploits Vulnerability	Exploit Shellcodes

CVE List

Tools/Apps

News/Aarticles

Phishing Database

Deepfake Detection

Trends/Statistics & Live Infos

Tags:
Teach Your AI Well: A Potential New Bottleneck for Cybersecurity