Know Thyself Through Data-Driven Security Q&A

Two financial services security pros discuss how to correlate and contextualize data an organization already has to create actionable metrics that can bolster risk management practices

Its almost an inevitability at IT security conferences that some speaker will break out the Sun Tzu quote about knowing your enemy and yourself to avoid disaster in battle. But in this day of threat intelligence feeds and cyberawareness, all too often the emphasis is put on intelligence-gathering about the adversary. Meanwhile, the more obvious and often more available data about oneself remains unharvested.
At the recent UNITED Security Summit, two banking executives from a top 25 U.S. financial institution (who shared best practices on the condition of not naming their employer) challenged that lack of self-awareness, advising fellow practitioners to take a deeper dive into readily available data about their systems, users, and patterns in their environments to improve their risk management strategies with meaningful action. That process starts and ends with what Kelly White, vice president and information security manager, called a security Q&A for an organization.
In order to set yourself up to be able to answer those security questions, the primary steps in doing that is setting it up as a data-centric problem, he said, explaining that the process involves collecting the right data, correlating that data, and then contextualizing the data so its meaningful to the business.
He explained that as his organization -- which manages 30,000 to 40,000 systems on its network -- embarked on the years-long process of finding the right questions to ask and going through the collection, correlation and contextualization to answer them, it experienced plenty of hiccups early on.
We werent asking sophisticated security questions, nor were we doing a good job contextualizing it, he said, explaining that the Q&A was limited to simply asking, for example, what vulnerabilities its scanners were producing, and the answer was limited to the reports those scanners produced.
But questions like that dont offer much value to the risk management process on their own. Instead, said Whites colleague, Adam Collins, organizations should be trying to raise the bar of sophistication. For example, network instrumentation data could offer a good view of the organizations network footprint, so a question to ask there would be, How did the footprint change between yesterday and today? Or you may have knowledge about how many rogue systems are running on the network, but the real question to ask would be, How many rogue systems are there, and who has accessed them in the past week?
The good news is that the data that you need to answer your security questions, you already have. The data is there in your environment, said Collins, senior information security engineer for the bank. It may not be simple to get at. It may be spread out over these different data points -- your platform configuration, your NetFlow data, your server and network vulns, your malware events, your network ingress -- but its there.
Correlate And Contextualize
According to Collins, with the advent of big data stores, the skys the limit on how much data an organization can store and analyze to get value from the information it is collecting.
He and White reported that their organization takes data from about 100 different types of data feeds. These include feeds like SQL server logs, firewall logs, system logs, PCAP files, and Active Directory information through LDAP queries.
But when you look at scattered sets of data like this, it can seem unapproachable, White said. The trick is extracting that and putting it into a central location where it can be analyzed.
He said that NoSQL systems have made it much easier for his organization to build out a centralized common system to do correlation and analysis. Collins also advised organizations to remember that 80 percent of the benefit will come from about 20 percent of the data points. So its good to start at the high-value data point targets, like user Web activity, Active Directory to bind everything together, your vulns -- whether on workstations, laptops, or servers -- your malware agent logs to know infection rates, your IP ports and addresses, and DNS, he said. Thats a good starting point and less overwhelming than trying to take all the data at once.
Collection is one thing, but its the correlation that makes it possible to answer sophisticated security questions about oneself, White said. What happens when you correlate is that it exponentially magnifies the value of that data as opposed to when it stands alone, he said, explaining that when data is tied together, we can answer some cool, more useful actionable security questions.
One of the most important correlations, White said, is through DNS IP to host name mapping.
Were just grabbing DNS activity logs for resolution, and, really, 95 percent of the time it boils down to taking an IP address, mapping it to a host name, and then, in some cases, we also map to a MAC address by pulling CAM tables off of switches, he said. For users we extract from Active Directory, so how do we tie them to the system theyve been interacting with? Again, from the domain authentication logs we can get their IP address and, from there, based on DNS, we can get host name. Nine times out of 10 its that simple.
But correlation isn’t the only important part of the metrics equation. Adding in business context is also critical.
Our work is only as useful as it is to the business and into the action that it influences, he said, explaining that often the big question is what business unit or process some particular metric relates to. In the banking world, it could be a matter of tying specific metrics to a customer sales system or call center systems or Internet banking system. But with so many back-end systems interrelated, those waters can get muddied very quickly.
One of the more interesting, more challenging, issues we had is when youve got a large network and a lot of systems on that network, you start to say over time, Whats the relationship of one system to another system?
To answer that contextual question for their organization, White and Collins said their team has had success leveraging NetFlow data coming off of its Cisco network infrastructure.
That tells you in summary form who is talking to who and over what port, White said. We know, for example, one system that belongs to our e-commerce site, then based on that NetFlow data we can say, OK, well, who does that system talk to? Well, it talks to these two app servers, and these two app servers talk to these systems, and it looks like theyre talking this database language.
[Are you missing the downsides of big data security analysis? See
3 Inconvenient Truths About Big Data In Security Analysis
.]
Putting It Together For Meaningful Answers Through Metrics
So what does all that correlation and contextualization look like in the real world? According to Collins, it can mean the difference between handing a business unit a report that said it has X amount of vulnerabilities on a laundry list of assets and handing them an enterprise threat readiness report.
Since weve taken in more data, weve asked more complicated security questions, weve correlated that data, and weve added this rich context, were able say, Heres the different vulnerabilities broken down by insider threat, outsider threat, by regulation, by each individual threat and also going across the columns by the business unit, he said.
As for security Q&A, the probing questions are based on what the organization needs to know, not on what data is offered ready-made by a security tool.
For example, they said their organization has asked which users have the worst security behavior. And by correlating system configuration information, Web proxy events, and malware events, they learned that 90 percent of the problems come from 1 percent of the users.
Which really sets us up to do targeted, follow-up security awareness training, White said.
Whats more, they took that a step further and asked, Which users are the riskiest users? They tied the answers from the previous question to its application risk catalog and user permissions to see how bad behavior looked across populations of users with access to the highest priority applications.
Like building up muscle through regular exercise, regularly asking and answering difficult security questions hones thought processes about data collection and correlation that can yield creative answers to some of the toughest metrics problems. For example, one of the most intractable problems faced by White and plenty of others in the industry is understanding where sensitive data resides in unstructured data stores, and who has access to those repositories.
In his organizations case, answering that question took the use of a Google appliance, pointing it at its systems, and configuring it to crawl and index unstructured data so that his team could execute regular expressions against the indexed content.
You get the uniform resource locator and the filename and type of content found and the number of those records, he said, explaining that combining that with Active Directory information for user permissions to fileshares or SharePoint can pinpoint who has access to the sensitive information.
As other organizations seek to engage in data-driven Q&A like White and Collins organization did, Collins said a real key to the correlation and contextualization process is ensuring that theres a common language for the data sets. Its also important to understand who the owners are for every asset and every system.
Its great you collect this stuff, he said, but if you dont have anyone you can communicate back to and have them act on it, its not really that valuable.
Have a comment on this story? Please click Add Your Comment below. If youd like to contact
Dark Readings
editors directly,
send us a message
.

Last News
▸ ArcSight prepares for future at user conference post HP acquisition. ◂ Discovered: 07/01/2025 Category: security	▸ Samsung Epic 4G: First To Use Media Hub ◂ Discovered: 07/01/2025 Category: security	▸ Many third-party software fails security tests ◂ Discovered: 07/01/2025 Category: security

**Cyber Security Categories**
Google Dorks Database	Exploits Vulnerability	Exploit Shellcodes

CVE List

Tools/Apps

News/Aarticles

Phishing Database

Deepfake Detection

Trends/Statistics & Live Infos

Tags:
Know Thyself Through Data-Driven Security Q&A