Netflix Wants You To Adopt Chaos Monkey

  /     /     /  
Publicated : 22/11/2024   Category : security


Netflix Wants You To Adopt Chaos Monkey


Netflix has made its own automated disaster testing service, Chaos Monkey, available as a free public download. Should you turn it loose on your own systems?



Netflix is a high-profile consumer service. When things go wrong,
people tend to notice
. So it might seem strange that the company tries to make things go wrong with its service on a regular basis.
Thats indeed the goal of Chaos Monkey, the automated software Netflix developed to test its infrastructures mettle. In laymans terms, Chaos Money tries to break stuff. The theory behind this is to build a stronger platform and avoid the types of major, unexpected problems that tend to make ITs phones ring at 2 a.m. (Netflix configures the tool to run only during normal business hours; that way IT staff handle any related issues during the day instead of on nights and weekends.)
Now everyone can embrace the chaos: Netflix just made the source code publicly available as a
free download
. Youll first need to ask yourself if youve got the guts for it. Or, as Netflix put it in a
blog post
: Do you think your applications can handle a troop of mischievous monkeys loose in your infrastructure?
[ Security researcher Dan Kaminsky wants to address security by changing the fundamental way code is written. Read more at
Tired Of Security Problems? Change Rules Of Writing Code
. ]
If youre picturing the band of
winged monkeys
from The Wizard of Oz running amok, youre not far off--theyve just been reengineered for the cloud. Chaos Monkey deliberately shut downs virtual machines (VMs) within Amazons
Auto-Scaling Groups (ASGs)
. (Though the software was written with Amazon Web Services in mind, Netflix said Chaos Monkey is flexible enough to work with other cloud platforms.)
By causing intentional failures on individual instances--Netflix generated more than 65,000 failures in the last year, according to the company--you can learn from those errors and their resolutions. Basic example: Is your application hardy enough to weather a failed VM, or could that single instance bring the curtains down on the whole show?
That type of no-holds-barred testing can help unearth and resolve unknown issues before they become major outages. Better yet, it can lead to stronger applications as theyre being built, rather than trying to retrofit them after the fact. By having that constant idea that somethings going to break, [Netflix has] within their dev ops and engineering departments the mindset that they have to make sure that no single point can take down the entire site, said Jim MacLeod, product manager at the networking firm
WildPackets
, in an interview.
In Netflixs case, it makes sense to try to rise to Chaos Monkeys challenge--their bottom line depends upon their site running smoothly. If you look at Netflixs business model, what really differentiates them isnt just streaming media--its the fact theres something immediate and easy for users to get to, but that means they have to be fairly reliable, MacLeod said. Reliability and uptime are things that are difficult to put in afterwards if you dont design it in, just like security. Its worth noting that Chaos Monkey is not a security tool, per se. Its not intended to unearth the types of flaws that might lead to a
targeted hack
or other security breach. MacLeod said its better characterized as an automated QA tool, though it could help inspire a more serious approach to security in the process.
Chaos Monkey can conceivably help any organization that deploys applications via the cloud. So should you turn the monkey loose? Small and midsize businesses (SMBs) that beg off testing for budget reasons, for example, cant argue with Chaos Monkeys price tag. But MacLeods not too sure many SMBs are prepared for what the software will do. If you lack the resources to quickly respond to downtime, Chaos Monkey is probably not for you. Theres not a reverse of Chaos Monkey to bring things back up [automatically], MacLeod said.
No matter your companys size, there are two prerequisites before running Chaos Monkey, according to MacLeod. First: Know what youre getting into. This is something that is designed to cause problems, MacLeod warned. Chaos Monkey could very well knock you offline, something thats likely to inspire fear in some corners of the business. MacLeod also pointed out that it might be just as scary--and probably for good reason--if you turn on Chaos Monkey and
nothing
breaks.
The second prerequisite is to have the proper organizational philosophy. Given the tools purpose--to break stuff--youd better have buy-in from your bosses before turning it on. Imagine trying to explain yourself to non-technical management after the fact if Chaos Monkey runs rampant within your infrastructure. (A chaos
what?!
) It requires an adventurous spirit and confidence that youve got an architecture thats going to survive this, plus enough humility to know that youre willing to test your site to get better, MacLeod said. One of the big problems out there is arrogance--Oh, I know I dont have any problems.
Even if youre unwilling or unable to invite Chaos Monkey inside your infrastructure, it can still offer a valuable lesson from safe distance. The best thing about Chaos Monkey is that its making people think about security and uptime and helping them realize that these arent optional features in a service. Theyre something that needs to be designed in from the beginning, MacLeod said. Cutting corners on reliability will lead to unexpected failures--and a bunch of unhappy customers.
Its like realizing, two-thirds of the way through building a car, that you kind of need an engine and a gas tank, and then trying to figure out where to shove those in, MacLeod added. Its a lot easier if you start out with the reliability and the security in mind.
This Dr. Dobbs virtual event,
Developer-Based Testing
, will examine developer-based testing of code in its many forms: the tools, the techniques, the best practices. Sessions led by established experts explain the subtleties of different approaches and the best practices that have worked best in the field. If you want to know more about how to test your code with less effort and better results, one or all of the days sessions are for you. It happens Aug. 2.

Last News

▸ Scan suggests Heartbleed patches may not have been successful. ◂
Discovered: 23/12/2024
Category: security

▸ IoT Devices on Average Have 25 Vulnerabilities ◂
Discovered: 23/12/2024
Category: security

▸ DHS-funded SWAMP scans code for bugs. ◂
Discovered: 23/12/2024
Category: security


Cyber Security Categories
Google Dorks Database
Exploits Vulnerability
Exploit Shellcodes

CVE List
Tools/Apps
News/Aarticles

Phishing Database
Deepfake Detection
Trends/Statistics & Live Infos



Tags:
Netflix Wants You To Adopt Chaos Monkey