availability: September 2013
I've been tracking infrastructure as code for a few years now. Over the years it has gotten closer to real code.
Close but no sigar yet.... We've come a long way but when you compare it to real languages it still feels in it's infancy. In this updated overview I gave at the ABUG, I went through:
This talk is probably the most comprehensive tool list that I've seen/made about the subject. But feel free to post and add your findings in the comments!
Note: that at the end of the presentation there are many extra links still to be sorted or slight outdated tools.
I've given previous versions of this talk at Devoxx 2012 and Jax2012. Enjoy the Jax2012 video here:
Let's face it, if you write software it's often hard to distribute it: you have the runtime , the modules you depend on and your software itself. Sure you can package that all but packages ofter require you to have root-privileges to install.
Therefore at times it's convenient to have a single file/binary distribution. Download the executable and run it. For ruby project you can convert things into a single jar using Jruby. A good example is the logstash project: download 1 file , run it and you're in business. But you'd still require the java runtime to be installed. (thanks Apple, NOT).
This is a extra of the GO language but I was looking for a similar thing for nodejs. And the following documentation is the closest I could it get: (it works!)
Enter nexe a tool to compile nodejs projects to an executable binary.
The way it works is: - it downloads the nodejs source of your choice - it creates a single file nodejs source (using sardines ) - it monkey patches the nodejs code to include this single file in the binary (adding it to the lib/nexe.js directory)
Creating a binary is as simple as:
$ nexe -i myproject.js -o myproject.bin -r 0.10.3
Many of these single packaging tools, suffer from the problem of handline native modules.
nexe doesn't handle native modules (yet).
But with a little persistance and creativity, this is what I did to add the pty.js native module directly to the nodejs binary
$ tar -xzvf node-v0.8.21.tar.gz $ cd node-v0.8.21 # Copy the native code in the src directory # If there is a header file copy/adapt it too $ cp ~/dev/terminal.js/node_modules/pty.js/src/unix/pty.cc src/node_pty.cc # Correct the export name of the module # Add the node_ prefix to the node_module name # Last line should read - NODE_MODULE(node_pty, init) # add node_pty to src/node_extensions.h (f.e. right after node_zlib) # NODE_EXT_LIST_ITEM(node_pty) # Copy the pty.js file $ cp ~/dev/pty.js/lib/pty.js lib/pty.js # Add the pty.js to the node.gyp # Somewhere in the library list add pty.js # Somewhere in the source list add node_pty.cc # Adapt the namings/bindings in lib/pty.js # 1) replace: var pty = require('../build/Release/pty.node'); # with: var binding = process.binding('pty'); # 2) replace all references to pty. to binding. $ make clean $ ./configure $ make
Now you have a custom build node in out/Release/node The filesize was about 10034856 , you can further strip it and 6971192 (6.6M)
Now you need to remove the native dependency from your package.json before you nexe build it
A single binary now makes it easy to to make a curl installer from it as it only requires you to download file. Remember the caveat of this.
And you can still package it up:
More info on the process.binding:
Convert nodejs projects to single file/beautifier:
I had a blast at Devopsdays Austin 2013 . Here's my keynote on the 'future of devops'.
My main point is that besides repeating the devops stories, we also need to seek diversity and make sure we keep adapting to situations.
The slides are available on slideshare - http://www.slideshare.net/jedi4ever/future-ofdevopsv2
While working on the Devops Cookbook with my fellow authors Gene Kim,John Willis,Mike Orzen we are gathering a lot of "devops" practices. For some time we struggled with structuring them in the book. I figured we were missing a mental model to relate the practices/stories to.
This blogpost is a first stab at providing a structure to codify devops practices. The wording, descriptions are pretty much work in progress, but I found them important enough to share to get your feedback.
As you probably know by now, there are many definitions of devops. One thing that occasionally pops up is that people want to change the name to extend it to other groups within the IT area: star-ops, dev-qa-ops, sec-ops, ... From the beginning I think people involved in the first devops thinking had the idea to expand the thought process beyond just dev and ops. (but a name bus-qa-sec-net-ops would be that catchy :).
I've started reffering to :
As rightly pointed out by Damon Edwards , devops is not about a technology , devops is about a business problem. The theory of Contraints tells us to optimize the whole and not the individual 'silos'. For me that whole is the business to customer problem , or in lean speak, the whole value chain. Bottlenecks and improvements could be happen anywhere and have a local impact on the dev and ops part of the company.
So even if your problem exists in dev or ops, or somewhere between, the optimization might need to be done in another part of the company. As a result describing pre-scriptive steps to solve the 'devops' problem (if there is such a problem) are impossible. The problems you're facing within your company could be vastly different and the solutions to your problem might have different effects/needs.
If not pre-scriptive, we can gather practices people have been doing to overcome similar situations. I've always encouraged people to share their stories so other people could learn from them. (one of the core reasons devopsdays exists) This helps in capturing practices, I'd leave it in the middle to say that they are good or best practices.
Currently a lot of the stories/practices are zooming in on areas like deployment, dev and ops collaboration, metrics etc.. (Devops Lite) . This is a natural evolution of having dev and ops in the term's name and given the background of people currently discussing the approaches. I hope that in the future this discussion expands itself to other company silos too: f.i. synergize HR and Devops(Spike Morelli) or relate our metrics to financial reporting.
Another thing to be aware of is that a system/company is continously in flux: whenever something changes to the system it can have an impact; So you can't take for granted that problems,bottle-necks will not re-emerge after some time. It needs continuous attention. That will be easier if you get closer to a steady-state, but still, devops like security is a journey, not an end state.
Let's zoom in on some of the practices that are commonly discussed: the direct field between 'dev' and 'ops'.
In most cases, 'dev' actually means 'project' and 'ops' presents 'production'. Within projects we have methodologies like (Scrum, Kanban, ...) and within operations (ITIL, Visble Ops, ...). Both parts have been extending their project methodology over the years: from the dev perspective this has lead to 'Continous Delivery' and from the Ops side ITIL was extended with Application Life Cycle (ALM). They both worked hard on optimize the individual part of the company and less on integration with other parts. Those methodologies had a hard time solving a bottleneck that outside their 'authority'. I think this where devops kicks in: it seeks the active collaboration between different silos so we can start seeing the complete system and optimize where needed, not just in individual silos.
In my mental model of devops there are four 'key' areas:
In each of these areas there will be a bi-directonal interaction between dev and ops, resulting in knowledge exchange and feedback.
Depending on where your most pressing 'current' bottleneck manifests itself, you may want to address things in different areas. There is no need to first address things in area1 than area2. Think of them as pressure points that you can stress but requiring a balanced pressure.
Area 1 and Area2 tend to be heavier on the tools side , but not strictly tools focused. Area3 and Area4 will be more related to people and cultural changes as their 'reach' is further down the chain.
When visualized in a table this gives you:
As you can see:
Note 1: these areas definitely need 'catchier' names to make them easier to remember. Note 2: Ben Rockwoods post on "The Three Aspects of Devops" lists already 3 aspects but I think the areas make it more specific
In each of these areas, we can interact at the traditional 'layers' tools, process, people:
So whenever I hear story , I try to relate it's practice to one of these areas as described above and the layer it's adressing. Practices can have an impact at different layers so I see them as 'tags' to quickly label stories. Another benefit is that whenever you look at an area, you can ask yourself what practices we can do to improve each of these layers. To have a maximum impact on each of the layers, it's clear that the approach needs to be layered in all three.
The ultimate devops tools would support the whole people and process in all of these areas, not just in Area1 (deployment) or Area2 (monitoring/metrics). Therefore a devops toolchain with different tools interacting in each of the areas makes more sense. Also the tool by itself doesn't make it a devops tool: configuration mangement systems like chef and puppet are great, but when applied in Ops only don't help our problem much. Of course Ops gets infrastructure agilitity, but it isn't until it is applied to the delivery (f.i. to create test and development environments) that it becomes 'devops'. This shows that the mindset of the person applying the tool makes it a devops tool, not the tool by itself.
Now that we have the areas and layers identified, we want to track progress as we start solving our problems and are improving things.
CMMI levels allow you to quantify the 'maturity' of your process. That addresses only one layer (although an equally important one). In a nutshell CMMI describes the different levels as:
All these levels could be applied to dev , ops or devops combined. It gives you an idea at what level process is in, while you are optimizing in an area.
An alternative way of expressing maturity levels is used by the Continuous Integration Maturity Model. It puts a set of practices in levels of maturity: (industry consensus)
Instead of focusing on the proces only , it could be applied to a set of tools, process or people practices. What people consider the most advanced would get the highest maturity level.
A practice could be anything from an anecdotal item to a systemic approach. Similar practices can be grouped into patterns to elevate them to another level. Similar to the Software Design Patterns we can start grouping devops practices in devops patterns.
Practices and patterns will rely on principles and it's these underlying principles that will guide you when and you to apply the pattern or practice. These principles can be 'borrowed' from other fields like Lean, Systems Theory etc, Human Psychology. The principles are what the agile manifesto is about for example.
Slowly we will turn the practices -> patterns -> principles .
Note: I'm wondering if there will be new principles that will emerge from from devops itself or it will be apply existing principle to a new perspective.
Below are a few example 'practices' codified in a standard template. The practices/patterns/principles are not yet very well described. The point is more that this can serve as a template to codify practices.
The idea is to list metrics/indicators that can tracked. The numbers as such might be not be too relevant but the rate of change would be. This is similar to tracking the velocity of storypoints or the tracking of mean time to recovery.
Note: I'm scared of presenting these as metrics to track, therefore I call them indicators to soften that.
Examples would be :
This is not yet fleshed out enough , I'm guessing it will be based on my research done for my Velocity 2011 Presentation (Devops Metrics)
To present progress during your 'devops' journey you can put all these things in a nice matrix, to get an overview on where you are at optimizing at the different layers and areas.
Obviously this only makes sense if you don't lie to yourself, your boss, your customers.
Jez Humble often talks about project teams evolving to product teams: largere silos will split of not by skill, but for product functionality they are delivering. Splitting teams like that, has the potential danger of creating new silos. It's obvious these product teams need to collaborate again. You should treat other product teams are external dependencies, just like other Silos. The areas of interaction will be very similar.
Also you can see the term NOOPS as working with product teams outside your company, like you rely on SAAS for certain functions. It's important not only to integrate in each of the areas on the tools layer, but also on the people and process layer. Something that is often forgotten. Automation and abstraction allows you to go faster but when things fail or even changes occur, synchronisation needs to happen.
The CAMS acronym (Culture, Automation, Measurement, Sharing) could be loosely mapped onto the areas structure:
Of course automation, measurement, culture and sharing can happen in any of the areas, but some of the areas seem to have a stronger focus on each of these parts.
Devops areas, layers and maturity levels, give us a framework to capture new practices stories and it can be used to identify areas of improvements related to the devops field. I'd love feedback on this. If anyone wants to help, I'd like to bring up a website where people can enter their stories in this structure and make it easily available for anyone to learn. I don't have too much CPU cycles left currently , but I'm happy to get this going :)
P.S. @littleidea: I do want to avoid the FSOP Cycle
It's the time of year that all conferences are gearing up. Here's a list of conferences I'm speaking or wish I was attending.
ChefConf 12 - May 15-17 : the place to be if you're anything with chef these days
Devopsdays Tokyo - May 26: Tokyo was always on my list, I can't go , bummers. Botchagalupe is winning :)
Kanban for Devops , Belgium June 18-19: initially announced that I would be there, and I was very keen on doing so. Work got in the way, so can't make. But if you can , you should! I'm sure @dominica will get your WIP (that is Work in Progress :)
Velocity - June 25-27 : the uber conference on anything on web and performance
Devopsdays MountainView - June 28-29 : this year at Google, looking forward to so much fun!
Webperfdays - June 28 : interesting unconference happening on performance. Happening at the same time as Devopsdays at Google.
Puppetconf - September 27-28 : and if you're into puppet, or config mgmt in general. A cool place to be , hope I can make it this year
Velocity Europe - October 2-4 : since the success last year, Velocity Europe strikes again: Web Performance isn't a US only concern!
Devopsdays Italy - October 6-7 : Rome, sweet rome - sun and devops - the perfect mix
AppSec USA 2012 - October 23-24 : not 100% sure on this one, but rumors go on a devops track in a security conference - sounds like fun to me.
Busy times .... but .... Fun times!
For our Atlassian Hosted Platform, we have about 10K websites we need to monitor. Those sites are monitored from a remote location to measure responsetime and availability. Each server would have about 5 sub URLs on average to check, resulting in 50K URL checks.
Currently we employ Nagios with check_http and require roughly about 14 Amazon Large Instances. While the nagios servers are not fully overloaded, we make sure that all checks would complete within a 5 minutes check cycle.
In a recent spike we investigated if we could do any optimizations to:
While looking at this, we wanted the technology to be reusable with our future idea of a fully scalable and distributed monitoring in mind (think Flapjack or the new kid on the block Sensu). But for now, we wanted to focus on the checks only.
In the first blogpost of the series we look at the integration and options within Nagios. In a second blogpost we will provide proof of concept code for running an external process (ruby based) to execute and report back to nagios. Even though Nagios isn't the most fun to work with, a lot of solutions that try to replace it, focus on replacing the checks section. But Nagios gives you more the reporting, escalation, dependency management. I'm not saying there aren't solutions out there, but we consider that to be for another phase.
The canonical way in Nagios to run a check is to execute Check_http.
F.i. to have it execute a check if confluence is working on https://somehost.atlassian.net/wiki , we would provide the options:
$ /usr/lib64/nagios/plugins/check_http -H somehost.atlassian.net -p 443 -u /wiki -f follow -S -v -t 2 HTTP OK: HTTP/1.1 200 OK - 546 bytes in 0.734 second response time |time=0.734058s;;;0.000000 size=546B;;;0
We can reduce part of the forks by using the use_large_installation_tweaks=1 setting. The benefits and caveats are explained in the docs
Nagios itself tries to be smart to schedule the checks. It tries to spread the number of service checks within the check interval you configure. More information can be found in older Nagios documentation .
Configuration options that influence the scheduling are:
Default for the inter_check_delay_method is to use smart, if we want to execute the checks as fast as possible
When one host can't cut it anymore, we have to scale eventually. Here are some solutions that live completely in the Nagios world:
Our future solution would have a similar approach to dispatching the checks command and gathering the results back over queue, but we'd like it to be less dependent on the Nagios solution and be possible to be integrated with other monitoring solutions (Think Unix Toolchain philosophy) A great example idea can be seen in the Velocityconf presentation Asynchronous Real-time Monitoring with Mcollective
So with distribution we just split our problem again in smaller problems. So let's focus again on the single host running checks problem, after all, the more checks we can run on 1 host, the less we have to distribute.
NSCA does have a few limitations:
This lead them to using NRD (Nagios Result Distributor)
"What no one tells you when you are deploy NCSA is that it send service checks in series while nagios performs service checks in parallel"
This lead him to writing A highperformance NSCA replacement involving feeding the result direct into the livestatus pipe instead of over the NSCA protocol baked into nagios On a similar note Jelle Smet has created NSCAWEb Easily submit passive host and service checks to Nagios via external commands
We would leverage the Send NSCA Ruby Gem
Why is this relevant to our solution? Without employing some of these optimizations, our bottleneck would shift from running the checks to accepting the check results.
Another solution could be run an NRPE server , and we could probably leverage some ruby logic from Metis - a ruby NRPE server
Even after the following optimizations:
we can still optimize with:
In the next blogpost we will show the results of proof of concept code involving ruby/eventmachine/jruby and various httpclient libraries.
One of the strong pillars of devops (if not the strongest) is the collaboration/communication. For the talk about Devops Metrics for Velocity 2011 I researched how to prove collaboration is a good thing: while discussing devops to people it sometimes comes to believe that it makes sense to collaborate more or that all this collaboration is overkill. I think at time I came across Design Thinking and read how it evolved from 1 person doing the design to listening to user requirements to participatory design. In the book Design Thinking - Understanding Designers Think Nigel Cross writes that design used to be collaborative thing (like guilds trying to push their craft forward).
One of the concepts introduced was the symmetry of ignorance PDF
Complex design problems require more knowledge than any one single person can possess, and the knowledge relevant to a problem is often distributed and controversial. Rather than being a limiting factor, “symmetry of ignorance” can provide the foundation for social creativity. Bringing different points of view together and trying to create a shared understanding among all stakeholders can lead to new insights, new ideas, and new artifacts. Social creativity can be supported by new media that allow owners of problems to contribute to framing and solving these problems. These new media need to be designed from a meta-design perspective by creating environments in which stakeholders can act as designers and be more than consumers.
Sounds like systems thinking and reminded me of the knowledge divide within the devops problem space. When you spend time with each group/silo individually they would of think themselves superior to the other group: "ha those devs they don't know anything about the systems, ha those ops don't anything about coding". So it seems more about the symmetry of arrogance . That arrogance symmetry reminded "We judge others by their behavior, we judge ourselves by our intentions". We might think we know more/can do better, but that often not visible in our actions.
This kind of got me intrigued and I wanted to explore the subject more for the next Cutter Summit 2012.
Part of the designing thinking and this symmetry of ignorance is related to the concept of wicked problems
Rittel and Webber's (1973) formulation of wicked problems specifies ten characteristics:
I'll let you judge if you think devops (or even monitoring sucks :) is a wicked problem
More readings to explore:
The whole discission on what is a wicked problem or not reminded me of a talk by Dave Snowden. He helped creating the Cynefin model.
The Cynefin framework has five domains.The first four domains are:
Note this a sense making framework, not a ordering framework: it's not always that exact to put your problems in each of the spaces, but it gets you thinking about which solutions to apply to which problems. And it fits in nicely with other frameworks as explained in A Tour of Adoption and Transformation models
So devops in my opinion, falls into the complex problem space.
A great video explaining it was recorded at the ALE 2011:
He explains many things, but here a few things that resonated with me:
that last point reminded me of the Debt Metaphor - Ward Cunningham. @littleidea explained that Ward was using a different concept for Technical Debt that most people use: he explains technical debt as the difference between the implementation and the ideal implementation on hinsight. Not because of bad implementation, or deliberate shortcuts, but because of new insights gathered during the discovery/problem solving process.
More research can be found at:
The fact that problems don't always stay/match one of the locations on the diagram is greatly visualized by adding dimensions to the diagram (a thing that got lost in the initial publication)
To tackle complex problems he suggests using three principles of complexity based management:
This could result in the Resilient Organisation
Because in complex systems it's hard to predict the exact behavior, Dave Snowden also talks about going From Robustness to Resiliance. It almost sounded like the difference between MTBF and MTTR like John Allspaw explains in Outages Post-Mortems and Human Error 101.
I came across those articles but never put them into the light of the Snowden perspective. More to explore so.
The final document I'd like to highlight is about Reducing the impact of Organisational Silos on Resilience.
Stone quotes five questions suggested by Angela Drummond (a practitioner in the area of silo breaking and organisational change) to help executives identify and overcome silos.
Quoting from the article:
Resilience cannot be achieved in isolation of other units and organisations. In summary, there is a need to recognise:
Leadership is the key to bringing these elements together. Leadership is needed to reduce and mitigate risks before crises occur.
It was fascinating to read the collaboration and resilience go hand in hand. And breaking the silos is really a must there and requires collaboration. Also the inter-company silos fits in nicely with The Agile Executive - A new Context for Agile presentation on how we come to rely on external services in a SAAS model and this will be another silo to tackle.
This is all research in progress, but it's exciting to see a lot of different concepts fit in nicely. I apologize that this isn't yet a complete polished train of thought, but it might be useful to explore more on the subject.