Just Enough Developed Infrastructure

Monitoring Wonderland Survey - Visualization

(2012-01-04) - Comments

A picture tells more than a ...

Now that you've collected all the metrics you wanted or even more , it's time to make them useful by visualizing them. Every respecting metrics tool provides a visualization of the data collected. Older tools tended to revolve around creating RRD graphics from the data. Newer application are leveraging javascript or flash frameworks to have the data updated in realtime and rendered by the browser. People are exploring new ways of visualizing large amounts of data efficiently. A good example is Visualizing Device Utilization by Brendan Gregg. or Multi User - Realtime heatmap using Nodejs

Several interesting books have been written about visualization:

Dashboard written for specific metric tools

Graphite

Graphs are Graphite's killer feature, but there's always room for improvement:

Grockets - Realtime streaming graphite data via socket.io and node.js

Opentsdb

Graphs in Opentsdb are based on Gnuplot

Ganglia

Collectd

Nagios

Nagios also has a way to visualize metrics in it's UI

Overall integration

With all these different systems creating graphs, the nice folks from Etsy have provided a way to navigate the different systems easily via their dashboard - https://github.com/etsy/dashboard

I also like the Idea of Embeddable Graphs as http://explainum.com implements it

Development frameworks for visualization

Generic data visualization

There are many javascript graphing libraries. Depending on your need on how to visualize things, they provide you with different options. The first list is more a generic graphic library list

Time related libraries

To plot things many people now use:

For timeseries/timelines these libraries are useful:

And why not have Javascript generate/read some RRD graphs :

Annotations of events in timeseries:

On your graphs you often want event annotated. This could range from plotting new puppet runs , tracking your releases to everything that you do in the proces of managing your servers. This is what John Allspaw calls Ops-Metametrics

These events are usually marked as vertical lines.

Dependencies graphs

One thing I was wondering is that with all the metrics we store in these tools, we store the relationships between them in our head. I researched for tools that would link metrics or describe a dependency graph between them for navigation.

We could use Depgraph - Ruby library to create dependencies - based n graphviz to draw a dependency tree, but we obviously first have to define it. Something similar to the Nagios dependency model (without the strict host/service relationship of course)

Conclusion

With all the libraries to get data in and out and the power of javascript graphing libraries we should be able to create awesome visualizations of our metrics. This inspired me and @lusis to start thinking about creating a book on Metrics/Monitoring graphing patterns. Who knows ...


Monitoring Wonderland Survey - Moving up the stack Application and User metrics

(2012-01-04) - Comments

While all the previously described metric systems have easy protocols, they tend to stay in Sysadmin/Operations land. But you should not stop there. There is a lot more to track than CPU,Memory and Disk metrics. This blogpost is about metrics up the stack: at the Application Middleware, Application and the User Usage.

To the cloud

Application Metrics

Maybe grumpy sysadmins have scared the developers and business to the cloud. It seems that the space of Application metrics, whether it's Ruby, Java , PHP is being ruled today by New Relic In a blogpost New Relic describes serving about 20 Billion Metrics A day.

It allows for easy instrumentation of ruby apps, but they also have support for PHP, Java, .NET, and Python

Part of their secret of success is the easy at how developers can get metrics from their application by adding a few files, and a token.

Several other cloud monitoring vendors are stepping into arena, and I really hope to see them grow the space and give some competition:

Some other complementary services, popular amongst developers are:

Check this blogpost on Monitoring Reporting Signal, Pingdom, Proby, Graphite, Monit , Pagerduty, Airbrake to see how they make a powerful team.

User tracking Metrics - Cloud

Clicks, Page view etc ...

Besides the application metrics, there is one other major player in web metrics. Google Analytics

I found several tools to get data out of it using the Google Analytics API

With google Analytics there is always a delay on getting your data;

If you want to have realtime statistics/metrics checkout Gaug.es http://get.gaug.es :

A/B Testing

Haven't really gotten into this, but well worth exploring getting metrics out of A/B testing

Page render time

Another important to track is the page render time. This is well explained in the Real User Monitoring- Chapter 10 - Complete Web Monitoring - O'Reilly Media

Again Newrelic provides RUM : Real User Monitoring. See How we provide real user monitoring: A quick technical review for more technical info

Who needs a cloud anyway

Putting your metrics into the cloud can be very convenient , but it has downsides:

  • most tools don't have way to redirect/replicate the metrics they collect internally
  • that makes it hard to correlate with your internal metrics
  • it's easy to get metrics in, but hard to get the full/raw data out again
  • it depends on the internet , duh, and sometimes this fails :)
  • or privacy or the volume of metrics just isn't possible to put it out in the cloud

Application Metrics - Non - Cloud

In his epic Metrics Anywhere, Codahale explains the importance of instrumenting your code with metrics. This looks very promising as this is really driven from the developers world:

Java

Or you can always use JMX to monitor/metrics from your application

And with JMX-trans http://code.google.com/p/jmxtrans you can feed jmx information into Graphite, Ganglia, Cacti/Rrdtool,

Other

Esty style: StatsD

To collect various metrics, Etsy has created StatD https://github.com/etsy/statsd a network daemon for aggregating statistics (counters and timers), rolling them up, then sending them to graphite.

There have been written clients in many languages php, java, ruby etc..

Other companies have been raving about the benefits of StatsD and for example Shopify has completely integrated it in their environment

It's incredible to see the power and simplicity of this; I've created a simple Proof of Concept to extract the statsd metrics on ZeroMQ in this experimental fork

MetricsD https://github.com/tritonrc/metricsd tries to marry both Etsy's statsD and the Coda Hale / Yammer's Metrics Library for the JVM and puts the data into Graphite. It should be drop-in compatible with Etsy's statsd, although with added explicit support for meters (with the m type) and gauges (with the g type) and introduce the h (histogram) type as an alias for timers (ms).

User tracking - Non Cloud

Clicks, Page view etc ...

Here are some Open Source Web Analytics libraries. These are merely links, haven't investigated it enough, work in progress

Another tool worth mentioning for tracking endusers is HummingBird - http://hummingbirdstats.com/ . It is NodeJS based an allows for realtime web traffic visualization. To send metrics is has a very simple UDP protocol.

A/B Testing

At Arrrrcamp I saw a great presentation on A/B Testing by Andrew Nesbitt(@teabass. Do watch the video to get inspired!

He pointed out several A/B testing frameworks:

And presented his own A/B Testing framework: Split - http://github.com/andrew/split

It would be interesting to integrate this further into traditional Monitoring/Metrics tools. View metrics per new version/enabled flags etc... In a Nutshell food for thought.

Page render time

For checking the page render time, I could not really found Open Source Alternatives.

There is a page by Steve Sounders about Episodes http://stevesouders.com/episodes/paper.php. Or you can track your Apache logs with Mod Log I/O

Conclusion

It's exciting to see the cross over between both development, operations and business. Up until now only New Relic has a very well integrated suite for all metrics. Hope the internal solutions catch up.

Now that we have all that data, it's time to talk about dashboards and visualization. On to the next blogpost.

If you are using other tools, have ideas, feel free to add them in the comments.


Monitoring Wonderland Survey - Nagios the Mighty Beast

(2012-01-03) - Comments

Controlling the tool everybody hates, but still uses

This blog post mainly contains my findings on getting data in and out of Nagios. That data can be status information, performance information and notifications. At the end there are some pointers on ruby integration with Pingdom and Jira

The idea is similar to my previous blogposting Monitoring Wonderland Survey - Metrics - API - Gateways: I want to share/open up this data for others to consume, preferably on a bus like system and using events instead of polling.

Nagios - IN

Writing Checks in Ruby

If you want to get data into Nagios, you have to write a check. These are some options for doing this in ruby:

Projects that link testing and monitoring:

Transporting check results

Nagios has many ways to collect the results of these checks:

You can test NRPE with the standalone NRPE runner

And maybe schedule the Nagios NRPE checks with Rundeck

If you don't like the spawning of separate ruby processes for each check, you can leverage Metis:https://github.com/krobertson/metis

Transport over a bus system

Instead of using the traditional provided interfaces, people are starting to send the check information over a bus for further handling:

Look ma, no Nagios Server needed

Some people have taken an alternative approach, re-using the checks libraries but reusing them in their own framework.

Nagios - OUT

Reading Status

As there is no official API to extract status information from Nagios, people have been implementing various ways of getting to the data:

Scraping the UI

Well if we really have to ...

Parsing status.dat file

All status information from Nagios is stored in the .dat file, so several people have started writing parsers for it, and exposing it as an API

Nagios-Dashboard parses the nagios status.dat file & sends the current status to clients via an HTML5 WebSocket. The dashboard monitors the status.dat file for changes, any modifications trigger client updates (push). Nagios-Dashboard queries a Chef server or Opscode platform organization for additional host information.

Parsing the log files

Using Checkmklivestatus

A better option to get adhoc status is to query Nagios via CheckMK_Livestatus http://mathias-kettner.de/checkmk_livestatus.html It is a Nagios Event Broker that hooks directly into the Nagios Core, allowing it direct acces to all structures and commands NEB's are very powerfull, and for more information look a the Nagios book - event broker section

Tools that use this API :

Quering the database/NDO

An alternative NEB handler is NDO Utils, NDO2DB. It stores all the information into a database. Or on using NDO2FS - NDO in Json or filesystem on a filesystem.

Hooking into performancehandler

RI Pienaar shows us how to hook into a process-service-perfdata handler and logs that information to a file:

The advantage is that we can get the information evented instead of having to poll the status of information. In other words ready to be put on message bus for others to read.

Listening in to events with NEB/Message queue

In order to get the events as fast as possible, I looked into using a NEB to put information on a message queue directly.

I found the following sample code:

Marius Sturm had Nagios-ZMQ https://github.com/mariussturm/nagios-zmq that allowed to get the events directly on the queue. I extended to not only read the check results or performance data, but also the notifications.

It seems Icinga is taking a similar approach with the Icinga - ZMQ - icingamq. This to enable High performance Large Scale Monitoring

An interesting difference is that is will also expose the CheckMklivestatus API directly over ZeroMQ

Adding Hosts dynamically

A bit of side track, but one of the things a lot of people struggle with is dynamically adding hosts/servers to Nagios , without restarting it. The following are links that kind of try to solve this problem, but none solves it completely. It seems most people solve this by some interaction with a Configuration Management system and a system inventory.

To read the config and write the configs, people have writing various parsers:

The reload problem doesn't look like an easy one to solve: one could create NEB that manipulates the memory host/service structures but it will also need to persist that on disk. If anyone has a good solution, please let us know!

Notification handling

There a lot more problems with Nagios, but people still use it's notification and acknowledgement system. Some interesting things I found:

Pingdom

If pingdom is your game, here are some API to information to Pingdom, and read the status

I could not find a way to make this evented , we'll have to create

Jira Notificiation

I found 4 libraries to interact with Jira - from ruby:

Conclusion:

  • We can get a long way to automate getting data in and out of Nagios
  • Exposing the API through the Livestatus works really well
  • Using the NEB Nagios-ZMQ will allow us to get the information in an evented way
  • Adding hosts dynamically still seems to be an issue

By listening in on the events over a queue, we could create a self-servicing for nagios events similar to Tattle, which does the same for Graphite:

Next blogpost we'll move up the stack a bit and start investigating options for application and enduser usage metrics.


Monitoring Wonderland Survey - Metrics - API - Gateways

(2012-01-03) - Comments

Update 4/01/2012: added ways to add metrics via logs, java pickle graphite feeder

One tool to rule them all? Not.

If you are working within an enterprise , chances are that you have different metric systems in place: You might have some Cacti, Ganglia, Collectd, etc... due to historical reasons, different departments,

This reminded me of the situation while I was working in Identity Management: you might have an LDAP, Active Directory, local HR database etc. There would be plans and discussions of using one over the other, and gateways would need to be written. I learned a few lessons there:

  1. have as few sources/stores of information as possible
  2. DON't try to chase the one tool to rule them all, aka don't use a tool for something it's not made for
  3. make it self-servicing to user and automate processes

1 to 1 gateways

Take the new Metrics hotness Graphite as an example, it has some nice graphing advantages over other tools . So people wonder , should I migrate my Ganglia, Collectd to Graphite? Graphite doesn't come with elaborate collection scripts for memory/disk/etc ... , so we have to rely on other tools like Cacti,Munin,Collectd,Ganglia to first collect the data.

So we start writing gateways to get data into Graphite:

But what happens if we also use Opentsdb for storing long term data ? We have to re-implement those gateways:

Issue 1 : Effort duplication

This just seems like a waste of energy implementing the protocol in every tool.This sure isn't the first time this happens in history: the same thing happened for Collectd -> Ganglia Plugin

If you look at the data that is transmitted it is actually pretty much the same:

a metric name, value, timestamp, optionally hostname, some metadata tags

So we could easily envision a 'universal' format that would be used to translate from and to.

Ganglia  <-> Intermediate format <-> Graphite
Collectd <-> Intermediate format <-> Opentsdb

With this intermedia format, we would only have to write one end of the equation once.

I started thinking of this like an ffmpeg for monitoring

Issue 2: Difficult to hook in additional listeners

Let's add another system that wants to listen into the metrics, something like Esper, Nagios alerting, some Dataware house tools etc... We could reuse the libraries from end to the other, but we'll have to add more gateways and put these in place everytime.

A better approach would be to use a message bus approach: every tools puts and listens on a bus and gets the data it needed. RI Pienaar has written about this approach extensively in his Series on Common Messaging Patterns. Aso John Bergmans has a great post on using AMQP and Websockets to get realtime graphics.

Some of the tools already have Message queue integrations, but there seems to be a common intermediate format missing

As a proof of concept I've created :

Building blocks

In this section I'll look for API's (ruby oriented) to get data in and out of the different metrics systems:

Graphite - IN

Sending metrics from ruby to Graphite:

These both implement the Simple Protocol, but for high performance we'd like to use the batching facility through the Pickle Format. I could not find a Pickle gem for ruby, but his could work through Ruby-Python gateway http://rubypython.rubyforge.org/.

Faster - a Java Netty based graphite relay takes the same approach https://github.com/markchadwick/graphite-relay

Another way to get your data into graphite is using Etsy's Logster https://github.com/etsy/logster

Mike Brittain greatly explains it's use in Take my logs... Please! - A velocity Online Conference SessionVideoPDF

Graphite - OUT

To get all the data out of Graphite is impossible through the standard API. You get a graph out as Raw data, but that hardly counts.

The best option seems to be to listen in to the graphite - udp receiver and duplicate the information onto a message bus.

An alternative might be to directly read from the Whisper storage, inspiration for that can be found in:

Opentsdb - IN

I could not find any ruby gem that implements the Opentsdb protocol for sending data, but creating one should be trivial. Opentsdb just use a plain TCP socket to get the data in

Opentsdb - OUT

Getting data out of Opentsdb suffers the same problem as Graphite: you can do queries on specific graph data

But you can't get it out, maybe if you directly interface with the Hbase/Java API. So again the best bet is to create a listener/proxy for the simple TCP protocol.

Ganglia - IN

Sending metrics to Ganglia is easy using the gmetric shell command. Early days code describing this can still be found at http://code.google.com/p/embeddedgmetric/

Igrigorik has written up nicely on how to use the Gmetric Ruby gem to send metrics

If you want to feed in log files into ganglia Logtailer might be your thing https://bitbucket.org/maplebed/ganglia-logtailer

Ganglia - OUT

Vladimir describes the options while he explains on how to get Ganglia data to graphite

Option 1 is to poll the Gmond over TCP and get the XML from it's current data:

Options 2 is to listen into the UDP protocol as a additional receiver.

I implemented both approaches in the https://github.com/jedi4ever/gmond-zmq

Note: As a side effect I found that the metrics send to the UDP are actualy more acurate then the values when you query the XML.

Collectd - IN

So send metrics to Collectd, you can use ruby gem from Astro that implements most of the UDP protocol

Collectd - OUT

I give Collectd for the price of best output.

It currently implements different writers:

  • Network plugin
  • UnixSock plugin
  • Carbon plugin
  • CSV
  • RRDCacheD
  • RRDtool
  • Write HTTP plugin

And the deactived ZeroMQ - https://github.com/deactivated/collectd-write-zmq

The Binary Protocol http://collectd.org/wiki/index.php/Binary_protocol is pretty simple to listen into.

Munin

If you happen to use Munin, here's some inspiration, but I haven't researched it much

Circonus

If you happen to use Circonus, here's some inspiration, but I haven't researched it much

RRD interaction from ruby

For those who want to read and write directly from RRD's in ruby, please have fun:

Alert on metrics:

With all the tools in and out, and a unified intermediate format, it will be trivial to rewrite the traditional alert check tools to listen into the bus for values. This means you can listen into for your Nagios, your ticket system, your pager system etc.. from the same source.

Graphite

Opentsdb

Ganglia

New Relic

https://github.com/kogent/check_newrelic

Conclusion

It should be feasible to create an intermediate format and reuse some of these libraries to implement both IN and OUT functionality. Why not create a Fog for monitoring information? Like implements metric receive, send,

Next stop Nagios because it deserves a blogpost on it's own ...


Monitoring Wonderland Survey - Introduction

(2012-01-03) - Comments

Introduction

While Automation is great to get you going and doing things faster and reproducible, Monitoring/Metrics are probably more valuable for learning and getting feedback from what's really going on. Matthias Meyer describes it as the virtues of monitoring. Nothing new, if you have been listening to John Allspaw on Metrics Driven Engineering (pdf), essentially putting the science back in IT as Adam Fletcher noted at the Boston devopsdays openspace session on What does a sysadmin look like in 10 years

Eager to help

Over the years I've done my fair share of monitoring setups, but the last years I was more focused on Automation. I would automate the hell out of any monitoring system the customer had. But after a while, this felt like standing on the sideline too much for me. This feeling got amplified by the Monitoring Sucks initiative of John Vincent: an initiative to improve the field where we can. The initiative has already spun some very good blogpost and one of the first blogposts monitoring sucks watch your language where they try to create a common vocabulary , reminded me a lot of the early 'what is' devops postings. So after Jason Dixon said, Monitoring Sucks, Do something about It , I decided to widen my focus again from automation to monitoring. And I found a great partner in Atlassian.

I'm certainly not the first person to do this, but I'm eager to help in the space. People like RI Pienaar have done some amazing ground work thinking about Monitoring Frameworks and making them Composable Architectures. One of the exiting areas, I'd like to focus on , is trying to make monitoring/metrics as easy as 'monitoring up' for developers and bring the traditionally operational tools in development land to better understand their application. We learned from configuration management that having common tools and a common language greatly helps overcome the devops divide.

Before jumping in the space, we decided to research the existing space extensively with its problems and solutions. This blogpost series is a summmary of these finding and will therefore will contain a lot of links.

Non technical reading

This series of blogposts is tools focused, not monitoring approach oriented, more on that in later posts, but for now I'll refer you to :

Note:

  • You will find that some tools were more predominantly researched, that's because the research was done from the perspective of Atlassian's current and future metrics/monitoring environment.
  • Also you will notice a slant towards ruby libraries, that's mainly because I feel most productive in it and I'm thinking integration with chef/puppet/fog/vagrant etc.
  • the main focus will be on Open Source Solutions, where available and commercially wherever there is a gap.

Meet the players

For people new in the field, I'd like to give a quick overview on the current players in the field , together with their official links and where possible links to books available:

A good actual overview can be found in the presentation of Jason Dixon's Trending with Purpose and Joshua Barratt - Getting more signal from your noisePDF I especially liked his approach to look at these tools from the Collect - Transport - Process - Store - Present perspectives.

Metrics

In the 'old' days, people first focused on the collect and transport problem. The standard for timeseries Storage was RRD Round Robin Database, and people would choose their metrics tools based on the collection scripts that were available. (Similar to how people choose cloud or config management it seems)

As the number of servers started to grow, people wanted to have a scalable way of collecting ,aggregating and transporting the data.

Even with the help of RRD cache, the storage of all these metrics was becoming the new bottleneck, so alternatives had be found. So Graphite introduced Whisper and Opentsdb decided to build on top of Hadoop And as the volume of data was increasing, it was begging for a self servicing way for visualization of the data.

Alerting, notification, availability

All these metric tools kind of ignore the alerting, notification and acknowlegement and rely on the real monitoring systems. So you need to complement them with some warning system like the following:

Note that most of them are suffering from the scaling perspective and flexibility and graphical overview.

Beyond servers , to applications , to business

Now that we have gotten better at monitoring and metrics of servers, we are seeing better integration with application and business metrics:

The next blogposts will contain more meat of tools surrounding, enhancing, bypassing these 'traditional players'. Stay tuned...


Markdown to Confluence Convertor

(2011-12-15) - Comments

Recently in Confluence 4.0 the Wiki Markup Editor was removed for various engineering reasons. I like to type my text in wiki style, and most of all using Markdown.

This code is a quick hack for converting markdown to Atlassian confluence markup language. Which you can still insert via the menu.

It's not a 100% full conversion, but I find it rather usuable already. I will continue to improve where possible.

The gem is based on Kramdown

Installation:

Via gem

$ gem install markdown2confluence

From github:

$ gem install bundler
$ git clone git://github.com/jedi4ever/markdown2confluence.git
$ bundle install vendor

Usage:

If using Gem:

$ markdown2confluence <inputfile>

If using bundler:

$ bundle exec bin/markdown2confluence <inputfile>

Extending/Improving it:

there is really one class to edit

  • see lib/markdown2confluence/convertor/confluence.rb Feel free to enhance or improve tag handling.

Behavioral testing with Vagrant - Take 2

(2011-12-15) - Comments

A big thanks to Atlassian for allowing me to post this series!!

Running tests from within the VM

After I covered Puppet Unit Testing, the logical step is writing about Behavioral testing.

While writing this , I can up with a good example of why BDD needs to complement your Unit tests: I have installed the Apache Puppet Module, and all provision ran ok. I wasn't until I tested the webpage with lynx http://localhost that I understood I needed to create a default website. This is of course a trivial example, but I shows you that BDD can help you in testing logical errors.

When this topic arises, most people are familiar with Cucumber Nagios. It contains a series of Cucumber steps that allow you to test http request, amqp, dns, ssh, command.

From what I found, most people would execute these test on the VMs directly. This requires you to install cucumber and all of it's dependent gems in the VM. Gareth RushGrove wrote a great blogpost on packaging cucumber-nagios with fpm

Running tests from outside the VM - Take 1

In some situations, the required gems, libraries might lead to conflicts or introduce dependencies you would rather not have on your production machine. And they would become another point to maintenance in your production machines.

So in a previous blogpost Vagrant Testing,Testing One Two , I already described using modified Cucumber-Nagios steps that interact with Vagrant over ssh.

Running tests from outside the VM - Take 2

But I had a problem with the previous approach. Depending on the situation I would need to run the same tests via different connection methods: vagrant uses ssh, ec2 via fog, openvz via vzctl etc...

So I came up with a new flexible approach: use a configurable command to connect to a vm and have it execute the same steps.

With a little Aruba help

While Cucumber-Nagios slowly moves into Cuken, the SSH steps are getting converted Aruba steps for local exection. And in combination to the ssh-forever steps for ssh interaction.

The Aruba gem is a set of CLI Steps for Cucumber. You can use it to interactively interact with a process or just do a run. Example steps could look like:

Given I run "ssh localhost -p 2222" interactively
And I type "apache2ctl configtest"
And the exit status should be 0

Making it connection neutral

As you can see in the previous step, there is still the connection in the Feature. Not great if we want to run it local. I rephrased it to:

Feature: apache check

  Scenario: see if the apache header is served
    Given I execute `lynx http://localhost --dump` on a running system
    Then the output should match /It works/
    Then the exit status should be 0

  Scenario: check if the apache config is valid
    Given I execute `apache2ctl configtest` on a running system
    Then the exit status should be 0

Writing the logic

Here is the logic to make this work (put it in features/support/step_definitions/remote_system_connect_steps.rb . It uses two environment variables:

SYSTEM_EXECUTE: the command to execute just one command
SYSTEM_CONNECT: the command to connect to the system

Example for vagrant would be:

SYSTEM_EXECUTE: "vagrant ssh_config | ssh -q -F /dev/stdin default"
SYSTEM_CONNECT: "vagrant ssh"

This can be also your favorite knife ssh, vzctl 33 enter, mc-ssh somehost


When /^I execute `([^`]*)` on a running system$/ do |cmd|
  @execute_command=ENV['SYSTEM_EXECUTE']
  @connect_failed=false
  unless @execute_command.nil?
    steps %Q{ When I run `#{@execute_command} "#{cmd}"` }
  else
    @execute_failed=true
    raise "No SYSTEM_EXECUTE environment variable specified"
  end
end

When /^I connect to a running system interactively$/ do
  @connect_command=ENV['SYSTEM_CONNECT']
  @connect_failed=false
  unless @connect_command.nil?
    steps %Q{
        When I run `#{@connect_command}` interactively
    }
  else
    @connect_failed=true
    raise "No SYSTEM_COMMAND environment variable specified"
  end
end

When /^I disconnect$/ do
  steps %Q{ When I type "exit $?" }
end

Monkey Patching Aruba

By default, Aruba uses shellwords to parse the commandlines you pass, it seems to have an issue with "|" symbols. This is the patch I came up with: (in features/support/env.rb)

require 'aruba/cucumber'
require 'shellwords'

# Here we monkey patch Aruba to work with pipe commands
module Aruba
  class Process
    include Shellwords

    def initialize(cmd, exit_timeout, io_wait)
      @exit_timeout = exit_timeout
      @io_wait = io_wait

      @out = Tempfile.new("aruba-out")
      @err = Tempfile.new("aruba-err")
      @process = ChildProcess.build(cmd)
      @process.io.stdout = @out
      @process.io.stderr = @err
      @process.duplex = true
    end
  end
end

After this a regular cucumber run, should work (Note: use a recent cucumber version 1.1.x)

Automating it with Rake

The last part is automating this for Vagrant. For this we create a little rake task:

require "cucumber/rake/task"
task :default => ["validate"]

# Usage rake validate
# - single vm: rake validate
# - multi vm: rake validate vm=logger
Cucumber::Rake::Task.new(:validate) do |task|
    # VM needs to be running already
    vm_name=ENV['vm'] || ""
    ssh_name=ENV['vm'] || "default"
    ENV['SYSTEM_CONNECT']="vagrant ssh #{vm_name}"
    ENV['SYSTEM_EXECUTE']="vagrant ssh_config #{vm_name}| ssh -q -F /dev/stdin #{ssh_name}"
    task.cucumber_opts = ["-s","-c", "features" ]
end

Final words

The solution allows you to reuse the command execution steps, for running them locally, over ssh, or some other connection command.

  • This only works for commands that run over ssh, but I think it is already powerfull to do this. If would require amqp testing, you could probably find a command check as well.
  • Shell escaping is not 100% correct, this needs more work to work with the special characters or quotes inside quotes.
  • When testing, I sometimes miss the context of how a server is created (f.i. the params passed to the puppet manifest or the facts), maybe I could this in a puppet manifests. Not sure on this
  • If there is an interest, I could turn this into a vagrant plugin, to make it really easy.

All code can be found at the demo project: https://github.com/jedi4ever/vagrant-guard-demo


Test Driven Infrastructure with Vagrant, Puppet and Guard

(2011-12-13) - Comments

This is a repost of my SysAdvent blogpost. It's merely here for archival purposes, or for people who read my blog but didn't see the sysadvent blogpost.


Why

Lots has been written about Vagrant. It simply is a great tool: people use it as a sandbox environment to develop their Chef recipes or Puppet manifests in a safe environment.

The workflow usually looks like this:

  • you create a vagrant vm
  • share some puppet/chef files via a shared directory
  • edit some files locally
  • run a vagrant provision to see if this works
  • and if you are happy with it, commit it to your favorite version control repository

Specifically for puppet, thanks to the great work by Nikolay Sturm and Tim Sharpe, we can now also complement this with tests written in rspec-puppet and cucumber-puppet. You can find more info at Puppet unit testing like a pro.

So we got code, and we got tests, what else are we missing? Automation of this process: it's funny if you think of it that we automate the hell out of server installations, but haven't automated the previous described process.

The need to run vagrant provision or rake rspec actually breaks my development flow: I have to leave my editor to run a shell command and then come back to it depending on the output.

Would it not be great if we could automate this whole cycle? And have it run tests and provision whenever files change?

How

The first tool I came across is autotest: it allows one to automatically re-execute tests depending on filesystem changes. Downside is that it could either run cucumber tests or rspec tests.

So enter Guard; it describes itself as a command line tool to easily handle events on file system modifications (FSEvent / Inotify / Polling support). Just what we wanted!

Installing Guard is pretty easy, you require the following gems in your Gemfile

gem 'guard'
gem 'rb-inotify', :require => false
gem 'rb-fsevent', :require => false
gem 'rb-fchange', :require => false
gem 'growl', :require => false
gem 'libnotify', :require => false

As you can tell by the names, it uses different strategies to detect changes in your directories. It uses growl (if correctly setup) on Mac OS X and libnotify on Linux to notify you if your tests pass or fail. Once installed you get a command guard.

Guard uses a configuration file Guardfile, which can be created by guard init. In this file you define different guards based on different helpers: for example there is guard-rspec, guard-cucumber and many more. There is even a guard-puppet(which we will not use because it works only for local provisioning)

To install one of these helpers you just include it in your Gemfile. We are using only two here:

gem 'guard-rspec'
gem 'guard-cucumber'

Each of these helpers has a similar way of configuring themselves inside a Guardfile. A vanilla guard for a ruby gem with rspec testing would look like this:

guard 'rspec' do
  watch(%r{^spec/.+_spec\.rb$})
  watch(%r{^lib/(.+)\.rb$})     { |m| "spec/lib/#{m[1]}_spec.rb" }
  watch('spec/spec_helper.rb')  { "spec" }
end

Whenever a file that matches a watch expression changes, it would run an rspec test. By default if no block is supplied, the file itself is run. You can alter the path in a block as in the example.

Once you have a Guardfile you simply run guard (or bundle exec guard) to have it watch changes. Simple hu?

What

Vagrant setup

Enter our sample puppet/vagrant project. You can find the full source at http://github.com/jedi4ever/vagrant-guard-demo It's a typical vagrant project with the following tree structure:(only 3 levels shown)

├── Gemfile
├── Gemfile.lock
├── Guardfile
├── README.markdown
├── Vagrantfile
├── definitions # Veewee definitions
│   └── lucid64
│       ├── definition.rb
│       ├── postinstall.sh
│       └── preseed.cfg
├── iso # Veewee iso
│   └── ubuntu-10.04.3-server-amd64.iso
└── vendor
    └── ruby
        └── 1.8

Puppet setup

The project follows Jordan Sissel's idea of puppet nodeless configuration. To specify the classes to apply to a host, we use a fact called: server_role. We read this from a file data/etc/server_tags via a custom fact (inspired by self-classifying puppet node).

This allows us to only require one file, site.pp. And we don't have to fiddle with our hostname to get the correct role. Also if we want to test multiple roles on this one test machine, just add another role to the data/etc/server_tags file.

├── data
│   └── etc
│       └── server_tags

$ cat data/etc/server_tags
role:webserver=true

The puppet modules and manifests can be found in puppet-repo. It has class role::webserver which includes class apache.

puppet-repo
├── features # This is where the cucucumber-puppet catalog policy feature lives
│   ├── catalog_policy.feature
│   ├── steps
│   │   ├── catalog_policy.rb
│   └── support
│       ├── hooks.rb
│       └── world.rb
├── manifests
│   └── site.pp #No nodes required
└── modules
    ├── apache
    |    <module content>
    ├── role
    │   ├── manifests
    │   │   └── webserver.pp # Corresponds with the role specified
    │   └── rspec
    │       ├── classes
    │       └── spec_helper.rb
    └── truth # Logic of puppet nodeless configuration
        ├── lib
        │   ├── facter
        │   └── puppet
        └── manifests
            └── enforcer.pp

Puppet - Vagrant setup

These are the settings we use in our Vagrant file to make puppet work:

config.vm.share_folder "v-data", "/data", File.join(File.dirname(__FILE__), "data")
# Enable provisioning with Puppet stand alone.  Puppet manifests
# are contained in a directory path relative to this Vagrantfile.
config.vm.provision :puppet, :options => "--verbose"  do |puppet|
  puppet.module_path = ["puppet-repo/modules"]
  puppet.manifests_path = "puppet-repo/manifests"
  puppet.manifest_file  = "site.pp"
end

Puppet tests setup

The cucumber-puppet tests will check if the catalog compiles for role role::webserver

Feature: Catalog policy
  In order to ensure basic correctness
  I want all catalogs to obey my policy

  Scenario Outline: Generic policy for all server roles
    Given a node with role "<server_role>"
    When I compile its catalog
    Then compilation should succeed
    And all resource dependencies should resolve

    Examples:
      | server_role |
      | role::webserver |

The rspec-puppet tests will check if the package http gets installed

require "#{File.join(File.dirname(__FILE__),'..','spec_helper')}"
describe 'role::webserver', :type => :class do
  let(:facts) {{:server_tags => 'role:webserver=true',
      :operatingsystem => 'Ubuntu'}}
  it { should include_class('apache') }
  it { should contain_package('httpd').with_ensure('present') }
end

Guard setup

To make Guard work with a setup like our puppet-repo directory we need to change some things. This has mostly to do with conventions used in development projects where Guard is normally used.

Fixing Guard-Cucumber to read from puppetrepo/features

The first problem is that the Guard-Cucumber gem standard reads it's features from features directory. This is actually hardcoded in the gem. But nothing a little monkey patching can't solve:

require 'guard/cucumber'

# Inline extending the ::Guard::Cucumber
# Because by default it only looks in the ['features'] directory
# We have it in ['puppet-repo/features']
module ::Guard
  class ExtendedCucumber < ::Guard::Cucumber
    def run_all
      passed = Runner.run(['puppet-repo/features'], options.merge(options[:run_all] || { }).merge(:message => 'Running all features'))

      if passed
        @failed_paths = []
      else
        @failed_paths = read_failed_features if @options[:keep_failed]
      end

      @last_failed = !passed

      throw :task_has_failed unless passed
    end
  end
end

# Monkey patching the Inspector class
# By default it checks if it starts with /feature/
# We tell it that whatever we pass is valid
module ::Guard
  class Cucumber
    module Inspector
      class << self
        def cucumber_folder?(path)
          return true
        end
      end
    end
  end
end

Orchestration of guard runs

The second problem was to have Guard only execute the Vagrant provision when BOTH the cucumber and rspec tests would be OK. Inspired by the comments of Netzpirat, I got it working so that the block vagrant provision would only execute on both tests being complete.

# This block simply calls vagrant provision via a shell
# And shows the output
def vagrant_provision
  IO.popen("vagrant provision") do |output|
    while line = output.gets do
      puts line
    end
  end
end

# So determine if all tests (both rspec and cucumber have been passed)
# This is used to only invoke the vagrant_provision if all tests show green
def all_tests_pass
  cucumber_guard = ::Guard.guards({ :name => 'extendedcucumber', :group => 'tests'}).first
  cucumber_passed = cucumber_guard.instance_variable_get("@failed_paths").empty?
  rspec_guard = ::Guard.guards({ :name => 'rspec', :group => 'tests'}).first
  rspec_passed = rspec_guard.instance_variable_get("@failed_paths").empty?
  return rspec_passed && cucumber_passed
end

Guard matchers

With all the correct guards and logic setup, it's time to specify the correct options to our Guards.

group :tests do

  # Run rspec-puppet tests
  # --format documentation : for better output
  # :spec_paths to pass the correct path to look for features
  guard :rspec, :version => 2, :cli => "--color --format documentation", :spec_paths => ["puppet-repo"]  do
    # Match any .pp file (but be carefull not to include any dot-temporary files)
    watch(%r{^puppet-repo/.*/[^.]*\.pp$}) { "puppet-repo" }
    # Match any .rb file (but be carefull not to include any dot-temporary files)
    watch(%r{^puppet-repo/.*/[^.]*\.rb$}) { "puppet-repo" }
    # Match any _rspec.rb file (but be carefull not to include any dot-temporary files)
    watch(%r{^puppet-repo/.*/[^.]*_rspec.rb})
  end

  # Run cucumber puppet tests
  # This uses our extended cucumber guard, as by default it only looks in the features directory
  # --strict        : because otherwise cucumber would exit with 0 when there are pending steps
  # --format pretty : to get readable output, default is null output
  guard :extendedcucumber, :cli => "--require puppet-repo/features --strict --format pretty" do

    # Match any .pp file (but be carefull not to include any dot-temporary files)
    watch(%r{^puppet-repo/[^.]*\.pp$}) { "puppet-repo/features" }

    # Match any .rb file (but be carefull not to include any dot-temporary files)
    watch(%r{^puppet-repo/[^.]*\.rb$}) { "puppet-repo/features" }

    # Feature files are monitored as well
    watch(%r{^puppet-repo/features/[^.]*.feature})

    # This is only invoked on changes, not at initial startup
    callback(:start_end) do
      vagrant_provision if all_tests_pass
    end
    callback(:run_on_change_end) do
      vagrant_provision if all_tests_pass
    end
  end

end

The full Guardfile is on github

Run it

From within the top directory of the project type

$ guard

Now open a second terminal and change some of the files and watch the magic happen.

Final remarks

The setup described is an idea I only recently started exploring. I'll probably enhance this in the future or may experience other problems.

For the demo project, I only call vagrant provision, but this can of course be extended easily. Some ideas:

  1. Inspired by Oliver Hookins - How we use Vagrant as a throwaway testing environment:
  2. use sahara to create a snapshot just before the provisioning
  3. have it start from a clean machine when all tests pass
  4. Turn this into a guard-vagrant gem, to monitor files and tests

Devops from a sysadmin perspective

(2011-12-07) - Comments

This year LISA (Large Installation System Administration) 2011 Conference has a theme on "devops".

The LISA crowd has been practicing automation for a long time, and many of them just look at devops as something they have always been doing.

So they have asked me to write an article for Usenix ;Login magazine to explain devops from a sysadmin perspective. As the article requires a subscription ,I'm re-posting it here for others to enjoy :)


Introduction

While there is not one true definition of devops (similar to cloud), four of it's key-points resolve around Culture, Automation, Measurement and Sharing (CAMS). In this article we will show how this affects the traditional thinking of the sysadmin.

As a sysadmin you are probably familiar with the Automation and Measurement part: it has been good and professional practice to script/automate work to make things faster and repeatable. Gathering metrics and doing monitoring is an integral part of the job to make sure things are running smoothly.

The pain

For many years, operations (of which the sysadmin is usually part) has been seen as an endpoint in the software delivery process: developers code new functionality during a project in isolation from operations and once the software is considered finished, it is presented to the operations departement to run it.

During deployment a lot of issues tend to surface: some typical examples are the development and test environment not being representative to the production environment, or that not enough thought has been given to backup and restore strategies. Often it is too late in the project to change much of the architecture and structure of the code and it gives way to many fixes and ad-hoc solutions. This friction has created a disrespect between the two groups: developers feel that operations knows nothing about software, and operation feel that developers know nothing about running servers. Management tends to keep those two groups in isolation from each other, keeping the interaction at the minimum required. The result is a 'wall of confusion'

Culture of collaboration

Historically two drivers have fuelled devops: the first one was Agile Development which led in many companies to many more deployment than operations was used to. The second one was Cloud and large scale web operations , where the scale required a much closer collaboration between development and operations.

When things really go wrong, organizations often create a multi-disciplined task force to tackle production problems. Truth is that in today's IT, environments have become so complex that they can't be understood by one person or even one group. Therefore instead of separating developers and operations as we used to do, we need to bring them together more closely: we need more practice, and the motto should be "if it's hard do it more often".

Devops recognizes that software only provides value if it's running production and running a server without software does not provide value either. Development and operations are both working to serve the customer not for running their own department.

Although many sysadmins have been collaborating with other departments, it has never been seen as a strategic advantage. The cultural part of devops, seeks to promote this constant collaboration across silos, in order to better meet the business demands. It goes for 'friction-less' IT and promotes the cross-departmental/cross-disciplinary approach.

A good place to get started with collaboration are places where the discussion often escalates: deployment, packaging, testing, monitoring, building environments. These places can be seen as boundary objects: places where every silo has it's own understanding of. These are exactly the places where technical debt accumulates so they should contain real pain issues.

Culture of sharing

Silo's exist in many forms in the organization, not only between developers and operations. In some organizations there are even silos inside of operations: network, security, storage, servers avoid collaboration and each work in their own world. This has been referred to as the Ops-Ops problem. So in geek-speak devops is actually a wildcard for devops* collaboration.

Devops doesn't mean all sysadmins need to know how to code software now, or all developers need to know how to install a server. By collaborating constantly, both groups can learn from each other, but can also rely on each other to do the work. A similar approach has been promoted by Agile between developers and testers. Devops can be seen as the extend of bringing system administrators into the Agile equation.

Starting the conversation sometimes takes courage but think about the benefits: you get to learn the application as it grows, and you can actively shape it by providing your input during the process. A sysadmin has a lot to offer to the developers: f.i. you have the knowledge of how production looks like, therefore you can build representative environment in test/dev. You can be involved in loadtesting, failover testing. Or you can setup a monitoring system that developmers can use to see what's wrong. Give access to production logs so developers can understand real world usage.

A great way to share information and knowledge is by pairing together with developer or collegues: while you are deploying code he comments on what the impact is on the code and allows you to directly ask questions. This interaction is of great value to understand both worlds better.

Revisiting Automation

Like specified in the Agile Manifesto, devops values "Individuals and interactions over processes and tools". The great thing about tools is that they are concrete and can have a direct benefit as opposed to culture. It was hard to grasp the impact of Virtualization and Cloud unless you started doing it. Tools can shape the way we work and consequently change our behavior.

A good example is Configuration Management and Infrastructure as code. A lot of people rave about it's flexibility and power for the automation. If you look beyond the effect of saving time, you will find that it also has a great sharing aspects: It has created a 'shared' language that allows you can know exchange the way you manage systems with collegues and even outside your company by publishing recipes/cookbooks on github. Because we know use concepts as version control and testing we have a common problemspace with developers. And most importantly the automation is freeing us from the trivial stuff and allows us to discuss and focus on the stuff that really matters.

Revisiting Metrics

Measuring the effects of collaboration can't be done by measuring the number of interactions, after all more interaction doesn't mean a better party. It's similar to a black hole , you have to look at the objects nearby. So how do you see that things are improving? As an engineer you collect metrics about number of incidents, failed deploys, number of succesful deploys, number of tickets. Instead of keeping these information in their own silo, you radiate this to the other parts of the company so they could learn from them. Celebrate successes and failure and learn from them. Doing post-mortems with all parties involved and improve on it. Again this changes the focus of metrics and monitoring from only fast fixing to feedback to the whole organization. Aim to optimize the whole instead of only your own part.

The secret sauce

Several of the 'new' companies have been front-runners in these practices. Google with their two-pizza team approach, Flick with their 10 deploys a day where front runners in the field, but also more traditional companies like National Instruments are seeing the value from this culture of collaboration. They see collaboration as the 'secret sauce', that will set them apart from their competition. Why? Because it recognizes the individual not as a resource but as resourceful to tackle the challenges that exist in this complex world called IT.

Links index:

  1. Patrick Debois's Devopsdays Melbourne Keynote
  2. John Willis, What devops means to me
  3. Damon Edwards, what is devops
  4. Israel Gat, boundary objects in devops
  5. Agile Manifesto
  6. Ernest Mueller, Originality and Operations
  7. Cliff Stoll, The Cuckoos Egg
  8. Andrew Shaefer, Israel Gat, Patrick Debois Velocity Conference 2011 Devops Metrics"
  9. Amazon Architecture
  10. John Allspaw, 10 deploys per day - dev and ops cooperation at flickr
  11. Jesse Robbins, Operations is a competitive advantage

Puppet unit testing like a pro

(2011-12-05) - Comments

A big thanks to Atlassian for allowing me to post this series!!

In our previous blogpost on Puppet Versioning, we described the most basic check to see if a puppet manifest was valid. We used the parseonly function to see if it would compile.

Until know this means we have only have if the compiler is happy, not that it performs the function it needs to do. In 2009 after the first devopsdays I wrote a collection of Test Driven Infrastructure Links . This was obviously inspired by Lindsay Holmwood's talk on cucumber-nagios.

On the Opscode chef front, Stephen Nelson-Smith wrote a great book Test-driven Infrastructure with Chef on how to do this. Also see the cuken project where re-usable cucumber steps are grouped.

Because we are using Puppet here at Atlassian, I was out to understand the current state of puppet testing. A lot can already be found at http://puppetlabs.com/blog/testing-modules-in-the-puppet-forge/

Note that I've purposely named this blog 'Puppet unit testing', as the tests I'm describing now, don't run against an actual system. Therefore it's hard to test the actual behavior.


Tip 1: cucumber-puppet

Inspired by Lindsay Holmwood's talk on cucumber-nagios and Ohad Levy's manitest Nikolay Sturm created cucumber-puppet

In his post on Thoughts on testing puppet manifests he explains that the idea of writing tests is NOT about duplicating the code, and he identified the most common problems he was facing are:

  • catalog does not compile: syntax errors, missing template files, ..
  • catalog does compile, but cannot be applied: unreachable or non-existent resources, missing file resources in repo
  • catalog does applies, but is faulty: faulty files, due to empty manifests variables or wrong values, missing dependencies (wrong order ...), files are installed without ensuring a directory ...

An important advice is:

Resource specifications can be useful for documentation purposes or refactorings. However, there is a risk of reimplementing your Puppet manifest, so be wary.

$ cd puppet-mymodule
$ gem install cucumber-puppet

Write features per module, this is the structure we are aiming at:

module
  +-- manifests
  +-- lib
  +-- features
       +-- support
       |     +-- hooks.rb
       |     +-- world.rb
       +-- catalog
       +-- feature..

Generate a cucumber-puppet world:

$ cucumber-puppet-gen world
Generating with world generator:
     [ADDED]  features/support/hooks.rb
     [ADDED]  features/support/world.rb
     [ADDED]  features/steps

# Adjust the paths to your modules and manifests
$ cat features/support/hooks.rb
Before do
  # adjust local configuration like this
  # @puppetcfg['confdir']  = File.join(File.dirname(__FILE__), '..', '..')
  # @puppetcfg['manifest'] = File.join(@puppetcfg['confdir'], 'manifests', 'site.pp')
  # @puppetcfg['modulepath']  = "/srv/puppet/modules:/srv/puppet/site-modules"

  # adjust facts like this
  @facts['architecture'] = "i386"
end

# Nothing exciting here
$ cat features/support/world.rb

require 'cucumber-puppet/puppet'
require 'cucumber-puppet/steps'

World do
  CucumberPuppet.new
end

Generating a policy feature:

$ cucumber-puppet-gen policy
Generating with policy generator:
     [ADDED]  features/catalog

# Notice the <hostname>.example.com.yaml
# These files contain the facts to test your catalog against
# 
$ cat features/catalog/policy.feature 
Feature: General policy for all catalogs
  In order to ensure applicability of a host's catalog
  As a manifest developer
  I want all catalogs to obey some general rules

  Scenario Outline: Compile and verify catalog
    Given a node specified by "features/yaml/<hostname>.example.com.yaml"
    When I compile its catalog
    Then compilation should succeed
    And all resource dependencies should resolve

    Examples:
      | hostname  |
      | localhost |

To do an actual run:

$ cucumber-puppet features/catalog/policy.feature 
Feature: General policy for all catalogs
  In order to ensure applicability of a host's catalog
  As a manifest developer
  I want all catalogs to obey some general rules

  Scenario Outline: Compile and verify catalog                            # features/catalog/policy.feature:6
    Given a node specified by "features/yaml/<hostname>.example.com.yaml" # cucumber-puppet-0.3.6/lib/cucumber-puppet/steps.rb:1
    When I compile its catalog                                            # cucumber-puppet-0.3.6/lib/cucumber-puppet/steps.rb:14
    Then compilation should succeed                                       # cucumber-puppet-0.3.6/lib/cucumber-puppet/steps.rb:48
    And all resource dependencies should resolve                          # cucumber-puppet-0.3.6/lib/cucumber-puppet/steps.rb:28

    Examples: 
      | hostname  |
      | localhost |
      Cannot find node facts features/yaml/localhost.example.com.yaml. (RuntimeError)
      features/catalog/policy.feature:7:in `Given a node specified by "features/yaml/<hostname>.example.com.yaml"'

Failing Scenarios:
cucumber features/catalog/policy.feature:6 # Scenario: Compile and verify catalog

1 scenario (1 failed)
4 steps (1 failed, 3 skipped)
0m0.006s

List of commands:

Generators for cucumber-puppet

Available generators
    feature                          Generate a cucumber feature
    policy                           Generate a catalog policy
    testcase                         Generate a test case for the test suite
    testsuite                        Generate a test suite for puppet features
    world                            Generate cucumber step and support files

General options:
    -p, --pretend                    Run, but do not make any changes.
    -f, --force                      Overwrite files that already exist.
    -s, --skip                       Skip files that already exist.
    -d, --delete                     Delete files that have previously been generated with this generator.
        --no-color                   Don't colorize the output
    -h, --help                       Show this message
        --debug                      Do not catch errors

He has also added support for testing exported-resources.

And for a more practical explanation, see how Oliver Hookins describes the way Nokia uses cucumber-puppet

Scenario: Proxy host and port have sensible defaults
  Given a node of class "mymodule::myapp"
  And we have loaded "test" settings
  And we have unset the fact "proxy_host"
  And we have unset the fact "proxy_port"
  When I compile the catalog
  Then there should be a file "/etc/myapp/config.properties"
  And the file should contain "proxy.port=-1"
  And the file should contain /proxy\.host=$/

----

Then /^the file should contain "(.*)"$/ do |text|
  fail "File parameter 'content' was not specified" if @resource["content"].nil?
  fail "Text content [#{text}] was not found" unless @resource["content"].include?(text)
end

Then /^the file should contain \/([^\"].*)\/$/ do |regex|
  fail "File parameter 'content' was not specified" if @resource["content"].nil?
  fail "Text regex [/#{regex}/] did not match" unless @resource["content"] =~ /#{regex}/
end

Tip 2: rspec-puppet

While the idea on using specs and puppet is not new (https://github.com/jes5199/puppet_spec), the new tool on the block is rspec-puppet brought to us by Tim Sharpe. The same person who gave us vim-puppet and puppet-lint

Like the cucumber-puppet structure, the idea is to have specs directory close to your module:

module
  +-- manifests
  +-- lib
  +-- spec
       +-- spec_helper.rb
       +-- classes
       |     +-- <class_name>_spec.rb
       +-- defines
       |     +-- <define_name>_spec.rb
       +-- functions
             +-- <function_name>_spec.rb

I found it useful to change the default spec_helper.rb as the default

require 'rspec-puppet'

RSpec.configure do |c|
   c.module_path = File.expand_path(File.join(File.dirname(__FILE__), '..', '..'))
   c.manifest_dir = File.expand_path(File.join(File.dirname(__FILE__), '..', '..','..','manifests'))
end


desc "Run specs check on puppet manifests"
RSpec::Core::RakeTask.new(:spec) do |t|
   t.pattern = './demo-puppet/modules/**/*_spec.rb' # don't need this, it's default
   t.verbose = true
   t.rspec_opts = "--format documentation --color"
    # Put spec opts in a file named .rspec in root
  end

Here is a quick example for checking if the class apache installs a package httpd when on a Debian system

require "#{File.join(File.dirname(__FILE__),'..','spec_helper')}"

describe 'apache', :type => :class do
  let (:title { 'basic' })
  let(:params) { { } }
  let(:facts) { {:operatingsystem => 'Debian', :kernel => 'Linux'} }

  it { should contain_package('httpd').with_ensure('installed') }
end

A more detailed description can be found at

For more generic information on rspec:

Conclusion cucumber-puppet vs rspec-puppet

I think you can write your tests in both to do the same. Currently they both support 2.6 and 2.7

I found the rspec-puppet a bit simpler to juggle with providing params like :name or :facts. The yaml file didn't feel to flexible to me. Also cucumber seems to install more dependent gems, that might inflict with other projects.

But as Nikolay already said:

"don't duplicate your manifests in your tests" Focus on the catalog problems he described earlier and test your logic. Don't test if puppet is doing it's job, test that your logic it's doing it's job.

This is why I called them unit-tests, they don't test the real functionality. (That's for the next blogpost)


Tip 3: puppet-lint

To check you files against programming style you can use https://github.com/rodjek/puppet-lint. It will check for Rules on Spacing, Identation & Whitespace , Quoting, Resources, Conditionals, Classes

An easy way to integrate it in your Rakefile is:

require 'puppet-lint'

desc "Run lint check on puppet manifests"
task :lint do
linter =  PuppetLint.new
  Dir.glob('./demo-puppet/modules//**/*.pp').each do |puppet_file|
    puts "Evaluating #{puppet_file}"
    linter.file = puppet_file
    linter.run
  end
  fail if linter.errors?
 end

Now you can simply run:

$ rake lint

Tip 4: go wild and build your own test/catalog logic

After having a look at the rspec-puppet logic, I looked deeper in the way to walk trough the catalog object. This is pretty much work in progress, but the idea is find a way to look at changes in the catalog.

The following is a list of useful examples on understanding on how to work with puppet in ruby code:

The first list of links are some fun tools written by Dean Wilson of www.puppetcookbook.com fame:

R.I. Pienaar of Mcollective Fame shows a way to create diff on a catalog. this can be useful to understand what tests to run in between changes:

This final gist shows how to walk through the catalog and check the classes and resources available:

https://gist.github.com/1430062#file_puppet_demo.rb


Prev Next