availability: September 2013
Update 4/01/2012: added ways to add metrics via logs, java pickle graphite feeder
If you are working within an enterprise , chances are that you have different metric systems in place: You might have some Cacti, Ganglia, Collectd, etc... due to historical reasons, different departments,
This reminded me of the situation while I was working in Identity Management: you might have an LDAP, Active Directory, local HR database etc. There would be plans and discussions of using one over the other, and gateways would need to be written. I learned a few lessons there:
Take the new Metrics hotness Graphite as an example, it has some nice graphing advantages over other tools . So people wonder , should I migrate my Ganglia, Collectd to Graphite? Graphite doesn't come with elaborate collection scripts for memory/disk/etc ... , so we have to rely on other tools like Cacti,Munin,Collectd,Ganglia to first collect the data.
So we start writing gateways to get data into Graphite:
But what happens if we also use Opentsdb for storing long term data ? We have to re-implement those gateways:
This just seems like a waste of energy implementing the protocol in every tool.This sure isn't the first time this happens in history: the same thing happened for Collectd -> Ganglia Plugin
If you look at the data that is transmitted it is actually pretty much the same:
a metric name, value, timestamp, optionally hostname, some metadata tags
So we could easily envision a 'universal' format that would be used to translate from and to.
Ganglia <-> Intermediate format <-> Graphite Collectd <-> Intermediate format <-> Opentsdb
With this intermedia format, we would only have to write one end of the equation once.
I started thinking of this like an ffmpeg for monitoring
Let's add another system that wants to listen into the metrics, something like Esper, Nagios alerting, some Dataware house tools etc... We could reuse the libraries from end to the other, but we'll have to add more gateways and put these in place everytime.
A better approach would be to use a message bus approach: every tools puts and listens on a bus and gets the data it needed. RI Pienaar has written about this approach extensively in his Series on Common Messaging Patterns. Aso John Bergmans has a great post on using AMQP and Websockets to get realtime graphics.
Some of the tools already have Message queue integrations, but there seems to be a common intermediate format missing
Graphilia - Graphite AMQP: https://github.com/fetep/graphlia/blob/master/graphlia.py
Collectd - Plugin:AMQP - Transmit or receive value by collectd: http://collectd.org/wiki/index.php/Plugin:AMQP
As a proof of concept I've created :
In this section I'll look for API's (ruby oriented) to get data in and out of the different metrics systems:
Sending metrics from ruby to Graphite:
These both implement the Simple Protocol, but for high performance we'd like to use the batching facility through the Pickle Format. I could not find a Pickle gem for ruby, but his could work through Ruby-Python gateway http://rubypython.rubyforge.org/.
Faster - a Java Netty based graphite relay takes the same approach https://github.com/markchadwick/graphite-relay
Another way to get your data into graphite is using Etsy's Logster https://github.com/etsy/logster
To get all the data out of Graphite is impossible through the standard API. You get a graph out as Raw data, but that hardly counts.
The best option seems to be to listen in to the graphite - udp receiver and duplicate the information onto a message bus.
An alternative might be to directly read from the Whisper storage, inspiration for that can be found in:
I could not find any ruby gem that implements the Opentsdb protocol for sending data, but creating one should be trivial. Opentsdb just use a plain TCP socket to get the data in
Getting data out of Opentsdb suffers the same problem as Graphite: you can do queries on specific graph data
But you can't get it out, maybe if you directly interface with the Hbase/Java API. So again the best bet is to create a listener/proxy for the simple TCP protocol.
Sending metrics to Ganglia is easy using the gmetric shell command. Early days code describing this can still be found at http://code.google.com/p/embeddedgmetric/
Igrigorik has written up nicely on how to use the Gmetric Ruby gem to send metrics
If you want to feed in log files into ganglia Logtailer might be your thing https://bitbucket.org/maplebed/ganglia-logtailer
Vladimir describes the options while he explains on how to get Ganglia data to graphite
Option 1 is to poll the Gmond over TCP and get the XML from it's current data:
Options 2 is to listen into the UDP protocol as a additional receiver.
I implemented both approaches in the https://github.com/jedi4ever/gmond-zmq
Note: As a side effect I found that the metrics send to the UDP are actualy more acurate then the values when you query the XML.
So send metrics to Collectd, you can use ruby gem from Astro that implements most of the UDP protocol
I give Collectd for the price of best output.
It currently implements different writers:
And the deactived ZeroMQ - https://github.com/deactivated/collectd-write-zmq
The Binary Protocol http://collectd.org/wiki/index.php/Binary_protocol is pretty simple to listen into.
If you happen to use Munin, here's some inspiration, but I haven't researched it much
If you happen to use Circonus, here's some inspiration, but I haven't researched it much
For those who want to read and write directly from RRD's in ruby, please have fun:
With all the tools in and out, and a unified intermediate format, it will be trivial to rewrite the traditional alert check tools to listen into the bus for values. This means you can listen into for your Nagios, your ticket system, your pager system etc.. from the same source.
It should be feasible to create an intermediate format and reuse some of these libraries to implement both IN and OUT functionality. Why not create a Fog for monitoring information? Like implements metric receive, send,
Next stop Nagios because it deserves a blogpost on it's own ...