Just Enough Documented Information

What Is This Devops Thing, Anyway?

(2010-02-12) - Comments
emerging butterfly In the last few months, a movement has begun to take shape. It's a movement of people who think it's time for change in the IT industry - time to stop wasting money, time to start delivering great software, and building systems that scale and last. This movement is being called Devops. But what is Devops? Where did it come from? And what can it achieve?
Guest post by : Stephen Nelson-Smith @lordcope a Technical Manager and Devop based in Hampshire, UK and author of Agile Sysadmin

What problems are we trying to solve?

Let's face it - where we are right now sucks. In the IT industry, or perhaps to be more specific, in the software industry, particularly in the web-enabled sphere, there's a tacit assumption that projects will run late, and when they're delivered (if they’re ever delivered), they will underperform, and not deliver well against investment. It's a wonder any of us have a job at all!

Let’s have a look at some of the common problems we experience in the software world today.

Fear of change

Once an application is delivered, the business tends to be tremendously afraid of change. The suspicion is that the software itself, and the platform upon which it sits, is somewhat brittle and vulnerable. Bureaucratic change-management systems are put in place, and it takes a painfully long time to introduce new features, or fix problems with the application.

Risky deployments

Another symptom of the malaise is the concept of the ‘risky deployment’. This is the situation in which no-one is really terribly confident that the software will work in the live environment. Will the code behave as expected? Will it cope with the load? Often we don’t have the answers to these questions - we just push it out at a quiet time, and watch to see if it falls over.

It works on my machine!

A common situation we experience is one in which a problem manifests itself once the site is live. These problems are typically picked up by systems administrators, or helpdesk / client services people. After investigation (although to be fair frequently without) the problem is reported to the developers. The developers take a cursory look, and retort: “It works on my machine”.

However, this is frequently meaningless. I’ve worked in places where the code is developed on Windows 2000 workstations, and tested with Tomcat on Windows, and then deployed on Redhat Linux, with software load balancers, different Java versions and different Tomcat versions. This is not to mention the entirely different properties files on developer workstations compared to live machines.

Siloisation

On most projects I’ve worked on, the project team is split into developers, testers, release managers and sysadmins working in separate silos. From a process perspective this is dreadfully wasteful. It can also lead to a 'lob it over the wall' philosophy - problems are passed between business analysts, developers, QA specialists and sysadmins. Furthermore, we see a replication of this silo structure within the teams - it’s not uncommon to see dedicated database and network people in the same system team, alongside sysadmins. Often the larger silos aren't in the same office, the same city, or in sometimes not even in the same country. The result is an ‘us and them’ mentality - groups of people who are simultaneously suspicious of and afraid of each other.

The Devops movement is bold enough to believe that there’s a better way - a way of building teams, and building software that can solve these problems.

How does Devops help?

The Devops movement is built around a group of people who believe that the application of a combination of appropriate technology and attitude can revolutionize the world of software development and delivery.

The demographic seems to be experienced, talented 30-something sysadmin coders with a clear understanding that writing software is about making money and shipping product. More importantly, these people understand the key point - we’re all on the same side! All of us - developers, testers, managers, DBAs, network technicians, and sysadmins - are all trying to achieve the same thing: the delivery of great quality, reliable software that delivers business benefit to those who commissioned it.

Breaking this down a bit - one of the key statements there is ‘sysadmin coders’. There’s been a certain amount of popularity recently around the concept of ‘polyglot programmers’. These are developers who know multiple languages, and multiple approaches to programming, such that when appropriate they can move swiftly between object-oriented Ruby and a functional language such as Erlang or OCaml. This is simply an example of a larger observation - there is no one IT skill that is more useful or more powerful than another. To solve problems well you need all the skills. When you build teams around people who can be developers, testers, and sysadmins, you build remarkable teams.

Beyond this multi-disciplinary approach, the Devops movement is attempting to encourage the development of communication skills, understanding of the domain in which the software is being written, and, crucially, a sensitivity and passion for the underlying business, and for ensuring it succeeds.

It might seem obvious, but communication really is one of the biggest keys. Devops are bridge-builders. The fact of the matter is that building quality software is really hard - it's error prone, it's risky, it's unpredictable. In the last ten years or so, software developers have started to realise this. The development of the Agile Manifesto, and the consequent improvements in process and peopleware, together with some technology advances around testing have had a massive impact.

However, there's still a gaping hole in what Chris Read calls 'the last mile'. The thing is, 'dev complete' is a long long way from 'live, in production, stable, making money'.

The problem is that it's typically the sysadmins who are responsible for getting the software out live - but often they don't know, trust, like, or even work in the same city as the developers. How on earth are we expected to deliver good software that way?

So, the Devops movement is characterized by people with a multidisciplinary skill set - people who are comfortable with infrastructure and configuration, but also happy to roll up their sleeves, write tests, debug, and ship features. These are people who making connections, because they can - because they have feet in multiple camps, they can be ambassadors, peace makers, facilitators and communicators. And the point of the movement is to identify these, currently rare, people and encourage them, compare ideas, and start to identify, train, recruit and popularize this way of doing IT.

This has a tremendous impact on the business. Suddenly the technical team starts trying to pull together as one. An 'all hands on deck' mentality emerges, with all technical people feeling empowered, and capable of helping in all areas. The traditionally problematic areas of deployment and maintenance once live become tractable - and the key battlegrounds of developers ('the sysadmin built an unreliable platform') versus sysadmins ('the developers wrote unreliable code') begins to transform into a cross-disciplinary approach to maximizing reliability in all areas. This, of course, has a positive effect on the bottom line - better reliability and availability, happier clients, faster time to market, and more time to focus the team's energy on core business rather than wasteful administration and firefighting.

Criticisms of the movement

Some people have been a little suspicious - the movement really looks a lot like just a bunch of European sysadmins, many of whom know each other. Is this just an elitist club? Some kind of rebranding exercise?

Well, it's true - Devops as a movement is characterized by a number of sysadmins, many of whom know each other - but then this isn't a surprise. We've already explored the problems that we're trying to solve - these are problems centred around the deployment and management of applications that are, from a simplistic position, completed work. Developers have already realized that change was long overdue, and they’ve started taking steps to aleviate the pain. The result of this is that the problem has moved, and it’s in the systems area that the pain is now being felt. Consequently it's only natural that the majority of the people who have realized it's time to change things operate in the systems area.

Yes, there are some senior, experience, and talented people in the movement - but again, that's no surprise - the kind of sysadmin who is going to recognize the problem, care about fixing it, and have the combination of skills in terms of programming, systems and infrastructure and personal and management skills will by the very definition be a highly skilled and capable person. That’s not to be confused with elitist - the movement is friendly, welcoming, and open.

To draw a comparison, when the Agile Manifesto was drawn up, that was a gathering of well known, experienced, figurehead developers. It could easily have been perceived as exclusive, and elitist, but that didn't invalidate the ideas, its value, or prevent the movement from growing and thriving

The fact that the movement has grown up in Northern Europe is simply the coincidence of a concentration of good Agile groups, and capable sysadmin / developers gathering together in a Belgian city for the inaugural DevopsDays unconference.

So if you’re feeling wary of the movement, or worried that you might not be accepted or fit in, don’t worry at all - if you’ve got a desire to change the world, hop on board.

How to get involved

In my view becoming a Devop requires a certain state of mind. It's an attitude which says: I'm going to make a difference, I'm going to cooperate and communicate, and I'm going to understand that in the business of delivering great software, we're all in it together. If you're a sysadmin, this means spending time with the developers. Get to know your way around the code base. Contribute to it. If you don't know how to program, get started - there are good tutorials available for Python and Ruby, which are both excellent, powerful general purpose languages that will serve you well. Start thinking about systems management as a programming task - use Puppet or Chef to manage your machines, and start thinking about how to test your infrastructure.

If you're a developer, go and make friends with your sysadmins. Don't view them as lower life forms, or as people to lob problems to. Go and join in - pair with them. If they're using Puppet or Chef, get involved - start contributing to their codebase. If you're an experienced programmer, especially if you have an understanding of test driven development, consider mentoring your sysadmins - get them confident in programming, and encourage them to start to take ownership of the code base. Start to understand the roles within your (soon to be removed) silos. Try to understand what skills the QA team brings, and how they work - see if there's a way for you to help. Try to cross-pollenate - sysadmins, DBAs, network people, business analysts - you're all on he same team.

If you're a manager, advertise positions as 'Devop' not just ‘Developer’ or 'SysAdmin'. Make it clear you're interested in hiring people with cross-functional skills, and that you'll provide training to help this. Start implementing configuration management systems, get sysadmin code into version control, get the systems team involved in continuous integration and deployment activities. If you need to hire in consultancy to help get you on the right track, there are people available who do this kind of thing every day.

Summary

The Devops movement is still in its infancy, but it's gathering pace - there are conferences, mailing lists, irc channels, blogs and people to get to know. I'm convinced this isn't just a fad, and we're on the brink of a revolution in the software industry - a paradigm shift in which developers and sysadmins start to work together, to train each other, and ultimately to blur the boundary - welcome to the world of the Devop.

About the author

Stephen Nelson-Smith is a Technical Manager and Devop based in Hampshire, UK. His company, Atalanta Systems, specializes in applying a blend of automation technologies and agile and lean work practices to enable clients to concentrate on their core business, be more effective, and make more money. You can follow him on Twitter, and read his blog at Agile Sysadmin

Resources

If this sounds like the kind of thing you already do, or like the kind of thing you want to do, there are plenty of resources available to help.

Blogs

The following blogs are written from a Devops perspective, and offer high quality, technically rich content. The Build Doctor, Patrick Debois (This One), Agile Sysadmin, RI, Bitfield , Planet Infra, Unix Daemon

Mailing lists

The is a low volume Google group called Agile System Administration. Most of the founders of the movement can be found there.

IRC

IRC is a superb resource - the following channels on freenode cover the kinds of tools and approcahes that are central to the movement, and many of the current members are the movement can be found in the channels.

##infra-talk , #puppet , #chef

Twitter

Most of the inaugural Devops people can be found on Twitter. You can also search for the hash tag #Devops. Here are a selection of interesting people to follow:

Stephen Nelson-Smith , Julian Simpson , Matthias Marschall , Patrick Debois , Mike Poutney , Paul Nasrat , Dean Wilson , R.I.Pienaar , Lindsay Holmwood , Chris Read , Branden Faulls , James Turnbull , John Willis, Kris Buytaert , Gildas Lenadan, John Arundel , Andrew Shafer, John Allspaw, Michael Cote ,Damon Edwards , Israel Gat

Tools

And finally, here are some of the tools that you're likely to find in the box of a Devop.

Hudson , Cucumber , Cucumber Nagios , Puppet , mcollective , cobbler , Chef , Flapjack , Visage , Collectd


Charting out devops ideas

(2009-12-22) - Comments

Thanks to the devopsdays conference, the idea of devops seems to live on. While talking with other people about it, I realize that it is difficult to frame it within the current IT landscape. At lot of the ideas are coming from different kinds of emerging technologies (T) and process management (P) approaches.

For me the two most important observations are:

  • there is a increase in feedback loops between business, all parts of the delivery process and operations
  • thanks to this feedback loops we increase the quality and speed up the flow

So where can you look for devops ideas ? As you can see on the map , these interactions are all over the place.

  • (A) Business focusing on both functional and non-functional requirements: business are becoming aware that f.i. downtime and data loss can really drive their customers to the competition
  • (B) New software architectures driven by the *-ities: new topics as NO-SQL databases, queueing systems are increasingly used in software architecture to handle scability, caching systems like memcached in combination with programming languages
  • (C) Testing and monitoring growing towards each other: reuse of test logic into the monitoring system (f.i. cucumber-nagios), using monitoring probes in test environment to validate applications under scenario's
  • (D) Operations teams organizing themselves to cope with changes from business: ideas like agile operations and lean operations
  • (E) Closer interaction between development and system engineering during the projects: agile project methods or others that form a multifunctional team instead of different silo's
  • (F) Project Learning from operations: architects now actively inside projects and getting feedback of operations on what works or not, for better redesign
  • (G) Operations as a first listener to customer problems: Operations can also make the difference similar to sales in treating the customer right and listening in on problems and providing feedback to the business
  • (H) Business using operational metrics as feedback: to see what customers like or how they act on decreases in performances or outage is becoming an important feedback loop to make better business decisions
  • (I) Sysadmins using development techniques: using code repositories, continuous integration, testing tools, design patterns to handle automation and provisioning of systems
  • (J) Deployment growing towards configuration mgt: provisioning a systems uses a configuration management for the definition and these same tools (chef, puppet) are used for operational/live changes as well
  • (K) Operations teams developing new tools for managing systems: as there is still a large tool gap, a lot of sysadmins are working on better tools for mass deployment, large configuration changes, monitoring
  • (L) New system architectures: this were cloud computing and agile infrastructure comes in, better and innovative ways to automate the provisioning and deployment
  • (M) Operations team going upstream in the process instead of more passive role: Experiments with kanban in operations to interact during the project phase and even before the project phase (sales, Service Leve Management)

devops ideas overview

You can probably find a lot more or think that some have nothing to do with devops as such. I really like to hear your thoughts on this list. The focus of this post is not by the 100% exactness or completeness. Think of it as work in process waiting for your feedback ;-)


Building a Visual Resume

(2009-11-28) - Comments

As a freelancer I spent quite some time updating and sending my CV to different parties. Most recruiting companies ask me for a MS Word version of my CV without any markup. I understand why, they put on their own logo and copy and paste the relevant parts for the company they propose me to. To me this feels rather awkward. In these web oriented times, I would expect enterprises to check candidate backgrounds online. And a flat, unformatted list of job experience, is not what they expect online. Sites like Linkedin and VisualCV try to make a difference here, but they are still text based, very similar to the dull paper documents.

You can debate on the usefulness of a visual resume for jobhunters, I think that potential resume evaluators go through a resume in different passes. The right typography will already helps a lot in this process. Also the visual has to be easily understood by the recruiter, not only by the candidate himself.

The most brilliant examples I've seen, come from graphical or art designers. They are showcasing fonts, styles or other visuals. The problem is, I'm not into design jobs, and creating a resume that visual, will not send the right message.

Another trend is using visual slides to convey who you are and what you stand for. These examples work well when presenting in person, but without audio they don't work that well. Also printing these will result in a condensed overview.

Below is a list of the most beautiful examples I found. I'd love to make one myself, who knows.

Visual CV - Examples:


CV Industries Infographic Resume

Visual ThinkMap Life Map

Visual Thinkmap Arnaud Velten

Infographics Michael Anderson

Stephen Gates Visual Resume

Kevin Shrugged

Mahadevan Gomathisankaran

Kristi CV

David J. Downs

Interaction Designer Skillset

Shapeshifter - Bob Van Vliet

More stuff:

Future?

Well the next thing is probably create my own... Another idea might be to create an application to visualize f.i. LinkedIn profiles with it using the hresume format. If any visual designers are interested in this project, just let me know.


Translating Code Smells in Server Smells

(2009-11-27) - Comments

At xpdays Benelux 2009, I attended an interesting session called 'Developing a Sense of Smells' by Kevin RutherFord and Lindsay McEwan.

The exercise we did went as follows: suppose you are asked to do some work on code you never saw before. How would you assess this, go about estimating the effort and explaining that effort to justify the price/number of days. The first round resulted in terms like 'look for design patterns', 'readibility of the code', ...

Then they explained code smells patterns: f.i. a greedy function: one function that does way to much making it difficult to change, hidden secret: internal knowledge such as the real interpretations of a CSV file data format that can not be deduced by looking at the code. The code smells made it easier to express the problems.

The devops in me,thinks that this can be translated to the sysadmin world. Here are some of the 'smells' I've come up.

Private Playground

The sysadmin uses the system as his toy playground, doesn't clean up.

  • /tmp & /var/tmp full of old install files
  • / full of files

Gready Server

One server that does every function

  • combined mail and web and dns and fileshares
  • all users on the same system

Root is the cause of all evil

  • last show login all root
  • no sudo is activated
  • no sshd keys for logins
  • nfs share/root?
  • Chmod 777
  • most processes run as root

Cranky Crutches

Things that are needed to keep the system alive when failed

  • /etc/ start but no stop scripts
  • kill /stop/start in cron jobs

Nobody lives here anymore

Mainly indicate not much maintenance is done any more

  • Last update is a long time ago
  • Older kernel versions
  • last login was more then x days ago
  • olders reboot is a long time ago

Complexity Conspiracy

Loadbalancers, Cluster software, Dependencies

More is Less

  • All packages installed
  • All services running
  • All ports are open

This is just the beginning, so if you have your own ideas/names, just leave a comment.


Xpdays Benelux 2009 - Continuous Integration for the World

(2009-11-19) - Comments

Here are the slides of my presentations at Xpdays Benelux 2009. The presentation is on how we could bring sysadmins and developers closer together by using continuous integration in both worlds.

I would very much like to thank Gildas Le Nadan for helping with the first version of it at Xpdays France 2009. Also I want to thank the organizers for this great conference! And for giving me the chance to speak about this rather niche subject.


Coding an Infrastructure Test First

(2009-11-18) - Comments

Now that we outlined the programming languages for automating shell scripting, virtual machine creation ,network provisioning and os installation and beyond, I bet you as a devops are eager start writing your infrastructure code.

After some time chances are that you will end up with lots and lots of scripts executing in sequence. And then when you change something in a script the whole sequence will fail and you'll have a hard time looking for what caused the problem. A better approach for writing your code is to practice Test Driven Development.

Test Driven Development Automation

In short before writing any code, you first write a test for the code you are writing. Then you run your tests and see that the new test fails.(RED). It is only then that you start writing your code or change your existing code (REFACTOR). When you think the code is done, you run your tests again and see them succeed (GREEN). And then you continue to use this cycle to grow your code. It is important that you chance your code in small increments. For more infor see Test first guidelines.

Benefits for the sysadmin

So how can this help you as a sysadmin? Isn't this more of developer thing? And the answers is a big NO:

  • can you remember the last time when you had to apply patches or config file changes to a system. And did you have that fingers crossed feeling? Wouldn't it be great that you could install a patch and run a series of tests to see if everything behaved the way it should?
  • when you get audited : how can you show that the machines you're running comply to your installation guides
  • writing these tests also helps in sharing the knowledge and repeating the validation process every time. Even without your rockstar sysadmin being around
  • the incremental approach also helps in systems overdesign. As project complexity grows, you may notice that writing automated tests gets harder to do. This is your early warning system of overcomplicated design. Simplify the design until tests become easy to write again, and maintain this simplicity over the course of the project.

Yes, but won't this slow me down? Well this is exactly the reaction most developers have to this process. But to me the benefits should be clear, do you rather go for the fingers crossed go live or the verified state. You will have to find the right balance between writing all tests and writing no tests.

If you are working on a new project that tries to get something running as quickly as possible as a one shot, you'll probably be under pressure to deliver. It will take some time to reach the skills to get the automation and the tests in your fingers. Still a good way of convincing management is by explaining them that these tests not only for the project phase but can be used during the whole maintenance period. This definitely increased the ROI of writing these tests.

Setup / Teardown

Part of a test there usually is a setup and a tear-down part. The idea is to create a state under which you start performing your tests. For applications f.i. this would be to put the right stuff in the database, set the right variables. So how does this translate to machines? Most of the virtualization solutions allow you to do easily take snapshots of both your memory state (savestate) and your disks. So you can easily recreate a certain point in time to start your tests by cloning your systems and running some scripts to change the state (f.i. filling a disk, killing a process) . Another approach could be to take snapshots of your disks using filesystems like ZFS, LVM that easily let you take snapshots.

This also helps during the coding of your infrastructure:

  • you take a snapshot of your current state
  • you run your code
  • see if the test succeeds
  • if not OK rollback , if OK save the new state

Example: a Webserver Test First

In the following example I will describe the setup of a webserver which serves static pages. Note that there is no standard development involved here. It's all about pre-packaged software.

Step 1: Defining the virtual machine

In this step you would define the hardware of your virtual machine: number of CPU's, memory, network interfaces, mac addresses, disks, ... So how do we test this? In the days of physical hardware, the way to verify that systems had all the hardware in it, we booted up a CD and verified using commands that the system contained the correct number of disks.

To avoid writing your own boot CD , you could use something similar to sysrescueCD. It has a feature called autorun that allows you to boot up the disk and execute scripts that are either on a floppy, disk, NFS or Samba share or HTTP server. If you mount this as a virtual CD in your virtual machine, it can boot up the virtual machine and execute test to test if the definition was according to what was specified.

Step 2: Prepare IP, DNS, DHCP, TFTP

Next step is provisioning the network information for your virtual machine. In order to test this, we can easily use the same boot CD approach to verify this via scripts using dhclient, dig, tftpclient. Some might argue that this test is better done from within the OS. Off course testing is when the OS is installed is a more complete tests. Doing this test separate from the installed OS, lets you better distinguish where the error occurs. Is it a problem of the OS driver that there is no IP address or is is a problem with the definition of the DHCP/DNS.

Step 3: Minimal Install of OS

This is usually done by defining a kickstart/jumpstart template. It would contain the disks partitioning, network configuration, the minimal packages and patches, a set of minimal services enabled (ssh, puppet), selinux enabled , . As you see there is a lot more that can be tested here:

  • Is the swap activated correctly
  • Is all memory seen from the OS
  • check if disks partitioning is ok
  • check if disks are mounted correctly
  • Is it 64/32 bit
  • Are permission set right
  • Is SElinux activated
  • is the NFS share exported: showmount -e
  • IP: are the interfaces up an did they get the correct settings
  • do some DNS lookups to see if that works
  • Ping the router to see if network is alive
  • Verify the syntax of your sendmail.cf
  • See if the processes are running (sshd, puppetd) : ps -ef |
  • Check the listeners : netstat -an|grep LISTEN
  • Do a test login with SSH
  • Running NMAP to see if no other services are activated
  • run nessus to check vulnerabilities

Up until now these tests are simple checks , and probably most people will do similar things in their monitoring. I've talked about testing being more then monitoring. Aside from these simple checks you can complementing your monitoring to run scenarios. Also destructive tests like failing a disk or bringing down an interface are probably not the best thing to do in production monitoring ;-)

  • test if IP Bonding by executing a failover
  • Verify that syslog works by sending a log request
  • test if your raid system works by killing a disk
  • test if your self healing works by killing a process
  • test a reboot scenario
  • test if your DNS failover works by using iptables to block access to the first DNS server
  • test your backup/restore scenario ;-)

So aren't we re-testing the packages here? The problem here is packages are often tested in isolation from each other and they can only test a limited number of setups. That means it still makes sense to repeat some tests so see if YOUR combination of things actually works.

Step 4: Apply recipes for the webserver

We now have a tested minimal OS running with a configuration mgt system active. Next in line it applying recipes. While discussing these with a number of people , I heard a lot that you don't need tests because these tools work in a declarative mode: if something doesn't work then it's either an error in your recipe or an error in the configuration management software.

I personally disagree, the argument is similar to why we test even if individual packages are tested: you can write a beautiful recipe to install a webserver but maybe the firewall is preventing you access, maybe your SELinux is blocking things. Or you install a bad apache config file so that it uses the wrong directory to serve.

Also you can add scenario testing here :

  • by running load and see if it actually spawns the number of processes you specified
  • check for loadbalancer pages are available
  • kill the http daemon and see if it recovers
  • check if caching works ok by downloading the file one
  • check if HTTP/compression works
  • check if lastmodified/ headers are set correctly
  • check if log rotation works ok

Testing Frameworks

Programming languages have developed a lot of frameworks over time to help in writing tests. When coding your infrastructure in these languages check what's available. Still there is no test library that is specific to systems testing. The closest are test frameworks for HTTP testing.

Currently this is an emerging field. There are already a lot of examples in the wild. As we adopt more the idea of testable infrastructure, these frameworks will emerge. The most notable is cucumber-nagios written by Lindsay Holmwood. It brings http testing closer to monitoring and into the sysadmin world. Now you can reuse your tests written in cucumber in your monitoring environment.

After a discussion at devopsdays 09 Lindsay is starting on a similar thing that allows to integrated ssh scripting in this framework, or what he calls Behavior driven infrastructure through cucumber And recently he announced on the agile system administration mailing list that he's joining forces with Adam Jacob of Opscode. So definitely more to come!


Recipes for Automated installation of OS and beyond

(2009-11-18) - Comments

Up until now, I've described the options to automate shell scripting, virtual machine creation and network provisioning. So now we can actually get started with automating the installation of the Operating System itself. This not surprisingly is were most sysadmins spent most of their time.

The good news is that we are slowly going a way from the custom scripting toward a reusable and shareable language. Very similar to programming design patterns. Over the years the methods have evolved towards configuration management, which is obviously a good thing.

This also helps in leaving the hero culture sysadmins have because they are the guys who now how to manage their systems.

Jump/Kick start

The first way of automating the OS installation came from using Kickstart or Jumpstart. In essence it is a config file that defines the information necessary to do an installation. It typically contains:

  • the machine's network configuration (interfaces, IP, Routes, DNS)
  • the disk layout setup
  • the packages to be installed
  • patches to be applied

When booting the kernel of an OS, there usually is a way to specify this file, allowing automated installation. Over time this file was put on a floppy, then on a CDROM, USB Drive. To automatically start the installation the install media needed to be modified to change the default kernel boot options. The syntax of these files is typically different per OS.

To create your first file, you would do a manual installation of the OS, and at the end it would create a file corresponding to the manual choices you made. After that, the file could be altered to include additional post-installation scripts or extra packages.

Automating the creation of this has never really been important, I was either using an Editor or using a GUI to generate them. Interestly I found Snake as an example to scripting the creation of the template.

So in other to automate the installation of software, sysadmin would encourage developers to create packages for their install, so that the installation could be easily automated.

An example of a Solaris Jumpstart:

# install_type MUST be first
install_type      initial_install
# start with the minimal required number of packages
cluster           SUNWCXall
cluster           SUNWCapache delete
cluster           SUNWCpcmc   delete
cluster           SUNWCpcmcx  delete
cluster           SUNWCthai   delete
cluster           SUNWClp     delete
cluster           SUNWCnis    delete
cluster           SUNWCppp    delete
# format the entire disk for Solaris
fdisk   all   solaris all
# define how the disk is partitioned
partitioning      explicit
filesys           rootdisk.s0 6144  /
filesys           rootdisk.s1 1024  swap
filesys           rootdisk.s7 free  /state/partition1
# install systems as standalone
system_type standalone
# specify patches to install
patch 119281-06 nfs 172.16.64.194:/export/patches
# specify packages to install
package SPROcc add nfs 172.16.64.194:/export/packages

System installation frameworks

Over time frameworks emerged to manage all these packages installation and managing :

Configuration Management

As System recipes and configuration management beautifully describes, managing systems is more then just managing and installing packages : sysadmins would have tons of custom scripts and HOWTO's to describe how to do additional things f.i. installing a complete Webserver according to company standards. All of this knowledge was usually put in specific custom scripts, without a good way of sharing this, . What we actually need is a formalized description of all the things that need to be present and configured on the machines in your setup.

Enter the world configuration management tools. These tools provide a ways of describing the recipes in a language that makes abstraction of the actual system its running on. F.i. adding a user or adding a package is abstracted in the language.

Examples are:

Puppet seems to have most of the market for now, but the Chef guys are catching up. Also a lot of the system management tools have puppet integration:

Example puppet recipe for ensuring the permission of a sudoers file

# /etc/puppet/manifests/classes/sudo.pp
class sudo {
    file { "/etc/sudoers":
        owner => "root",
        group => "root",
        mode  => 440,
    }
}

Example for mysql node installation

class mysql-server {
     $password = "insert_password_here"
     package { "MySQL-client": ensure => installed }
     package { "MySQL-server": ensure => installed }
     package { "MySQL-shared": ensure => installed }

     exec { "Set MySQL server root password":
       subscribe => [ Package["MySQL-server"], Package["MySQL-client"], Package["MySQL-shared"] ],
       refreshonly => true,
       unless => "mysqladmin -uroot -p$password status",
       path => "/bin:/usr/bin",
       command => "mysqladmin -uroot password $password",
     }
}

What's fascinating is the new set of tools that starts to emerge here and how it really starts to integrate in real programming languages such as ruby.

  • Moonshine an opensource configuration management and deployment system that follows the Rails way.
  • Gepetto A helper suite for Puppet projects to create, manage and help daily development
  • ShadowPuppet is a Ruby DSL for Puppet, extracted from Moonshine. ShadowPuppet provides a DSL for creating collections (manifests) of Puppet Resources in Ruby.
  • Carpet of the Agile Web Operations guys is an example on how you can mix capistrano with puppet.
  • Cft - Configuration File Tracking: lets you kinda record puppet recipes while entering commands.

For managing configuration files Augeas is doing a really nice job: it parses configuration files in their native formats and transforms them into a tree. Configuration changes are made by manipulating this tree and saving it back into native config files.

Just Enough Operating System

Now that sysadmins are spending more time with their configuration management, they slim down their initial kickstart installations by only using a minimal base install of the OS in their kickstart + a configuration management tool. And when this minimal install is done, they continue applying their recipes to the installation.

This minimal install is sometimes called a base install , an installation with the minimal set of packages installed. Some OS vendors are actively working on providing what is called a JeOS which stands for Just enough Operating System. This is an image that contains an already installed minimal install. They are also useful for creating appliances. Some example are:

Middleware automation

If you're installing a complete stack f.i. Oracle Application Server, Database Server, changes are that the initial package install does nothing. What you need is to create your own instances, pools and so on. What's important is that some of these include their own automation language. It is essential that they have a way of doing every GUI action from the commandline. Otherwise you end up writing scripts for managing their config files. This is a bad thing because this might change overtime or even worse impossible because they store things in binary format.

Examples of good API's are:

Example of asadmin scripting

   # create an cluster
   asadmin create-cluster --user admin --passwordfile adminpassword.txt --host hostname -port 4848 cluster1
   # create instance 1
   asadmin create-instance --user admin --passwordfile adminpassword.txt --host hostname -port 4848 --cluster cluster1 --nodeagent nodeagent1 --systemproperties "JMX_SYSTEM_CONNECTOR_PORT=8687:IIOP_LISTENER_PORT=3330:IIOP_SSL_LISTENER_PORT=4440:IIOP_SSL_ MUTUALAUTH_PORT=5550:HTTP_LISTENER_PORT=1110:HTTP_SSL_LISTENER_PORT=2220" instance1
   # create instance 2
   asadmin create-instance --user admin --passwordfile adminpassword.txt --host hostname -port 4848 --cluster cluster1 --nodeagent nodeagent1 --systemproperties "JMX_SYSTEM_CONNECTOR_PORT=8688:IIOP_LISTENER_PORT=3331:IIOP_SSL_LISTENER_PORT=4441:IIOP_SSL_ MUTUALAUTH_PORT=5551:HTTP_LISTENER_PORT=1111:HTTP_SSL_LISTENER_PORT=2221" instance2
   # start the cluster
   asadmin start-cluster --user admin --passwordfile adminpassword.txt --host hostname --port 4848 cluster1

Some tools also provide a 'silent install' feature. It will record the installation into a text file that you can later replay to have the same result. Using some sed/awk scripts you can probably automate some of this.


Automation of Network Provisioning of Machines (DNS, DHCP, PXE, TFTP)

(2009-11-17) - Comments

Up until now, I've described the options to automate shell scripting and virtual machine creation. The next step is to prepare the network for the virtual machine to make it boot with the right network settings. In a lot of companies this is often a manual process but we want to automate it.

In order to boot a linux machine, one relies on:

  • a DHCP Server for IP Adresses, domainname, router
  • a DNS Server for the necessary name resolution
  • PXE file creation
  • a TFTP Boot server to provide a PXE File

DHCP Automation

On linux the de-facto standard DHCP Servers is the ISC DHCPD. It relies on configuration files in a specific format. Most of the automation of this, were homebrew scripts that changes these config files. It's a pity there is no scriptable API for managing these files.

The next runner up is the DNSMASQ which dispite it's name can also act as DHCP server. It too relies on configuration files, but I found dnsmasq mysql that extends dnsmasq with a mysql backend. So we could issues SQL statements to control it?

Other things I found were:

  • the project http://dhcpd-j.org/ (outdated) as a Java effort to create a DHCP Server storing its information in a Database. But still no real API to manage the files.
  • mydhcpgen(outdated) http://freshmeat.net/projects/mydhcpgen/ project: it holds all information into a database and then generates the files. We could you SQL to program the configuration.

Overall I found no good out of the box way of automating subnet creation, client registration, templates. So if you know any let me know. Maybe be this is the reason why the big virtual vendors provide their own DHCP server in their product, configurable with their own API.

Example of Virtualbox API:

VBoxManage dhcpserver       add|modify --netname <network_name>
                                --ifname <hostonly_if_name>
                                [--ip <p_address>
                                --netmask <network_mask>
                                 --lowerip <lower_ip>
                                 --upperip <upper_ip>]
                                [--enable | --disable]
 VBoxManage dhcpserver       remove --netname <network_name>
                                   --ifname <hostonly_if_name>

DNS Automation

The next step is automating the management of DNS files. For DNS servers under Linux we have more choice then with DHCP server. We have ISC Bind , DJBDNS and dnsMasq. All of them rely on configuration files and for Bind and dnsMasq there have been efforts to integrate it with a database backend f.i. mysql-bind . Another effort I found is [dnsadmin](http://freshmeat.net/projects/dnsadmin/)(outdated)for managing djdns/tinydns files.

DNS by itself allows the use of Dynamic Updates (see RFC 2136). This already allows us to change the attributes of a single host using commands such as nsupdate , but does not provide a way to create/update/delete new zones or other settings.

Another project calls dnspython (outdated) http://www.dnspython.org/ provides both high and low level access to DNS. The high level classes perform queries for data of a given name, type, and class, and return an answer set. The low level classes allow direct manipulation of DNS zones, messages, names, and records.

The most promising is the use of Power DNS http://www.powerdns.com which seem to have full scripting capability. I haven't tried if personally but it's the best alternative I found.

Combing DHCP, DNS, TFTP and PXE boot

For managing a single machine it is important than changes to both your DHCP, DNS, TFTP and PXE boot are in sync. It doesn't come as a surprise that a lot of global machine management solution use the base blocks and add their own sauce. It's a pity, that the management of DHCP and DNS management are not extended in the tools itself but stay in the global management tool. Another thing is that things like the creation of new DNS zones of DHCP templates are not possible because the underlying tools don't have it. (or maybe nobody normally automates that ;-)

Examples of global solutions :

And because most these have a commandline or other API's , one could easily script the updates to DNS of DHCP settings for individual machines.

Amazon started on an API for DNS/DHCP Options for automating their Private Cloud:

Example of scripting network provisioning with Cobbler

domainname="cobblertest.be"
mac_address="aa:aa:bb:bb:cc:01"
name="puppetclient"
ip="192.168.3.150"
kickstart="puppet.ks"
profile="centos53-latest"
distro="centos53-i386"

# Set the distribution of the machine
# The distribution was previously imported by importing an installation DVD
# Cobbler will detect the possible kernels to boot
# And this will also link the TFTP and PXE file necessary to Boot
cobbler profile add --name=#{profile} --distro=centos53-i386 

# Add the new machine with an IP and Mac address
# By the IP Address it knows in which reverse DNS 
cobbler system add --name=#{name} --ip=#{ip} --mac=#{mac_address}"
# Set the DNS domain of the machine (this determines in which zone file it is create)
cobbler system edit --name=#{name} --profile=#{profile} --dns-name=puppet1.#{domainname}"

# For linux machines, the kernels are provided with an option 
# ks=kickstart so that it will start the kickstart installation
cobbler system edit --name=#{name} --kickstart=/var/lib/cobbler/kickstarts/#{kickstart}
cobbler system edit --name=#{name}  --name-servers-search='#{domainname}'

# This finally commits all the changes
cobbler sync

Commandline creation of Msdos floppy on MacOSX

(2009-11-17) - Comments

I wanted to automate the creation of an MS-DOS floppy under MacOSX. The usual way is very similar to Linux. In this script I avoided using sudo to mount the filesystem and took advantage of the hdiutils to run everything under normal user credentials.

# creating an empty file
dd if=/dev/zero bs=512 count=2880 of=msdos-floppy.img

# associate the file with a device without mounting it
device=`hdid -nomount msdos-floppy.img`

# formatting disk with msdos format
newfs_msdos $device

# detach the file from the associated device
hdiutil detach $device -force

# mounting the image file 
device=`hdid msdos-floppy.img|cut -d ' ' -f 1`

# calculate the mountpoint by checking the mount table
path=`mount |grep -w '$device' | cut -d ' ' -f 3- | cut -d '(' -f 1`

# copying file to the mountpoint $path
cp file $path/ 

# unmount the image
hdiutil detach $device -forceend


Controlling virtual Machines with an API

(2009-11-17) - Comments

In the old days, getting a new machine could take days. It required ordering of hardware and putting everything together. Now in the these virtual/cloud days, creating new machines is a breeze. While a lot of effort is spent on automating the installation of the machine OS and its application, I see that the provisioning of a virtual machine is often still done by the GUI. So why not automate that step too.

Depending on the virtualization platform you choose, different options exist ranging from GUI (HTTP Posts), Command Lines, SOAP, XML-RPC based or language bindings. What follows is a list of ways I found. Again my experience is that most programming oriented XML-RPC, SOPA or Language Bindings are a subset of the commandline interface.

In all cases, the commandline API is updated first and then the rest follows. Again, this strengthens me to say that when automating the creation within Ruby you should actually write a wrapper around the commandline API. Because the target audience is sysadmins, this makes a lot of sense. At the end I provide an example using VirtualBox CommandLine API wrapped in Ruby Language.

Vmware

Vmware is one of the most used virtualization tools. These are some of the ways it can be scripted:

LibVirt

LibVirt tries to be virtual machine neutral: it has an abstraction for: Xen hypervisor on Linux and Solaris hosts, QEMU emulator, KVM Linux hypervisor, LXC Linux container system, OpenVZ Linux container system, User Mode Linux paravirtualized kernel. It also has experimental support for Virtualbox and Vmware ESX and GSX hypervisors but I found these unstable.

At lot of enterprise management tools use libvirt to build on:

Solaris Zones

Sun take another approach with their zones. They provide an excellent command API to manage their zones. http://www.sun.com/bigadmin/content/zones/. Beautifully for scripting.

Controlling Physical Machines

Even non virtual machines can be controlled: you can use wakeonlan, ipmitool or some management interface to power up/down the machines. Cobbler has this kind of powermanagement intergface https://fedorahosted.org/cobbler/wiki/PowerManagement to manage bullpap, wti, apc_snmp, ether-wake, ipmilan, drac, ipmitool, ilo, rsa , lpar, bladecenter.

Virtualbox

Virtualbox has one of the most excellent command Line available. And they generate their SOAP API from the same source!

Example of the Soap Interface

require 'soap/wsdlDriver'
require 'pp'

WSDL_URL="vboxwebService.wsdl"

soap = SOAP::WSDLDriverFactory.new(WSDL_URL).create_rpc_driver
soap.wiredump_dev=STDERR
#soap = SOAP::WSDLDriverFactory.new(WSDL_URL).create_rpc_driver("vboxService", "vboxServicePort")
#pp soap.methods
vbox=soap.IWebsessionManager_logon({:username => '', :password => ''})
puts "Sessions"+vbox.returnval
version=soap.IVirtualBox_getVersion({:_this => vbox.returnval})
puts version.returnval
disks=soap.IVirtualBox_getHardDisks({:_this => vbox.returnval})
diskids=disks.returnval
diskids.each do |diskid| 
  type=soap.IHardDisk_getType({:_this => diskid })
  size=soap.IHardDisk_getLogicalSize({:_this => diskid })
  location=soap.IMedium_getLocation({:_this => diskid })
  puts diskid+"-"+type.returnval+"-"+size.returnval+location.returnval
end

My commandline based abstraction in Ruby

  1 
  2 require "rubygems"
  3 require "open4"
  4 require "pp"
  5 require "systr/commands"
  6 
  7 #if disk has not been specified with fullname then is stores it in the default VBOX Location
  8 
  9 def wait_for_state_vmachine(vmname, state, options={ })
 10   defaults={ :timeout => 1000 , :pollrate => 5 }
 11   options=defaults.merge(options) 
 12   
 13   begin
 14     Timeout::timeout(options[:timeout]) do
 15     while true do
 16         begin
 17           puts "polling state"
 18           actualstate=state_vmachine(vmname)
 19           if actualstate==state
 20             return true
 21           else
 22             puts "Currentstate: "+actualstate
 23             sleep options[:pollrate]
 24           end 
 25         end
 26       end
 27     end
 28   rescue Timeout::Error
 29     raise 'timeout waiting for machine to reach state #{state}'
 30   end
 31   
 32 end
 33 
 34 def state_vmachine(vmname)
 35   result=Command.execute("VBoxManage showvminfo #{vmname} --machinereadable|grep VMState=|cut -d '=' -f 2|cut -d '"+'"'+"' -f 2")
 36   state=result.stdout.to_s
 37   return state
 38 end
 39 
 40 
 41 def remove_vmachine(vmname,options={})
 42 
 43   defaults={ :disk => vmname }
 44 
 45   options=defaults.merge(options)
 46 
 47     if (state_vmachine(vmname)!="poweroff")
 48       Command.comment("Can't remove a running machine")
 49       throw "machine is still running"
 50     end
 51     
 52     Command.execute("VBoxManage modifyvm #{vmname} -sataport1 none")
 53     #Command.execute("VBoxManage snapshot #{vmname} discardcurrent --all")
 54 
 55     Command.execute("VBoxManage closemedium disk #{vmname}.vdi")
 56   
 57     Command.execute("VBoxManage unregistervm #{vmname} --delete")
 58     
 59   
 60   #first stop machine
 61   
 62   #then unregister
 63   
 64   #then remove it?
 65 end
 66 
 67 def exists_vmachine(vmname)
 68     return Command.test("VBoxManage showvminfo #{vmname}")
 69 end
 70 
 71 def floppy_kickstart_vmachine (vmname)
 72   #http://www.win.tue.nl/~aeb/linux/kbd/scancodes-1.html
 73   Command.execute("VBoxManage controlvm #{vmname} keyboardputscancode 26 17 31 16 2d 39 25 1f 0d 21 26 18 19 19 15 1c")
 74 
 75 end
 76 
 77 def add_floppy_vmachine(vmname,floppyfile)
 78   Command.execute("VBoxManage modifyvm #{vmname} -floppy #{floppyfile}")
 79 end
 80 
 81 def create_vmachine(vmname,options={})
 82 
 83   defaults={:ostype => 'RedHat', :memory => '384', :disk => vmname, :net => 'pxenet'}
 84    
 85 
 86   options=defaults.merge(options)
 87 
 88   Command.execute("VBoxManage createvm -name #{vmname} -ostype #{options[:ostype]} -register")
 89 # Trying hostonly 
 90 #  Command.execute("VBoxManage modifyvm #{vmname} -nic1 nat -nic2 intnet -intnet2 #{options[:net]}")
 91 # http://www.virtualbox.org/manual/UserManual.html#networkingdetails
 92 
 93   
 94   Command.execute("VBoxManage modifyvm #{vmname} #{options[:network]}")
 95 
 96   #TODO: VBoxManage modifyvm puppet1 -macaddress1 aaaabbbbcc01
 97 
 98 
 99   Command.execute("VBoxManage modifyvm #{vmname} -memory #{options[:memory]}")
100   Command.execute("VBoxManage modifyvm #{vmname} -sata on -sataport1 #{options[:disk]}.vdi -sataportcount 1")
101 
102   unless options[:dvd].nil?
103     Command.execute("VBoxManage modifyvm #{vmname} -dvd #{options[:dvd]}")
104   else
105     Command.execute("VBoxManage modifyvm #{vmname} -dvd none")
106   end
107   
108   unless options[:floppy].nil?
109     Command.execute("VBoxManage modifyvm #{vmname} -floppy #{options[:floppy]}")
110   else
111     Command.execute("VBoxManage modifyvm #{vmname} -floppy empty")
112     
113   end
114 
115   Command.execute("VBoxManage modifyvm #{vmname} --vram 32")
116   Command.execute("VBoxManage modifyvm #{vmname} --bioslogodisplaytime 0")
117 #  Command.execute("VBoxManage modifyvm #{vmname} --bioslogoimagepath path-to-256-bmp")
118 
119   Command.execute("VBoxManage modifyvm #{vmname} --acpi on")
120   Command.execute("VBoxManage modifyvm #{vmname} --ioapic on")
121 
122   Command.execute("VBoxManage modifyvm #{vmname} -boot1 disk")
123   Command.execute("VBoxManage modifyvm #{vmname} -boot2 dvd")
124   Command.execute("VBoxManage modifyvm #{vmname} -boot3 net")
125 
126   #suppress interactive messages
127   Command.execute("VBoxManage setextradata global 'GUI/RegistrationData' 'triesLeft=0'")
128   Command.execute("VBoxManage setextradata global 'GUI/UpdateDate' '1 d, 2009-09-20'")
129   Command.execute("VBoxManage setextradata global 'GUI/SuppressMessages' ',confirmInputCapture,remindAboutAutoCapture'")
130 
131 end
132 
133 def remove_dhcp
134   result=Command.execute("VBoxManage list dhcpservers| grep NetworkName:|cut -d '-' -f 2").stdout
135   result.each do |interface|
136     Command.execute("VBoxManage  dhcpserver remove --ifname #{interface}")
137   end
138 
139 #  ERROR: Assertion failed at '/Users/vbox/tinderbox/3.0-mac-rel/src/VBox/Main/VirtualBoxImpl.cpp' (1706) in virtual nsresult VirtualBox::SetExtraData(const PRUnichar*, const PRUnichar*).
140 #  Unexpected exception 'N3xml12EIPRTFailureE' (Runtime error: -250 (Unresolved (unknown) device i/o error.)).
141 #  Please contact the product vendor!
142 #  Details: code NS_ERROR_FAILURE (0x80004005), component VirtualBox, interface IVirtualBox, callee nsISupports
143 #  Context: "EnableStaticIpConfig(Bstr(pIp), Bstr(pNetmask))" at line 267 of file VBoxManageHostonly.cpp
144 
145   Command.execute("VBoxManage hostonlyif ipconfig vboxnet0 --ip 192.168.10.1 --netmask 255.255.255.0")
146   
147 end
148 
149 def start_vmachine(vmname)
150   Command.comment("starting virtual machine #{vmname}")
151   if ENV['SYSTR_HEADLESS'].nil?
152     Command.execute("VBoxManage startvm  #{vmname}")  
153   else
154     system("VBoxHeadless -s #{vmname} --vrdp off &") 
155     sleep 4 
156   end
157 end
158 
159 def stop_vmachine(vmname)
160   Command.comment("stopping virtual machine #{vmname}")
161   Command.execute("VBoxManage controlvm  #{vmname} poweroff")    
162 end
163 
164 def remove_snapshot_vmachine(vmname,snapname)
165   Command.execute("VBoxManage snapshot #{vmname} discard #{snapname}")  
166 
167 end
168 
169 def create_snapshot_vmachine(vmname,snapname)
170   Command.execute("VBoxManage snapshot #{vmname} take #{snapname}")  
171 end
172 
173 def ssh_enable_vmachine(vmname, options={})
174   defaults={:localport => 2222 , :remoteport => 22}
175   options=defaults.merge(options)
176   
177   Command.execute("VBoxManage setextradata #{vmname} 'VBoxInternal/Devices/pcnet/0/LUN#0/Config/ssh/HostPort' #{options[:localport]}")
178   Command.execute("VBoxManage setextradata #{vmname} 'VBoxInternal/Devices/pcnet/0/LUN#0/Config/ssh/GuestPort' #{options[:remoteport]}")
179   Command.execute("VBoxManage setextradata #{vmname} 'VBoxInternal/Devices/pcnet/0/LUN#0/Config/ssh/Protocol' TCP")
180   return options[:port]
181 end
182 
183 
184