Coherent Ramblings

2010/05/24

clusto do-my-dishes

Filed under: Clusto, Digg, Systems Engineering — Tags: , — plathrop @ 10:50 am

While discussing the proper class hierarchy for adding our Netscalers to Clusto, we decided that Appliance was the proper name for the base class.

This of course led to the idea of coding up a driver for a dishwasher. clusto do-my-dishes FTW!

2010/05/11

Git Hooks: Branch ACLs and more.

Filed under: Git, Tools — Tags: , — plathrop @ 1:47 pm

Recently at Digg, we wanted to open up the git repository containing our Puppet manifests so that our developers could work with it. However, we wanted to maintain some control over which branches they could push to, in order to prevent accidental commits to the production manifests. In addition to this fine-grained authorization, we wanted the ability to force their development branch to track ours; we always want them working from the latest code we’ve committed.

There are some heavier-weight options we could use to accomplish this sort of thing, but we aren’t yet ready to move to a more complicated tool chain for this sort of thing. I was pretty sure we could accomplish this with git hooks. I was right:

#!/usr/bin/env python
 
import os
import re
 
from sys import argv, exit
from subprocess import Popen, PIPE
 
 
def blank(line):
    regex = re.compile('^\s*$')
    if regex.search(line):
        return True
    else:
        return False
 
 
def uncomment(line):
    return line.partition('#')[0]
 
 
def cleanup(lines):
    # Remove full-line comments.
    lines = [line for line in lines if not line.startswith('#')]
    # Remove blank lines.
    lines = [line for line in lines if not blank(line)]
    # Remove comments from line ends.
    lines = [uncomment(line) for line in lines]
 
    return lines
 
def parse_acl():
    try:
        fh = open('acl')
    except IOError:
        print "Could not open ACL file. Exiting."
        exit(1)
 
    # Read and close ACL file.
    lines = fh.readlines()
    fh.close()
    lines = cleanup(lines)
 
    # The ACL lines are whitespace separated. The first field is the
    # username, the second field is a regex describing the refs the
    # user is allowed to update.
    access = {}
    for line in lines:
        record = line.split()
        r_user = record[0]
        r_ref = record[1]
        if r_user in access:
            access[r_user] = access[r_user] + [r_ref]
        else:
            access[r_user] = [r_ref]
 
    return access
 
 
def authorize(refname='', user='', access={}):
    if not user in access:
        return False
 
    allowed_refs = access[user]
    for a_ref in allowed_refs:
        regex = re.compile("^%s$" % a_ref)
        if regex.search(refname):
            return True
 
    return False
 
 
def parse_git_config(config_entry):
    output = Popen(["git", "config", "--bool", config_entry], stdout=PIPE).communicate()[0]
    if output.strip() == 'true':
        return True
    else:
        return False
 
 
def parse_branch_tracking():
    try:
        fh = open('branch_tracking')
    except IOError:
        print "Could not open branch tracking file. Exiting."
        exit(1)
 
    # Read and close branch tracking file.
    lines = fh.readlines()
    fh.close()
    lines = cleanup(lines)
 
    # Branch tracking lines are whitespace separated. The first field
    # is the branch that must track the branch specified in the second
    # field.
    tracking = {}
    for line in lines:
        record = line.split()
        r_branch = record[0]
        r_track = record[1]
        tracking[r_branch] = r_track
 
    return tracking
 
 
def get_tracked_branch(refname):
    track = parse_branch_tracking()
    try:
        return track[refname]
    except KeyError:
        return None
 
 
def get_missing_refs(ref, base_ref):
    # git rev-list gives us the commits reachable from base_ref that
    # are NOT reachable from ref.
    output = Popen(["git", "rev-list", '%s..%s' % (ref, base_ref)], stdout=PIPE).communicate()[0]
    return output.splitlines()
 
 
def main():
    # The args are passed in by git.
    refname = argv[1]
    old_rev = argv[2]
    new_rev = argv[3]
    user = os.environ['USER']
 
    # Check if 'push acl' is enabled.
    push_acl = parse_git_config('hooks.pushacl')
 
    if push_acl:
        # Check the user's authorization to update these git refs.
        access = parse_acl()
        if not authorize(refname=refname, user=user, access=access):
            print "Could not update %s, permission denied by ACL." % refname
            exit(1)
 
    # Check if 'forced branch tracking' is enabled.
    force_tracking = parse_git_config('hooks.forcebranchtracking')
 
    if force_tracking:
        tbranch = get_tracked_branch(refname)
        if tbranch:
            # Get the refs that are reachable from our tracked branch
            # but NOT reachable from our new revision.
            missed_refs = get_missing_refs(new_rev, tbranch)
 
            if len(missed_refs) > 0:
                print "Could not update %s, you need to merge %s." % (refname, tbranch)
                exit(1)
 
 
if __name__ == '__main__':
    main()

This implements two git config options:

git config --bool hooks.pushacl
git config --bool hooks.forcebranchtracking

They are both meant to be set on a bare git repository (git init --bare). The first option causes the update hook to look for a file called ‘acl’ in the root of the bare repo. Here is the ACL file I’m using for our puppet configs right now:

plathrop refs/.*/.*
ron refs/.*/.*
synack refs/.*/.*
wfrancis refs/.*/.*
kad refs/.*/.*
mike refs/.*/.*
rcoli refs/heads/(development|(rcoli/.*)){1}
goffinet refs/heads/experimental
rich refs/heads/experimental
kelvin refs/heads/experimental

The entries are regular expressions matching git “refs”. So, I have full access, rcoli can push to the development branch or any branch starting with “rcoli/” (but cannot create tags), and goffinet and his fellow developers can push to the experimental branch (but cannot create tags).

The second option is probably badly named. It looks for a file called branch_tracking in the root of the bare repo. That file looks like this:

refs/heads/experimental refs/heads/development

When hooks.forcebranchtracking is set, the hook will enforce that the branch on the left contain all the commits from the branch on the right before it will accept any updates. Essentially this forces the experimental branch to track the development branch, and requires
regular runs of git pull to stay in sync.

2010/05/03

Debian Packaging Puppet Manifests

Filed under: Puppet — Tags: , , — plathrop @ 5:57 pm

With the (new?) ability for Puppet to have multiple module_dirs, the idea occurred to me: why not package a module as a Debian package?

The packaging process was relatively easy. Check out the results on GitHub.

Enjoy!

2008/07/07

Getting python-ldap installed on OS X

Filed under: Python, Quick Tips — Tags: , , — plathrop @ 4:13 pm

If you want to use Fink, and build a bunch of crap you don’t need, follow these instructions.

If you are like me and prefer to use the libraries that are already installed, just do this:

  1. If you haven’t already, install XCode.
  2. Download python-ldap
  3. and extract it.

  4. Edit setup.cfg. The relevant lines are below:
    library_dirs = /Developer/SDKs/MacOSX10.5.sdk/usr/lib/
    include_dirs = /Developer/SDKs/MacOSX10.5.sdk/usr/include/ /Developer/SDKs/MacOSX10.5.sdk/usr/include/sasl
  5. sudo python setup.py install
  6. That’s all!

2008/04/18

Creating Puppet Modules

Filed under: Philosophy, Puppet — Tags: , — plathrop @ 3:56 pm

In Puppet parlance, a module is a collection of manifests (including classes and definitions), templates, and files which, taken together, describe a recipe for configuring something via Puppet. Puppet has a number of facilities which make modules incredibly useful and labor-saving.

Modules have a standard internal organization which is described in detail on the ModuleOrganisation wiki page. A trivial module can be as simple as a manifests directory containing only a single manifest: init.pp. However, as modules grow more complex, you’ll want to break up your manifests and add templates in a templates directory, files in a files directory, and a README file which explains how to make use of your module.

I’m in the process of polishing up several modules, including LDAP, Kerberos, memcached, and ntp modules. In some ways, I’m duplicating work; several implementations of these modules are available. However, Puppet modules are still evolving, and I wanted to try my hand at module writing. Also, the existing modules did not work quite right for us. Some of them fail to properly isolate site-specific information from the recipe, for example. Others had complex interdependencies that I didn’t like.

There are several techniques that I think will evolve as “best practices” for Puppet module design. These are:

  1. Keep site-specific customization out of your modules.
  2. Your module should implement a minimal working configuration “out of the box”.
  3. Use variables to make it easier to patch your module for use on other platforms.
  4. Break a module up into sensible classes and defined types to make it easy to customize via inheritance.
  5. Use init.pp for module-wide variables and/or common functionality, but break all other classes/defines into separate files.

Let’s take a look at each of these in detail:

Keep site-specific customization out of your modules.

Modules are, at heart, meant to be shared with others. Try to write your modules so they are as site-neutral as they can be. Use variables that can be set in a higher-level scope like site.pp to control how the module works, instead of baking your site’s settings into the module. For example, in my LDAP module I do this:

class ldap::common {
  case $ldap_base_dn {
    "": { $ldap_base_dn = "dc=example,dc=com"
      warning("ldap_base_dn not set, using default $ldap_base_dn")
    }
  }
 
  case $ldap_admin_dn {
    "": { $ldap_admin_dn = "cn=admin,$ldap_base_dn"
      warning("ldap_admin_dn not set, using default $ldap_admin_dn")
    }
  }
 
  case $ldap_admin_password {
    "": { fail("ldap_admin_password not set!")
    }
  }

Then, in site.pp, I set the variables appropriately:

# Site Variables
$ldap_server           = "ash001.example.com"
$ldap_admin_password   = "testing"
$ldap_base_dn          = "dc=example,dc=com"

Your module should implement a minimal working configuration “out of the box”.

This principle is also tied in with the fact that modules were meant to be shared. When someone is looking for a module, they are going to want something that works with little-to-no configuration, because this will give them confidence that the module will work once it is fully configured. Just like full-blown pieces of software, if a module doesn’t have an easy initial setup, people will get confused as to whether or not it even works. My LDAP module provides a ldap::master class which implements a very basic configuration: the slapd package is installed, a working configuration file is set up (without SSL or any of the other goodies), and the service is started. If the user sets just one variable, $ldap_admin_password in their init.pp and includes ldap::master on a node, they will be able to verify that LDAP is up and running with an example configuration. Even better, if they set the other variables, this configuration will be customized to their site with little effort on their part. It might be better to add in all the bells and whistles (SSL at a minimum), but I’m not sure yet where I stand on that.

Use variables to make it easier to patch your module for use on other platforms.

Currently, in the init.pp of my LDAP module, I set variables like this:

  $ldappackage       = "slapd"
  $ldapservice       = "slapd"
  $ldapdir           = "/etc/ldap"
  $ldaputilpackage   = "ldap-utils"
  $ldapclientpackage = "libnss-ldap"

and use them like this:

  package {
    $ldappackage:     ensure => installed;
  }
 
  file { "$ldapdir/slapd.conf":
    content => template("ldap/slapd.conf.erb"),
    require => Package[$ldappackage],
    notify  => Service["$ldapservice"],
  }
 
  service { $ldapservice:
    require   => [ Package[$ldappackage], File["$ldapdir/slapd.conf"] ],
    ensure    => running,
    enable    => true,
  }

These variables are set up for Debian now, because that is the distribution I’ve standardized on. If later I (or someone I’ve shared the module with) wants to add support for another distribution, they can set up case statements, and not have to modify the rest of the module!

  case $operatingsystem {
    debian: {
      $ldappackage       = "slapd"
      $ldapservice       = "slapd"
      $ldapdir           = "/etc/ldap"
      $ldaputilpackage   = "ldap-utils"
      $ldapclientpackage = "libnss-ldap"
    }
    centos: {
      $ldappackage       = "openldap"
      $ldapservice       = "slapd"
      $ldapdir           = "/etc/openldap"
      $ldaputilpackage   = "openldap-utils"
      $ldapclientpackage = "libnss-ldap"
    }
  }

I have not tested this, as I don’t use CentOS, so don’t use these!

Break a module up into sensible classes and defined types to make it easy to customize via inheritance.

Since you’ve pulled all the site-specific customization out of your package, and you are providing a minimal working configuration, people are going to want to do other things with your module that are appropriate to their site. If you have everything lumped into one big “ldap” class, this is going to be difficult for them. I break thinks up into ldap::common for things common to all the other classes, ldap::client for things needed to query the LDAP servers, ldap::master for the primary LDAP server, and later I’ll provide ldap::slave for replication slaves.

Use init.pp for module-wide variables and/or common functionality, but break all other classes/defines into separate files.

This is for your sanity as well as the sanity of those who might want to modify your module. I started with everything in one big init.pp file and it rapidly went out of control. Puppet does some awesome automagical lookups that you can take advantage of. For example, my ldap::master class is defined in a file called master.pp; when someone tries to load ldap::master, Puppet automatically searches for a .pp file named “master” in the “ldap” module!

So far, this is all the wisdom I have to impart on the subject. I’ll be sure to post links here when these modules are ready for prime-time.

2008/03/24

SSH tab completion

Filed under: Quick Tips, Tools — Tags: — plathrop @ 9:57 am

This is awesome. Put the following into your ~/.bash_profile file:

complete -W "$(echo `cat ~/.ssh/known_hosts | cut -f 1 -d ' ' \
| sed -e s/,.*//g | uniq | grep -v "\["`;)" ssh

Open a new shell and type ssh tab tab. I love tab completion, don’t you?

2008/03/12

Virtualization Article

Filed under: Tools — Tags: — plathrop @ 10:41 am

This article might be interesting to some of you.

2008/02/03

Capistrano

Filed under: Tools — Tags: , , — plathrop @ 2:51 pm

One of the things you quickly learn as you become a more sophisticated systems administrator is that performing the same task on several servers is both tedious and error-prone. Of course, if your infrastructure has been deployed in an ad-hoc manner, sometimes you don’t have a choice; the task is unique on each machine simply because each machine has a unique configuration. However, once you have made the transition to stock configurations on multiple machines, you gain the ability to use new tools to perform tasks in parallel on a number of systems at once. The tendency for many admins is to set up public-key authentication and hand-roll a script. This isn’t necessarily a bad approach; sometimes a quick-and-dirty script is all you really need to get the job done. On the other hand, there are tools which can make the process more elegant, consistent, and professional. The two I know of are ClusterSSH and Capistrano.

I’m fairly new to Capistrano, but I’ve been using it to perform some simple tasks for a couple of months, and I’m really pleased with the results. I first encountered Capistrano when I was looking for a simple way to remove the SNMP agent from a group of about 20 machines. I knew I could whip up a script, but I had remembered reading a blog post somewhere (sorry, I don’t remember where) about ClusterSSH; several Google searches later I had found both ClusterSSH and Capistrano. At the time, I was learning Ruby (in order to get involved in Puppet development), so Capistrano seemed the natural choice.

According to the website, Capistrano was originally written to help deploy Ruby on Rails applications. Like many tools, it has grown beyond its initial mission, and is now a powerful tool for systems administration. Capistrano makes some assumptions about your infrastructure. First, you must be using SSH to access your systems, because Capistrano does not support older methods such as telnet (and if you are still using telnet, shame on you!) Second, your systems must have a POSIX shell in the default system path. For most *nix systems these days, this is a given. Finally, to *really* utilize Capistrano correctly, you should be using public-key authentication to access your servers. You don’t necessarily need to use password-less keys (in fact, I suggest you resist that temptation in general). Just use ssh-agent to keep your passphrase in memory. All of these assumptions are true for my environment, so I got started.

Installation was trivial using RubyGems. Next I needed to create a “capfile” – the Capistrano equivalent of a “Makefile”. The “capfile” gives Capistrano information about your environment and defines “roles” and “tasks”. “Roles” describe groups of systems, “webservers” or “firewalls” for example. “Tasks” are the actions you wish to perform. After reading through the Basics on the Capistrano website, it was fairly simple to define a task to do what I wanted:

task :remove_snmp do
  run "sudo aptitude -y purge snmp"
end

One complication of my environment is that most of the systems are not directly accessible from outside. We have an SSH “bastion host” which acts as a gateway to the internal network. Luckily, Capistrano is ready for this. I added the following to the beginning of my capfile:

set :gateway, "ssh-gateway.example.com"

This tells Capistrano to first establish an SSH connection to ssh-gateway.example.com, and connect from that machine to the systems we want to run commands on. “But how does Capistrano know what systems to run commands on?” you ask. Good question! That is what we need a “role” for:

role :de_snmp, "server1.example.com", "server2.example.com", "server3.example.com", "server4.example.com", "server5.example.com", "server6.example.com"

Before you run a task on all of your systems at once, you probably want to test it first. I usually try the task by hand on one or two systems. Once I am happy with the procedure and the results, I pick several less-critical (or easy to rollback) hosts to do a second pass on. Finally, after all is well with those hosts, I run the task on the full group. Here is our completed “capfile” for the second pass:

1
2
3
4
5
6
7
set :gateway, "ssh-gateway.example.com"
 
role :de_snmp, "server1.example.com", "server2.example.com", "server3.example.com", "server4.example.com", "server5.example.com", "server6.example.com"
 
task :remove_snmp, :roles => :de_snmp do
  run "sudo aptitude -y purge snmp"
end

As you can see, we’ve modified our “task” definition to apply to the “role” we created. All we have to do now is run cap remove_snmp and Capistrano will do its job! There is plenty of output, so it is fairly easy to review what is happening (though it can be challenging to follow the events on a single server unless you know your way around grep; you do, right?)

This is just the tip of the iceberg. Capistrano is very sophisticated, allowing your tasks to be self-documenting, divided into namespaces, use variables, and more! Perhaps the most powerful feature is the ability to define “transactions” and “rollback” functions. Although you still have to manually define what a “rollback” means, Capistrano allows you to do that once, and use the capability as often as necessary. In addition, you have the full power of Ruby at your command in Capistrano scripts. I haven’t had the need to explore these features but as I do I’ll be sure to share my experiences here!

2008/01/02

Puppet Modules and Apt

Filed under: Puppet — Tags: , — admin @ 4:15 pm

One of the most exciting features of Puppet is the way it enables the sharing of systems administration tools – “recipes” in the Puppet parlance. Puppet allows us to collect manifests, definitions, and even plugins into modules which can then be shared with others.

David Schmitt is the author of the Complete Configuration found on the Puppet wiki. This is an amazing example of what can be done with Puppet, but can be a bit overwhelming at first. Happily, it is reasonably easy to integrate David’s modules into your Puppet configuration one at a time. Today I’d like to show you how to use David’s apt module to manage the dpkg & apt databases, as well as the keyrings associated with them.

In order to get started with David’s modules, we’ll need to take a few steps. The first step is upgrading to the latest release of Puppet. In an earlier post I advocated using the Debian packages; on further reflection, it might be best to install from source (or roll your own package). Puppet is a bit of a moving target right now, and you’ll definitely want to keep up. Installing from source is easy enough. On Debian Etch, I used the following steps to install from source:

1
2
3
4
5
6
7
8
9
10
aptitude remove puppet puppetmaster
aptitude install libopenssl-ruby libshadow-ruby1.8 libxmlrpc-ruby
curl -kO https://reductivelabs.com/downloads/facter/facter-1.3.7.tgz
tar xzf facter-1.3.7.tgz
curl -O http://reductivelabs.com/downloads/puppet/puppet-0.24.1.tgz
tar xzf puppet-0.24.1.tgz
cd facter-1.3.7
ruby install.rb
cd ../puppet-0.24.1
ruby install.rb

In short: remove the old version, install the prerequisite ruby libraries, download the latest source of Facter and Puppet, and install them both. The init scripts didn’t work after this, so I rolled my own.

The new version of Puppet allows module authors to distribute plugins within their modules. However, to make use of this, we need to add a couple of lines to the [main] section of our puppet.conf file:

  pluginsync = true
  pluginsource = puppet://$server/plugins

You’ll want to create a “modules” subdirectory to store your modules in. According to the Puppet Best Practices page, this directory should be in the same place as your “manifests” directory. For me, that means /etc/puppet:

total 36K
drwxr-xr-x  6 root root 4.0K 2008-01-02 13:47 .
drwxr-xr-x 71 root root 4.0K 2008-01-02 13:44 ..
drwxr-xr-x  7 root root 4.0K 2008-01-02 13:47 files
-rw-r--r--  1 root root  664 2008-01-02 13:47 fileserver.conf
drwxr-xr-x  4 root root 4.0K 2008-01-02 13:57 manifests
drwxr-xr-x  4 root root 4.0K 2008-01-02 13:47 modules
-rw-r--r--  1 root root  250 2008-01-02 13:47 puppet.conf

Next, we’ll want to download the modules we wish to use. All of David’s modules depend on the “common” module, so we will want “common” and “apt”. David makes it easy to pull down each module independently. Here’s what I did:

1
2
3
4
5
6
7
8
9
10
11
12
cd /etc/puppet/modules
mkdir common
cd common
git init
git remote add -t common -m common -f origin git://git.black.co.at/manifests/
git pull
cd /etc/puppet/modules
mkdir apt
cd apt
git init
git remote add -t apt -m apt -f origin git://git.black.co.at/manifests/
git pull

You should now have local copies of both the “common” module and the “apt” module. If you read the documentation, you will notice that these modules require the installation of the “lsb-release” package. I’m not sure why David doesn’t have the module install the package, but he doesn’t. So, let’s have puppet install it. A new “package” resource in our default node should do the trick:

25
26
27
  package { sudo: ensure => installed }
  package { lsb-release: ensure => installed }
}

Next, we need to integrate the new modules into our puppet configuration. Now that we’re getting into more complex configurations, we’ll want to start breaking our puppet configuration into separate files. In the same directory as site.pp (/etc/puppet/manifests), create a file called modules.pp containing:

1
2
import “common”
import “apt”

As you add more modules, you’ll import them here.

We won’t be explicitly using the common module at this time, so all we need to do is add two lines to site.pp. At the top of the file, we’ll want:

1
import modules.pp

to pull in the file we just created. We also want to update the default node to make use of the apt module:

7
8
9
10
node default {
  include apt
 
  file {/root/.ssh/authorized_keys’:

It turns out that David’s modules also depend on a filebucket resource named “server”. This is a site-wide resource, so site.pp is the ideal place to put it:

3
4
5
6
7
filebucket { server:
  server => “server1.example.com}
 
node default {

After all these modifications, our site.pp looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import modules.pp
 
filebucket { server:
  server => “server1.example.com}
 
node default {
  include apt
 
  file {/root/.ssh/authorized_keys’:
    owner => root,
    group => root,
    mode => 644,
    source => ‘puppet:///root/.ssh/authorized_keys’
  }
 
  file {/etc/sudoers’:
    owner => root,
    group => root,
    mode => 440,
    source => ‘puppet:///etc/sudoers’
    require => [ Package[“sudo”] ]
  }
 
  package { sudo: ensure => installed }
  package { lsb-release: ensure => installed }
}

I had to do one more thing to get this to really work. Reading through David’s module, you would expect /var/lib/puppet/modules/apt to be created by Puppet. Unfortunately, this doesn’t seem to happen. I had to create this directory by hand. I’ll let you know if/when I figure out why.

For basic usage of the module, this is all you need! David’s module defaults to a very sane basic apt configuration. The only tweak I felt was necessary was a modification to modules/apt/templates/sources.list.erb to use a US Debian mirror. You can modify this file to your liking for your default sources.list. You can also use the $custom_sources_list variable in a node definition to provide a customized file for a specific node:

node “oddball.example.com{
  $custom_sources_list = “#This box is located in Canada, so use a local mirror
deb http://ftp.ca.debian.org/debian etch main contrib non-free
deb http://security.debian.org/ etch/updates main contrib non-free” include “apt”
}

David has a number of other modules I’d like to try; as I get them working I’ll be sure to share my experiences here.

2007/12/15

Clarification

Filed under: Puppet — Tags: — admin @ 9:55 pm

As I write these entries about Puppet I want to make it clear that the configurations I post may not necessarily be “best practices”. I am attempting to follow a logical evolution of steps involved in becoming familiar with Puppet. As the configuration grows more complex I will introduce the various best practices which make complex Puppet configurations manageable. If you are looking to dive straight in to a fully-realized puppet configuration, take a look at the Complete Configuration page of the Puppet wiki.

Older Posts »

Powered by WordPress