While discussing the proper class hierarchy for adding our Netscalers to Clusto, we decided that Appliance was the proper name for the base class.
This of course led to the idea of coding up a driver for a dishwasher. clusto do-my-dishes FTW!
Recently at Digg, we wanted to open up the git repository containing our Puppet manifests so that our developers could work with it. However, we wanted to maintain some control over which branches they could push to, in order to prevent accidental commits to the production manifests. In addition to this fine-grained authorization, we wanted the ability to force their development branch to track ours; we always want them working from the latest code we’ve committed.
There are some heavier-weight options we could use to accomplish this sort of thing, but we aren’t yet ready to move to a more complicated tool chain for this sort of thing. I was pretty sure we could accomplish this with git hooks. I was right:
#!/usr/bin/env python import os import re from sys import argv, exit from subprocess import Popen, PIPE def blank(line): regex = re.compile('^\s*$') if regex.search(line): return True else: return False def uncomment(line): return line.partition('#')[0] def cleanup(lines): # Remove full-line comments. lines = [line for line in lines if not line.startswith('#')] # Remove blank lines. lines = [line for line in lines if not blank(line)] # Remove comments from line ends. lines = [uncomment(line) for line in lines] return lines def parse_acl(): try: fh = open('acl') except IOError: print "Could not open ACL file. Exiting." exit(1) # Read and close ACL file. lines = fh.readlines() fh.close() lines = cleanup(lines) # The ACL lines are whitespace separated. The first field is the # username, the second field is a regex describing the refs the # user is allowed to update. access = {} for line in lines: record = line.split() r_user = record[0] r_ref = record[1] if r_user in access: access[r_user] = access[r_user] + [r_ref] else: access[r_user] = [r_ref] return access def authorize(refname='', user='', access={}): if not user in access: return False allowed_refs = access[user] for a_ref in allowed_refs: regex = re.compile("^%s$" % a_ref) if regex.search(refname): return True return False def parse_git_config(config_entry): output = Popen(["git", "config", "--bool", config_entry], stdout=PIPE).communicate()[0] if output.strip() == 'true': return True else: return False def parse_branch_tracking(): try: fh = open('branch_tracking') except IOError: print "Could not open branch tracking file. Exiting." exit(1) # Read and close branch tracking file. lines = fh.readlines() fh.close() lines = cleanup(lines) # Branch tracking lines are whitespace separated. The first field # is the branch that must track the branch specified in the second # field. tracking = {} for line in lines: record = line.split() r_branch = record[0] r_track = record[1] tracking[r_branch] = r_track return tracking def get_tracked_branch(refname): track = parse_branch_tracking() try: return track[refname] except KeyError: return None def get_missing_refs(ref, base_ref): # git rev-list gives us the commits reachable from base_ref that # are NOT reachable from ref. output = Popen(["git", "rev-list", '%s..%s' % (ref, base_ref)], stdout=PIPE).communicate()[0] return output.splitlines() def main(): # The args are passed in by git. refname = argv[1] old_rev = argv[2] new_rev = argv[3] user = os.environ['USER'] # Check if 'push acl' is enabled. push_acl = parse_git_config('hooks.pushacl') if push_acl: # Check the user's authorization to update these git refs. access = parse_acl() if not authorize(refname=refname, user=user, access=access): print "Could not update %s, permission denied by ACL." % refname exit(1) # Check if 'forced branch tracking' is enabled. force_tracking = parse_git_config('hooks.forcebranchtracking') if force_tracking: tbranch = get_tracked_branch(refname) if tbranch: # Get the refs that are reachable from our tracked branch # but NOT reachable from our new revision. missed_refs = get_missing_refs(new_rev, tbranch) if len(missed_refs) > 0: print "Could not update %s, you need to merge %s." % (refname, tbranch) exit(1) if __name__ == '__main__': main()
This implements two git config options:
git config --bool hooks.pushacl
git config --bool hooks.forcebranchtracking
They are both meant to be set on a bare git repository (git init --bare). The first option causes the update hook to look for a file called ‘acl’ in the root of the bare repo. Here is the ACL file I’m using for our puppet configs right now:
plathrop refs/.*/.*
ron refs/.*/.*
synack refs/.*/.*
wfrancis refs/.*/.*
kad refs/.*/.*
mike refs/.*/.*
rcoli refs/heads/(development|(rcoli/.*)){1}
goffinet refs/heads/experimental
rich refs/heads/experimental
kelvin refs/heads/experimental
The entries are regular expressions matching git “refs”. So, I have full access, rcoli can push to the development branch or any branch starting with “rcoli/” (but cannot create tags), and goffinet and his fellow developers can push to the experimental branch (but cannot create tags).
The second option is probably badly named. It looks for a file called branch_tracking in the root of the bare repo. That file looks like this:
refs/heads/experimental refs/heads/development
When hooks.forcebranchtracking is set, the hook will enforce that the branch on the left contain all the commits from the branch on the right before it will accept any updates. Essentially this forces the experimental branch to track the development branch, and requires
regular runs of git pull to stay in sync.
With the (new?) ability for Puppet to have multiple module_dirs, the idea occurred to me: why not package a module as a Debian package?
The packaging process was relatively easy. Check out the results on GitHub.
Enjoy!
What should you name your hosts?
I used to believe this was an easy problem. Pick a theme, like Star Wars Characters or Constellations, and then name machines based on that theme: “chewbacca.tertiusfamily.net” for example. Use DNS CNAMEs to point your “functional” names (webapp1, or db1 or whatever) at the theme name and you’re off. Then I started working for Digg.
Digg’s infrastructure is massively horizontally-scaled. That means clusters, which means *lots* of individual nodes. No matter what “theme” you choose, you run out of names awfully fast. Not only that, but when you start building out a service-oriented architecture, you stop wanting to deal with individual nodes and start thinking in terms of clusters which provide services instead.
At the Puppet training I went to in Portland, we discussed the issue and I found out that there are two camps (well, as many camps as there are sysadmins, but it can be broken down into two camps). On the one side are those who want to put meta-data into their hostnames. On the other are those who think that hostnames should be as arbitrary as possible. There are strong arguments either way.
I think I come down firmly on the side of arbitrary hostnames. I’m coming to believe that, since IP addresses are necessarily unique, they should be the only unique identifier for an individual node. But I can’t come up with a solid argument as to why that is the best way.
What do you do about naming?
If you want to use Fink, and build a bunch of crap you don’t need, follow these instructions.
If you are like me and prefer to use the libraries that are already installed, just do this:
and extract it.
library_dirs = /Developer/SDKs/MacOSX10.5.sdk/usr/lib/ include_dirs = /Developer/SDKs/MacOSX10.5.sdk/usr/include/ /Developer/SDKs/MacOSX10.5.sdk/usr/include/sasl
That’s all!
I’m at Velocity 2008 right now. So far I’ve learned several things:
At least I managed to score some Puppet schwag (whoever made these shirts has a different definition of XXL than me…) and managed to track down Luke Kanies and Andrew Shafer of Reductive Labs to say “Hi.” Haven’t socialized much with them; they look like they’re working.
About to watch a talk on measuring performance; a topic I’ve always had big questions about; hopefully I’ll learn something good. I’m kinda regretting my decision to sit towards the front, though.
In Puppet parlance, a module is a collection of manifests (including classes and definitions), templates, and files which, taken together, describe a recipe for configuring something via Puppet. Puppet has a number of facilities which make modules incredibly useful and labor-saving.
Modules have a standard internal organization which is described in detail on the ModuleOrganisation wiki page. A trivial module can be as simple as a manifests directory containing only a single manifest: init.pp. However, as modules grow more complex, you’ll want to break up your manifests and add templates in a templates directory, files in a files directory, and a README file which explains how to make use of your module.
I’m in the process of polishing up several modules, including LDAP, Kerberos, memcached, and ntp modules. In some ways, I’m duplicating work; several implementations of these modules are available. However, Puppet modules are still evolving, and I wanted to try my hand at module writing. Also, the existing modules did not work quite right for us. Some of them fail to properly isolate site-specific information from the recipe, for example. Others had complex interdependencies that I didn’t like.
There are several techniques that I think will evolve as “best practices” for Puppet module design. These are:
Let’s take a look at each of these in detail:
Modules are, at heart, meant to be shared with others. Try to write your modules so they are as site-neutral as they can be. Use variables that can be set in a higher-level scope like site.pp to control how the module works, instead of baking your site’s settings into the module. For example, in my LDAP module I do this:
class ldap::common { case $ldap_base_dn { "": { $ldap_base_dn = "dc=example,dc=com" warning("ldap_base_dn not set, using default $ldap_base_dn") } } case $ldap_admin_dn { "": { $ldap_admin_dn = "cn=admin,$ldap_base_dn" warning("ldap_admin_dn not set, using default $ldap_admin_dn") } } case $ldap_admin_password { "": { fail("ldap_admin_password not set!") } }
Then, in site.pp, I set the variables appropriately:
# Site Variables $ldap_server = "ash001.example.com" $ldap_admin_password = "testing" $ldap_base_dn = "dc=example,dc=com"
This principle is also tied in with the fact that modules were meant to be shared. When someone is looking for a module, they are going to want something that works with little-to-no configuration, because this will give them confidence that the module will work once it is fully configured. Just like full-blown pieces of software, if a module doesn’t have an easy initial setup, people will get confused as to whether or not it even works. My LDAP module provides a ldap::master class which implements a very basic configuration: the slapd package is installed, a working configuration file is set up (without SSL or any of the other goodies), and the service is started. If the user sets just one variable, $ldap_admin_password in their init.pp and includes ldap::master on a node, they will be able to verify that LDAP is up and running with an example configuration. Even better, if they set the other variables, this configuration will be customized to their site with little effort on their part. It might be better to add in all the bells and whistles (SSL at a minimum), but I’m not sure yet where I stand on that.
Currently, in the init.pp of my LDAP module, I set variables like this:
$ldappackage = "slapd" $ldapservice = "slapd" $ldapdir = "/etc/ldap" $ldaputilpackage = "ldap-utils" $ldapclientpackage = "libnss-ldap"
and use them like this:
package { $ldappackage: ensure => installed; } file { "$ldapdir/slapd.conf": content => template("ldap/slapd.conf.erb"), require => Package[$ldappackage], notify => Service["$ldapservice"], } service { $ldapservice: require => [ Package[$ldappackage], File["$ldapdir/slapd.conf"] ], ensure => running, enable => true, }
These variables are set up for Debian now, because that is the distribution I’ve standardized on. If later I (or someone I’ve shared the module with) wants to add support for another distribution, they can set up case statements, and not have to modify the rest of the module!
case $operatingsystem { debian: { $ldappackage = "slapd" $ldapservice = "slapd" $ldapdir = "/etc/ldap" $ldaputilpackage = "ldap-utils" $ldapclientpackage = "libnss-ldap" } centos: { $ldappackage = "openldap" $ldapservice = "slapd" $ldapdir = "/etc/openldap" $ldaputilpackage = "openldap-utils" $ldapclientpackage = "libnss-ldap" } }
I have not tested this, as I don’t use CentOS, so don’t use these!
Since you’ve pulled all the site-specific customization out of your package, and you are providing a minimal working configuration, people are going to want to do other things with your module that are appropriate to their site. If you have everything lumped into one big “ldap” class, this is going to be difficult for them. I break thinks up into ldap::common for things common to all the other classes, ldap::client for things needed to query the LDAP servers, ldap::master for the primary LDAP server, and later I’ll provide ldap::slave for replication slaves.
This is for your sanity as well as the sanity of those who might want to modify your module. I started with everything in one big init.pp file and it rapidly went out of control. Puppet does some awesome automagical lookups that you can take advantage of. For example, my ldap::master class is defined in a file called master.pp; when someone tries to load ldap::master, Puppet automatically searches for a .pp file named “master” in the “ldap” module!
So far, this is all the wisdom I have to impart on the subject. I’ll be sure to post links here when these modules are ready for prime-time.
This is awesome. Put the following into your ~/.bash_profile file:
complete -W "$(echo `cat ~/.ssh/known_hosts | cut -f 1 -d ' ' \ | sed -e s/,.*//g | uniq | grep -v "\["`;)" ssh
Open a new shell and type ssh tab tab. I love tab completion, don’t you?
I just followed these instructions to update my server from FreeBSD 6.2-RELEASE to FreeBSD 7.0-RELEASE.
Aside from taking almost a day, it went flawlessly. I’m extremely impressed!
Powered by WordPress