PerlFastLane

Note: You are viewing an old revision of this page. View the current version.

Perl In the Fast -lane

Ok, ok, sorry for the terrible title. But hey I have exciting news: I think I've finally come to understand the awesomeness of the perl -lane method of running one line perl command scripts.

Unfortunately, this does mean I'm saying goodbye to awk. I'm sorry old buddy it's been a long run but it's time for me to move on.

Anyway, in my real job I often have to run scripts on thousands of remote systems via a parallel ssh execution tool (the details of which are unimportant here). For example, I might need to confirm that the version of a particular configuration file is consistent across all the hosts. I can use my parallel execution tool to easily run a command on all 10,000 or so hosts and dump the results in a text file. The problem is I end up with a very long result file that looks like this:

>>> argle.example.com
command: /usr/sbin/db_update -check
db_update: Version available from dist:  1.15.130 (built: Mon Jan 31 02:32:12 2011)
db_update: Installed database version:   1.15.130 (built: Mon Jan 31 02:32:12 2011)
db_update: Installed database status:    OK (Matches dist version)
db_update: Installed database age:       30 days since db was built
>>> bargle.example.com
command: /usr/sbin/db_update -check
db_update: Version available from dist:  1.15.130 (built: Mon Jan 31 02:32:12 2011)
db_update: Installed database version:   1.15.130 (built: Mon Jan 31 02:32:12 2011)
db_update: Installed database status:    OK (Matches dist version)
db_update: Installed database age:       30 days since db was built

and so on for many pages. The annoyance with this is I really only am interested in the actual 'Installed database version'. However if I grep out just those lines, I then lose the context of which host the result came from. Something more clever is needed.

In the past I've done this with the usual command line tools like grep and awk. for example, I might run something like this as the command on each host:

echo -n "$(hostname): " && /home/y/sbin/ynet_db_update -check | grep "Installed database version" | awk '{ print $5 }'

or, a slight improvement I might use is to make awk do the work of grep too, and save a process:

echo -n "$(hostname): " && /usr/sbin/ynet_db_update -check | awk '/Installed database version/ { print $5 }'

which works just fine for sure, but both versions seem a little ugly, what with that awkward echo at the beginning to get the hostname. I finally decided to see if I could do something more clever with a perl one liner instead.

I've always liked the idea of one line perl scripts but the different command-line arguments always tripped me up. How do you remember to use perl -ne vs. perl -pe for example?

Also, here's a big sticking point for me - how do you replace the smart line splitting functionality in awk, which allows me to do things like awk '{ print $5}' in the above example?

I have to credit the Ksplice blog for finally making me understand perl autosplit mode. I learned that ll I needed to do was run my command with perl -a to make perl split every line into the array @F which I could then use in exactly the same way I was used to dealing with awk positional parameters $1, $2, etc.

Then, it's a matter of selecting other perl command line arguments to get the results I desired. First of all of course -e is mandatory in all cases because that's what tells perl to read the script from the command line. Then I knew I wanted to run this thing in a loop, so I should use -n or -p to save myself having to implement the loop in the script. -p prints the output every time. I don't want to do that so let's use -n to just do the loop.

I knew I wanted to use the perl autosplit mode to make this script work similarly to awk, so that's -a. Rounding it all out is the hard to understand -l option. The explanation of -l is complicated, but you can ignore that and just read the 'record separators' section of this page. All you need to remember about -l is it automatically chomps the newline character off every input line, and puts it back on the output. This allows you to use print in your one liner and not worry about missing newlines in the output.

Putting those command line options together, I wrote this script:

/home/sbin/db_update -check | perl -MSys::Hostname -lane 'print hostname.": ".$F[4] if /Installed database version/'

A final note about how this script is constructed: I know I could have used the backtick operator to obtain the system hostname in my one liner. However, since I was going all perl here it seemed best to figure out how to obtain the hostname directly in perl. Since Sys::Hostname automatically exports the hostname finction, all I had to do was load the module via the -M command line option, and call hostname directly in my script.

When I run that script against my input file the result is this:

argle.example.com: 1.15.130
bargle.example.com: 1.15.130
boo.example.com: 1.15.49

which is exactly what I wanted. The key idea here is to use -a to autosplit the input line into array @F. You can then use $F[4] exactly like you do in awk '{ print $4 }'. This whole thing is generally a replacement for the standard awk 'find a line and output part of it' idiom: awk '/Installed database version/ '{ print $4 }'. The advantage here is the integration of additional data (like the hostname) is quite simple. I realize that awk programmers out there can probably do this in an equally simple way. If so, please give me your solution in the comments!

So there you go - it can pay to take some time to figure out how to use tools like perl one liners. It's easy to get stuck in a rut of grep and awk or whatever. I think it's always important to be trying out new tools and expanding your skillset. For me this meant finally committing to perl one liners.





Our Founder
ToolboxClick to hide/show