ServerLoadSpreading
Note: You are viewing an old revision of this page. View the current version.
Like Peanut Butter on Toast
Say you have a process that runs on lots and lots of servers and you want to make sure it doesn't execute on every machine at the same time. What's the best way to design that setup? How can you measure the effectiveness of your solution?
I recently had to implement a system like this at my work. In this case we had about 10,000 servers and an update process that had to run once per day per machine. It was desirable that the My first guess at how to implement load spreading was by using something simple like the bash $RANDOM
pseudorandom number generator. But, is that sufficient to adequately spread the load? I started thinking about other alternatives and hit on the idea of using a hashing function. I then decided to take the next step and try to compare the two methods empirically. The rest of this post explains that process and my conclusions.
First Stab
Like I said, I'm working with about 10,000 servers here. I also have an account on every machine and a massively parallel ssh execution engine called Pogo. Thus, running commands on all the servers is fairly trivial.
When I originally started developing the spreading mechanism, my first thought was to use a random delay in a bash script, something like this:
sleep `expr $RANDOM % 86400'`
on every host. Yes, I know that could be done in modern bash as <code>sleep $($(($RANDOM%86400+1)))<code> but for various reasons I need to stick with stock bourne shell functionality. Also I realize that not all days have 86400 seconds in them but lets sweep that under the rug for now.
Turns out this doesn't work, because bash $RANDOM
is a signed 16-bit integer so it only goes to 32767. So I embiggened it:
sleep `expr $RANDOM * 3 % 86400'`
That produced numbers in the range I was looking for so I used it for the rest of my testing.
Perl?
Hey wait a second - can't I do this with perl? The answer is of course yes. In fact, perl offers a nice command line shortcut for generating random numbers:
perl -le 'print int rand(86400)'
that offers several advantages over using the builtin bash $RANDOM
- supports arbitrarily large integers
- uses much better pseudorandom algorithm
I ended up testing this mechanism and you will find my results below.
Perl Hostname Hashing
However, I still wasn't done. Could I find a better mechanism than random numbers? Based on some conversations with coworkers, I started thinking about hostname hashing. Disclaimer: I did not come up with this idea myself, although I did develop this particular implementation.
A hash function is an algorithm that takes input values and puts them into buckets. The goal is to spread the input into the buckets evenly. A phone book is an example of a (terrible) hash function - it maps names to alphabetical sections. It's a terrible hash function because there are many more names in the US that start with 'M' than with 'Z', so it doesn't spread the input between the buckets very well.
Every sever has a unique identifier - the fully qualified hostname. Thus I just needed a hash function that took the hostname and converted it to a number between 1 and 86400. Perl has an internal hashing function which is exposed through the perl compiler backend module 'B'. I wrote the following perl code:
use B; sub compute_hash { my ($string,$range) = @_; my $hc = hex(B::hash $string); return $hc % $range; }
which I then distilled down to a perl oneliner:
perl -MB -MSys::Hostname -le "print hex(B::hash hostname) % 86400;"
Testing the Alternatives
As I mentioned, I have a massively parallel ssh execution engine at my disposal. That makes testing easy - just launch a job on every host and collect the results. I realize that I could just calculate the perl hash values for hostnames without logging in to every machine, but I had the setup all ready to go for the random values so it was easier to log on to the host for each test.
I ran a job on every host and collected the output in the
why you should use a hash instead of a random delay? 1. Repeatable delay for each host. 2. maybe better guarantee of spread?
to test: on each remote host run this:
perl -le "print int rand(86400)+1"
would be interesting to compare these delays to the spread generated by calling bash RANDOM.
for bash , run this on remote host:
expr $RANDOM \* 3 % 86400 + 1'
#!/usr/bin/perl # randomer.pl # process input in form of # <hostname> <delay> # #and transform into # # <delay> <count> # # suitable for frequency plotting. use warnings; use strict; my %counter = (); while(<>) { (my $delay) = (/^\S+\s+(\S+)/); $counter{$delay}++; } foreach (sort {$a<=>$b} keys %counter) { print $_ . " " . $counter{$_} . "\n"; }
results in output like this:
1 1 2 1 3 1 7 1 30 1 42 1 47 1 49 1 66 1 71 1 96 1 112 1 126 1 134 1 136 1 140 1
etc.
gnuplot control file:
set terminal png size 1000,300 font "/Library/Fonts/Andale Mono.ttf" 12 set output "spread.png" set xrange [0:86400] set yrange [0:4] set xlabel "delay (s)" set ylabel "# of hosts" set title "collisions using perl \"int(rand())\" for delay value - 9313 hosts, 86400s spread" set ytics 1 set xtics nomirror set ytics nomirror set key off plot "random-freq.txt" using 1:2 index 0 title "number of collisions" pt 6 lt 2
output of bash random approach:
collision counts:
6989 hosts no collisions
1314 hosts one collision
183 hosts two collisions
21 hosts three collisions
3 hosts four collisions
no hosts more than four collisions
output of perl random approach:
output of perl hashed approach:
collision counts:
9047 hosts no collisions
540 hosts one collision
13 hosts two collisions
no hosts more than two collisions
Further Reading
A perlmonks discussion on perl hash internals.
A description of how hashes really work.
How to call the perl hash function directly.
More than you could ever want to know about hashing - - warning, eldrich magicks.