ServerLoadSpreading

Note: You are viewing an old revision of this page. View the current version.

large numbers of servers contacting a central host

some thoughts on using a perl hash function instead of a random delay

use B;

sub compute_hash
{
  my ($string,$range) = @_;
  my $hc = hex(B::hash $string);
  return $hc % $range + 1;
}

reference: http://www.perlmonks.org/index.pl?node_id=315876

how hashes really work: http://www.perl.com/pub/2002/10/01/hashes.html

calling the perl has function: http://www.perlmonks.org/?node_id=229982

more than you could ever want to know about hashing: http://burtleburtle.net/bob/hash/doobs.html

another hash function that probably isn't very good:

sub compute_hash
{
  my ($string,$low,$high) = @_;
  my $hc = 0;
  my $i = 0;
  my $range = ($high - $low) + 1;

  while ($i < length($string))
  {
    $hc = ((($hc) >> 27) | (($hc) << 5)) ^ ord(substr($string,$i++,1));
  }
  $hc = ((($hc) >> 1) | (($hc) << 31)) ^ ($hc);
  $hc = ((($hc) >> 3) | (($hc) << 29)) ^ ($hc);
  $hc = ((($hc) >> 5) | (($hc) << 27)) ^ ($hc);
  $hc = ((($hc) >> 7) | (($hc) << 25)) ^ ($hc);
  $hc = ((($hc) >> 9) | (($hc) << 23)) ^ ($hc);
  $hc = ((($hc) >> 11) | (($hc) << 21)) ^ ($hc);
  $hc = ((($hc) >> 13) | (($hc) << 19)) ^ ($hc);
  $hc = ((($hc) >> 15) | (($hc) << 17)) ^ ($hc);

  return $low + ($hc % $range);
}

need to do some distribution tests against a corpus of common hostnames or something like that.

why you should use a hash instead of a random delay? 1. Repeatable delay for each host. 2. maybe better guarantee of spread?

would be interesting to compare these delays to the spread generated by calling bash RANDOM.

#!/usr/bin/perl
# randomer.pl

# process input in form of
# <hostname> <delay>
#
#and transform into
#
# <delay> <count>
#
# suitable for frequency plotting.

use warnings;
use strict;

my %counter = ();

while(<>)
  {
    (my $delay) = (/^\S+\s+(\S+)/);
    $counter{$delay}++;
  }

foreach (sort {$a<=>$b} keys %counter)
  {
    print $_ . " " . $counter{$_} . "\n";
  }

results in output like this:

1 1
2 1
3 1
7 1
30 1
42 1
47 1
49 1
66 1
71 1
96 1
112 1
126 1
134 1
136 1
140 1

etc.

gnuplot control file:

set terminal png size 1200,300 font "/Library/Fonts/Andale Mono.ttf" 12
set output "spread.png"
set xrange [0:86400]
set yrange [0:4]
set xlabel "delay (s)"
set ylabel "# of hosts"
set title "collisions using perl \"int(rand())\" for delay value - 9313 hosts, 86400s spread"
set ytics 1
set xtics nomirror
set ytics nomirror
set key off
plot "random-freq.txt" using 1:2 index 0 title "number of collisions" pt 6 lt 2

output:

collision counts:

8805 hosts no collisions
483 hosts one collision
25 hosts two collisions
no hosts more than two collisions



Our Founder
ToolboxClick to hide/show
RecentChanges Click to hide/show