Exploring Riak
Been playing with Riak recently, which is one of the modern dynamo-derived nosql databases (the other main ones being Cassandra and Voldemort). We're evaluating it for use as a really large brackup datastore, the primary attraction being the near linear scalability available by adding (relatively cheap) new nodes to the cluster, and decent availability options in the face of node failures.
I've built riak packages for RHEL/CentOS 5, available at my repository, and added support for a riak 'target' to the latest version (1.10) of brackup (packages also available at my repo).
The first thing to figure out is the maximum number of nodes you expect
your riak cluster to get to. This you use to size the ring_creation_size
setting, which is the number of partitions the hash space is divided into.
It must be a power of 2 (64, 128, 256, etc.), and the reason it's important
is that it cannot be easily changed after the cluster has been created.
The rule of thumb is that for performance you want at least 10 partitions
per node/machine, so the default ring_creation_size
of 64 is really only
useful up to about 6 nodes. 128 scales to 10-12, 256 to 20-25, etc. For more
info see the Riak Wiki.
Here's the script I use for configuring a new node on CentOS. The main
things to tweak here are the ring_creation_size
you want (here I'm using
512, for a biggish cluster), and the interface to use to get the default ip
address (here eth0
, or you could just hardcode 0.0.0.0 instead of $ip
).
#!/bin/sh # Riak configuration script for CentOS/RHEL # Install riak (and IO::Interface, for next) yum -y install riak perl-IO-Interface # To set app.config:web_ip to use primary ip, do: perl -MIO::Interface::Simple -i \ -pe "BEGIN { \$ip = IO::Interface::Simple->new(q/eth0/)->address; } s/127\.0\.0\.1/\$ip/" /etc/riak/app.config # To add a ring_creation_size clause to app.config, do: perl -i \ -pe 's/^((\s*)%% riak_web_ip)/$2%% ring_creation_size is the no. of partitions to divide the hash $2%% space into (default: 64). $2\{ring_creation_size, 512\}, $1/' /etc/riak/app.config # To set riak vm_args:name to hostname do: perl -MSys::Hostname -i -pe 's/127\.0\.0\.1/hostname/e' /etc/riak/vm.args # display (bits of) config files for checking echo echo '********************' echo /etc/riak/app.config echo '********************' head -n30 /etc/riak/app.config echo echo '********************' echo /etc/riak/vm.args echo '********************' cat /etc/riak/vm.args
Save this to a file called e.g. riak_configure
, and then to configure a couple
of nodes you do the following (note that NODE
is any old internal hostname you use
to ssh to the host in question, but FIRST_NODE
needs to use the actual -name
parameter defined in /etc/riak/vm.args
on your first node):
# First node NODE=node1 cat riak_configure | ssh $NODE sh ssh $NODE 'chkconfig riak on; service riak start' # Run the following until ringready reports TRUE ssh $NODE riak-admin ringready # All nodes after the first FIRST_NODE=riak@node1.example.com NODE=node2 cat riak_configure | ssh $NODE sh ssh $NODE "chkconfig riak on; service riak start && riak-admin join $FIRST_NODE" # Run the following until ringready reports TRUE ssh $NODE riak-admin ringready
That's it. You should now have a working riak cluster accessible on port 8098 on your cluster nodes.