This post will explicate the configuration and deployment of Cassandra v1.2 cluster across 2 Amazon EC2 regions – one EC2 instance in Oregon and the other in Virginia. Note that the instances are in a cluster that spans across multiple Amazon regions. The “cassandra-stress” utility (bundled in with Cassandra) will be used to test the insertion of 1M records of 2KB each in one region and subsequently read in the other region.
Configuration of a 2 node cluster – one node in each region
One can extend the cluster into as many nodes as required based on the steps outlined herein to create a 2 node cluster. Please note that these steps are a enabler for creating a multi-region cluster of ‘X’ set of nodes where ‘X’ is, of course, greater than 2. 🙂 You would not want to have a 2 node cluster – much less 2 nodes spread across 2 regions.
- Download and unzip / untar the cassandra 1.2 binary.
- cd conf and open up cassandra.yaml for editing:
cluster_name: 'GK Cluster' [Update the cluster name]
num_tokens: [Keep this commented]
initial_token: 0 [set this to 0 for the first node]
partitioner: RandomPartitioner [Replace the default with this]
data_file_directories: /fs/fs1/cassandra/data [See below for details]
commitlog_directory: /fs/fs2/cassandra/commitlog [See below for details]
saved_caches_directory: /fs/fs2/cassandra/saved_caches [See below for details]
seeds: "X.X.X.X,Y.Y.Y.Y" [comma separate public IPs of EC2 instances - one for each region]
listen_address: pri.pri.pri.pri [Private IP of this instance]
broadcast_address: pub.pub.pub.pub [Public IP of this instance]
rpc_address: 0.0.0.0 [Replace with this]
endpoint_snitch: Ec2MultiRegionSnitch [Replace with this snitch]
The data_file and commit_log directories should be on two different disks. If you are using the new hi1.4xlarge instance to host Cassandra nodes then there are 2 TBs of local SSD storage. These 2 volumes need to be formatted (to ext4) and mounted. Thereafter one of these could be used for a commit log and the other for a data files. The initial_token is to be calculated using the “tools/bin/token-generator” tool. In our case, we have one node in each region. Please note that if we had multiple nodes in each region then each region should be partitioned if it was its own distinct ring. - Repeat the preceding configuration step on the other node in the other region.
- Start up both the nodes.
- Check the status of the ring on node tool:
1234567891011121314151617$ bin/nodetool ringDatacenter: us-east==========Replicas: 0Address Rack Status State Load Owns TokenX.X.X.X 1a Up Normal 71.18 KB 50.00% 0Datacenter: us-west-2==========Replicas: 0Address Rack Status State Load Owns TokenY.Y.Y.Y 2a Up Normal 43.18 KB 50.00% 169417178424467235000914166253263322299
This concludes the setup of the cluster spread across two Amazon regions.
Replication across regions
To demonstrate the replication of data from one region to the other, we would need to define a KeySpace with a RF (Replication Factor) of 2. Thus, there would be 2 replicas for each column family in it – one in each region.
We can utilize the “cassandra-stress” utility that is configurable to create KeySpaces, specify snitches, create a test payload of a given size and so on. The following two steps delineate the usage and demonstrates replication across regions.
Step 1:
In one node, we would run “cassandra-stress” to write to the cluster.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
$ tools/bin/cassandra-stress -S 2048 -c 1 -e ONE -n <strong>1000000</strong> -r -R NetworkTopologyStrategy --strategy-properties='<strong>us-east:1,us-west-2:1</strong>' -i 3 Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,latency/95th/99th,elapsed_time 20673,6891,6891,1.2,6.1,114.1,3 70052,16459,16459,0.5,2.7,214.4,6 128452,19466,19466,0.4,2.2,115.8,9 178211,16586,16586,0.4,1.9,115.8,12 226423,16070,16070,0.4,1.8,99.0,15 282094,18557,18557,0.4,1.8,99.0,18 325845,14583,14583,0.4,1.7,99.0,21 387316,20490,20490,0.4,1.5,99.0,24 445825,19503,19503,0.4,1.3,99.0,27 497455,17210,17210,0.4,1.3,43.1,31 551568,18037,18037,0.4,1.2,43.1,34 606531,18321,18321,0.4,1.2,53.5,37 662429,18632,18632,0.4,1.2,96.5,40 716008,17859,17859,0.4,1.1,96.5,43 775517,19836,19836,0.4,1.1,96.5,46 831456,18646,18646,0.4,1.1,96.5,49 875923,14822,14822,0.4,1.1,96.5,52 925837,16638,16638,0.4,1.1,96.5,55 986341,20168,20168,0.4,1.1,96.5,59 <strong>1000000,4553,4553,0.4,1.0,96.5,59</strong> END |
Here we write a million rows of size 2048 bytes with a consistency level of “ONE”. We also specify that the KeySpace that is to be created (if the KeySpace that cassandra-stress utilizes does not exist then it creates it) should have replication across regions – “us-east:1” and “us-west-2:1”. The one million rows are created in 59 seconds.
To check the number of keys inserted, run the following:
1 2 3 4 5 6 7 |
$ bin/nodetool cfstats | more Column Family: Standard1 SSTable count: 2 Space used (live): 2017340163 Space used (total): 2017452629 <strong>Number of Keys (estimate): 951424 </strong> |
The number of keys are close to one million – please note that this is an estimate.
Step 2:
At the other node, we read from the cluster.
1 2 3 4 5 6 7 8 |
$ tools/bin/cassandra-stress -o read -n 1000000 total,interval_op_rate,interval_key_rate,latency/95th/99th,elapsed_time 140813,14081,14081,0.7,3.7,33.6,10 351872,21105,21105,0.5,2.4,36.7,20 567648,21577,21577,0.5,1.8,37.3,30 786035,21838,21838,0.5,1.7,37.3,40 <strong>1000000,21396,21396,0.5,1.3,37.3,50 </strong>END |
We see that one million keys are read in 50 seconds.
To check the number of keys on this node:
1 2 3 4 5 6 7 8 |
$ bin/nodetool cfstats | more Column Family: Standard1 SSTable count: 2 Space used (live): 2041994209 Space used (total): 2042106846 <strong>Number of Keys (estimate): 963072 </strong> |
Status of the ring post the write and read operations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
$ bin/nodetool ring Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: us-east ========== Address Rack Status State Load Owns Token X.X.X.X 1a Up Normal 1.88 GB 50.00% 0 Datacenter: us-west-2 ========== Address Rack Status State Load Owns Token Y.Y.Y.Y 2a Up Normal 1.9 GB 50.00% 169417178424467235000914166253263322299 |
This is a very simple, clear tutorial. I wish I had found it before I spent a day trying to figure it all out myself.
One question: is there any reason why you have not used virtual nodes, electing to set the initial node instead?
I meant to say “initial token instead”.
Thanks Andrews. The reason for utilizing the initial token is that I wanted to perform the configuration manually to view the nodes. It is easier to keep track of a smaller number of nodes visually rather than a huge set that would be the case when using the vnodes concept. Also, if I remember correctly, this concept of vnodes (virtual nodes) and things similar have been in use in other solutions such as Coher*** at least since 2008.