Cassandra is a NoSQL database. It is designed to be launched in a cluster of machines, providing high availability and fault tolerance.
Before starting, make sure you scan through Node and Cluster Initialization Properties.
Important Node attributes:
cluster_name
All nodes in a cluster must have the same name.
commitlog_directory
Datastax recommends to put this into a separate disk partition (Perhaps, EBS).
data_file_directories
Stores the column family data.
partitoner
defaults to RandomPartitioner
rpc_address
set to 0.0.0.0 to listen on all configured interfaces
rpc_port
Port for Thrift server. Default is 9160.
saved_caches_directory
Location where column family key and row caches will be stored.
seeds
Nodes that contain information about the ring topology and obtain gossip information.
storage_port
Port for inter-node communication. Default is 7000.
Create a Large Instance
You will need to launch at least a large instance. See Cassandra Hardware for more details. If you have a smaller instance, you may not be able to use /etc/init.d/cassandra start command and you will see a JVM heap memory error.
Ideal Cassandra Instance Specs:
We will NOT be using EBS due to the bad I/O throughput and reliability. We will use the ephemeral volume instead. Click here for more details.
Security Group
Create a Security Group with the settings above.
Port 22 and 8888 will be 0.0.0.0/0.
1024-65535 will be your group id. (Click on Details tab on a Security Group to check your group id)
All other ports will be your group id.
Mounting the ephemeral drive
We will begin by formatting the ephemeral drive with XFS.
Use fdisk -l to check what's your ephemeral drive. It may come with ext3 already.
umount /dev/xvdb
mkfs.xfs -f /dev/xvdb
vi /etc/fstab
Remove the original entry and put
/vol xfs noatime 0 0
sudo mount /vol
You may also want to use RAID-0 to strip a set of ephemeral volumes.
Install Oracle Sun Java
Do not use OpenJDK. Cassandra works only with Oracle Sun Java.
Download jdk-6u38-linux-x64.bin.
mkdir /usr/java/latest
Upload or wget the JDK in this folder.
chmod +x jdk-6u38-linux-x64.bin
sudo ./jdk-6u38-linux-x64.bin
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/java/latest/jdk1.6.0_38/bin/java" 1
sudo update-alternatives --set java /usr/java/latest/jdk1.6.0_38/bin/java
java -version
Make sure JNA is installed. Linux does not swap out the JVM and performance can improve.
sudo apt-get install libjna-java
vi /etc/security/limits.conf
Add:
cassandra soft memlock unlimited
cassandra hard memlock unlimited
Install Cassandra
Begin by installing a single node Cassandra. Read Cassandra - installing on Ubuntu 12.04 Amazon EC2.
Make sure Cassandra is at version 1.2.x and cqlsh is 2.3.x
mkdir /vol/cassandra
mkdir /vol/cassandra/commitlog
mkdir /vol/cassandra/data
mkdir /vol/cassandra/saved_caches
chown cassandra:cassandra -R /vol/cassandra
vi /etc/cassandra/cassandra.yaml
Point these directories to the ones we created above.
Kill Cassandra if you started with cassandra -f command. We will want to start from init.d
sudo /etc/init.d/cassandra start
sudo /etc/init.d/cassandra stop
sudo /etc/init.d/cassandra status
Use nodetool to check the status:
If it's not starting, check the log /var/log/cassandra/output.log
If it's complaining about oldClusterName != newClusterName, just remove everything in the data_file_directories.
Create a Cassandra AMI
We will be setting up a ring (multi-node Cassandra). Before you create an AMI, umount /dev/xvdb and comment out the xfs record in /etc/fstab. Else you won't be able to ssh into the instances launched by this image.
Launch a second instance in another availability zone
Setting up a Cassandra Ring
Before you begin, make sure you have the following:
A snitch is used to determine which data centers and racks are written to and read from, and distribute replicas by grouping machines into data centers and racks.
For Ec2Snitch, a region is treated as a data center, and an availability zone is treated as a rack within a data center.
Setting up Multi Data Center Cassandra Ring
We will begin by tweaking the first node.
cd /etc/cassandra/cassandra.yaml
Set the following:
We will now add a second node in a different region (Ex. if first region is at us-east-1a, then make second region to be at us-east-1d).
ssh into your second instance. Remember to mount the partition back. Use "df" to make sure /dev/xvdb is mounted.
The following needs to be changed on all nodes:
Seeds
Add the private IPs of all nodes
RPC Address
The address in which clients connect to
Listen Address
The address in which nodes connect with each other
For the first node,
Initial token (skip to Virtual nodes if you are using Cassandra 1.2.x and above)
This is used for load balancing. The first node should have a value of zero. All other nodes will need to recalculate this value every time a new node joins the cluster.
Calculate this based on the number of nodes. Use the Python problem from Cassandra.
Create a file called token_generator.py. Paste the following in the file.
Virtual nodes (Cassandra 1.2.x or above)
vnodes are introducted in 1.2.x.
Set num_tokens to 256 and leave initial_token to empty.
Auto bootstrapping
When a new node is added, the cluster will automatically migrate the correct range of data from existing nodes.
Do not set autobootstrap: true and include it in the seed list together.
After all the above setup, start both nodes. Then check if they are up.
PropertyFileSnitch
Set endpoint_snitch: PropertyFileSnitch
We will be using PropertyFileSnitch and define our data centers and racks.
We will use dc1 to represent data center 1 and rac1 to represent rack 1.
Create /etc/cassandra/cassandra-topology.properties on all nodes and place the following:
10.216.218.73=dc1:rac1
10.31.2.31=dc2:rac1
default=dc1:rac1
default=dc1:rac1 is for when a node first joined and it's not specified in file.
Keep in mind that when creating our schema we will be using NetworkTopologyStrategy and use the dc and rac references we used above.
You may want to create an image again.
Testing the cluster replication
Start Cassandra for both nodes:
nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.32.6.31 28.94 KB 256 48.2% eab0379f-2ac6-408a-b6dc-0ad475337a28 rac1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.108.23.52 47.66 KB 256 51.8% 35f1f17c-84c1-4e10-83b5-857feba03f4d rac1
In both nodes, try executing the following:
cqlsh 10.216.218.73
cqlsh 10.31.2.31
You should not have a problem connecting to both of these machines. Make sure you are have the latest cqlsh (2.3.0 at the moment of this post).
We will be executing a script. I would recommend setting up Git and pull your code from Github for a production machine.
Create a script called test.cql.
Paste the following:
Execute your script by running:
Connection your Application to Cassandra
In the Security Group for Cassandra, open up 9160 to your application security group id.
Check if the connection is okay by telnet
set to 0.0.0.0 to listen on all configured interfaces
rpc_port
Port for Thrift server. Default is 9160.
saved_caches_directory
Location where column family key and row caches will be stored.
seeds
Nodes that contain information about the ring topology and obtain gossip information.
storage_port
Port for inter-node communication. Default is 7000.
Create a Large Instance
You will need to launch at least a large instance. See Cassandra Hardware for more details. If you have a smaller instance, you may not be able to use /etc/init.d/cassandra start command and you will see a JVM heap memory error.
Ideal Cassandra Instance Specs:
- 32 GB RAM (Minimum 4GB)
- 8-core cpu
- 2 disks (one for CommitLogDirectory, one for DataFileDirectories)
- RAID 0 for DataFileDirectories disk when disk capacity is 50% full
- XFS file system
- Minimum 3 replications (instances)
We will NOT be using EBS due to the bad I/O throughput and reliability. We will use the ephemeral volume instead. Click here for more details.
Security Group
Port | Description |
---|---|
Public Facing Ports | |
22 | SSH port. |
8888 | OpsCenter website port. |
Cassandra Inter-node Ports | |
1024+ | JMX reconnection/loopback ports. See description for port 7199. |
7000 | Cassandra inter-node cluster communication. |
7199 | Cassandra JMX monitoring port. After the initial handshake, the JMX protocol requires that the client reconnects on a randomly chosen port (1024+). |
9160 | Cassandra client port (Thrift). |
OpsCenter ports | |
61620 | OpsCenter monitoring port. The opscenterd daemon listens on this port for TCP traffic coming from the agent. |
61621 | OpsCenter agent port. The agents listen on this port for SSL traffic initiated by OpsCenter. |
Create a Security Group with the settings above.
Port 22 and 8888 will be 0.0.0.0/0.
1024-65535 will be your group id. (Click on Details tab on a Security Group to check your group id)
All other ports will be your group id.
Mounting the ephemeral drive
We will begin by formatting the ephemeral drive with XFS.
Use fdisk -l to check what's your ephemeral drive. It may come with ext3 already.
umount /dev/xvdb
mkfs.xfs -f /dev/xvdb
vi /etc/fstab
Remove the original entry and put
/vol xfs noatime 0 0
sudo mount /vol
You may also want to use RAID-0 to strip a set of ephemeral volumes.
Install Oracle Sun Java
Do not use OpenJDK. Cassandra works only with Oracle Sun Java.
Download jdk-6u38-linux-x64.bin.
mkdir /usr/java/latest
Upload or wget the JDK in this folder.
chmod +x jdk-6u38-linux-x64.bin
sudo ./jdk-6u38-linux-x64.bin
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/java/latest/jdk1.6.0_38/bin/java" 1
sudo update-alternatives --set java /usr/java/latest/jdk1.6.0_38/bin/java
java -version
java version "1.6.0_38"
Java(TM) SE Runtime Environment (build 1.6.0_38-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode)
Make sure JNA is installed. Linux does not swap out the JVM and performance can improve.
sudo apt-get install libjna-java
vi /etc/security/limits.conf
Add:
cassandra soft memlock unlimited
cassandra hard memlock unlimited
Install Cassandra
Begin by installing a single node Cassandra. Read Cassandra - installing on Ubuntu 12.04 Amazon EC2.
Make sure Cassandra is at version 1.2.x and cqlsh is 2.3.x
cassandra -version
cqlsh --versionWe want to save the data in the ephemeral drive. The mount point we created earlier is /vol
mkdir /vol/cassandra
mkdir /vol/cassandra/commitlog
mkdir /vol/cassandra/data
mkdir /vol/cassandra/saved_caches
chown cassandra:cassandra -R /vol/cassandra
vi /etc/cassandra/cassandra.yaml
Point these directories to the ones we created above.
- commitlog_directory
- data_file_directories
- saved_caches_directory
Kill Cassandra if you started with cassandra -f command. We will want to start from init.d
sudo /etc/init.d/cassandra start
sudo /etc/init.d/cassandra stop
sudo /etc/init.d/cassandra status
Use nodetool to check the status:
nodetool -h localhost -p 7199 ringReboot and check if it's running by running "netstat -tupln"
If it's not starting, check the log /var/log/cassandra/output.log
If it's complaining about oldClusterName != newClusterName, just remove everything in the data_file_directories.
Create a Cassandra AMI
We will be setting up a ring (multi-node Cassandra). Before you create an AMI, umount /dev/xvdb and comment out the xfs record in /etc/fstab. Else you won't be able to ssh into the instances launched by this image.
Launch a second instance in another availability zone
Setting up a Cassandra Ring
Before you begin, make sure you have the following:
- Cassandra on each node
- a cluster name
- IP of each node
- seed nodes
- snitches (EC2Snitch, EC2MultiRegionSnitch)
- open required firewalls
A snitch is used to determine which data centers and racks are written to and read from, and distribute replicas by grouping machines into data centers and racks.
For Ec2Snitch, a region is treated as a data center, and an availability zone is treated as a rack within a data center.
Setting up Multi Data Center Cassandra Ring
We will begin by tweaking the first node.
cd /etc/cassandra/cassandra.yaml
Set the following:
cluster_name: my_clusterStart Cassandra. If you face any problems starting it, delete all the files in commitlog_directory and data_file_directories.
initial_token: 0
We will now add a second node in a different region (Ex. if first region is at us-east-1a, then make second region to be at us-east-1d).
ssh into your second instance. Remember to mount the partition back. Use "df" to make sure /dev/xvdb is mounted.
umount /mntNow edit /etc/cassandra/cassandra.yaml.
vi /etc/fstab
uncomment "/dev/xvdb /vol xfs noatime 0 0" and remove entries that are using /dev/xvdb if appropriate
mkfs.xfs -f /dev/xvdb
mount /vol
mkdir /vol/cassandra
mkdir /vol/cassandra/commitlog
mkdir /vol/cassandra/data
mkdir /vol/cassandra/saved_caches
chown cassandra:cassandra -R /vol/cassandra
The following needs to be changed on all nodes:
- seeds
- rpc_address
- listen_address
- initial_token
- auto_bootstrap
Seeds
Add the private IPs of all nodes
- seeds: "10.31.2.31,10.216.218.73"
RPC Address
The address in which clients connect to
Listen Address
The address in which nodes connect with each other
For the first node,
listen_address: 10.31.2.31For the second node,
rpc_address: 10.31.2.31
listen_address: 10.216.218.73
rpc_address: 10.216.218.73
Initial token (skip to Virtual nodes if you are using Cassandra 1.2.x and above)
This is used for load balancing. The first node should have a value of zero. All other nodes will need to recalculate this value every time a new node joins the cluster.
Calculate this based on the number of nodes. Use the Python problem from Cassandra.
Create a file called token_generator.py. Paste the following in the file.
#! /usr/bin/pythonChange it to an executable.
import sys
if (len(sys.argv) > 1):
num=int(sys.argv[1])
else:
num=int(raw_input("How many nodes are in your cluster? "))
for i in range(0, num):
print 'node %d: %d' % (i, (i*(2**127)/num))
chmod 777 token_generator.pyExecute the program with the number of nodes as the first argument. In our case, it's 2.
./token_generator.py 2The output should be similar to the following
node 0: 0Put 85070591730234615865843651857942052864 as the initial token for the second node.
node 1: 85070591730234615865843651857942052864
initial_token: '85070591730234615865843651857942052864'If you get DatabaseDescriptor.java (line 509) Fatal configuration error, you are probably using Cassandra 1.2.x.
Virtual nodes (Cassandra 1.2.x or above)
vnodes are introducted in 1.2.x.
Set num_tokens to 256 and leave initial_token to empty.
Auto bootstrapping
When a new node is added, the cluster will automatically migrate the correct range of data from existing nodes.
Do not set autobootstrap: true and include it in the seed list together.
After all the above setup, start both nodes. Then check if they are up.
nodetool status
PropertyFileSnitch
Set endpoint_snitch: PropertyFileSnitch
We will be using PropertyFileSnitch and define our data centers and racks.
We will use dc1 to represent data center 1 and rac1 to represent rack 1.
Create /etc/cassandra/cassandra-topology.properties on all nodes and place the following:
10.216.218.73=dc1:rac1
10.31.2.31=dc2:rac1
default=dc1:rac1
default=dc1:rac1 is for when a node first joined and it's not specified in file.
Keep in mind that when creating our schema we will be using NetworkTopologyStrategy and use the dc and rac references we used above.
You may want to create an image again.
Testing the cluster replication
Start Cassandra for both nodes:
service cassandra startCheck the status:
nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.32.6.31 28.94 KB 256 48.2% eab0379f-2ac6-408a-b6dc-0ad475337a28 rac1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.108.23.52 47.66 KB 256 51.8% 35f1f17c-84c1-4e10-83b5-857feba03f4d rac1
In both nodes, try executing the following:
cqlsh 10.216.218.73
cqlsh 10.31.2.31
You should not have a problem connecting to both of these machines. Make sure you are have the latest cqlsh (2.3.0 at the moment of this post).
We will be executing a script. I would recommend setting up Git and pull your code from Github for a production machine.
Create a script called test.cql.
Paste the following:
create keyspace helloworld with replication ={'class': 'NetworkTopologyStrategy', 'dc1': 1, 'dc2',1};
create table activity (
activity_key int,
activity_time timeuuid,
activity_type varint,
primary key (activity_key, activity_time)
)
with clustering order by (event_time desc);
cqlsh 10.216.218.73 -f test.cqlCheck to see if the keyspace "helloworld" exists:
cqlsh 10.216.218.73
describe keyspaces;
Connection your Application to Cassandra
In the Security Group for Cassandra, open up 9160 to your application security group id.
Check if the connection is okay by telnet
telnet 10.31.2.31 9160
Confronting Error to Install Cassandra? Contact to Cassandra Technical Support | Cognegic
ReplyDeleteAssuming over and over having a blunder to introduce Cassandra at that point ensures you have to introduce JRE of rendition 1.7. In the event that this was a new install and in the event that you have no information at that point have a go at cleansing the introduced Cassandra bundle and first simply introduce once more. Expectation by doing this your concern will tackle yet in the event that not then contact to Apache Cassandra Support or Cassandra Customer Service. Cognegic's point is to raise you hell free with moderate value range and propel bolster.
For More Info: https://cognegicsystems.com/
Contact Number: 1-800-450-8670
Email Address- info@cognegicsystems.com
Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801
Thank you for your guide to with upgrade information about AWS keep update at
ReplyDeleteAWS
Online Course