Getting started with cassandra

Remarks#

The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra’s support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.

PROVEN

Cassandra is in use at Constant Contact, CERN, Comcast, eBay, GitHub, GoDaddy, Hulu, Instagram, Intuit, Netflix, Reddit, The Weather Channel, and over 1500 more companies that have large, active data sets.

FAULT TOLERANT

Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.

PERFORMANT

Cassandra consistently outperforms popular NoSQL alternatives in benchmarks and real applications, primarily because of fundamental architectural choices.

DECENTRALIZED

There are no single points of failure. There are no network bottlenecks. Every node in the cluster is identical.

SCALABLE

Some of the largest production deployments include Apple’s, with over 75,000 nodes storing over 10 PB of data, Netflix (2,500 nodes, 420 TB, over 1 trillion requests per day), Chinese search engine Easou (270 nodes, 300 TB, over 800 million reqests per day), and eBay (over 100 nodes, 250 TB).

DURABLE

Cassandra is suitable for applications that can’t afford to lose data, even when an entire data center goes down.

YOU’RE IN CONTROL

Choose between synchronous or asynchronous replication for each update. Highly available asynchronous operations are optimized with features like Hinted Handoff and Read Repair.

ELASTIC

Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.

PROFESSIONALLY SUPPORTED

Cassandra support contracts and services are available from third parties.

Versions#

Version	Release Date
1.1.12	2013-11-19
1.1.9	2013-02-11
1.2.12	2013-11-28
1.2.13	2013-12-19
1.2.15	2014-02-19
1.2.16	2014-04-22
1.2.17	2014-06-25
1.2.18	2014-07-04
1.2.19	2014-11-14
1.2.6	2013-07-02
1.2.8	2013-07-27
2.0.10	2014-08-12
2.0.11	2014-10-17
2.0.12	2015-01-14
2.0.13	2015-03-20
2.0.14	2015-04-01
2.0.15	2015-06-01
2.0.16	2015-07-08
2.0.17	2015-09-18
2.0.5	2014-02-13
2.0.6	2014-04-02
2.0.7	2014-04-24
2.0.8	2014-06-13
2.0.9	2014-07-22
2.1.11	2015-10-12
2.1.12	2015-10-22
2.1.2	2014-11-20
2.1.3	2015-03-03
2.1.4	2015-04-01
2.1.5	2015-03-31
2.1.6	2015-06-09
2.1.7	2015-06-18
2.1.8	2015-07-03
2.1.9	2015-09-03
2.2.0	2015-05-14
2.2.0-beta1	2015-05-19
2.2.0-rc1	2015-06-04
2.2.0-rc2	2015-06-30
2.2.1	2015-08-25
2.2.2	2015-09-25
2.2.3	2015-10-12
2.2.4	2015-12-02
3.0.0	2015-01-26
3.0.0-alpha	2015-07-29
3.0.0-alpha1	2015-07-18
3.0.0-beta1	2015-07-10
3.0.0-beta2	2015-09-04
3.0.0-rc1	2015-07-16
3.0.0-rc2	2015-10-16
3.0.1	2015-12-04
3.0.2	2016-01-21
3.0.3	2015-11-24
3.0.4	2016-02-05
3.0.5	2016-04-02
3.0.6	2016-03-31
3.0.7	2016-05-24
3.0.8	2016-05-25
3.2.819	2016-01-05
3.4.950	2016-03-08
3.6.1076	2016-05-02
3.8.1199	2016-06-27
3.10.3004	2016-08-10

(Got this using a bit of awk: git log --tags --simplify-by-decoration --pretty="format:%ai %d" |egrep "\(tag: [0-9]"| awk -F" " '{ print $1 " " $5}'|awk -F"." '{print $1 "." $2 "." $3}'| awk -F" " '{print $2 " |" $1}'| sed 's/)//'|sed 's/,//'| sort -n|sort -u -t" " -k1,1 | awk '{print "|" $0 "|"}')

Installation or Setup

Single node Installation

Pre-install NodeJS, Python and Java
Select your installation document based on your platform https://docs.datastax.com/en/cassandra/3.x/cassandra/install/installTOC.html
Download Cassandra binaries from https://cassandra.apache.org/download/
Untar the downloaded file to <installation location>
Start the cassandra using <installation location>/bin/cassandra OR start Cassandra as a service - [sudo] service cassandra start
Check whether cassandra is up and running using <installation location>/bin/nodetool status.

Ex:

On Windows environment run cassandra.bat file to start Cassandra server and cqlsh.bat to open CQL client terminal to execute CQL commands.

There are two ways that installation for a Single Node can be carried out.

You should have Oracle Java 8 or OpenJDk 8 (preferred for Cassandra versions > 3.0)

1. Installing a Debian package (installs Cassandra as a service)

Add the Cassandra version to the repository (replace the 22x with your own version for example for 2.7 use 27x)

echo "deb-src https://www.apache.org/dist/cassandra/debian 22x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
# Update the repository  
sudo apt-get update
# Then install it
sudo apt-get install cassandra cassandra-tools

Now Cassandra can be started and stopped using:

sudo service cassandra start
sudo service cassandra stop

Check the status using:

nodetool status

Logs and Data directories are /var/log/cassandra and /var/lib/cassandra respectively.

2. Installing any version of Cassandra in form of binary tarball (installs Cassandra as a standalone process)

Download the Datastax version:

curl -L  https://downloads.datastax.com/community/dsc-cassandra-version_number-bin.tar.gz | tar xz

Or Apache Cassandra binary tarball manually (from the site https://www.apache.org/dist/cassandra/)

Now untar this:

tar -xvzf dsc-cassandra-version_number-bin.tar.gz

Change the directory to install location:

cd install_location

Start Cassandra using:

sudo sh ./bin/cassandra

Stop using:

sudo kill -9 pid

Check:

./bin/nodetool status

And viola, you have a single-node test cluster for Cassandra. So just use cqlsh in the terminal for Cassandra shell.

Configuration of Cassandra can be done in cassandra.yaml in conf folder in install_location.