Getting started with cassandra
Remarks#
The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra’s support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
PROVEN
Cassandra is in use at Constant Contact, CERN, Comcast, eBay, GitHub, GoDaddy, Hulu, Instagram, Intuit, Netflix, Reddit, The Weather Channel, and over 1500 more companies that have large, active data sets.
FAULT TOLERANT
Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
PERFORMANT
Cassandra consistently outperforms popular NoSQL alternatives in benchmarks and real applications, primarily because of fundamental architectural choices.
DECENTRALIZED
There are no single points of failure. There are no network bottlenecks. Every node in the cluster is identical.
SCALABLE
Some of the largest production deployments include Apple’s, with over 75,000 nodes storing over 10 PB of data, Netflix (2,500 nodes, 420 TB, over 1 trillion requests per day), Chinese search engine Easou (270 nodes, 300 TB, over 800 million reqests per day), and eBay (over 100 nodes, 250 TB).
DURABLE
Cassandra is suitable for applications that can’t afford to lose data, even when an entire data center goes down.
YOU’RE IN CONTROL
Choose between synchronous or asynchronous replication for each update. Highly available asynchronous operations are optimized with features like Hinted Handoff and Read Repair.
ELASTIC
Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.
PROFESSIONALLY SUPPORTED
Cassandra support contracts and services are available from third parties.
Versions#
Version | Release Date |
---|---|
1.1.12 | 2013-11-19 |
1.1.9 | 2013-02-11 |
1.2.12 | 2013-11-28 |
1.2.13 | 2013-12-19 |
1.2.15 | 2014-02-19 |
1.2.16 | 2014-04-22 |
1.2.17 | 2014-06-25 |
1.2.18 | 2014-07-04 |
1.2.19 | 2014-11-14 |
1.2.6 | 2013-07-02 |
1.2.8 | 2013-07-27 |
2.0.10 | 2014-08-12 |
2.0.11 | 2014-10-17 |
2.0.12 | 2015-01-14 |
2.0.13 | 2015-03-20 |
2.0.14 | 2015-04-01 |
2.0.15 | 2015-06-01 |
2.0.16 | 2015-07-08 |
2.0.17 | 2015-09-18 |
2.0.5 | 2014-02-13 |
2.0.6 | 2014-04-02 |
2.0.7 | 2014-04-24 |
2.0.8 | 2014-06-13 |
2.0.9 | 2014-07-22 |
2.1.11 | 2015-10-12 |
2.1.12 | 2015-10-22 |
2.1.2 | 2014-11-20 |
2.1.3 | 2015-03-03 |
2.1.4 | 2015-04-01 |
2.1.5 | 2015-03-31 |
2.1.6 | 2015-06-09 |
2.1.7 | 2015-06-18 |
2.1.8 | 2015-07-03 |
2.1.9 | 2015-09-03 |
2.2.0 | 2015-05-14 |
2.2.0-beta1 | 2015-05-19 |
2.2.0-rc1 | 2015-06-04 |
2.2.0-rc2 | 2015-06-30 |
2.2.1 | 2015-08-25 |
2.2.2 | 2015-09-25 |
2.2.3 | 2015-10-12 |
2.2.4 | 2015-12-02 |
3.0.0 | 2015-01-26 |
3.0.0-alpha | 2015-07-29 |
3.0.0-alpha1 | 2015-07-18 |
3.0.0-beta1 | 2015-07-10 |
3.0.0-beta2 | 2015-09-04 |
3.0.0-rc1 | 2015-07-16 |
3.0.0-rc2 | 2015-10-16 |
3.0.1 | 2015-12-04 |
3.0.2 | 2016-01-21 |
3.0.3 | 2015-11-24 |
3.0.4 | 2016-02-05 |
3.0.5 | 2016-04-02 |
3.0.6 | 2016-03-31 |
3.0.7 | 2016-05-24 |
3.0.8 | 2016-05-25 |
3.2.819 | 2016-01-05 |
3.4.950 | 2016-03-08 |
3.6.1076 | 2016-05-02 |
3.8.1199 | 2016-06-27 |
3.10.3004 | 2016-08-10 |
(Got this using a bit of awk: git log --tags --simplify-by-decoration --pretty="format:%ai %d" |egrep "\(tag: [0-9]"| awk -F" " '{ print $1 " " $5}'|awk -F"." '{print $1 "." $2 "." $3}'| awk -F" " '{print $2 " |" $1}'| sed 's/)//'|sed 's/,//'| sort -n|sort -u -t" " -k1,1 | awk '{print "|" $0 "|"}'
)
Installation or Setup
Single node Installation
- Pre-install NodeJS, Python and Java
- Select your installation document based on your platform https://docs.datastax.com/en/cassandra/3.x/cassandra/install/installTOC.html
- Download Cassandra binaries from https://cassandra.apache.org/download/
- Untar the downloaded file to
<installation location>
- Start the cassandra using
<installation location>/bin/cassandra
OR start Cassandra as a service -[sudo] service cassandra start
- Check whether cassandra is up and running using
<installation location>/bin/nodetool status
.
Ex:
- On Windows environment run
cassandra.bat
file to start Cassandra server andcqlsh.bat
to open CQL client terminal to execute CQL commands.
There are two ways that installation for a Single Node can be carried out.
You should have Oracle Java 8 or OpenJDk 8 (preferred for Cassandra versions > 3.0)
1. Installing a Debian package (installs Cassandra as a service)
Add the Cassandra version to the repository (replace the 22x with your own version for example for 2.7 use 27x)
echo "deb-src https://www.apache.org/dist/cassandra/debian 22x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
# Update the repository
sudo apt-get update
# Then install it
sudo apt-get install cassandra cassandra-tools
Now Cassandra can be started and stopped using:
sudo service cassandra start
sudo service cassandra stop
Check the status using:
nodetool status
Logs and Data directories are /var/log/cassandra
and /var/lib/cassandra
respectively.
2. Installing any version of Cassandra in form of binary tarball (installs Cassandra as a standalone process)
Download the Datastax version:
curl -L https://downloads.datastax.com/community/dsc-cassandra-version_number-bin.tar.gz | tar xz
Or Apache Cassandra binary tarball manually (from the site https://www.apache.org/dist/cassandra/)
Now untar this:
tar -xvzf dsc-cassandra-version_number-bin.tar.gz
Change the directory to install location:
cd install_location
Start Cassandra using:
sudo sh ./bin/cassandra
Stop using:
sudo kill -9 pid
Check:
./bin/nodetool status
And viola, you have a single-node test cluster for Cassandra. So just use cqlsh
in the terminal for Cassandra shell.
Configuration of Cassandra can be done in cassandra.yaml
in conf
folder in install_location
.