Monday, 20 January 2014

Cassandra Data Modelling - Primary Keys

In the previous blog we discussed how data is stored in Cassandra by creating a keyspace and a table. We also inserted rows into the table. The rows in Cassandra are stored in the form of variable key/value pairs or columns. We also observed that each row in a table is referenced by a primary key or a row key. In this post we are going to discuss more about primary keys. Primary key concept in Cassandra is different from Relational databases. Therefore it is worth spending time to understand this concept.


In our last post we created a person table which had a person_id as a primary key column.
create table person (person_id int primary key, fname text, lname text, dateofbirth timestamp, email text, phone text );
We saw that the person_id was used as a row key to refer to person data.


Compound primary key:

As the name suggests, compound primary key is comprised of one or more columns which are referenced in the primary key. One component of compound primary key is called partition key where as the other component is called clustering key. Following are different variations of primary keys. Please note that K1, K2, K3,... and so on represent columns in the table. 

  1. K1: primary key has only one partition key and no cluster key.
  2. (K1, K2): column K1 is a partition key and column K2 is a cluster key.
  3. (K1,K2,K3,...): column K1 is a partition key and columns K2, K3 and so on make cluster key.
  4. (K1, (K2, K3,...)): It is same as 3 i.e column K1 is a partition key and columns K2,K3,... make cluster key.
  5. ((K1, K2,...), (K3,K4,...)): columns K1, K2 make partition key and columns K3,K4,... make cluster key. 
It is important to note that when compound key is K1,K2,K3 then the first key K1 becomes partition key and rest of the keys become part of the cluster key. In order to make composite partition keys we have to specify keys in parenthesis such as: ( ( K1,K2) , K3, K4).  In this case K1 & K2 are part of partition keys and K3, K4 are part of cluster key.

Partition key

The purpose of partition key is to identify the partition or node in the cluster which stores that row. When data is read or write from the cluster a function called Partitioner is used to compute the hash value of the partition key. This hash value is used to determine the node/partition which contains that row. For example rows whose partition key values range from 1000 to 1234 may reside in node A and rows with partition key values range from 1235 to 2000 may reside in node B as shown in figure 1. If a row contains partition key whose hash value is 1233 then it will be stored in node A.
Figure 1: Cluster with node A stores partition keys with hash values from 1000-1234 and node B with hash values from 1235-2000.

Clustering key

The purpose of clustering key is to store row data in sorted order. The sorting of data is based on columns which are included in the clustering key. This arrangement makes it efficient to retrieve data using clustering key.

To make these concepts clear we will consider example of weather forecast system. The purpose of this system is to store weather related data. First we will create a weather keyspace using cqlsh. Type the following command on cqlsh terminal window:

Examples

create keyspace weather with replication = {'class' : 'SimpleStrategy', 'replication_factor':1};
This creates weather keyspace with replication strategy 'SimpleStrategy' and replication_factor 1.


Now switch to weather keyspace:
use weather

We will create a table city which contains general weather information about that city. Type the following create statement into cqlsh.
create table city (cityid int, avg_tmp float, description text, primary key (cityid));
The above statement will create a table city with primary key cityid. As there is only one column in the primary key therefore the partition key would be cityid and there will no clustering key.

Type the following insert statements to enter some data into this table.
insert into city (cityid, avg_tmp, description) values (1,25.5,'Mild weather');
insert into city (cityid, avg_tmp, description) values (2,3,'Cold weather');

Check the data which we have just inserted into the table.

select * from city;

The output would be as follows:

 cityid | avg_tmp | description
--------+---------+--------------
      1 |    25.5 | Mild weather
      2 |       3 | Cold weather


We can see how Cassandra has stored this data under the hood by using cassandra-cli tool. Run cassandra-cli tool in a separate terminal window and type the following command on that terminal.

use weather;

list city;

RowKey: 1
=> (name=, value=, timestamp=1387128357537000)
=> (name=avg_tmp, value=41cc0000, timestamp=1387128357537000)
=> (name=description, value=4d696c642077656174686572, timestamp=1387128357537000)
-------------------
RowKey: 2
=> (name=, value=, timestamp=1387128377816000)
=> (name=avg_tmp, value=40400000, timestamp=1387128377816000)
=> (name=description, value=436f6c642077656174686572, timestamp=1387128377816000)

2 Rows Returned.
Elapsed time: 163 msec(s).

We can see from the above output that the cityid has become the row key and it identifies individual rows. We can use columns in the primary key to filter data in the select statement. Type the following command in the cqlsh window:

select * from city where cityid = 1;


We get the following output:

cityid | avg_tmp | description
--------+---------+--------------
      1 |    25.5 | Mild weather



Now we will create another table called forecast which records temperature of each city for every day. Type the following command on cqlsh:
create table forecast(cityid int,forecast_date timestamp,humidity float,chanceofrain float,wind float,feelslike int, centigrade int, primary key (cityid,forecast_date));

This statement creates forecast table with primary key ( city id, forecast_date). As primary key has two components therefore the first component is considered as partition key and the second component becomes the cluster key. Add some data into the table:

insert into forecast(cityid,forecast_date,humidity,chanceofrain,wind,feelslike,centigrade) values (1,'2013-12-10',0.76,0.1,10,8,8);
insert into forecast(cityid,forecast_date,humidity,chanceofrain,wind,feelslike,centigrade) values (1,'2013-12-11',0.90,0.3,12,4,4);
insert into forecast(cityid,forecast_date,humidity,chanceofrain,wind,feelslike,centigrade) values (1,'2013-12-12',0.68,0.2,6,3,3);

Notice that in this case value of partition key i.e cityid is same where as value of clustering key i.e forcast_date is different. Now retrieve this data from the table.

select * from forecast;

We get following output:

 cityid | forecast_date            | centigrade | chanceofrain | feelslike | humidity | wind
--------+--------------------------+------------+--------------+-----------+----------+------
      1 | 2013-12-10 00:00:00+0000 |          8 |          0.1 |         8 |     0.76 |   10
      1 | 2013-12-11 00:00:00+0000 |          4 |          0.3 |         4 |      0.9 |   12
      1 | 2013-12-12 00:00:00+0000 |          3 |          0.2 |         3 |     0.68 |    6

As expected we get three rows with all rows having same partition key values but different clustering key values. Lets check how it looks like in cassandra-cli. Go to cassandra cli and type the following command:
list forecast

It displays following output:

RowKey: 1
=> (name=2013-12-10 00\:00\:00+0000:, value=, timestamp=1386700252831000)
=> (name=2013-12-10 00\:00\:00+0000:centigrade, value=00000008, timestamp=1386700252831000)
=> (name=2013-12-10 00\:00\:00+0000:chanceofrain, value=3dcccccd, timestamp=1386700252831000)
=> (name=2013-12-10 00\:00\:00+0000:feelslike, value=00000008, timestamp=1386700252831000)
=> (name=2013-12-10 00\:00\:00+0000:humidity, value=3f428f5c, timestamp=1386700252831000)
=> (name=2013-12-10 00\:00\:00+0000:wind, value=41200000, timestamp=1386700252831000)
=> (name=2013-12-11 00\:00\:00+0000:, value=, timestamp=1386700252835000)
=> (name=2013-12-11 00\:00\:00+0000:centigrade, value=00000004, timestamp=1386700252835000)
=> (name=2013-12-11 00\:00\:00+0000:chanceofrain, value=3e99999a, timestamp=1386700252835000)
=> (name=2013-12-11 00\:00\:00+0000:feelslike, value=00000004, timestamp=1386700252835000)
=> (name=2013-12-11 00\:00\:00+0000:humidity, value=3f666666, timestamp=1386700252835000)
=> (name=2013-12-11 00\:00\:00+0000:wind, value=41400000, timestamp=1386700252835000)
=> (name=2013-12-12 00\:00\:00+0000:, value=, timestamp=1386700255855000)
=> (name=2013-12-12 00\:00\:00+0000:centigrade, value=00000003, timestamp=1386700255855000)
=> (name=2013-12-12 00\:00\:00+0000:chanceofrain, value=3e4ccccd, timestamp=1386700255855000)
=> (name=2013-12-12 00\:00\:00+0000:feelslike, value=00000003, timestamp=1386700255855000)
=> (name=2013-12-12 00\:00\:00+0000:humidity, value=3f2e147b, timestamp=1386700255855000)
=> (name=2013-12-12 00\:00\:00+0000:wind, value=40c00000, timestamp=1386700255855000)

1 Row Returned.
Elapsed time: 200 msec(s).

We can see from the output that Cassandra-cli has returned only one row instead of three rows as returned by cqlsh. The reason is that Cassandra stores only one row for each partition key. All the data associated to that partition key is stored as columns in the datastore. The data which we have stored through three different insert statements have the same cityid value i.e. 1 therefore all the data is saved in that row as columns.
If you remember we discussed before that the second component of a primary key is called clustering key. The role of clustering key is to group related items together. All the data which is inserted against same clustering key is grouped together. Lets see how it is done? If you recall the first insert statement which we have issued is as follows:
insert into forecast(cityid,forecast_date,humidity,chanceofrain,wind,feelslike,centigrade) values (1,'2013-12-10',0.76,0.1,10,8,8);
In this case all the columns such as: humidity, chainceofrain, wind, feelslike and centigrade will be grouped by value in forecast_date i.e 2013-12-10 00\:00\:00+0000. If we look at the output of Cassandra-cli we see that the first five column names have 2013-12-10 00\:00\:00+0000 as shown below:

=> (name=2013-12-10 00\:00\:00+0000:, value=, timestamp=1386700252831000)
=> (name=2013-12-10 00\:00\:00+0000:centigrade, value=00000008, timestamp=1386700252831000)
=> (name=2013-12-10 00\:00\:00+0000:chanceofrain, value=3dcccccd, timestamp=1386700252831000)
=> (name=2013-12-10 00\:00\:00+0000:feelslike, value=00000008, timestamp=1386700252831000)
=> (name=2013-12-10 00\:00\:00+0000:humidity, value=3f428f5c, timestamp=1386700252831000)
=> (name=2013-12-10 00\:00\:00+0000:wind, value=41200000, timestamp=1386700252831000)

In the above output, first column name is: 2013-12-10 00\:00\:00+0000 which represents clustering key. Note that this column does not have any key value. The value of the column is already stored in the column name. It is important to remember that in Cassandra column names can be of any binary type which is different from Relational databases where column names can only be text. The second column is: 2013-12-10 00\:00\:00+0000:centigrade. Its value is 00000008. Note that the forecast_date value has been appended in the column name. This is same with all the other columns which corresponds to 2013-12-10 00\:00\:00+0000. Similarly different values of forecast_dates are appended to other columns as shown below:

=> (name=2013-12-11 00\:00\:00+0000:, value=, timestamp=1386700252835000)
=> (name=2013-12-11 00\:00\:00+0000:centigrade, value=00000004, timestamp=1386700252835000)
=> (name=2013-12-11 00\:00\:00+0000:chanceofrain, value=3e99999a, timestamp=1386700252835000)
=> (name=2013-12-11 00\:00\:00+0000:feelslike, value=00000004, timestamp=1386700252835000)
=> (name=2013-12-11 00\:00\:00+0000:humidity, value=3f666666, timestamp=1386700252835000)
=> (name=2013-12-11 00\:00\:00+0000:wind, value=41400000, timestamp=1386700252835000)
=> (name=2013-12-12 00\:00\:00+0000:, value=, timestamp=1386700255855000)
=> (name=2013-12-12 00\:00\:00+0000:centigrade, value=00000003, timestamp=1386700255855000)
=> (name=2013-12-12 00\:00\:00+0000:chanceofrain, value=3e4ccccd, timestamp=1386700255855000)
=> (name=2013-12-12 00\:00\:00+0000:feelslike, value=00000003, timestamp=1386700255855000)
=> (name=2013-12-12 00\:00\:00+0000:humidity, value=3f2e147b, timestamp=1386700255855000)
=> (name=2013-12-12 00\:00\:00+0000:wind, value=40c00000, timestamp=1386700255855000)

In the above output, columns which are associated with clustering key: 2013-12-11 00\:00\:00+000 have columns names starting with 2013-12-11 00\:00\:00+000. Similarly columns which are associated with clustering key: 2013-12-12 00\:00\:00+0000 have column names starting with 2013-12-12 00\:00\:00+0000. As there are three different clustering key values therefore we get three rows when we run select statement in cqlsh.

Suppose that the city is very big and we want to store weather forecast separately for each region/town in that city. One option could be to define a composite partition key which is composed of cityid and regionid. In that case row key would be city id and region id. Lets create this table into the weather keyspace using cqlsh.

create table forecast_for_region(cityid int,regionid int, forecast_date timestamp,humidity float,chanceofrain float,wind float,feelslike int, centigrade int, primary key ((cityid,regionid),forecast_date)); 

Now insert some data into the table:
insert into forecast_for_region(cityid,regionid,forecast_date,humidity,chanceofrain,wind,feelslike,centigrade) values (1,24, '2013-12-10',0.76,0.1,10,8,8);
insert into forecast_for_region(cityid,regionid,forecast_date,humidity,chanceofrain,wind,feelslike,centigrade) values (1,24,'2013-12-11',0.90,0.3,12,4,4);
insert into forecast_for_region(cityid,regionid,forecast_date,humidity,chanceofrain,wind,feelslike,centigrade) values (1,24,'2013-12-12',0.68,0.2,6,3,3);

Check the data in cqlsh using select statement:
select * from forecast_for_region; 

The output would be as follows:
 cityid | regionid | forecast_date            | centigrade | chanceofrain | feelslike | humidity | wind
--------+----------+--------------------------+------------+--------------+-----------+----------+------
      1 |       24 | 2013-12-10 00:00:00+0000 |          8 |          0.1 |         8 |     0.76 |   10
      1 |       24 | 2013-12-11 00:00:00+0000 |          4 |          0.3 |         4 |      0.9 |   12
      1 |       24 | 2013-12-12 00:00:00+0000 |          3 |          0.2 |         3 |     0.68 |    6 

Check the output in cassandra-cli:

list forecast_for_region;

RowKey: 1:24
=> (name=2013-12-10 00\:00\:00+0000:, value=, timestamp=1386777901856000)
=> (name=2013-12-10 00\:00\:00+0000:centigrade, value=00000008, timestamp=1386777901856000)
=> (name=2013-12-10 00\:00\:00+0000:chanceofrain, value=3dcccccd, timestamp=1386777901856000)
=> (name=2013-12-10 00\:00\:00+0000:feelslike, value=00000008, timestamp=1386777901856000)
=> (name=2013-12-10 00\:00\:00+0000:humidity, value=3f428f5c, timestamp=1386777901856000)
=> (name=2013-12-10 00\:00\:00+0000:wind, value=41200000, timestamp=1386777901856000)
=> (name=2013-12-11 00\:00\:00+0000:, value=, timestamp=1386777901907000)
=> (name=2013-12-11 00\:00\:00+0000:centigrade, value=00000004, timestamp=1386777901907000)
=> (name=2013-12-11 00\:00\:00+0000:chanceofrain, value=3e99999a, timestamp=1386777901907000)
=> (name=2013-12-11 00\:00\:00+0000:feelslike, value=00000004, timestamp=1386777901907000)
=> (name=2013-12-11 00\:00\:00+0000:humidity, value=3f666666, timestamp=1386777901907000)
=> (name=2013-12-11 00\:00\:00+0000:wind, value=41400000, timestamp=1386777901907000)
=> (name=2013-12-12 00\:00\:00+0000:, value=, timestamp=1386777901911000)
=> (name=2013-12-12 00\:00\:00+0000:centigrade, value=00000003, timestamp=1386777901911000)
=> (name=2013-12-12 00\:00\:00+0000:chanceofrain, value=3e4ccccd, timestamp=1386777901911000)
=> (name=2013-12-12 00\:00\:00+0000:feelslike, value=00000003, timestamp=1386777901911000)
=> (name=2013-12-12 00\:00\:00+0000:humidity, value=3f2e147b, timestamp=1386777901911000)
=> (name=2013-12-12 00\:00\:00+0000:wind, value=40c00000, timestamp=1386777901911000)

1 Row Returned.
Elapsed time: 225 msec(s).
In the above output we can see that the Row key is the combination of cityid and regionid. This means data for different regions within same city can reside on different partitions or nodes in a cluster.

I hope these examples would have helped you to clarify few concepts of data modelling in Cassandra. Please feel free to leave any comments related to this post.

29 comments:

  1. Ganesh Rajasekaran20 July 2014 at 01:04

    Hey Shakir Ali,
    It is very good article. I was trying to understand how the partitioned and cluster key works, your article is very clear and upto the point, rather too much theoritcal.

    Thanks,
    Ganesh Rajasekaran.

    ReplyDelete
  2. Very informative, thanks!

    ReplyDelete
  3. Thanks, that was very helpful.

    ReplyDelete
  4. Thanks. Explained very clearly. You should write more articles on cassandra :)

    ReplyDelete
  5. Excellent articles. Made things lot more clear. Thank you.

    ReplyDelete
  6. Excellent article.

    ReplyDelete
  7. excellent explanation, excellent article!

    ReplyDelete
  8. Good Understanding of the topic.Explained and clear article.

    ReplyDelete
  9. Excellently explained. I hope you write more articles on Cassandra. I have just started learning Cassandra and your articles are an excellent resource for me to learn it.

    ReplyDelete
  10. Thanks. This was really helpful! definitely keep up the great articles!

    ReplyDelete
  11. Good one. I refer this post whenever I have doubts in Cassandra.

    ReplyDelete
  12. I would like to see the cli representation if the clustering key had another column not just forecast_date

    ReplyDelete
  13. I am new to Cassandra and have been searching for a very basic understanding with some examples, This is probably the best. Thanks Shakir !!! I would love to see more such articles , especially about what is column family, super column family etc.

    Thanks
    Pritam

    ReplyDelete
  14. Thanks! this is a good addition to the datastax training. Was looking for something to make this more clear.

    ReplyDelete
  15. Thanks, This is very useful information.

    Can we know from which node the record is coming in java using any library.

    ReplyDelete
  16. Fabulous work, Thanks a ton for sharing this :)

    ReplyDelete
  17. Great explanation, but I would add that Cassandra can store up to
    2 billion columns per row. That causes some limitation in your examples.

    ReplyDelete
  18. Very well written article. You have explained the concepts of Primary Key, Composite/Compound Key, Partition Key, Clustering Key very well. With your examples of the cassandra-cli output, you have also briefly introduced the storage format in terms of how the bytes are laid out on disk. However, these examples apply for Cassandra versions before Cassandra 3.x. It will be great if you can write similar article for the new storage format used in Cassandra 3.x.

    ReplyDelete
  19. Helps me a lot. Thanks for this.

    ReplyDelete
  20. Very good article. Insightful !

    ReplyDelete
  21. This comment has been removed by the author.

    ReplyDelete
  22. Excellent article! Thanks

    ReplyDelete
  23. This is great article for a beginner. You have documented key concepts with simple and easy to understand examples.

    ReplyDelete