How to effectively use Range Queries in Cassandra, Hypertable, or HBase

Here is a quick and dirty tutorial on how to do range queries in your favorite BigTable clone datastore (although Cassandra is a BigTable/Dynamo hybrid). Depending on how you set your keys you can do some fun stuff (like your own secondary indexes). Lets say you have the following keys in the same keyspace:

my_app/logs/date=some_date1
my_app/logs/date=some_date2
my_app/logs/date=some_date3
my_app/records/employee/name=alice
my_app/records/employee/name=bob
my_app/records/employee/name=claris
my_app/records/employee/name=zed
your_app/logs/date=some_date1
your_app/logs/date=some_date2
your_app/logs/date=some_date3
your_app/records/employee/name=adam
your_app/records/employee/name=alice
your_app/records/employee/name=bob
your_app/records/employee/name=claris
your_app/records/employee/name=zed

Now let's say we want the entire keyspace for just your app. We would set the start key to "your_app/" and the end key to "your_app/~" where '~' is the last character in the ascii table (http://www.asciitable.com/). Note that if your keys have non-ascii characters, your end character would be different. 

If you want all the records from your app you would use
the start key set to "your_app/records/" and end key set to "your_app/records/~"

If you want just records from your app that have a name that starts with "a" then
start key would be set to "your_app/records/employee/name=a" and end key set to "your_app/records/employee/name=a~"

In Cassandra you'll get better performance if you are using lexicographical key partitioning, as opposed to random partitioning. With lexicographical partitioning the keys will be grouped together for far more efficient scans. You can set this in your configuration file, or during runtime through the system manager. 

Now go do some range queries. 
§


Posterous theme by Cory Watilo