How to search in cassandra case-insensitive?

1.4k Views Asked by At

I am learning cassandra with Python and specially with Django 2 using cqlengine. I am trying to search in database where I search for string that starts with the search parameter but I want to make it case insensitive. So if I have following data

-------------------------------
|    PKID    |     String     |
-------------------------------
|    1234    |     FOObar     |
|    4321    |     FoOBar     |
|    5665    |     IreALLy    |
|    5995    |     DontknoW   |
|    8765    |     WHatTOdo   |
|    4327    |     foobaR     |
-------------------------------

So if I want to search for string that starts with foo, I should get all three records. I searched for the solution and I found one comment on stackoverflow that everything is byte in cassandra and so it is not possible but I also found something that says I need to write custom function to do it.

For Django I am using django-cassandra-engine to create model. It is an implementation of cqlengine for django. So when I create my model, is there anything that I need to add in it? My test model is

class TestModel(DjangoCassandraModel):
    key_id = columns.UUID(primary_key=True, default=uuid.uuid4)
    string = columns.Text()

I looked for it in cqlengine docs but couldn't find anything helpful. So I am seeking for help here on stackoverflow.

Thanks.

1

There are 1 best solutions below

2
Alex Ott On

You can use so-called SASI-index (SSTable Attached Secondary Index) to do it in Cassandra itself (although it's marked as experimental feature). You can define indexes for doing prefix, range, or substring search, and when configuring index, you can specify that you want case-insensitive strings:

CREATE CUSTOM INDEX index_name ON table (field) 
  USING 'org.apache.cassandra.index.sasi.SASIIndex'
  WITH OPTIONS = { 
    'mode': 'PREFIX', // if you want to search only for 'starting with'
    'case_sensitive': 'false'
};

Although I'm not sure, will cqlengine pickup this functionality out of the box, or not.

Here is very detailed blog post about SASI-indexes.