Cassandra Data Modeling - Designing Columns

30 Views Asked by At

I have a data model in my Java application as below for which I am looking to design columns in an existing Cassandra table (this is my first time working on Cassandra)

public class Domain {
    private StringType strData;
    private LongType longData;
    private DoubleType doubleData;

    // constructor, setters, getters
}


public class StringType {
   private String key;
   private String value;

   // constructor, getter, setters
}

public class LongType {
   private String key;
   private Long value;

   // constructor, getter, setters
}

public class DoubleType {
   private String key;
   private Double value;

   // constructor, getter, setters
}

Existing Cassandra table

CREATE TABLE port(
   user_id UUID,
   account_id UUID,
   trade_date TIMESTAMP,
   shares INT,
   price_per_share DECIMAL,
   PRIMARY KEY ((user_id, account_id), trade_id)
);

I have an existing table in Cassandra and I am looking to add columns to it so as to be able to store Domain object in the table - 1:1 mapping between a table row and a Domain object

Few facts:

  • Only one of the 'strData', 'longData', 'doubleData' would be populated for an instance of Domain object
  • Need to have the ability to query data in the table based on 'key'
  • Anticipate adding more properties to Domain class like BooleanData, Custom Nested Objects which am planning to store as Blob in Cassandra

How should I go about designing the columns? I am able to do it by adding 6 columns to represent key/value for each possible Types i.e. StringType, DoubleType, LongType but is there another way to do this?

1

There are 1 best solutions below

4
Madhavan On

Could you provide how your existing schema look like? There isn't enough information here to properly propose a solution. Here is my take based on couple assumptions.

Also, you can leverage this free browser-based data modeling course to learn how to perform data modeling in Cassandra.

CREATE TABLE IF NOE EXISTS domain (
  key text,
  key_type text,
  other_clustering_cols <>,
  ...
  PRIMARY KEY ((key_type, key), other_clustering_cols)
);

If you have a table like as above, you could simply persist the values as,

INSERT INTO domain(key, key_type, other_clustering_cols, ...) VALUES ('key1', 'string', <clustering key cols values>, ...);

On the application side, you could leverage the data based on the key_type column value to parse it to the needed type and then simply query the domain table as:

SELECT ... FROM domain WHERE key = ? AND key_type = ? AND other_clustering_cols = ?;

(OR)

SELECT ... FROM domain WHERE key = ? AND key_type = ?;

Again, I just assumed your existing table schema and proposed the above, but if you were to update your original question with additional details - existing schema, query / access patterns, etc., we may be able to design this much more effectively.