What are the feasibility and implementation methods of partitioning data in ClickHouse?
What are the traditional RDBMS?
In ClickHouse, how feasible is data partitioning, and what implementation methods exist? Additionally, how do traditional Relational Database Management Systems (RDBMS) typically handle data partitioning?
You partition a table in ClickHouse just like you do in your favorite old-school RDBMs - using a
PARTITION BYclause.The difference is in how ClickHouse stores the data on disk. Every time you do an
INSERTinto aMergeTreetable, the rows being inserted go into their own folder called a part. You can get a lot of parts in ClickHouse, so insert your data wisely (either lots of rows at once or using async inserts). You don't want too many parts. (Parts merge in the background, but that's a story for another day.)When a table is partitioned, only rows from the same partition key can go into the same part. So let's say you partition by a column that has 100,000 unique values. Then you are guaranteed, even on your best day, to have 100,000 parts in your cluster. That's too many...which means your choice of partitioning key was not good.
In general, we have one recommendation for partitioning - especially when you are new to ClickHouse - and that is to only partition by month. All rows from the same month will be stored together, but that means on your best day you might only have 12 parts per year. (That's an extreme simplification...but it makes my point.)