Databricks: Z-order vs partitionBy

9.8k Views Asked by patryk_ostrowski At 27 January 2022 at 15:45

I am learning Databricks and I have some questions about z-order and partitionBy. When I am reading about both functions it sounds pretty similar. Both functions are grouping data in some way that accelerate reading operations. Also it s looks like partitionBy is good with join operations but I don t really understand what function should I use when I want only to read data. Can you tell how should I think about both functions to use it correctly?

Original Q&A

There are 1 best solutions below

Alex Ott On 27 January 2022 at 18:11 BEST ANSWER

Partitioning physically splits the data into different files/directories having only one specific value, while ZOrder provides clustering of related data inside the files that may contain multiple possible values for given column.

Partitioning is useful when you have a low cardinality column - when there are not so many different possible values - for example, you can easily partition by year & month (maybe by day), but if you partition in addition by hour, then you'll have too many partitions with too many files, and it will lead to big performance problems.

ZOrder allows to create bigger files that are more efficient to read compared to many small files.

But you can combine both partitioning with ZOrder - for example partition by year/month, and ZOrder by day - that will allow to collocate data of the same day close to each other, and you can access them faster (because you read fewer files).

Besides ZOrder, you can also use data skipping to efficiently filter out files that doesn't contain data you need for your query.

You can read about data skipping & ZOrder in the following blog post.

Databricks: Z-order vs partitionBy

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in DATABRICKS

Related Questions in PARTITIONING

Related Questions in DELTA-LAKE

Related Questions in Z-ORDER

Trending Questions

Popular # Hahtags

Popular Questions