How does Dremel or its implementation (say Drill) handle large columnar data layout in memory?

141 Views Asked by Dev At 28 August 2015 at 05:22

I am going through the white paper of Google Dremel. I came to know it converts complex data into columnar data layout.

At what location is this data stored?

As Drill has no central metadata repository, I assume it must be in-memory.

Therefore how does Drill handle this data when I have billions of rows?

Original Q&A

There are 1 best solutions below

catpaws On 28 August 2015 at 17:56 BEST ANSWER

To get complete, consistent query results from billions of rows, you'll use a distributed file system connected to multiple Drillbits, simulate a distributed file system by copying files to each node, or use an NFS volume, such as Amazon Elastic File System. Drill performs performant querying of big data using a number of techniques, including these:

Relies on the cluster nodes to handle failures (doesn't spend time on failure-related tasks).
Uses an in-memory data model that's hierarchical and columnar (doesn't access the disk for columns that are not involved in an analytic query, processing the columnar data without row materialization).
Uses columnar storage optimizations and execution (keeps memory footprint low).
Uses vectorization to work on arrays of values from different records rather than single values from one record at a time.

For more information, see http://drill.apache.org/docs/performance/.

How does Dremel or its implementation (say Drill) handle large columnar data layout in memory?

There are 1 best solutions below

Related Questions in APACHE-DRILL

Related Questions in DREMEL

Trending Questions

Popular # Hahtags

Popular Questions