Google's Dremel algorithm supports top-k queries. Could somebody tell me what algorithm that top-k query makes use of?
In google's dremel, what algorithm that top-k query makes use of?
266 Views Asked by user1562158 At
2
There are 2 best solutions below
0
On
I guess you know about the Dremel Paper?
Here is a link: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36632.pdf
It says:
Some Dremel queries, such as top-k and count-distinct, return approximate results using known one-pass algorithms (e.g., [4]).
The reference is the following:
[4] Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting Distinct Elements in a Data Stream. In RANDOM, pages 1–10, 2002.
Does this help?
like a Heap ?
A heap can be used to answer query asking for top k elements in a sorted list, in a O(nlogk) time.
see http://stevehanov.ca/blog/index.php?id=122