Dictionary of terms index and posting list compression

21 Views Asked by At

I'm studying and developing a simple information retrieval system base on the book "Introduction to information retrieval" and there are two topic about the dictionary and posting list compression that I'm not understanding very well.

In the chapter 5 of the book some techniques about the compression of the dictionary and posting list are explained:

  • blocked compression {eg. "This is an example" -> "4this2is2an7example"} *not considering stop words.
  • in the posting list then there will be, for each term: (in this case the pointer is the position of the k block and then a linear scanner is performed.

enter image description here

Regarding this consideration I have a question:

  • How the compression works? When I compress my dictionary of terms to fit it entire in memory then to perform a query I need to decompress it? And what if the decompressed dictionary is not able to be fitted in memory?
  • How the posting list compression works? In my case my posting list in formed by different terms: [document_frequency doc_id:document_frequency_in_doc_id ...]. In the book some technique (gamma compression) for compressing the doc_id are explained but what about other useful information to perform ranking retrieval? For example the frequency of the term in the document or the position of the term.
0

There are 0 best solutions below