AEM Oak Index Wonky Cost Estimation

286 Views Asked by At

i have a query like this:

SELECT * FROM [cq:Page] WHERE ISDESCENDANTNODE("/content/some/specific/path") 
AND LOWER([jcr:content/depth1/depth2/depth3/depth4/prop1]) = "someValue" 
AND LOWER([jcr:content/depth1/depth2/depth3/depth4/prop2]) = "someOtherValue"

And i've implemented an oak lucene-property index as follow:

 <myIndex
    jcr:primaryType="oak:QueryIndexDefinition"
    async="async"
    compatVersion="{Long}2"
    fulltextEnabled="{Boolean}false"
    includedPaths="[/content/some/specific/path]"
    type="lucene">
    <indexRules jcr:primaryType="nt:unstructured">
        <cq:Page jcr:primaryType="nt:unstructured">
            <properties jcr:primaryType="nt:unstructured">
                <prop1
                    jcr:primaryType="nt:unstructured"
                    name="jcr:content/depth1/depth2/depth3/depth4/prop1"
                    propertyIndex="{Boolean}true"/>
                <prop2
                    jcr:primaryType="nt:unstructured"
                    name="jcr:content/depth1/depth2/depth3/depth4/prop2"
                    propertyIndex="{Boolean}true"/>
            </properties>
        </cq:Page>
    </indexRules>
</myIndex>

Although "myIndex" is defined correctly it is more expensive (somehow) than the ootb index "ntBaseLucene", i've checked this ootb index and it is bigger in size and does not have this prop1 and prop2 which i am looking for as part of my query.

Here are some description of both as far as i could check:

MyIndex: (not being pick up)

  • Small in size
  • nodeType specific (cq:Page)
  • type Lucene
  • Cost is More expensive (while checking with queryDescription tool)

OOTB ntBaseLucene index (being pick up)

  • Big in size (at least 4 times bigger than mine)
  • No nodeType specific (nt:Base)
  • type Lucene
  • Cost is cheaper than mine (while checking with queryDescription tool)

Is there any reasonable reason why my index is more expensive than the ootb one? Any input is really appreciated.

2

There are 2 best solutions below

1
Paul Chibulcuteanu On

Have you tried generating the index via OAK index tool generator? It's free: http://oakutils.appspot.com/generate/index

Generated with the tool, the index looks something like:

  - compatVersion = 2
  - async = "async"
  - jcr:primaryType = oak:QueryIndexDefinition
  - evaluatePathRestrictions = true
  - type = "lucene"
  + indexRules 
   + cq:Page 
    + properties 
     + prop2 
      - name = "function*lower*@jcr:content/depth1/depth2/depth3/depth4/prop2"
      - propertyIndex = true
     + prop2_2 
      - name = "jcr:content/depth1/depth2/depth3/depth4/prop2"
      - propertyIndex = true
      - notNullCheckEnabled = true
     + prop1_3 
      - name = "jcr:content/depth1/depth2/depth3/depth4/prop1"
      - propertyIndex = true
      - notNullCheckEnabled = true
     + prop1 
      - name = "function*lower*@jcr:content/depth1/depth2/depth3/depth4/prop1"
      - propertyIndex = true

Besides the definition above make sure to add includedPaths but also excludedPaths to point to at least generic excluded paths that other lucene indexes have:

"/var","/etc/replication","/etc/workflow/instances","/jcr:system"
0
guvlnce On

it's a bit old but for the other people I just want to mention that if we want to target our custom index then we can use option index syntax. Please follow this URL By default, queries will use the index with the lowest expected cost (as in relational databases). But some cases, it is needed to specify which index(es) should be considered for a query: https://jackrabbit.apache.org/oak/docs/query/query-engine.html#Query_Option_Index_Tag "If you want to ensure a specific type of index is used for specific type of queries, for example custom indexes are used for custom queries. If there are multiple Lucene fulltext indexes with different aggregation rules, and the index data overlaps. In the query, you want to specify which aggregation rule to use. To temporarily work around limitations of the index implementation, for example incorrect cost estimation. To use index tags, add tags (a multi-valued String property) to the index(es) of choice, for example: /oak:index/lucene/tags = [x, y] Note each index can have multiple tags, and the same tag can be used in multiple indexes. The syntax to limit a query to a certain tag is: option(index tag ):

/jcr:root/content//element(*, nt:file)[jcr:contains(., 'test')] option(index tag x)

select * from [nt:file] where ischildnode('/content') and contains(*, 'test') option(index tag [x])