ElasticSearch query is slow and first query always takes too much time

4.9k Views Asked by At

I'm new to elasticsearch, my queries are slow when i do should match with multiple search terms and also for matching nested documents, basically it is taking 7-10 sec for first query and 5-6 sec later on due to elasticsearch cache, but queries for non nested objects with just match works fast i.e within 100ms .

i'm running elastic search in aws instance with 250GB RAM and 500GB disk space, i have one template and 204 indexes with total of around 107 Million document indexed with 2 shards per index in a single node, and i have kept 30GB heap size.

following is my memory usage: memory

i can have nested objects more than 50k so i have increased length to 500k, searching on this nested objects is taking too much time and any OR (should match) operations on fields other than nested also taking time, is there any way i can boost my query performance for nested objects? or is there anything wrong in my configuration? And is there any way i can make first query also faster?

{
  "index_patterns": [
    "product_*"
  ],
  "template": {
    "settings": {
      "index.store.type": "mmapfs",
      "number_of_shards":2,
      "number_of_replicas": 0,
      "index": {
        "store.preload": [
          "*"
        ],
        "mapping.nested_objects.limit": 500000,
        "analysis": {
          "analyzer": {
            "cust_product_name": {
              "type": "custom",
              "tokenizer": "standard",
              "filter": [
                "lowercase",
                "english_stop",
                "name_wordforms",
                "business_wordforms",
                "english_stemmer",
                "min_value"
              ],
              "char_filter": [
                "html_strip"
              ]
            },
            "entity_name": {
              "type": "custom",
              "tokenizer": "standard",
              "filter": [
                "lowercase",
                "english_stop",
                "business_wordforms",
                "name_wordforms",
                "english_stemmer"
              ],
              "char_filter": [
                "html_strip"
              ]
            },
            "cust_text": {
              "type": "custom",
              "tokenizer": "standard",
              "filter": [
                "lowercase",
                "english_stop",
                "name_wordforms",
                "english_stemmer",
                "min_value"
              ],
              "char_filter": [
                "html_strip"
              ]
            }
          },
          "filter": {
            "min_value": {
              "type": "length",
              "min": 2
            },
            "english_stop": {
              "type": "stop",
              "stopwords": "_english_"
            },
            "business_wordforms": {
              "type": "synonym",
              "synonyms_path": "<some path>/business_wordforms.txt"
            },
            "name_wordforms": {
              "type": "synonym",
              "synonyms_path": "<some path>/name_wordforms.txt"
            },
            "english_stemmer": {
              "type": "stemmer",
              "language": "english"
            }
          }
        }
      }
    },
    "mappings": {
      "dynamic": "strict",
      "properties": {
        "product_number": {
          "type": "text",
          "analyzer": "keyword"
        },
        "product_name": {
          "type": "text",
          "analyzer": "cust_case_name"
        },
        "first_fetch_date": {
          "type": "date",
          "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy-MM||yyyy"
        },
        "last_fetch_date": {
          "type": "date",
          "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy-MM||yyyy"
        },
        "review": {
          "type": "nested",
          "properties": {
            "text": {
              "type": "text",
              "analyzer": "cust_text"
            },
            "review_date": {
              "type": "date",
              "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy-MM||yyyy"
            }
          }
        }
      }
    },
    "aliases": {
      "all_products": {}
    }
  },
  "priority": 200,
  "version": 1,
}

if i search for any specific term in review text the response is taking too much time.

{
    "_source":{
        "excludes":["review"]
    },
    "size":1,
    "track_total_hits":true,
    "query":{
        "nested":{
            "path":"review",
            "query":{
                "match":{
                    "review.text":{
                        "query":"good",
                        "zero_terms_query":"none"
                    }
                }
            }
        }
    },
    "highlight":{
        "pre_tags":[
            "<b>"
        ],
        "post_tags":[
            "</b>"
        ],
        "fields":{
            "product_name":{
                
            }
        }
    }
}

I'm sure I'm missing something obvious.

1

There are 1 best solutions below

5
Jaycreation On

Easy things : track_total_hits should be set to false. A maintenance with a force merge could help also.

The difference between fisrt and next request time is due to elasticsearch cache.

But If my comprehension is good you can have more than 50k reviews on a doc ? If it's right it's to much. Could you think of inverting your mapping ? having a review index which embed the related product in and object. It should be much faster.

PUT reviews 
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      },
      "review_date": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy-MM||yyyy"
      },
      "product": {
        "properties": {
          "product_number": {
            "type": "text",
            "analyzer": "keyword"
          },
          "product_name": {
            "type": "text"
          },
          "first_fetch_date": {
            "type": "date",
            "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy-MM||yyyy"
          },
          "last_fetch_date": {
            "type": "date",
            "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy-MM||yyyy"
          }
        }
      }
    }
  }
}