How can we improve our AWS OpenSearch performance?

1.5k Views Asked by At

We are using AWS OpenSearch to ingest and merge data and see that the process is quite slow. Wondering if someone here can help:

Essentially we have 850 GB of data, 400 MM rows with indexing on a subset of the attributes already added to OpenSearch.

When we try to merge (or insert if match not found) it with another data provider, 220 MM rows based on 2 of those indexed attributes, the process is quite slow.

Our Open Search configuration:

3 Master nodes.

20 Data nodes. We have 1 replica. Didnt see too much difference keeping the replicas at 0, so just wanted to be on the safer side.

r6g.large.search machines.

60 shards, we think it is too much, but right now not sure if it is worth restarting the process. Or changing it on active instances is worth the risk.

Refresh interval is 900 sec. We are OK that indexing happens at a slower pace.

CPU utilization looks OK. Memory utilization also seems fine.

Indexing rate is at 4-8K Ops/min. Search rate at 170K-200K Ops/min (we think this is quite slow and don't know how to increase it linearly)

The process is running in bulk with 100 threads running on each of the three instances and each bulk operation with 500 rows.

Right now, it shows taking ~4 days for the operation to finish. The deleted documents(based on the merge) is like 1.5 MM records/hour and a couple of hundred thousand net new insertions in each hour.

Let me know if there are some pointers on how we can improve the performance. enter image description here enter image description here

Thanks, Kushal.

0

There are 0 best solutions below