I'm doing some tests with PouchDb and CouchDb in order to optimize my Ionic app as much as possible. In databases with a few hundred documents, I have no problem, but in larger databases (starting from 20,000) the replication takes a while.

Currently, the way we replicate is using the selector parameter:

this.remoteDb.replicate.to(this.localDb, {
    selector: {
        "tipo": "Parte",
        "estado": 0
    }
})

The thing is, I started doing some tests thinking that if I used filters or views created on the server, the realization would be faster since these filters and views generate a series of indexes that should speed up the process of obtaining the documents. However, I found the following when replicating only those documents whose type is "Parte" and their status is 0:

Database with 31,090 documents.

Using selector:

this.remoteDb.replicate.to(this.localDb, {
    selector: {
        "tipo": "Parte",
        "estado": 0
    }
})

Returns 4 docs Takes 4910 ms

Using a filter defined on the server:

Server filter:

function(doc,req){
    return doc && doc.tipo===req.query.tipo && doc.estado === parseInt(req.query.estado)
}

Client code:

this.remoteDb.replicate.to(this.localDb, {
    filter: 'datos/tipo',
    query_params:{
        "tipo": "Parte",
        "estado": 0
    }
})

Using a view defined on the server:

Server view:

function (doc) {
    if(doc && doc.tipo && doc.tipo==='Parte' && doc.estado === 0)
        emit(doc.tipo, 1);
}

Client code:

this.remoteDb.replicate.to(this.localDb, {
    filter: '_view',
    view: 'pruebas/parte'
})

The thing is, I'm bringing this up because I don't know if it's normal or if I'm doing something wrong. I hope you can help me.

1

There are 1 best solutions below

2
Glynn Bird On

First of all using "selector" syntax vs JavaScript functions: selectors are much faster at filtering a CouchDB changes feed. Put simply, all of the work can be done inside of Erlang, without spinning up any JavaScript processes to decide whether a change should make it through.

It should be clear that this use-case is going to get slower the bigger the database. If you are "syncing" a very small subset of a large database, then CouchDB is having to spool through the entire changes feed (the history of the database) to find the handful of documents you need. This is find for very small databases, but slowish for 20k docs and progressively slower. If you intend to keep growing the database, then this solution isn't going to scale well. Imagine a database with 500M docs?!

I solved this for a customer by choosing a different technique for seeding the data in a new empty PouchDB database: populating the data with a query first, then replicating to catch up any recent changes. This is much faster with larger databases. It's written up here: https://blog.cloudant.com/2019/06/21/Replicating-from-a-Query.html.

In short:

  • query the remote database for the subset of data you need. This query should be backed by a secondary index to be quick and scalable.
  • fetch the document bodies of the documents you need using CouchDB's _bulk_get endpoint, which returns replication history for each document.
  • write these documents to PouchDB

^ this gives you the same data you would have had with replication but quicker.

You can then replicate from your remote database but providing a "since=now" parameter to get latest changes, or "since=".