Which NoSQL Database for Mostly Writing

2.6k Views Asked by At

I'm working on a system that will generate and store large amounts of data to disk. A previously developed system at the company used ordinary files to store its data but for several reasons it became very hard to manage.

I believe NoSQL databases are good solutions for us. What we are going to store is generally documents (usually around 100K but occasionally can be much larger or smaller) annotated with some metadata. Query performance is not top priority. The priority is writing in a way that I/O becomes as small a hassle as possible. The rate of data generation is about 1Gbps, but we might be moving on 10Gbps (or even more) in the future.

My other requirement is the availability of a (preferably well documented) C API. I'm currently testing MongoDB. Is this a good choice? If not, what other database system can I use?

2

There are 2 best solutions below

9
Gates VP On

The rate of data generation is about 1Gbps,... I'm currently testing MongoDB. Is this a good choice?

OK, so just to clarify, your data rate is ~1 gigaBYTE per 10 seconds. So you are filling a 1TB hard drive every 20 minutes or so?

MongoDB has pretty solid write rates, but it is ideally used in situations with a reasonably low RAM to Data ratio. You want to keep at least primary indexes in memory along with some data.

In my experience, you want about 1GB of RAM for every 5-10GB of Data. Beyond that number, read performance drops off dramatically. Once you get to 1GB of RAM for 100GB of data, even adding new data can be slow as the index stops fitting in RAM.

The big key here is:

What queries are you planning to run and how does MongoDB make running these queries easier?

Your data is very quickly going to occupy enough space that basically every query will just be going to disk. Unless you have a very specific indexing and sharding strategy, you end up just doing disk scans.

Additionally, MongoDB does not support compression. So you will be using lots of disk space.

If not, what other database system can I use?

Have you considered compressed flat files? Or possibly a big data Map/Reduce system like Hadoop (I know Hadoop is written in Java)

If C is key requirement, maybe you want to look at Tokyo/Kyoto Cabinet?


EDIT: more details

MongoDB does not support full-text search. You will have to look to other tools (Sphinx/Solr) for such things.

Larges indices defeat the purpose of using an index.

According to your numbers, you are writing 10M documents / 20 mins or about 30M / hour. Each document needs about 16+ bytes for an index entry. 12 bytes for ObjectID + 4 bytes for pointer into the 2GB file + 1 byte for pointer to file + some amount of padding.

Let's say that every index entry needs about 20 bytes, then your index is growing at 600MB / hour or 14.4GB / day. And that's just the default _id index.

After 4 days, your main index will no longer fit into RAM and your performance will start to drop off dramatically. (this is well-documented under MongoDB)

So it's going to be really important to figure out which queries you want to run.

2
Maksym Polshcha On

Have a look at Cassandra. It executes writes are much faster than reads. Probably, that's what you're looking for.