Memory usage in C++ program

1.4k Views Asked by At

I wrote a program that needs to handle a very large data with the following libraries:

  • vector
  • boost::unordered_map
  • boost::unordered_multimap

So, I'm having memory problems (The program uses a LOT) and I was thinking maybe I can replace this libraries (with something that already exists or my own implementations):

So, three questions:

  • How much memory I'd save if I replace vector with a C array? Is it worth it?
  • Can someone explain how is the memory used in boost::unordered_map and boost::unordered_multimap in the current implementation? Like what's stored in order to achieve their performance.
  • Can you recommend me some libraries that outperform boost::unordered_map and boost::unordered_multimap in memory usage (But not something too slow) ?
5

There are 5 best solutions below

0
Jcao02 On BEST ANSWER

I ended up by changing boost::unordered_multimap for std::unordered_map of vector.

boost::unordered_multimap consumes more than two times the memory consumed by std::unordered_map of vector due the extra pointers it keeps (at least one extra pointer per element) and the fact that it stores the key and the value of each element, while unordered_map of vector only stores the key once for a vector that contains all the colliding elements.

In my particular case, I was trying to store about 4k million integers, consuming about 15 GB of memory in the ideal case. Using the multimap I get to consume more than 40 GB while using the map I use about 15 GB (a little bit more due the pointers and other structures but their size if despicable).

4
zmbq On

std::vector is memory efficient. I don't know about the boost maps, but the Boost people usually know what they're doing, I doubt you'll save a lot of memory by creating your own variants.

You can do a few other things to help with memory issues:

  1. Compile in 64 bit. Running out of memory in a 64 bit process is very hard.
  2. You won't run out of memory, but memory might get swapped out. You should instead see if you need to load everything into memory at once, perhaps you can work on chunks of the data at a time.
  3. As a side benefit, working on a chunk of the data at a time allows you to run your code in parallel.

With memory being so cheap nowadays, so that allocating 10GB of RAM is very simple, I guess your bottlenecks will be in the processing you do of the data, not of allocating the data.

4
mrjoltcola On

std::vector is basically a contiguous array, plus a few bytes of overhead. About the only way you'll improve with vector is by using a smaller element type. Can you store a short int instead of a regular int? If so, you can cut the vector memory down by half.

Are you perhaps using these containers to hold pointers to many objects on the heap? If so, you may have a lot of wasted space in the heap that could be saved by writing custom allocators, or by doing away with a pointer to a heap element altogether, and by storing a value type within the container.

Look within your class types. Consider all pointer types, and whether they need to be dynamic storage or not. The typical class often has pointer members hanging off a base object, which means a single object is a graph of memory chunks in itself. The more you can inline your class members, the more efficient your use of the heap will be.

RAM is cheap in 2014. Easy enough to build x86-64 Intel boxes with 64-256GB of RAM and Solid State Disk as fast swap if your current box isn't cutting it for the project. Hopefully this isn't a commercial desktop app we are discussing. :)

0
olgierdh On

If you have only one set of data and multiple ways of accessing it you can try to use boost::multi_index here is the documentation.

0
Joaquín M López Muñoz On

These two articles explain the data structures underyling some common implementations of unordered associative containers:

Even though there are some differences between implementations, they are modest --one word per element at most. If you go with minimum-overhead solutions such as sorted vectors, this would gain you 2-3 words per element, not even a 2x improvement if your objects are large. So, you'd probably be better off resorting to an environment with more memory or radically changing your approach by using a database or something.