When immutable collections are preferable than concurrent

3.6k Views Asked by At

Recently read about immutable collections. They are recommended to be used as a thread safe for reading, when the read operations are performed more often than write.

Then I want to test read performance ImmutableDictionary vs ConcurrentDictionary. Here is this very simple test (in .NET Core 2.1):

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;

namespace ImmutableSpeedTests
{
    class Program
    {
        public class ConcurrentVsImmutable
        {
            public int ValuesCount;
            public int ThreadsCount;

            private ImmutableDictionary<int, int> immutable = ImmutableDictionary<int, int>.Empty;
            private ConcurrentDictionary<int, int> concurrent = new ConcurrentDictionary<int, int>();

            public ConcurrentVsImmutable(int valuesCount, int threadsCount)
            {
                ValuesCount = valuesCount;
                ThreadsCount = threadsCount;
            }

            public void Setup()
            {
                // fill both collections. I don't measure time cause immutable is filling much slower obviously.
                for (var i = 0; i < ValuesCount; i++)
                {
                    concurrent[i] = i;
                    immutable = immutable.Add(i, i);
                }
            }

            public async Task<long> ImmutableSum() => await Sum(immutable);

            public async Task<long> ConcurrentSum() => await Sum(concurrent);

            private async Task<long> Sum(IReadOnlyDictionary<int, int> dic)
            {
                var tasks = new List<Task<long>>();

                // main job. Run multiple tasks to sum all values.
                for (var i = 0; i < ThreadsCount; i++)
                    tasks.Add(Task.Run(() =>
                    {
                        long x = 0;
                        foreach (var key in dic.Keys)
                        {
                            x += dic[key];
                        }
                        return x;
                    }));

                var result = await Task.WhenAll(tasks.ToArray());
                return result.Sum();
            }
        }

        static void Main(string[] args)
        {
            var test = new ConcurrentVsImmutable(1000000, 4);

            test.Setup();

            var sw = new Stopwatch();

            sw.Start();
            var result = test.ConcurrentSum().Result;
            sw.Stop();

            // Convince that the result of the work is the same
            Console.WriteLine($"Concurrent. Result: {result}. Elapsed: {sw.ElapsedTicks}.");

            sw.Reset();
            sw.Start();
            result = test.ImmutableSum().Result;
            sw.Stop();

            Console.WriteLine($" Immutable. Result: {result}. Elapsed: {sw.ElapsedTicks}.");

            Console.ReadLine();
        }
    }
}

You can run this code. Elapsed time in ticks will differ from time to time but the time spent by ConcurrentDictionary is several times less than by ImmutableDictionary.

This experiment makes me embarrassed. Did I do it wrong? What the reason to use immutable collections if we have concurrent? When they are preferable?

4

There are 4 best solutions below

4
On BEST ANSWER

Immutable collections are not alternative to concurrent collections. And the way they are designed to reduce memory consumption, they are bound to be slower, the trade off here is to use less memory and thus by using less n operations to do anything.

We usually copy collections to other collections to achieve immutability to persist state. Lets see what it means,

 var s1 = ImmutableStack<int>.Empty;
 var s2 = s1.Push(1);
 // s2 = [1]

 var s3 = s2.Push(2);
 // s2 = [1]
 // s3 = [1,2]

 // notice that s2 has only one item, it is not modified..

 var s4 = s3.Pop(ref var i);

 // s2 = [1];
 // still s2 has one item...

Notice that, s2 always has only one item. Even if all items are removed.

The way all data is stored internally is a huge tree and your collection is pointing to a branch which has descendants that represents initial state of the tree.

I don't think the performance can be matched with concurrent collection where goals are totally different.

In concurrent collection, you have a single copy of collection accessed by all threads.

In immutable collection you have virtually isolated copy of a tree, navigating that tree is always costly.

It is useful in transactional system, where if a transaction has to be rolled back, state of collection can be retained in commit points.

3
On

This is a criticism that's been made before.

As Akash already said, ImmutableDictionary works with an internal tree, instead of a hashset.

One aspect of this is that you can improve the performance slightly if you build the dictionary in one step instead of iteratively adding all the keys:

  immutable = concurrent.ToImmutableDictionary();

Enumerating a hashset and a balanced tree are both O(n) operations. I took the average of a few runs on a single thread for varying container size and get results consistent with that:

Graph of tick count vs collection size

I don't know why the immutable slope is 6x steeper. For now I'll just assume its doing tricky nonblocking tree stuff. I assume this class would be optimized for random stores and reads rather than enumeration.

To identify what exact scenarios ImmutableDictionary wins at, we'd need to wrap a concurrent dictionary to provide some level of immutability, and test both classes in the face of levels of read/write contention.

Not a serious suggestion, but a counterpoint to your test is to use immutability to "cheat" over multiple iterations by comparing:

        private ConcurrentDictionary<object, long> cache = new ConcurrentDictionary<object, long>();
        public long ImmutableSum() 
        {
            return cache.GetOrAdd(immutable, (obj) => (obj as ImmutableDictionary<int, int>).Sum(kvp => (long)kvp.Value));                
        }

        public long ConcurrentSum() => concurrent.Sum(kvp => (long)kvp.Value);

This makes a quite a difference on subsequent calls to sum an unchanged collection!

0
On

In .NET 8 there is a new class named FrozenDictionary<TKey,TValue> that has great improvements on read. You can read more about it here:

0
On

The two are not mutually exclusive. I use both.

If your dictionary is small the read performance of ImmutableDictionary will be superior to ConcurrentDictionary as K1*Log(N) < K2 where Log(N) < K2/K1 (when the hash table overhead is worse than tree traversal).

I personally find the write semantics of the Immutable collections easier to understand than those of the concurrent collections as they tend to be more consistent, especially when dealing with AddOrUpdate() and GetOrAdd().

In practice, I find that there have many cases in which I have a good number of small (or empty) dictionaries that are more appropriate as ImmutableDictionary and some larger ones that warrant the use of ConcurrentDictionary.

Having said that, if they are small then it doesn't make much of a difference what you use. Regarding the answer of Peter Wishart, the enumeration performance of ImmutableDictionary is higher than ConcurrentDictionary (for reasonable N) because tree traversal is brutal in terms of memory latency on modern cache architectures.