ComputeIfAbsent wrong map size

143 Views Asked by At

I have tried below code in order to have a unique id for each jj.

As far as I know, computeIfAbsent is thread-safe but:

    public static void main(String[] args) throws InterruptedException {
        ExecutorService executorService =
                Executors.newFixedThreadPool(4);

        final Map<String, ConcurrentHashMap<String, Short>> threadSafeMap = new ConcurrentHashMap<>();
        threadSafeMap.put("0", new ConcurrentHashMap<>());
        threadSafeMap.put("1", new ConcurrentHashMap<>());

        for (int i = 1; i <= 10; i++) {
            final int jj = i;
            executorService.submit(() -> {
                int                                key              = jj % 2;
                final ConcurrentMap<String, Short> idByName = threadSafeMap.get(String.valueOf(key));
                return idByName.computeIfAbsent(String.valueOf(jj), x -> (short) idByName.size());
            });
        }
        executorService.shutdown();
        executorService.awaitTermination(5, TimeUnit.SECONDS);
        System.out.println(threadSafeMap);
    }

Actual value: {0={2=0, 4=0, 6=2, 8=3, 10=4}, 1={1=0, 3=0, 5=2, 7=3, 9=4}}

Expected value for example (due to concurrency): {0={2=0, 4=1, 6=2, 8=3, 10=4}, 1={1=1, 3=0, 5=2, 7=3, 9=4}}

The problem is that I have with 2=0 and 4=0, which is wrong values should be unique!

btw using interger instead of short solve the issue! Could you please help?

2

There are 2 best solutions below

6
DuncG On

Your assumption is wrong, as many different threads could be executing the same mapping function concurrently for different keys.

ConcurrentHashMap is thread-safe but permits concurrent updates to the map. Some callers which are accessing similar parts of the underlying mapping tables may block while waiting for another thread to finish.

You won't find that computeIfAbsent(someKey,mapFunc) runs the mapping function for the same key twice, because it is an atomic operation for that key. So second or concurrent caller would see the value of the first call. However another computeIfAbsent(anotherKey,mapFunc) could be running at exactly the same time which is why your mapping function may evaluate size() to be same value for multiple keys.

The javadoc states:

The supplied function is invoked exactly once per invocation of this method if the key is absent, else not at all.

Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple.

If you wish to have unique values you could use AtomicInteger counter.

3
Rogue On

Something to note with ConcurrentHashMap is how the size of the map is calculated:

public int size() {
    long n = sumCount();
    return ((n < 0L) ? 0 :
            (n > (long)Integer.MAX_VALUE) ? Integer.MAX_VALUE :
            (int)n);
}

final long sumCount() {
    CounterCell[] cs = counterCells;
    long sum = baseCount;
    if (cs != null) {
        for (CounterCell c : cs)
            if (c != null)
                sum += c.value;
    }
    return sum;
}

These "counter cells" are updated upon a call to addCount internally. At the end of the #computeIfAbsent method we have:

if (val != null)
    addCount(1L, binCount);
return val;

However, before this code is reached, the lambda expression you are passing will have already been evaluated earlier on in the method. For instance, when we are first adding values, they will synchronize on a ReservationNode:

Node<K,V> r = new ReservationNode<K,V>();
synchronized (r) {
    if (casTabAt(tab, i, null, r)) { //compare-and-swap
        binCount = 1;
        Node<K,V> node = null;
        try {
            if ((val = mappingFunction.apply(key)) != null) //runs the mapping function
                node = new Node<K,V>(h, key, val);
            } finally {
                setTabAt(tab, i, node);
            }
        }
    }
}

src: ConcurrentHashMap#computeIfAbsent

The above code is quite suspect to me, and leaves me thinking that nothing would be truly blocked/synchronized on the creation phase here. Later on in the method (once the table entry exists), you'll see it doing synchronized(f) (the hash bucket) in order to compute/put the new values, which then causes the sequential insertion you see above.

So, those concurrent updates at the start (when we are first initializing the map) are not only capable of running in parallel, but even with proper synchronization in place could potentially retrieve a 0 for the Map's size before the end of the #computeIfAbsent call.