NDepend searching for faster collection opportunities

62 Views Asked by At

I have a huge legacy code base and I would like to optimize it, make it faster. For this reason I thought about looking up opportunities where I can replace list and arrays with HashSets and Dictionaries.

There is the following NDepend query under .NET Framework Usage / System.collection

// <Name>Caution with List.Contains()</Name>
let containsMethods = ThirdParty.Methods.WithFullNameIn(
   "System.Collections.Generic.List<T>.Contains(T)",
   "System.Collections.Generic.IList<T>.Contains(T)",
   "System.Collections.ArrayList.Contains(Object)")

from m in Application.Methods.UsingAny(containsMethods) 
select m

This query is not enough. It will list one function with the following code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ListOptimisation
{
    class Program
    {
        static void Main(string[] args)
        {
            int aLength = 10000;
            List<int> aNumbers2Search = Enumerable.Range(0, aLength).ToList();

            List<int> aTestList = Enumerable.Range(0, aLength).ToList();
            int[] aTestArray = Enumerable.Range(0, aLength).ToArray();

            HashSet<int> aTestHash = new HashSet<int>(Enumerable.Range(0, aLength));
            Dictionary<int, int> aTestDictionary = new Dictionary<int, int>();
            for(int i = 0; i < aLength; ++i)
            {
                aTestDictionary.Add(i, i);
            }

            Search(aTestList, aNumbers2Search);
            SearchIList(aTestList, aNumbers2Search);
            SearchIEnumerable(aTestList, aNumbers2Search);
            Search(aTestArray, aNumbers2Search);
            SearchIList(aTestArray, aNumbers2Search);
            SearchIEnumerable(aTestArray, aNumbers2Search);
            Search(aTestHash, aNumbers2Search);
            SearchIEnumerable(aTestHash, aNumbers2Search);
            Search(aTestDictionary, aNumbers2Search);
        }

        private static void Search(List<int> testList_in, List<int> numbers2Search_in)
        {
            numbers2Search_in.ForEach(x => testList_in.Contains(x));
        }

        private static void Search(HashSet<int> testHash_in, List<int> numbers2Search_in)
        {
            numbers2Search_in.ForEach(x => testHash_in.Contains(x));
        }

        private static void Search(Dictionary<int, int> testDictionary_in, List<int> numbers2Search_in)
        {
            numbers2Search_in.ForEach(x => testDictionary_in.ContainsKey(x));
        }

        private static void Search(int[] testArray_in, List<int> numbers2Search_in)
        {
            numbers2Search_in.ForEach(x => testArray_in.Contains(x));
        }

        private static void SearchIList(IList<int> testIList_in, List<int> numbers2Search_in)
        {
            numbers2Search_in.ForEach(x => testIList_in.Contains(x));
        }

        private static void SearchIEnumerable(IEnumerable<int> testIEnumerable_in, List<int> numbers2Search_in)
        {
            numbers2Search_in.ForEach(x => testIEnumerable_in.Contains(x));
        }
    }
}

A better query would be this one:

// <Name>Caution with List style contains</Name>
let containsMethods = ThirdParty.Methods.WithSimpleName("Contains").Except(ThirdParty.Methods.WithFullNameIn("System.Collections.Generic.HashSet<T>.Contains(T)"))

from m in Application.Methods.UsingAny(containsMethods) 
select m

//<Description>
// Alternative to Caution with List.Contains()
//</Description>

This will list 4 functions (List, IList, int[], IEnumerable). I am newbie regarding CQLinq. My questions are:

  • Does any one can write better query to detect possible bad .NET container usages (not just for contains, but for other possible operations)?
  • How do or would you detect bad container usage?

A last comment, some our business logic handles a lot of data so having correct containers, data structures and algorithms counts.

2

There are 2 best solutions below

3
lstern On

This is not a good approach to optimize performance problems. This optimization will have minor impact on your system unless you are dealing with huge lists.

You'll have better results using performance profiling software. If you want to performance by searching for some code pattern, try searching for nested loops and expensive code such as file and database related methods.

4
Patrick from NDepend team On

Indeed trying to replace List<T>.Contains() calls with Hashset<T>.Contains() calls is not a micro-optimization, and can dramatically improve performance. Actually refactoring an algorithm to rely on O(1) hashset search is, from my experience, one of the best thing to do to improve performance.

The CQLinq query you wrote is a first step to identify some potential slow points. However to start refactoring well, you must 1) review the code to assess the collection size at run-time and 2) use a performance profiling tool on real situation to assess if these potential slow points have an impact on performance and also, find others slow points not matched by the query.