Enumerate through multiple IEnumerable<T> simultaneously

163 Views Asked by At

There are several instances of IEnumerable<T> and I want to first get the first of each, then the second of each, and so on.

If they were lists, I'd do it like this:

List<List<T>> myLists = ...;
bool iLessThanMaxSize = true;
for (int i=0; iLessThanMaxSize; i++)
{
    iLessThanMaxSize = false;
    for (int j=0; j<myLists.Count; j++)
    {
        if (i < myLists[j].Count)
        {
            iLessThanMaxSize = true;
            yield return myLists[j][i];
        }
    }
}

However, they are not lists, but IEnumerables, which is essential, because the items are computed on demand in a heavy calculation, so computing all of the items at once would take longer than getting only a limited number on demand.

How can I get the same behavior, but working with IEnumerable<T> instead of List<T>?


Edit: By simultaneously, I did not mean multi-threading but just the pattern of enumeration, which is getting the first of each collection, and then the second of each, as opposed to first enumerating through the first collection and then through the second collection and so on (which would be concatenating the IEnumerables). Sorry for the confusion.


Edit: In my specific use case, the enumeration of the collections shall continue until all the elements of all collections are enumerated. In case that the element counts differ, collections that already have been entirely enumerated shall be skipped.

4

There are 4 best solutions below

1
JonasH On BEST ANSWER

If you want to do more complex things with IEnumerables you often need to get the enumerator and use .MoveNext() and .Current to do things.

For example:

public static IEnumerable<T> MyFunction<T>(IEnumerable<IEnumerable<T>> enumerables)
{
    try{
       var enumerators = enumerables.Select(e => e.GetEnumerator()).ToList();
       bool anyResult ;
      do
       {
           anyResult = false;
           foreach (var enumerator in enumerators)
           {
               if (enumerator.MoveNext())
               {
                   yield return enumerator.Current;
                   anyResult = true;
               }
           }
       } while (anyResult);
    }
    finally{
       foreach (var enumerator in enumerators)
       {
           enumerator.Dispose();
       }
    }
}

[Test]
public void Test()
{
    var l1 = new[] { 1, 4 };
    var l2 = new[] { 2, 5, 6 };
    var l3 = new[] { 3, };
    var result = MyFunction(new[] { l1, l2, l3 });
    CollectionAssert.AreEqual(new []{1, 2, 3, 4, 5, 6}, result);
}

I would recommend avoiding code like this for large amounts of data. When the amount of data grows you often want to use lower level, less abstract code to help ensure adequate performance.

2
Guru Stron On

You can use enumerators (see the IEnumerator<T> interface) to iterate through the collection of collections:

IEnumerable<T> Iterate<T>(IEnumerable<IEnumerable<T>> cols)
{
    bool hasNext;
    List<IEnumerator<T>> enumerators = null;
    try
    {
        enumerators = cols.Select(c => c.GetEnumerator()).ToList();
        do
        {
            hasNext = false;
            foreach (var enumerator in enumerators)
            {
                if (enumerator.MoveNext())
                {
                    yield return enumerator.Current;
                    hasNext = true;
                }
            }
        } while (hasNext);
    }
    finally
    {
        if (enumerators is not null)
        {
            foreach (var e in enumerators)
            {
                e.Dispose();
            }
        }
    }
}
1
Dmitry Bychenko On

If you are looking for performance and thus want to enumerate manually, you can implement it like this:

private static IEnumerable<T> MyFunc<T>(IEnumerable<IEnumerable<T>> source) {
  Queue<IEnumerator<T>> agenda = new(source.Select(inner => inner.GetEnumerator()));

  try {
    while (agenda.Count > 0) {
      var en = agenda.Peek();

      if (en.MoveNext()) {
        agenda.Enqueue(agenda.Dequeue());

        yield return en.Current;
      }
      else
        agenda.Dequeue().Dispose();
    }
  }
  finally {
    foreach (var en in agenda)
      en.Dispose();
  }
}

Demo:

List<List<int>> list = new () {
  new() { 1, 2, 3 },
  new() { 4 },
  new() { 5, 6 },
};

var report = string.Join(", ", MyFunc(list));

Console.Write(report);

Output:

1, 4, 5, 2, 6, 3
0
Theodor Zoulias On

Here is another implementation, which is functionally identical with the implementations posted by JonasH, Dmitry Bychenko and Guru Stron. My contribution mainly is that Merge is a better name for the method, because that's how are named similar APIs for other types of sequences. Also params IEnumerable<T>[] is a better signature, because accepting a IEnumerable<IEnumerable<T>> creates false aspirations about deferred execution of the outer sequence.

/// <summary>
/// Merges all elements from all source sequences, into a single interleaved sequence.
/// </summary>
public static IEnumerable<TSource> Merge<TSource>(
    params IEnumerable<TSource>[] sources)
{
    ArgumentNullException.ThrowIfNull(sources);
    List<IEnumerator<TSource>> enumerators = new(sources.Length);
    try
    {
        foreach (IEnumerable<TSource> source in sources)
            enumerators.Add(source.GetEnumerator());
        while (enumerators.Count > 0)
        {
            for (int i = 0; i < enumerators.Count; i++)
            {
                IEnumerator<TSource> enumerator = enumerators[i];
                if (enumerator.MoveNext())
                {
                    yield return enumerator.Current;
                }
                else
                {
                    enumerators.RemoveAt(i);
                    enumerator.Dispose();
                    i--;
                }
            }
        }
    }
    finally
    {
        foreach (IEnumerator<TSource> e in enumerators)
            e.Dispose();
    }
}