EF Where after AsEnumerable - do all the entries from the query are loaded in memory?

63 Views Asked by At

If I have such code in Entity Framework var data = query.AsEnumerable().Where(x => IsObjectValid(x)).ToList() where query is IQueryable that can be translated to SQL.

Will Entity Framework load all the entries from the database in-memory (just as ToList() would do) and then apply the IsObjectValid() function and filter the data. Or Entity Framework will load entries from the database one by one and apply the filtration function?

If I have 1000 entries in query, where each weighs 1 MB, will the memory usage be the same for ToList().Where() and for AsEnumerable().Where()?

In my tests .AsEnumerable() worked faster and consumed way less memory.

Thanks!

1

There are 1 best solutions below

3
Olivier Jacot-Descombes On BEST ANSWER

The part before AsEnumerable() will be translated to SQL and performed by the DB. The part after AsEnumerable() will be performed as LINQ to Objects after the data has been loaded from the DB.

var data = dbContext.Data
    .Where(d => d.Author = "me")
    .AsEnumerable()
    .Where(d => MyCustomValidationFunction(d))
    .ToList();

The first Where will be performed on the DB and only the records with Author = "me" will be returned to your application. The second Where will then be performed on this returned data and the ToList() on the data filtered twice.

Custom functions cannot be translated to SQL. Therefore, AsEnumerable() is required to be able to call this function locally. But it is advisable to do as much of a filtering and grouping, etc. before AsEnumerable() to minimize the returned data.

The first Where expects an Expression<Func<T, bool>> parameter, where as the second one expects a Func<T, bool> parameter. I.e., AsEnumerable() transitions from IQueryable<T> to IEnumerable<T>.

An Expression<Func<T, bool>> consists of a data structure representing the syntax of the lambda function. This syntax is translated to an equivalent SQL syntax. This also means that the lambda expressions used in IQueryables are actually never executed. Therefore, it does not matter whether you write e.g. Where(x => hasSet.Contains(x)) or Where(x => array.Contains(x)) performance-wise. Both will be translated to something like WHERE x IN (a, b, c). But it matters after AsEnumerable(). HashSet<T>.Contains is faster than List<T>.Contains or T[].Contains.

As @DavidBrowne-Microsoft explains, AsEnumerable() does a lazy evaluation over a DataReader. The following LINQ to Objects operations perform a lazy evaluation whenever possible. I.e., one data object is processed at a time and passed over to the next operation.

Operations like Where, Select, Skip, SkipWhile etc, perform a lazy evaluation. Other operations like OrderBy, GroupBy or Reverse must keep all the data in memory.

And of course ToList, ToArray and ToDictionary pull all the data and store it in memory. I often see people append a ToList() to a query just to do a for or foreach on the result. Perform the foreach directly on the IEnumerable<T> result whenever possible! Often there is no need to store the whole data in memory.