I am looping through a collection (DataTable) where each item has a "group number" and building a dictionary where each key corresponds to the group number. There's about a million records in the DataTable and I am running into memory issues when this runs. I was wondering if I could get some feedback on how to make this perform better. I think maybe one issue is where I am creating a new tempList on each iteration.
I really appreciate any feedback here.
Dictionary<int, IEnumerable<Item>> itemGroups = new Dictionary<int, IEnumerable<Item>>();
foreach (DataRow row in dtItems.Rows)
{
Item item = new Item(row["ID"].ToString(),
row["Name"].ToString(),
row["Description"].ToString());
Int32.TryParse(row["GroupNum"].ToString(), out int groupNum);
if (!itemGroups.ContainsKey(groupNum))
{
List<Item> itemList = new List<Item>();
itemList.Add(item);
itemGroups.Add(groupNum, itemList);
}
else
{
var tempList = itemGroups[groupNum].ToList();
tempList.Add(item);
itemGroups[groupNum] = tempList;
}
}
You can simplify it by using LINQ. The
GroupByextension method is already optimized for aggregating items.As other have already pointed out, don't convert values if you can cast them. If the
GroupNumcolumn contains anint, then simply cast the value toint. However, if a column is nullable, then it can contain aDBNull.Value. Let's assume thatIDandNameare NOT NULL columns but that theDescriptionis optional and therefore a NULL column. Then you could optimize the creation of an item like this:Note also that an
IEnumerable<T>queries the source every time it is enumerated. AnIEnumerable<T>is not a collection, but a way to iterate over a data source. We could have dropped the.ToList()and typed the dictionary values asIEnumerable<Item>instead. But with the consequence that enumerating the dictionary values would have executed the wholeSelectincluding thenew Item(...)every time.see also: Understanding C# LINQ’s deferred execution