When you started using LINQ, did you think it looked like SQL? I did.
The more I learned LINQ, the more I realized it wasn’t anything like SQL. Take grouping, for example. Because LINQ has a group by statement, and it looks like SQL, I assumed that the syntax for grouping in LINQ would be just like SQL. Ha ha! Wrong! As soon as I tried to use it, I discovered that the LINQ syntax is not only nothing like the SQL equivalent, but the whole grouping concept in LINQ is completely different too.
At first glance, the two syntaxes look slightly similar.
SQL:
select ReportsTo, count(LastName) as NameCount
from Employees
group by ReportsTo
LINQ (C#):
from employee in Employees
group employee by employee.ReportsTo
Ignoring the omission of the select statement from LINQ, and the requirement of a range variable, they do look similar. But looks can be deceiving.
Let me explain why.
Grouping in SQL
In SQL, you typically group a set of entities using the following syntax:
select ReportsTo, count(LastName) as NameCount
from Employees
group by ReportsTo
This kind of query results in a simple table with two columns: ReportsTo and NameCount. The count of the names was grouped into a single column using an aggregate function.
Of course, this is a simple example; you could do much more complex grouping by using more expressions in the aggregation or the group by statement. Whatever the grouping though, the result is a flat table.
LINQ is different. Completely different.
Grouping in LINQ
Before you can understand the LINQ syntax, you have to understand how grouping works in LINQ.
When you use the group by statement, the C# compiler translates it to the LINQ Enumerable’s GroupBy extension method. The GroupBy method returns an object of type
IEnumerable<IGrouping<TKey, TSource>>
The GroupBy method takes a parameter of a delegate which specifies the key of each grouping. The key is defined to be of type TKey, and is usually inferred by the compiler depending on the expression.
The key to understanding the C# query syntax is that you don’t get back a flat structure from group by, as you would with SQL. You actually get an “IEnumerable” of “IGrouping” objects.
The IGrouping interface has only one property, “Key”, and implements IEnumerable<TElement>. Again, TElement is usually inferred. “Key” is typed as the generic type you define for TKey. For the minute, that’s not so important, so we’ll come back to the Key later.
One of the first indications that LINQ is different to SQL is that you discover it is possible to finish a LINQ statement with a group by statement without needing a Select. As IGrouping implements IEnumerable, group by actually returns an IEnumerable of IEnumerable. The result is a set of sets. It’s not a flat table. To use it, you have to iterate over it again or perform another operation on the set.
Let’s see if an example can make it clearer.
When grouping a set of employees, you might use the following code:
var emp1 = new { ReportsTo = 1, LastName = “Richard” };
var emp2 = new { ReportsTo = 1, LastName = “Christopher” };
var emp3 = new { ReportsTo = 2, LastName = “John” };
var emp4 = new { ReportsTo = 2, LastName = “Greg” };
var employees = new[] { emp1, emp2, emp3, emp4 };
var groups =
from employee in employees
group employee by employee.ReportsTo;
foreach (var group in groups)
Console.WriteLine(“{0} employees report to {1}.”,
group.Key, group.Count());
(Don’t worry about the anonymous class definitions for now. You just have to know that my employee objects are all treated as the same type internally, so this is valid C# code.)
The employees variable is of type IEnumerable, so it can be used in a LINQ statement. The groups variable is also an IEnumerable, as it is a set of results, as usually returned by a LINQ statement. Remember though, it’s not like the set you would get back in a SQL statement. It is actually an IEnumerable of IGrouping, or an IEnumerable of IEnumerable.
To do something with the grouping, you have to either treat it as another IEnumerable, and iterate over it, or use methods like Count() to do something with it.
foreach (var group in groups)
foreach (var employee in group)
Console.WriteLine(“{0} reports to {1}”,
employee.LastName, employee.ReportsTo);
This is how LINQPad shows what is contained in the groups variable:
|
IEnumerable<IGrouping<Int32,>>
|
|
|
IGrouping<Int32,> (2 items)
|
|
ReportsTo
|
LastName
|
|
1
|
Richard
|
|
1
|
Christopher
|
|
|
|
IGrouping<Int32,> (2 items)
|
|
ReportsTo
|
LastName
|
|
2
|
John
|
|
2
|
Greg
|
|
Notice that each IGrouping has an associated Key property of type Int32.
Grouping Syntax in LINQ
So now it should be a little easier to understand how to use the LINQ group by statement.
In LINQ, you always need to declare the range variable between group and by.
from employee in Employees
group employee by employee.ReportsTo
After by, you give an expression which returns the value for the Key. The enumeration is then split into groups of IGroupings where the key for each item in that IGrouping is the same.
So here, we specify the employee.ReportsTo variable as the key. Each employee with a different value for ReportsTo creates a new IGrouping with a Key of that value. The employee is then added to that new IGrouping as an item. The IGrouping is then added to the returned IEnumerable<IGrouping<>>.
The next time an item in the employees variable has the same value for ReportsTo as the Key property of an already-existing IGrouping object, the item is simply added to that IGrouping.
Of course, that’s probably not the way it works under the covers. But I’m not trying to specify the algorithm, just what happens in the end result.
In the end, an object implementing IEnumerable<IGrouping<>> is returned by the LINQ statement. As each item in the IEnumerable is an IGrouping, you can either use it’s Key or treat it as an IEnumerable.
So there you have it. Just remember, group by is nothing like SQL, and it returns an IEnumerable of IGrouping, and you should stay happy.
At least until you meet SelectMany… :=)
If you want to learn more about grouping in LINQ, take a look at chapter 9 of the excellent C# 3.0 in a Nutshell: A Desktop Quick Reference (In a Nutshell (O’Reilly))
, by Ben and Joseph Albahari. The chapter has some great explanations and examples. In particular, they have an example of grouping with multiple keys, which I found very interesting.