Getting started with linq
Remarks#
LINQ is a set of features introduced in the .NET Framework version 3.5 that bridges the gap between the world of objects and the world of data.
Traditionally, queries against data are expressed as simple strings without type checking at compile time or IntelliSense support. Furthermore, you have to learn a different query language for each type of data source: SQL databases, XML documents, various Web services, and so on. LINQ makes a query a first-class language construct in C# and Visual Basic. You write queries against strongly typed collections of objects by using language keywords and familiar operators.
Setup
LINQ requires .NET 3.5 or higher (or .NET 2.0 using LINQBridge).
Add a reference to System.Core, if it hasn’t been added yet.
At the top of the file, import the namespace:
- C#
using System;
using System.Linq;
- VB.NET
Imports System.Linq
The different joins in LINQ
In the following examples, we’ll be using the following samples:
List<Product> Products = new List<Product>()
{
new Product()
{
ProductId = 1,
Name = "Book nr 1",
Price = 25
},
new Product()
{
ProductId = 2,
Name = "Book nr 2",
Price = 15
},
new Product()
{
ProductId = 3,
Name = "Book nr 3",
Price = 20
},
};
List<Order> Orders = new List<Order>()
{
new Order()
{
OrderId = 1,
ProductId = 1,
},
new Order()
{
OrderId = 2,
ProductId = 1,
},
new Order()
{
OrderId = 3,
ProductId = 2,
},
new Order()
{
OrderId = 4,
ProductId = NULL,
},
};
INNER JOIN
Query Syntax
var joined = (from p in Products
join o in Orders on p.ProductId equals o.ProductId
select new
{
o.OrderId,
p.ProductId,
p.Name
}).ToList();
Method Syntax
var joined = Products.Join(Orders, p => p.ProductId,
o => o.OrderId,
=> new
{
OrderId = o.OrderId,
ProductId = p.ProductId,
Name = p.Name
})
.ToList();
Result:
{ 1, 1, "Book nr 1" },
{ 2, 1, "Book nr 1" },
{ 3, 2, "Book nr 2" }
LEFT OUTER JOIN
var joined = (from p in Products
join o in Orders on p.ProductId equals o.ProductId into g
from lj in g.DefaultIfEmpty()
select new
{
//For the empty records in lj, OrderId would be NULL
OrderId = (int?)lj.OrderId,
p.ProductId,
p.Name
}).ToList();
Result:
{ 1, 1, "Book nr 1" },
{ 2, 1, "Book nr 1" },
{ 3, 2, "Book nr 2" },
{ NULL, 3, "Book nr 3" }
CROSS JOIN
var joined = (from p in Products
from o in Orders
select new
{
o.OrderId,
p.ProductId,
p.Name
}).ToList();
Result:
{ 1, 1, "Book nr 1" },
{ 2, 1, "Book nr 1" },
{ 3, 2, "Book nr 2" },
{ NULL, 3, "Book nr 3" },
{ 4, NULL, NULL }
GROUP JOIN
var joined = (from p in Products
join o in Orders on p.ProductId equals o.ProductId
into t
select new
{
p.ProductId,
p.Name,
Orders = t
}).ToList();
The Propertie Orders
now contains an IEnumerable<Order>
with all linked Orders.
Result:
{ 1, "Book nr 1", Orders = { 1, 2 } },
{ 2, "Book nr 2", Orders = { 3 } },
{ 3, "Book nr 3", Orders = { } },
How to join on multiple conditions
When joining on a single condition, you can use:
join o in Orders
on p.ProductId equals o.ProductId
When joining on multiple, use:
join o in Orders
on new { p.ProductId, p.CategoryId } equals new { o.ProductId, o.CategoryId }
Make sure that both anonymous objects have the same properties, and in VB.NET, they must be marked Key
, although VB.NET allows multiple Equals
clauses separated by And
:
Join o In Orders
On p.ProductId Equals o.ProductId And p.CategoryId Equals o.CategoryId
Query Syntax and Method Syntax
Query syntax and method syntax are semantically identical, but many people find query syntax simpler and easier to read. Let’s say we need to retrieve all even items ordered in ascending order from a collection of numbers.
C#:
int[] numbers = { 0, 1, 2, 3, 4, 5, 6 };
// Query syntax:
IEnumerable<int> numQuery1 =
from num in numbers
where num % 2 == 0
orderby num
select num;
// Method syntax:
IEnumerable<int> numQuery2 = numbers.Where(num => num % 2 == 0).OrderBy(n => n);
VB.NET:
Dim numbers() As Integer = { 0, 1, 2, 3, 4, 5, 6 }
' Query syntax: '
Dim numQuery1 = From num In numbers
Where num Mod 2 = 0
Select num
Order By num
' Method syntax: '
Dim numQuery2 = numbers.where(Function(num) num Mod 2 = 0).OrderBy(Function(num) num)
Remember that some queries must be expressed as method calls. For example, you must use a method call to express a query that retrieves the number of elements that match a specified condition. You also must use a method call for a query that retrieves the element that has the maximum value in a source sequence. So that might be an advantage of using method syntax to make the code more consistent. However, of course you can always apply the method after a query syntax call:
C#:
int maxNum =
(from num in numbers
where num % 2 == 0
select num).Max();
VB.NET:
Dim maxNum =
(From num In numbers
Where num Mod 2 = 0
Select num).Max();
LINQ methods, and IEnumerable vs IQueryable
LINQ extension methods on IEnumerable<T>
take actual methods1, whether anonymous methods:
//C#
Func<int,bool> fn = x => x > 3;
var list = new List<int>() {1,2,3,4,5,6};
var query = list.Where(fn);
'VB.NET
Dim fn = Function(x As Integer) x > 3
Dim list = New List From {1,2,3,4,5,6};
Dim query = list.Where(fn);
or named methods (methods explicitly defined as part of a class):
//C#
class Program {
bool LessThan4(int x) {
return x < 4;
}
void Main() {
var list = new List<int>() {1,2,3,4,5,6};
var query = list.Where(LessThan4);
}
}
'VB.NET
Class Program
Function LessThan4(x As Integer) As Boolean
Return x < 4
End Function
Sub Main
Dim list = New List From {1,2,3,4,5,6};
Dim query = list.Where(AddressOf LessThan4)
End Sub
End Class
In theory, it is possible to parse the method’s IL, figure out what the method is trying to do, and apply that method’s logic to any underlying data source, not just objects in memory. But parsing IL is not for the faint of heart.
Fortunately, .NET provides the IQueryable<T>
interface, and the extension methods at System.Linq.Queryable
, for this scenario. These extension methods take an expression tree — a data structure representing code — instead of an actual method, which the LINQ provider can then parse2 and convert to a more appropriate form for querying the underlying data source. For example:
//C#
IQueryable<Person> qry = PersonsSet();
// Since we're using a variable of type Expression<Func<Person,bool>>, the compiler
// generates an expression tree representing this code
Expression<Func<Person,bool>> expr = x => x.LastName.StartsWith("A");
// The same thing happens when we write the lambda expression directly in the call to
// Queryable.Where
qry = qry.Where(expr);
'VB.NET
Dim qry As IQueryable(Of Person) = PersonSet()
' Since we're using a variable of type Expression(Of Func(Of Person,Boolean)), the compiler
' generates an expression tree representing this code
Dim expr As Expression(Of Func(Of Person, Boolean)) = Function(x) x.LastName.StartsWith("A")
' The same thing happens when we write the lambda expression directly in the call to
' Queryable.Where
qry = qry.Where(expr)
If (for example) this query is against a SQL database, the provider could convert this expression to the following SQL statement:
SELECT *
FROM Persons
WHERE LastName LIKE N'A%'
and execute it against the data source.
On the other hand, if the query is against a REST API, the provider could convert the same expression to an API call:
https://www.example.com/person?filtervalue=A&filtertype=startswith&fieldname=lastname
There are two primary benefits in tailoring a data request based on an expression (as opposed to loading the entire collection into memory and querying locally):
- The underlying data source can often query more efficiently. For example, there may very well be an index on
LastName
. Loading the objects into local memory and querying in-memory loses that efficiency. - The data can be shaped and reduced before it is transferred. In this case, the database / web service only needs to return the matching data, as opposed to the entire set of Persons available from the data source.
Notes
1. Technically, they don’t actually take methods, but rather delegate instances which point to methods. However, this distinction is irrelevant here.
2. This is the reason for errors like ”LINQ to Entities does not recognize the method ‘System.String ToString()’ method, and this method cannot be translated into a store expression.”. The LINQ provider (in this case the Entity Framework provider) doesn’t know how to parse and translate a call to ToString
to equivalent SQL.