Skip to content

Creating Left Joins and Aggregates using Entity Framwork

July 21, 2015

One of the deficiencies that I run into from time to time in Entity Framework (EF) is accommodating parent – child relationships where the list of child elements is empty, unless I define that relationship in the Entity class itself.

For example, consider these two classes:

public class Order {
 [key]
 public int OrderId { get; set; }
 public int CustomerId { get;set; }
 public string Name { get; set; }
}
public class OrderDetail {
 [key]
 public int OrderDetailId {get;set;}
 public int OrderId {get;set;}
 public string ProductCode { get; set;}
}

In EF, I can create a relationship between these by modifying the Order class like this:

public class Order {
 [key]
 public int OrderId { get; set; }
 public int CustomerId { get; set; }
 public string Name { get; set; }
[ForeignKey(“OrderId”)]
 public List<OrderDetail> OrderDetails { get; set; }
}

Then, when I’m using EF to select from these tables, EF will automatically include my join and get the OrderDetails for me, as long as I do something like this:

var list = _dc.Orders
 .Include(“OrderDetails”)
 .ToList();

This setup will give me all of the orders, even if the OrderDetails collection is empty, but if it’s not empty I’ll get the details as well, which is great.

The trouble I find sometimes is that I may want to create a join that works just like this, but that doesn’t require me to define the relationship in the Entity class definition. For example, continue considering the classes above, but now add this one into the mix:

public class Customer {
 [key]
 public int CustomerId {get;set;}
 public string Name { get; set; }
 public bool IsActive { get; set; }
}

Suppose that I want to get all of the active customers, and a count of all of the orders each customer has. Sure, I could modify the Customer Entity class to include a reference to the Order class, but if I do that it means that EF will need to consider that relationship (and make a decision about whether it wants to include a query to get the Order results) every time I access my Customer class. Let’s suppose that that produces more overhead than I want in this case, because there are lots of times when I will want to access the Customer entity without respect for whether there are any orders. My first reaction is that I can just write a join EF statement, like this:

var list = _dc.Customers
 .Join(_dc.Orders, c => c.CustomerId, o => o.CustomerId, (c, o) => new { Customer = c, Order = o }
 .Where(x => x.Customer.IsActive)
 .GroupBy(x => new { x.Customer.CustomerId, x.Customer.Name }
 .Select(x => new { x.Key.CustomerId, x.Key.Name, OrderCount = x.Count() })
 .ToList();

This will give me what I’m looking for (the customer fields for active Customers, plus a new field called “OrderCount” that counts the number of Order records associated with my Customer), but it has a major drawback – I’ll only get results when a record already exists in the Order table. Any customers that haven’t created an Order yet won’t be returned, because they’re failing the Inner join condition. This can be a major problem, but I’m glad to say that with EF 5 an answer to this scenario was introduced. Here’s what the answer looks like:

var list = _dc.Customers
 .GroupJoin(_dc.Orders, c => c.CustomerId, o => o.CustomerId, (c, o) => new { Customer = c, Orders = o }
 .Where(x => x.Customer.IsActive)
 .DefaultIfEmpty()
 .Select(x => new { x.Customer.CustomerId, x.Customer.Name, OrderCount = x.Orders.Count() })
.ToList();

The important points to notice here are that the .Join keyword was replaced with .GroupJoin, and that we used a .DefaultIfEmpty statement after the .Where clause.

.GroupJoin is a keyword that works just like a regular join, but instead of creating results that exist at the same hierarchical level as the first table, it creates records that are gathered into in an IEnumerable<T> as children of the parent. In other words, the same sort of relationship that is created when we modify our Entity classes like we did with the Order and OrderDetail classes back at the beginning. This means that we don’t need to use a .GroupBy statement because the results are already organized into a tiered anonymous type. That’s very cool!

.DefaultIfEmpty is a keyword that instructs EF to create that relationship and return an IEnumerable even if the child table has no results. In other words, it changes your SQL from INNER JOIN to LEFT JOIN. In this case, since I’m using an aggregate of the child table (.Count), I will get 0 in my result set if no rows were found, or if there were records in the Order table, I’ll get the count. That’s exactly what I was looking for. Note also that .DefaultIfEmpty comes after the .Where keyword, so it will be evaluated after any restrictions are placed on the query.

Here is the same query, but this time it outputs into a List of a pre-defined classe called “CustomerOrder”, which includes both an aggregate of the orders (OrderCount) and a List<Order> of the orders themselves (Orders):

var list = _dc.Customers
 .GroupJoin(_dc.Orders, c => c.CustomerId, o => o.CustomerId, (c, o) => new { Customer = c, Orders = o }
 .Where(x => x.Customer.IsActive)
 .DefaultIfEmpty()
 .Select(x => new CustomerOrder() {
    CustomerId = x.Customer.Id,
    Name = x.Customer.Name,
    OrderCount = x.Orders.Count(),
    Orders = x.Orders.ToList()
 })
 .ToList();

Now my variable contains a List<CustomerOrder>, which I can return from my data layer, and I have a strongly-typed set of objects to use for my business logic.  Perfect!

Advertisements
No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: