Option to use Inner Join SQL when Eager Loading (AKA IncludeRequired method)
When trying to load related entities using the "Include" eager loading feature of EF, it always generates SQL that uses an LEFT OUTER JOIN e.g.
from m in Customer.Includes("Orders")
will always generate the SQL:
LEFT OUTER JOIN Orders....
This is not always desirable, especially when you only want to retrieve Customers that have at least one or more related Orders. By forcing the generated SQL to always use LEFT OUTER JOIN, it's going to load all Customers regardless of whether they have related Orders or not. This can become extremely inefficient, especially when there are lots of Customers and you only want to load the few that have related Orders.
Would be good to have the option of using either an INNER or OUTER JOIN when doing eager loading to improve the generated SQL's efficiency.
Christopher Grace commented
I cannot believe this only has 11 votes! The use of LEFT OUTER JOINs instead of intended INNER JOINs seems to expand the generated SQL exponentially, and slow performance to a crawl.
In the simple example above, it's not that bad. But more complex queries bring in many tables with nullable foreign keys. Join/Bridge tables also introduce LEFT OUTER JOINs. Oddly enough, sometimes the do not.
When performance is an issue, and developers see the SQL generated, they invariably abandon LINQ for Stored Procedures and TSQL. At lease that has been my experience 100% of the time.
I LOVE EF, and stubborly refused to give up. I burned days of my own time trying to understand how to control the JOINs. But I am plumb out of time and have to give up.
Please, please add a JoinRequired, IncludeRequired, or any other means to let the developer choose the JOIN.
By design the use of Include in a query only conveys the need to traverse a path of navigation properties and to load the associated objects found in the path. It does not apply filtering to the root query on which it is called.
To implement Include with these semantics, EF needs to use LEFT OUTER JOINs in the general case.
In EF5 and the updates to the EF core libraries in .NET 4.5 we made the query logic smarter to take advantage of additional information to decide when it is ok to use INNER JOIN instead of LEFT OUTER JOIN. E.g. if Include traverses a navigation property based on a non-nullable foreign key (i.e. a one-to-many association, as opposed to a cero-or-one-to-many association), it will use INNER JOIN because it knows that the starting entity will always have an associated entity on the other end.
I would expect this improvement to help improve performance in many cases.
On the other hand, what this idea is proposing seems to be an alternative to Include that filters the root query to only return objects that have associated objects in the path. I can imagine such alternative looking something like this:
var q = db.Customers.IncludeRequired(c => c.Orders);