Improved SQL Generation
I have seen simple select statements with 4 or 5 includes result in nearly 5000 line SQL statements when an equivalent hand-written SQL statement is ~15 lines. The performance of these queries along with the readability when debugging makes it an area that I would like to see the EF team focus on improving.
There is already an item for general SQL generation improvements. I am adding this item with the hope to measure interest in improvements specifically for eager loding and Include.
The update for the core EF libraries that will be included in .NET 4.5 will include some improvements in SQL generation. However not all the scenarios mentioned in the comments will be addressed.
I am going to leave this idea as open for now, until we figure out a better way to track individual scenarios, which we need to do to get more actionable data.
David Taylor-Fuller commented
Wouldn't this problem be better solved by not making Include the only way to eager load data. How about being able to issue separate queries to load in the different parts of the object graph?
Kashyap Shah commented
I agree with a lot of posts that it generates a poor SQL and I found where you use navigation property in Select part of linq query. I use LinqPad to show me the query generated from the EF LINQ queries and optimize by using explicit joins then using a navigation property etc.
But when all said and done with the newer technology it cannot and does not replace 100% need of use of T-SQL. No matter who says what there is still need for using Stored procedures and complex joins should resides in Stored procedures as there are several years worth of technology behind T-SQL and it will take years for EF to mimic it.
AdminDiego Vega (Admin, DataFx) commented
@Salar, Joseph: This is great feedback. We have indeed looked at taking the whole of EF out of .NET Framework for this same reason. The June 2011 CTP of EF was a first attempt that showed us it is going to be harder than we thought because of the impact on existing applications, partner teams and ADO.NET provider writers.
We don't have much to announce right now but I can say we keep working on it.
I would like to remind you that if you have ideas or suggestions that are orthogonal to one you are looking at, the best you can do is add them as separate idea or to lookup an existing idea that you can support with your votes. At the time we try to understand the relative importance of the suggestions we get in this site we will pay attention to the votes rather than trying to extract actionable data from the comments.
@Diego Vega, I guess because EF core library is included in .NET framework we should wait until then!
That is awful, please consider put EF libraries out of .net framework. Good ORM needs frequent release cycle.
Joseph Feser commented
Diego, No offense but moving to v4.5 of the framework is really not an option for the most corporate software. I have been using Lightspeed for over 4 years and it does not force me to upgrade the core runtime when improvements are made. http://www.mindscapehq.com/products/lightspeed
@Michael Carr - I've found this to be the case too. Our current difficulty is with the time it takes to generate the SQL from the LINQ. If it's complex SQL it can take over 1 second to generate (which we can live with but isn't great), and then there is the time on top of that to execute the SQL.
Felipe Fujiy Pessoto commented
@Piers, instead Any() I always use .Select(x=> 1).FirstOrDefault() == null. What is wrong, but faster
Some of the performance issues that come from using innocent looking methods really need fixing. Using Any() for example results in a check for EXISTS and a second check for NOT EXISTS, doubling the work the datbase has to do:
The same goes for some uses of boolean operators within a query (rather than back onth client). For example having something like a.bs.Count() > 0 in a projection also has the database checking for both Count() > 0 and NOT Count() > 0. Since neither side of the operator can be NULL, this is wasteful.
Unfortunately the workaround given for the first issue is to use the second issue ;-)
Michael Carr commented
I just documented a case where the Contains() operator reduces EF query performance by a factor of 300:
Further diagnosis revealed that the slowdown occurs in the query-generation portion of the request, so I agree that this needs some serious attention.
Unfortunately I think much of the bloat of the SQL code generator is the fault of the parser in SQL Server itself. I've come from the MySQL world, and so many things that are so simple and straightforward in MySQL are like pulling teeth in SQL Server. The result is that if you have tight constraints on the range of input parameters, you can design small queries - but in a code generation scenario, you have no such constraints, and the only way to accomplish what you want to do is with lots of checks and long queries. I've seen the same thing in SSMS's query generator.
David Nelson commented
I had to rip out an Entity Framework model of about 30 objects and replace it with Linq To Sql because the SQL generated by the Entity Framework was so horrendous. If EF is ever going to be taken seriously there needs to be some major work done in this area.
While Dan Meierotto's JoinOn and Join Optimization would be absolutely great!!! The SQL generated for GroupBy statements is quite ridiculous.
I am not sure if many have taken the time to look at the SQL that is generated for a simple group by statement.
The SQL is messy and does way more than it needs to to. It is too complicated (not to understand, but for the purpose) and slow.
Now I understand that they try to Group on the specified field and still return you the entire record which is cool, when you want to use it.
But sometimes you just want to do a simple group by and only get the fields back that you grouped by without the cumbersome API.
Especially if you want to to a Group BY on multiple columns from different tables (the tables that you joined on).
This is quite easy to do in SQL but not when you are trying to wright is in a object or linq query.
Dan Meierotto commented
It would be great if you could tell EF which foriegn keys to use or which navigation properties to use for joins. Perhaps a "JoinOn" extension method or something. I can understand that EF may not know which join path to use, so a hint would be great. In LLBLGen you can setup relations beforehand that would construct inner joins to solve this problem.
Cleve Littlefield commented
It would be nice if we could have the query return multiple recordsets and have the EF mapper map that to object instances, instead of trying to do the entire thing in one recordset.
If somebody wants to see a VERY VERY BAD query generated by EF, try running profiler during updating code model from database in EF designer. The query kills the DB server for 60s for large databases. I tried to rewrite it manually and it took about 1s on the same server with EXACTLYTHE SAME functionality. The first step should be optimizing JOINs which are such a mess (in most cases not always)
Vlad G. commented
I think part of the complexity comes from the EF's tendency to always use polymorphic queries even when the application does not require it. For example when I write this to retrieve a single Employee object:
objectSet<Person>.OfType<Employee>().Where(o => o.Id == id).Single();
EF still builds the query so that it can return an Employee or anything derived from it. I would be great to have a way to tell EF that I'm only expecting an Employee object so don;t bother trying to query for anything else. I think this would reduce the complexity of the generated SQL considerably.
For my opinion all the queries should be seriously minified.
- The word [Extent1] should become [E1]
- No line breaks
- No extra whitespaces
View and vote my connection @ https://connect.microsoft.com/VisualStudio/feedback/details/522369/minimize-entity-framework-query-weight
Adrian Hesketh commented
I'd like to see a way that the Entity Framework could analyse the application in advance and generate DBA-friendly stored procedures based on the queries it anticipates or a way for queries to be replaced (after deployment) with stored procedure equivalents by changes to configuration files etc.
If EF generates a query that my DBA doesn't like, then the DBA could tune it.
If it can't generate a stored procedure because of the nature of the query, then flagging that would be useful so that developers could revisit the query.
Perhaps this could be done using static analysis of application DLLs.
John Jones commented
I think SQL generation needs a major overhaul. I can't agree at all with the idea that it's good already. At this point it works (most of the time), but the length of the generated query can be epic. 100x longer than hand-written T-SQL. (You may not think hand-written is a fair comparison, but I may need to fall back on that to deal with our messiest queries.)
The trouble points (that I know of) are Includes, Subqueries (e.g., joins, wheres, etc. tacked on the end), sorting, skip, and take. Put them all together and our queries don't work if the database is in SQL2000 compatibility mode (on SQL2008R2). The inconvenience is compounded by inability to call stored procedures/user defined functions within a query if using Code First.
@Jeff - See my blog post on the subject: http://blog.mssoftwareconsulting.com/msswc/blog/post/Examining-Entity-Frameworke28099s-SQL-Generation.aspx
@pharcyde - My testing with the above query showed a 30% performance difference when run inside of SQL Management Studio. Plus, each time you execute a query like that you have 4KB of wasted data being transmitted to the data engine.