Entity Framework Feature Suggestions

let me save expensive intermediate LINQ results in temp table

LINQ-to-Entities should offer an anaogous operator to AsEnumerable() but instead of materializing results on the client, it would materiailze results on the server (e.g. a temp table for SQL Server), allowing subsequent operators to re-use intermediate results without risking additional server recomputation or requiring a round-trip from client to server.

The underlying requirement is to make it easier to build and maintain performant LINQ queries that use millions of rows, many complex joins, I/O intensive calculations like running totals or self-joins, or all of the above.

For these kinds of queries in T-SQL, a good way to get predictable performance is to materialize expensive intermediate results into a temp table, and then join the temp table to later parts of the query.

For example, imagine a query on a table of running totals, where first you need to self-join to turn running totals into single-period values, and then you want to do other aggregation on those single-period values, like grouping by store or salesperson. In that case, you get a more efficient query by ensuring that the server materializes results from the self-join rather than re-running the self-join multiple times. Of course, it would be great if the DBMS materialized automatically, but most modern DBMS's (including SQL 2008 R2) aren't usually that smart.

But today in LINQ it's impossible to materialize those intermediate results without a round-trip to the client. Such a round-trip is OK when the intermediate results are only a few rows, but for million-row intermediate results it's not workable.

Another case where server materialization is desirable is when building libraries of queries that can be re-used. You don't want "downstream" query writers to end up crafting hugely expensive queries against underlying tables, so you can pre-process those tables into cheaper temp tables, and then hand those temp tables out from your library. This ensures a predictable cost for each query.

Note that a temp table, not a table variable, should be used on SQL Server for materialization. This is because temp tables support parallel execution plans while table variables do not (at least on SQL 2008 R2 and lower). Without a parallel plan, most expensive queries run 4x or more slower on modern DB hardware.

8 votes
Vote
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    I agree to the terms of service
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    Justin GrantJustin Grant shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

    0 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      I agree to the terms of service
      Signed in as (Sign out)
      Submitting...

      Feedback and Knowledge Base