Entity Framework Feature Suggestions

TPT (Table-Per-Type) Inheritance Performance

Problem Description:
For TPT inheritance, the more subclasses you add, the time it takes to generate SQL, and the complexity of the SQL query itself, become unmanageable. With a simple base class (5 or 6 fields), and around 30 simple subclasses (2 or 3 fields a piece), it takes almost 2 minutes to generate the SQL (ObjectQuery.ToTraceString()) and execute it on the server. EF generates almost 8000 lines of SQL for a simple select query on the base class. This is because the SQL generated is a mess of crazy subselects, joins, and unions. Even with empty tables (so that a query would return no data), it takes that long to generate and execute the SQL.

Doing a simple "flat" query (no subselects), left joining each subclass table, could produce the same results, but with infinitely better performance. You wouldn't need any unions either. As it stands, even with just 3 or 4 subclasses, EF TPT inheritance can not be used in any kind of production environment.

I have written an article that describes this problem in more detail, where I have modeled and graphed the performance degradation as the number of subclasses go up.

Note>http://samscode.com/index.php/2010/01/the-entity-framework-v1-and-v4-deal-breaker-tpt-inheritance/

Note that I have selected EF4 as the version, but this is equally applicable to EF 3.5 SP1. I have tested in both and the behavior is exactly the same.

585 votes
Vote
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    I agree to the terms of service
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    Mike FlaskoMike Flasko shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →
    CecilCecil shared a merged idea: Allow Entity Framework to query only the base table from in TPT inheritance  ·   · 

    7 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      I agree to the terms of service
      Signed in as (Sign out)
      Submitting...
      • AnonymousAnonymous commented  ·   ·  Flag as inappropriate

        It appears that this issue has not been addressed in EF6.0. Is that the case? As you seem to know what needs tobe done can it be added into the roadmap somewhere.

        Thanks

        Andy

      • Jeson MJeson M commented  ·   ·  Flag as inappropriate

        I come from Oracle where dealing with 100 millions of records with hundreds columns (aka big cardinality) is a performance hell unleashed. Index stops working since every query is a full-table-scan query due to large data retrieval. Table-Per-Type (TPT) is a way to divide this huge dataset into smaller records with smaller cardinality.

        I can see a "performance" reason of why TPT is slower in EF over Table-Per-Hierarchy (TPH). It is because EF is programmatically generating the SQL statements, which would include unnecessary joins. (Please correct me if I am wrong)

        I am still learning so far, and Code First is a very convenient approach. However, this TPH recommendation over TPT makes me cautious to apply MVC in data intensive production environment. Ideally speaking, MVC should be neutral on whether it is TPT or TPH. Its should only be dependent upon the actual size of dataset.

        Maybe as I am learning this MVC code first approach, I found a way to custom override the generated SQL, thus giving programmers more control over the SQL statement.

      • Brian G. SweeneyBrian G. Sweeney commented  ·   ·  Flag as inappropriate

        Huge bummer. We thought we just didnt know what we were doing with EF until we realized that everyone else has this problem too... Wish I had prototyped my project using 30 subclasses rather than just 3. We would have discovered this right off the bat rather than 6 months into the project.

      • Vlad G.Vlad G. commented  ·   ·  Flag as inappropriate

        I think part of the problem comes from the EF's tendency to always use polymorphic queries even when the application does not require it. For example when I write this to retrieve a single Employee object:

        objectSet<Person>.OfType<Employee>().Where(o => o.Id == id).Single();

        EF still builds the query so that it can return an Employee or anything derived from it. I would be great to have a way to tell EF that I'm only expecting an Employee object so don't bother trying to query for its subclasses. I think this would reduce the complexity of the generated SQL considerably.

      • RossRoss commented  ·   ·  Flag as inappropriate

        This is a huge issue and until it is fixed the EF is unusable in a production environment.

      • DerekDerek commented  ·   ·  Flag as inappropriate

        This is also a very high concern. No matter what route I take (Designer, CodeOnly, etc) this is a huge problem.

      Feedback and Knowledge Base