Entity Framework Feature Suggestions

Incorrect handling of null variables in 'where' clause

Problem Description:
The following LINQ-to-Entites query will return the expected results:
var result = from o in table
where o.column == null
select entry;

However, the following query will never return any results:
int? myNullInt = null;
var result = from o in table
where o.column == myNullInt
select entry;

This is because the first query is correctly generating a 'column IS INT' query, while the second one generates a 'column = @p_linq_p; @p_linq_p = NULL'.

This problem affects both LINQ-to-Entities and LINQ-to-SQL. However, while LINQ-to-SQL has the not-so-bad workaround
where o.column.Equals(myNullInt)
The workaround for LINQ-to-Entities is much harder to follow and more painful to write:
where (myNullInt == null ? o.column == null : o.column == myNullInt)

395 votes
Vote
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    I agree to the terms of service
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    Mike FlaskoMike Flasko shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →
    completed  ·  Diego VegaAdminDiego Vega (Admin, DataFx) responded  · 

    UPDATE: The SQL CE issue has been fixed in the EF6 provider and we changed the default behavior of EF6 to compensate for three-valued logic.

    UPDATE: changing the status from completed to started. Turns out that SQL Server Compact provider for EF is incompatible with this feature, therefore we are not making this option the default in DbContext API. The option is still available going through the ObjectContext API.

    The option to compensate for three-valued logic used in null equality comparisons performed by SQL databases will be included in the update of the EF core libraries that will ship as part of .NET 4.5. When enabled, this option will cause a LINQ expression such as ‘a == b’ to be translated to an equivalent SQL expression that will evaluate to true when both ‘a’ and ‘b’ are simultaneously null.

    Since the option will cause a change in query behavior that could potentially break existing applications and could also have performance impact in specific scenarios, it will be disabled by default for the ObjectContext API. However, the plan is to enable it by default for the DbContext API in the new version of Entity Framework (i.e. the EntityFramework package in NuGet) that we will release alongside .NET 4.5.

    We encourage you to add or vote for any other suggestions mentioned in the comments in separate items.

    26 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      I agree to the terms of service
      Signed in as (Sign out)
      Submitting...
      • Felipe Fujiy PessotoFelipe Fujiy Pessoto commented  ·   ·  Flag as inappropriate

        @Diego Vega, I did more tests, it looks like more a SQL Server bug. I changed

        ArquivoCotadoTodosGrupos = arquivo.WorkSheet.All(y => context.PURCHASE_CONFIRMATION.Any(x => x.CGR_ID == y.CGR_ID))

        To use the navigation property:

        ArquivoCotadoTodosGrupos = arquivo.WorkSheet.All(y => y.RelationCGR.PURCHASE_CONFIRMATION.Any()),

        It changes the sub-select from

        WHERE (NOT EXISTS (SELECT 1 AS [C1]
        FROM [dbo].[SCC_CONFIRMACAO_COMPRA] AS [Extent20]
        WHERE (([Extent20].[CGR_ID] = [Project11].[CGR_ID])
        AND (NOT ([Extent20].[CGR_ID] IS NULL
        OR [Project11].[CGR_ID] IS NULL)))
        OR (([Extent20].[CGR_ID] IS NULL)
        AND ([Project11].[CGR_ID] IS NULL))))

        To

        WHERE (NOT EXISTS (SELECT 1 AS [C1]
        FROM [dbo].[SCC_CONFIRMACAO_COMPRA] AS [Extent20]
        WHERE ([Extent20].[CGR_ID] IS NOT NULL)
        AND ([Project11].[CGR_ID] = [Extent20].[CGR_ID])))

        Resulting in same execution time as before UseCSharpNullComparisonBehavior

      • Felipe Fujiy PessotoFelipe Fujiy Pessoto commented  ·   ·  Flag as inappropriate

        @Diego Vega: I found the main problem. At my select statement I have a sub-select like this:

        from arquivo in query
        select new {
        ...
        ArquivoCotadoTodosGrupos = arquivo.WorkSheet.All(y => context.PURCHASE_CONFIRMATION.Any(x => x.CGR_ID == y.CGR_ID)),
        ...}

        At least with EF5, Any() generates a not optimal SQL, maybe EF6 fix that:

        CASE WHEN ( NOT EXISTS (SELECT
        1 AS [C1]
        FROM ( SELECT
        [Extent15].[CGR_ID] AS [CGR_ID]
        FROM (SELECT
        [WorkSheet].[PGA_ID] AS [PGA_ID],
        [WorkSheet].[CGR_ID] AS [CGR_ID],
        [WorkSheet].[ARQ_ID] AS [ARQ_ID],
        [WorkSheet].[FUN_ID] AS [FUN_ID],
        [WorkSheet].[PGA_DT_CADASTRO] AS [PGA_DT_CADASTRO],
        [WorkSheet].[PGA_DT_ATUALIZACAO] AS [PGA_DT_ATUALIZACAO],
        [WorkSheet].[ARQ_ID_CADREVIT] AS [ARQ_ID_CADREVIT]
        FROM [dbo].[WorkSheet] AS [WorkSheet]) AS [Extent15]
        WHERE [Project3].[ARQ_ID] = [Extent15].[ARQ_ID]
        ) AS [Project6]
        WHERE ( NOT EXISTS (SELECT
        1 AS [C1]
        FROM [dbo].[PURCHASE_CONFIRMATION] AS [Extent16]
        WHERE (([Extent16].[CGR_ID] = [Project6].[CGR_ID]) AND ( NOT ([Extent16].[CGR_ID] IS NULL OR [Project6].[CGR_ID] IS NULL))) OR (([Extent16].[CGR_ID] IS NULL) AND ([Project6].[CGR_ID] IS NULL))
        )) OR (CASE WHEN ( EXISTS (SELECT
        1 AS [C1]
        FROM [dbo].[PURCHASE_CONFIRMATION] AS [Extent17]
        WHERE (([Extent17].[CGR_ID] = [Project6].[CGR_ID]) AND ( NOT ([Extent17].[CGR_ID] IS NULL OR [Project6].[CGR_ID] IS NULL))) OR (([Extent17].[CGR_ID] IS NULL) AND ([Project6].[CGR_ID] IS NULL))
        )) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT
        1 AS [C1]
        FROM [dbo].[PURCHASE_CONFIRMATION] AS [Extent18]
        WHERE (([Extent18].[CGR_ID] = [Project6].[CGR_ID]) AND ( NOT ([Extent18].[CGR_ID] IS NULL OR [Project6].[CGR_ID] IS NULL))) OR (([Extent18].[CGR_ID] IS NULL) AND ([Project6].[CGR_ID] IS NULL))
        )) THEN cast(0 as bit) END IS NULL)
        )) THEN cast(1 as bit) WHEN ( EXISTS (SELECT
        1 AS [C1]
        FROM ( SELECT
        [Extent19].[CGR_ID] AS [CGR_ID]
        FROM (SELECT
        [WorkSheet].[PGA_ID] AS [PGA_ID],
        [WorkSheet].[CGR_ID] AS [CGR_ID],
        [WorkSheet].[ARQ_ID] AS [ARQ_ID],
        [WorkSheet].[FUN_ID] AS [FUN_ID],
        [WorkSheet].[PGA_DT_CADASTRO] AS [PGA_DT_CADASTRO],
        [WorkSheet].[PGA_DT_ATUALIZACAO] AS [PGA_DT_ATUALIZACAO],
        [WorkSheet].[ARQ_ID_CADREVIT] AS [ARQ_ID_CADREVIT]
        FROM [dbo].[WorkSheet] AS [WorkSheet]) AS [Extent19]
        WHERE [Project3].[ARQ_ID] = [Extent19].[ARQ_ID]
        ) AS [Project11]
        WHERE ( NOT EXISTS (SELECT
        1 AS [C1]
        FROM [dbo].[PURCHASE_CONFIRMATION] AS [Extent20]
        WHERE (([Extent20].[CGR_ID] = [Project11].[CGR_ID]) AND ( NOT ([Extent20].[CGR_ID] IS NULL OR [Project11].[CGR_ID] IS NULL))) OR (([Extent20].[CGR_ID] IS NULL) AND ([Project11].[CGR_ID] IS NULL))
        )) OR (CASE WHEN ( EXISTS (SELECT
        1 AS [C1]
        FROM [dbo].[PURCHASE_CONFIRMATION] AS [Extent21]
        WHERE (([Extent21].[CGR_ID] = [Project11].[CGR_ID]) AND ( NOT ([Extent21].[CGR_ID] IS NULL OR [Project11].[CGR_ID] IS NULL))) OR (([Extent21].[CGR_ID] IS NULL) AND ([Project11].[CGR_ID] IS NULL))
        )) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT
        1 AS [C1]
        FROM [dbo].[PURCHASE_CONFIRMATION] AS [Extent22]
        WHERE (([Extent22].[CGR_ID] = [Project11].[CGR_ID]) AND ( NOT ([Extent22].[CGR_ID] IS NULL OR [Project11].[CGR_ID] IS NULL))) OR (([Extent22].[CGR_ID] IS NULL) AND ([Project11].[CGR_ID] IS NULL))
        )) THEN cast(0 as bit) END IS NULL)
        )) THEN cast(0 as bit) END AS [C2],

      • Diego VegaAdminDiego Vega (Admin, DataFx) commented  ·   ·  Flag as inappropriate

        @Felipe: generating different SQL based on parameter values doesn't really work with how we do query caching. We did perf testing of the generated SQL and we did indeed see that in some cases it made queries somewhat slower (and in some other cases it made queries faster as well) but we definitively didn't see a 50x perf regressions in any case, so I am *extremely* interested in getting a repro from you. I encourage you to file a bug at https://entityframework.codeplex.com/WorkItem/Create and attach a repro there, otherwise feel free to contact me via email.

      • Felipe Fujiy PessotoFelipe Fujiy Pessoto commented  ·   ·  Flag as inappropriate

        @Diego Vega, I did a test with UseCSharpNullComparisonBehavior, it get performance 50x down. With not null column it changed to
        "([Extent1].[PRJ_ID] = 123 /* @p__linq__0 */)
        AND (18922 /* @p__linq__0 */ IS NOT NULL)"

        With a NULL column to:

        AND ((([Extent1].[PRJ_ID] = NULL /* @p__linq__1 */)
        AND (NOT ([Extent1].[PRJ_ID] IS NULL
        OR NULL /* @p__linq__1 */ IS NULL)))
        OR (([Extent1].[PRJ_ID] IS NULL)
        AND (NULL /* @p__linq__1 */ IS NULL)))

        My query went from 50ms to 10000ms.

        My suggestion is to generate different queries when using a value or null

      • Paul AnthoneyPaul Anthoney commented  ·   ·  Flag as inappropriate

        Now that EF5 and .NET 4.5 are released, is this option available. It doesn't appear to be the default with DbContext?

      • Robby MaddoxRobby Maddox commented  ·   ·  Flag as inappropriate

        @Diego - Thanks for the correction on the result of null==null. I guess it just goes to show that I'm stuck thinking in boolean logic rather than tri-valued logic. ;)

        Could you explain the UseCSharpNullComparisonBehavior flag and how it works? When I searched for it online, I couldn't find any useful results. The MSDN documentation says "Gets or sets {insert text here}". Really helpful. lol ;)

        Thanks again.

      • Diego VegaAdminDiego Vega (Admin, DataFx) commented  ·   ·  Flag as inappropriate

        @Robby Maddox: good catch on the link. I pasted the admin link, which won't work for most users of the site :)

        Regarding tri-valued logic, null==null is actually not false in the database, but null! The question is how to make the database behave in a similar way to in-memory queries. We chose to compensate for the differences, but there are several implications: some queries become more expensive because predicates contain extra terms. We belive we have hit a good compromise with the UseCSharpNullComparisonBehavior flag, but as I said before we are not enabling this by default because it could cause unexpected results. And with "unexpected results" I mean not only unexpected data (I agree with you that the majority of applications won't intentionally depend of the current behavior) but because we have to produce queries with different characteristics when the option is enabled.

      • Robby MaddoxRobby Maddox commented  ·   ·  Flag as inappropriate

        I can't imagine what DESIRABLE existing code would be broken if EF and L2S simply rejected the idea of ANSI nulls in all situations. If there's any EF or L2S code in production that depends on null == null evaluating to false, then it could only be 0.0001% of the EF and L2S code out there. So I say it should be changed with ample warning to the programming community and then that 0.0001% can be changed to something like (a == null ? false : b == null ? false : a == b).

        PS: Your link is broken Diego, I think it should be https://data.uservoice.com/forums/72025-ado-net-entity-framework-ef-feature-suggestions/suggestions/2410716-sum-max-min-count-and-average-of-an-e

        Thanks

      • Eamon NerbonneEamon Nerbonne commented  ·   ·  Flag as inappropriate

        Will this future .NET 4.5 EF version also fix the fact that Sum() of an empty set should be 0 rather than what it currently is, NULL?

      • Mikey CeeMikey Cee commented  ·   ·  Flag as inappropriate

        Agree, it is totally retarded and leaky to override C#'s null comparison semantics with SQL's.

        Retarded. Fix it.

      • TommyTommy commented  ·   ·  Flag as inappropriate

        This is a real pain in the backside. We're migrating an L2S project over to EF... it was a real nuisance before doing object.equals() but now its on a whole new level.

        Find all "Function(n) Object.Equals(", Subfolders, Keep modified files open, Find Results 2, "Entire Solution", "*.vb"
        ...
        Matching lines: 267 Matching files: 89 Total files searched: 1389

        =/

      ← Previous 1

      Feedback and Knowledge Base