Improve the performance of the Contains() operator
Currently, using the Contains() LINQ operator in EF queries causes a significant performance hit. See http://stackoverflow.com/questions/7897630/why-does-the-contains-operator-degrade-entity-frameworks-performance-so-drama for documentation and discussion of this problem. Please improve the performance of this operator.
Native support for DbInExpression was added in EF6 alpha 2 (announcement is here: http://blogs.msdn.com/b/adonet/archive/2012/12/10/ef6-alpha-2-available-on-nuget.aspx). My own home-made micro-benchmarks show that translating a query with list.Contains(x) for a list with 100,000 int elements is now well under a second.
@Matthew From what I remember there are a few corner cases in which the improvement won't kick in. EF providers also need to enable support for the new direct translation explicitly as it means the query expressions EF sends to them could contain the new tree node type representing the IN operator. If you find some other case please file a bug at https://entityframework.codeplex.com/WorkItem/Create.
I'm on EF 6.1.3 but the .Contains is still having performance issues. Are there any cases where the Contains won't run efficiently?
Michael Carr commented
Awesome Diego, thank you! I'm eager to test it out.
Roy McDonough commented
Looking forward to it, thank you for addressing this
@Allon: you are probably right about using TVPs as a solution, but given all the things we are committed to do in the EF6 release it is hard to think we will introduce a major change like that in the short term. If you are interested in contributing you can start a discussion in our CodePlex site.
Allon Guralnek commented
Instead of translating .Contains() to SQL's IN syntax, it should send the list as a table-valued parameter and perform a join. This would mean you could have an unlimited number of items in the .Contains() while not affecting query generation time, reducing the parsing time of SQL server (no huge lists to parse), and even possibly improving the query performance.
You don't need full-blown TVP support with metadata just to implement this improvement, since this is only for native SQL types.
I've just encountered this. My users have access to depots. In views screens they can filter by any selected subset of these. On a clients site where a user has access to 100 it takes 4 seconds to build the query and 0.1 to execute it. e.g
var query = from fb in FITTERBOOKINGS
where fb.BOOKINGDATE == theDate && fb.DRIVER != string.Empty
Roy McDonough commented
Performance hit is crippling. The more "contains" clauses, the slower the performance. L2E takes 6-8 seconds to translate into a SQL query which runs in 70ms. Solutions such as intermediate tables just to workaround this shortcoming are unpalatable.