Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupJoin in EF Core 9 Returns Null for Joined Entities #35393

Open
TanerSaydam opened this issue Dec 30, 2024 · 4 comments · May be fixed by #35395
Open

GroupJoin in EF Core 9 Returns Null for Joined Entities #35393

TanerSaydam opened this issue Dec 30, 2024 · 4 comments · May be fixed by #35395
Assignees
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported regression type-bug

Comments

@TanerSaydam
Copy link

Issue Description:
Hello EF Core team,

I've encountered what seems to be a bug in EF Core 9. When I perform a GroupJoin between two tables using a condition that always evaluates to true (for example, c => true and p => true), the resulting joined collection is always null, even though there are valid entries in the related table. Here is minimal reproducible example:

var query = context.Categories
    .GroupJoin(context.Products,
               c => true,
               p => true,
               (c, p) => new 
               { 
                   Category = c, 
                   Products = p 
               })
    .Select(s => new
    {
        CategoryName = s.Category.CategoryName,
        Products = s.Products
    })
    .ToList();

In this scenario, s.Products always comes back as empty (null), However, if I rewrite this using a from clause (or a manual join with on 1 equals 1), I get the expected results:

var query2 = 
    (from c in context.Categories
     join p in context.Products 
         on 1 equals 1
         into productGroup
     select new
     {
         CategoryName = c.CategoryName,
         Products = productGroup
     })
    .ToList();

Both of these queries should logically produce the same result, but the GroupJoin version returns null for the Products while the LINQ from version successfully includes the product collection. If this is not intended behavior, it seems like a bug.

Thank you for looking into this! If you need any further information I'm happy to provide more details.

@TanerSaydam
Copy link
Author

When I updated the version to 8.0.11, the issue was resolved. There is a bug in version 9.0.0

@maumar
Copy link
Contributor

maumar commented Dec 31, 2024

I'm able to reproduce this:

    [ConditionalFact]
    public async Task ReproGJ()
    {
        using (var ctx = new MyContext())
        {
            await ctx.Database.EnsureDeletedAsync();
            await ctx.Database.EnsureCreatedAsync();

            var p1 = new Product();
            var p2 = new Product();
            var c1 = new Category { CategoryName = "one" };
            var c2 = new Category { CategoryName = "two" };

            ctx.Products.AddRange(p1, p2);
            ctx.Categories.AddRange(c1, c2);
            await ctx.SaveChangesAsync();
        }

        using (var ctx = new MyContext())
        {
            var query = await ctx.Categories
                .GroupJoin(ctx.Products,
                           c => true,
                           p => true,
                           (c, p) => new
                           {
                               Category = c,
                               Products = p
                           })
                .Select(s => new
                {
                    CategoryName = s.Category.CategoryName,
                    Products = s.Products
                })
                .ToListAsync();
        }
    }

    public class MyContext : DbContext
    {
        public DbSet<Category> Categories { get; set; }
        public DbSet<Product> Products { get; set; }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
        }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            optionsBuilder
                    .UseSqlServer(@"Server=(localdb)\mssqllocaldb;Database=ReproGJ;Trusted_Connection=True;MultipleActiveResultSets=true")
                    .LogTo(Console.WriteLine, LogLevel.Information)
                    .EnableSensitiveDataLogging();
        }
    }

    public class MyEntity
    {
        public int Id { get; set; }
    }


    public class Category
    {
        public int Id { get; set; }
        public string CategoryName { get; set; }
    }

    public class Product
    {
        public int Id { get; set; }
    }

We produce this sql:

SELECT [c].[CategoryName], [c].[Id], [p0].[Id]
FROM [Categories] AS [c]
OUTER APPLY (
    SELECT [p].[Id]
    FROM [Products] AS [p]
    WHERE 0 = 1
) AS [p0]
ORDER BY [c].[Id]

while EF 8 produces:

SELECT [c].[CategoryName], [c].[Id], [t].[Id]
FROM [Categories] AS [c]
OUTER APPLY (
    SELECT [p].[Id]
    FROM [Products] AS [p]
) AS [t]
ORDER BY [c].[Id]

@maumar
Copy link
Contributor

maumar commented Jan 1, 2025

When we process GroupJoin in GroupJoinConvertingExpressionVisitor we convert it to filtered subquery:

DbSet<Product>()
    .Where(p => !(object.Equals(
        objA: (object)True, 
        objB: null)) && object.Equals(
        objA: (object)True, 
        objB: (object)True))

then, during translation:

 !(object.Equals(
        objA: (object)True, 
        objB: null))

is naively translated to NOT(CAST(1 AS BIT) == NULL) but we apply incorrect optimization when we process the NOT wrapping CAST(1 AS BIT) == NULL:

// !(true == a) -> false == a
// !(false == a) -> true == a
SqlBinaryExpression { OperatorType: ExpressionType.Equal, Left: SqlConstantExpression { Value: bool } } binary
    => Equal(Not(binary.Left), binary.Right),

This optimization is correct in 3-value logic:

a	(true = a) !(true = a) (false = a)
0	    0		1	  1
1	    1		0	  0
N	    N		N	  N

but it breaks down when we know that a is actually null (i.e. null constant or null value parameter later on), which converts to ISNULL in SqlNullabilityProcessor.OptimizeComparison operation and that takes us to the realm of 2-value logic

!(true = null) => !(true IS NULL) => !false => true
false = null => (false IS NULL) => false

/cc @ranma42

@maumar maumar self-assigned this Jan 1, 2025
maumar added a commit that referenced this issue Jan 1, 2025
Problem was that in EF9 we moved some optimizations from sql nullability processor to SqlExpressionFactory (so that we optimize things early). One of the optimizations:

```
!(true == a) -> false == a
!(false == a) -> true == a
```

is not safe to do when a is a constant or parameter null value, because it evaluates to true IS NULL (two value logic). Fix is to remove this optimization from SqlExpressionFactory and instead put it back into OptimizeNotExpression, once we've converted everything we could to IS NULL checks already.

Fixes #35393
@maumar maumar added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Jan 1, 2025
@ranma42
Copy link
Contributor

ranma42 commented Jan 1, 2025

This optimization is correct in 3-value logic:

a	(true = a) !(true = a) (false = a)
0	    0		1	  1
1	    1		0	  0
N	    N		N	  N

but it breaks down when we know that a is actually null (i.e. null constant or null value parameter later on), which converts to ISNULL in SqlNullabilityProcessor.OptimizeComparison operation and that takes us to the realm of 2-value logic

Exactly, if the == you are considering is the C# one, then the optimization is incorrect (regardless of having a constant / parameter a argument), because in C# null == false and null == true are both false, hence you cannot negate x == false as x == true.

Sorry, I introduced this regression in #34142

maumar added a commit that referenced this issue Jan 1, 2025
Problem was that in EF9 we moved some optimizations from sql nullability processor to SqlExpressionFactory (so that we optimize things early). One of the optimizations:

```
!(true == a) -> false == a
!(false == a) -> true == a
```

is not safe to do when a is a constant or parameter null value, because it evaluates to true IS NULL (two value logic). Fix is to remove this optimization from SqlExpressionFactory and instead put it back into OptimizeNotExpression, once we've converted everything we could to IS NULL checks already.

Fixes #35393
maumar added a commit that referenced this issue Jan 1, 2025
Problem was that in EF9 we moved some optimizations from sql nullability processor to SqlExpressionFactory (so that we optimize things early). One of the optimizations:

```
!(true == a) -> false == a
!(false == a) -> true == a
```

is not safe to do when a is a constant or parameter null value, because it evaluates to true IS NULL (two value logic). Fix is to remove this optimization from SqlExpressionFactory and instead put it back into OptimizeNotExpression, once we've converted everything we could to IS NULL checks already.

Fixes #35393
maumar added a commit that referenced this issue Jan 3, 2025
Problem was that in EF9 we moved some optimizations from sql nullability processor to SqlExpressionFactory (so that we optimize things early). One of the optimizations:

```
!(true == a) -> false == a
!(false == a) -> true == a
```

is not safe to do when a is nullable. Fix is to constrain this optimization in SqlExpressionFactory to only work on argument which we know is not null (constant, column, non-nullable parameter) and do the comprehensive one back in OptimizeNotExpression, once we've converted everything we could to IS NULL checks already.

Fixes #35393
maumar added a commit that referenced this issue Jan 3, 2025
Problem was that in EF9 we moved some optimizations from sql nullability processor to SqlExpressionFactory (so that we optimize things early). One of the optimizations:

```
!(true == a) -> false == a
!(false == a) -> true == a
```

is not safe to do when a is nullable. Fix is to constrain this optimization in SqlExpressionFactory to only work on argument which we know is not null (constant, column, non-nullable parameter) and do the comprehensive one back in OptimizeNotExpression, once we've converted everything we could to IS NULL checks already.

Fixes #35393
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported regression type-bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants