Saturday, 25 February 2012

Entity Framework Code First, Inheritance and ComplexType

In one of the applications we are working on, we have a pretty complex inheritance hierarchy which is mapped to SQL Server using Entity Framework 4.3 using Code First. We noticed that when selecting from the base type just on ID we had terrible performance. Some investigation revealed that the simple select-by-ID generated a SQL Statement that was 360,000 characters long. Generating the SQL statement took 1.5s, causing serious performance issues. Quite a lot of investigation revealed that Code First and Complex Types really don’t play well together. It is entirely possible to place a single property on a single class and massively decrease performance of your solution without it being at all obvious what is happening.
I don’t think this is necessarily a fault with Entity Framework nor do I pretend to fully understand what is happening, but here is a walk through that will show the issue. Note, for background reference on inheritance handling in EF Code First, I strongly recommend reading Morteza Manavi’s series on the subject.

When it works well

If we run this code;
class Program
{
    static void Main(string[] args)
    {
        Database.SetInitializer<MyDb>(new DropCreateDatabaseAlways<MyDb>());

        var db = new MyDb();
        var t = db.Fruits.FirstOrDefault(f => f.ID == 1);
    }
}

class MyDb : DbContext
{
    public DbSet<Fruit> Fruits { get; set; }
}

abstract class Fruit
{
    public int ID { get; set; }
}

class Apple : Fruit
{
    public int PipCount { get; set; }
}

class Orange : Fruit
{
    public int Oranginess { get; set; }
}
This SQL is generated:
SELECT TOP (1) 
[Extent1].[Discriminator] AS [Discriminator], 
[Extent1].[ID] AS [ID], 
[Extent1].[PipCount] AS [PipCount], 
[Extent1].[Oranginess] AS [Oranginess]
FROM [dbo].[Fruits] AS [Extent1]
WHERE ([Extent1].[Discriminator] IN ('Apple','Orange')) AND (1 = [Extent1].[ID])
That’s pretty neat; We get all the properties for all the subclasses in a neat way.

When it gets a little weird

When we add a property to a sub class which is a Complex Type then we start having slightly strange results. If we change the class structure to this, for example;
abstract class Fruit
{
    public int ID { get; set; }
}

class Apple : Fruit
{
    public int PipCount { get; set; }
}

class Orange : Fruit
{
    public int Oranginess { get; set; }
    public Nutrition Nutrition { get; set; }
}

[ComplexType]
class Nutrition
{
    public int Calories { get; set; }
}
We then get this SQL generated;
SELECT 
[Limit1].[Discriminator] AS [Discriminator], 
[Limit1].[ID] AS [ID], 
[Limit1].[PipCount] AS [PipCount], 
[Limit1].[Oranginess] AS [Oranginess], 
[Limit1].[C1] AS [C1]
FROM ( SELECT TOP (1) 
    [Extent1].[ID] AS [ID], 
    [Extent1].[PipCount] AS [PipCount], 
    [Extent1].[Oranginess] AS [Oranginess], 
    [Extent1].[Discriminator] AS [Discriminator], 
    CASE WHEN ([Extent1].[Discriminator] = 'Orange') THEN [Extent1].[Nutrition_Calories] END AS [C1]
    FROM [dbo].[Fruits] AS [Extent1]
    WHERE ([Extent1].[Discriminator] IN ('Apple','Orange')) AND (1 = [Extent1].[ID])
)  AS [Limit1]
The structure of the statement is a bit weird, but the key thing to note is the line
CASE WHEN ([Extent1].[Discriminator] = 'Orange') THEN [Extent1].[Nutrition_Calories] END AS [C1]
EF is starting to use Case statements to choose what to select from SQL, depending on the sub class. If you had more sub-classes then this would start getting bigger. Still, it’s something you can live with.

When it gets bad

The real problems start when we use a complex type on an intermediate class as in this example:
abstract class Fruit
{
    public int ID { get; set; }
}

class Apple : Fruit
{
    public int PipCount { get; set; }
}

abstract class Citrus : Fruit
{
    public Nutrition Nutrition { get; set; }
}

class Orange : Citrus
{
    public int Oranginess { get; set; }
}

class Clementine : Citrus
{
    public int Sweetness { get; set; }
}

[ComplexType]
class Nutrition
{
    public int Calories { get; set; }
}
What we have is an intermediate “Citrus” class, which inherits from Fruit and which has children. That generates this SQL:
SELECT 
[Limit1].[C1] AS [C1], 
[Limit1].[ID] AS [ID], 
[Limit1].[C2] AS [C2], 
[Limit1].[C3] AS [C3], 
[Limit1].[C4] AS [C4], 
[Limit1].[C5] AS [C5]
FROM ( SELECT TOP (1) 
    [Extent1].[ID] AS [ID], 
    CASE WHEN ([Extent1].[Discriminator] = 'Apple') THEN '0X0X' 
        WHEN ([Extent1].[Discriminator] = 'Orange') THEN '0X1X0X' ELSE '0X1X1X' END AS [C1], 
    CASE WHEN ([Extent1].[Discriminator] = 'Apple') THEN [Extent1].[PipCount] 
        WHEN ([Extent1].[Discriminator] = 'Orange') THEN CAST(NULL AS int) END AS [C2], 
    CASE WHEN ([Extent1].[Discriminator] = 'Apple') THEN CAST(NULL AS int) 
            WHEN ([Extent1].[Discriminator] = 'Orange') THEN 
                CASE WHEN (((CASE WHEN ([Extent1].[Discriminator] = 'Orange') THEN cast(1 as bit) 
                ELSE cast(0 as bit) END) = 1) 
                OR ((CASE WHEN ([Extent1].[Discriminator] = 'Clementine') THEN cast(1 as bit) 
                ELSE cast(0 as bit) END) = 1)) THEN [Extent1].[Nutrition_Calories] END 
                WHEN (((CASE WHEN ([Extent1].[Discriminator] = 'Orange') 
                THEN cast(1 as bit) ELSE cast(0 as bit) END) = 1) OR 
                ((CASE WHEN ([Extent1].[Discriminator] = 'Clementine') THEN cast(1 as bit) 
                ELSE cast(0 as bit) END) = 1)) THEN [Extent1].[Nutrition_Calories] END AS [C3], 
    CASE WHEN ([Extent1].[Discriminator] = 'Apple') THEN CAST(NULL AS int) 
        WHEN ([Extent1].[Discriminator] = 'Orange') THEN [Extent1].[Oranginess] END AS [C4], 
    CASE WHEN ([Extent1].[Discriminator] = 'Apple') THEN CAST(NULL AS int) 
        WHEN ([Extent1].[Discriminator] = 'Orange') THEN CAST(NULL AS int) ELSE [Extent1].[Sweetness] END AS [C5]
    FROM [dbo].[Fruits] AS [Extent1]
    WHERE ([Extent1].[Discriminator] IN ('Apple','Orange','Clementine')) AND (1 = [Extent1].[ID])
)  AS [Limit1]
As far as I can tell, adding that ComplexType property to the intermediate class causes EF to pursue a strategy in which every single property from every single class other than the root class is selected using CASE clauses. From what I have read, this code looks like it is related to some strategies for performance optimisation with table-per-concrete-type.
Now, the above hierarchy is very simple- you can probably imagine what happens when you have a complex class structure.
Fixing this problem is a matter of either moving the complex type to the base class or change it to be a navigation property.
Microsoft Connect issue here