This morning I was supposed to speak at DevUp in St. Louis, sadly due to a dead airplane battery, I couldn't get myself down there in time. Although not as good as being able to be there in person, I wanted to make at least an effort to share the presentation materials, and code samples with everyone as this topic made for a very popular session at other events. So let us dive in!
EF Is Slow.....NOT!!
The most common thing I hear from developers that haven't worked with Entity Framework is that it performs slowly on large sets of data. Although I have heard this numerous times, in reality, it has never turned out to be an Entity Framework issue, but more of an education/adoption issue and simple coding style changes resolve the issues.
EF is just like any other ORM, it is a toolset, and the tools allow a level of flexibility that can be amazing, or horrendous depending on how it is used and the information obtained by the user before implementation. This is no different than raw SQL, Stored Procedures, or any other data access strategy.
If you understand the high-level concepts of how EF works, if you build a good model, and understand Deferred Execution, you will be able to move ahead quickly. This post focuses on the common areas that are often overlooked, or misunderstood when working with Entity Framework.
EF Bulk Operations
Before I dive into the specifics on the education front, I think it is important to note the one area that I do see performance difficulties in SQL and that is bulk data loads. In these cases, I'm specifically discussing loading thousands of records in single transactions or similar. There are advanced mechanisms for dealing with it, but often Bulk Copy is still better. I find that this is a rare limitation for many people, and is not a reason to discredit EF.
Common Problems
This post is all about confronting your fears about working with Entity Framework, so we need to dive into things. Let us discuss a few key concepts that typically trackback to the root problem when it comes to EF Performance.
Deferred Execution
One area of LINQ usage that I find many people don't fully understand is the concept of deferred execution. With IQueryable you can progressively build up an Entity Framework query, and only once you enumerate the IQueryable do you reach out to the database. Consider the following example:
In this code sample, we use a reference to our DbContext called ctx. Does this code snippet hit the database? If so, what records are returned?
No, this snippet alone will never execute against the database, at the last line we simply have a query that would get us all active courses, ordered by name. However, we would need to enumerate the list to trigger a database call. We could do this with a ForEach loop, with a call to ToList() or otherwise.
Now, on the flip side, I will often see developers that write code such as the following.
Comparing this snippet to the prior one doesn't look that much different; however, it could be a major performance issue waiting to happen. In this snippet, we call ToList() on the first line, executing the equivalent of SELECT * FROM Courses returning the entire table and storing it in memory. At this point we then filter the list, in memory, then we sort the list after that. We have efficient filtering and sorting in .NET, but it is important to leverage our DB for these operations when we can.
The effect of this process can be hard to track for smaller sets, and will typically result in higher CPU & Memory usage. In the provided sample code you can compare the performance to the GoodDataService and BadDataService for the various Search operations to see the timing, query, and communication differences.
Lazy Loading
Another commonly misunderstood concept is that of lazy loading. Lazy loading can be a great help when we want to get to information at a later time. However, it can also introduce big overhead. If a query executes, and then a lazy-loaded object is requested another database query will execute to obtain that row information. In loops, this can be a large problem. Consider the following example.
This query doesn't look horrible. However, the devil is in the details. The first line will get all active schools, executing a SQL similar to SELECT * FROM Schools WHERE IsActive = 1. This is what we expect; it only includes school information and limits. The problem comes in the next line. We have a navigation property on the School object linking to State; we then want to find items in a particular state.
That where clause will result in 1 query being executed for each row. If you have 1000 schools that are active, these two lines of code will result in 1,001 queries being executed. SearchSchools really bad, in the BadDataService of the provided sample, showcases this issue. When running the application, you will be able to see the query.
Fixing this issue is quite easy. If you make all where clauses before enumeration you will get around the issue. Alternatively, you can tell EF to eagerly load a Lazy Loaded item for a specific query. Changing to use below would result in a single query being used.
The Include(s => s.State) tells the query process to prepare for later usage.
Over Analysis of Queries
For brevity, the final item that I'll discuss today is the traditional Red Herring, SQL Query Format. The following snippet is a sample query, generate by EF, of the Schools table in our sample database, filtered for schools in Iowa with a name including the word "testing."
Developers new to EF, or possibly your DBA, might see things such as Extent1, Extent2 and Project1 and think something is crazy or inefficient. Looking at the query more detailed we can see that Extent1 is simply an alias for Schools, Extent2 an alias for AddressStates, and Project1 an alias for the sub-query. If we were to write this query ourselves, we might skip the sub-query part and have a query such as the following:
Sure, this is easier to read, but with EF our concern is DB impact, execution plans, and reliability. So to put things into perspective you can test the queries. Place both queries into a SQL Server Management Studio (SSMS) and from the "Query" menu select "Include Actual Execution Plan." This will render the query plan for the two queries and will assign a "query cost (relative to the batch" percentage. Doing this with these queries we will see the SAME execution plan and the same costs, no difference. This awareness can help in decision making.
Code & Database Samples and Slides
All referenced have been published to my EntityFramework Performance Demo Github repository. In this repository, you will find a few key pieces of information.
Sample Database
Within the repository you will find a folder named "SampleDB," containing both a .bak backup and scripts to recreate the database used for demonstration. If you are running on SQL Server 2017, the backup is the fastest route. If you are running on an earlier version, you will need to use the scripts to create the database and load the information. I'll try to update a readme file with the order in which to deploy the data into the tables.
Sample Project
The same project is a Console Application running on the Full .NET Framework and includes a separate project with the EntityFramework model. You will need to correct the connection string values in the configuration file to point to your Database Server and database instance. In the sections below I'll detail more on the components of this project.
Presentation
I've included the slides that are typically used in the talk discussing common concepts these are primarily to ensure we establish a base of knowledge before diving into the sample code.
Next Steps
Have a look at the sample code, look at the executed SQL and see the differences between the Good and Bad data services. If you have questions please share below. I hope this has been beneficial and for all of the DevUp attendees I'm sorry I couldn't attend.