Jun 16, 2023
7 mins read
ORMs can cause significant performance bottlenecks due to incorrect mapping, which happens when the ORM fails to accurately translate between the object model and the database schema. This issue can be challenging to identify and resolve. While there are various problems related to mapping, this post will focus on one specific issue: the implicit conversion problem. We’ll discuss the impact of explicit conversion and explore considerations related to data types in Sql Server.
To start off, we will incorporate a new table and populate it with relevant data, extracted from the StackOverflow database dump of 2018:
CREATE TABLE dbo.UsersDescription
(
Id int not null primary key identity(1,1),
UserId int not null foreign key references Users(Id),
Description nvarchar(max) null,
CreationDate datetime not null,
Location varchar(100) null
)
insert into UsersDescription (UserId, Description, CreationDate, Location)
select Id, AboutMe, CreationDate, CAST(Location as varchar(100)) from Users
Up next, we’ll generate an index that will be utilized to search for entries in UsersDescription based on Location.
CREATE INDEX Location ON Users (Location)
WITH(DATA_COMPRESSION = ROW);
Additionally, we will establish a straightforward mapping in NHibernate that enables us to retrieve data from our table efficiently:
public class UsersDescriptionMapping : ClassMap<UsersDescription>
{
public UsersDescriptionMapping()
{
Table("UsersDescription");
Schema("dbo");
Id(x => x.Id).GeneratedBy.Identity().Column("Id");
Map(x => x.Description).Nullable();
Map(x => x.CreationDate).Not.Nullable();
Map(x => x.Location).Nullable();
References(x => x.User).Column("UserId").Not.Nullable();
}
}
Lastly, we can proceed to write the necessary code to retrieve the data from the table:
using (var session = sessionFactory.OpenSession())
{
await session
.Query<UserDescription>()
.Where(x => x.Location == "Warsaw")
.ToListAsync();
}
By examining the SQL Profiler, we can observe the precise query generated by NHibernate:
exec sp_executesql
N'select userdescri0_.Id as id1_1_, userdescri0_.Description as description2_1_, userdescri0_.CreationDate as creationdate3_1_, userdescri0_.Location as location4_1_, userdescri0_.UserId as userid5_1_ from dbo.UsersDescription userdescri0_ where userdescri0_.Location=@p0',
N'@p0 nvarchar(4000)',
@p0=N'Warsaw'
At first sight, the query may appear to be correct and fast, but upon closer examination of the SQL Profiler results, we can see that the number of reads for this query is 15123 pages and the number of returned rows is 330. Considering the relatively low number of returned rows, even with additional key lookups, we would anticipate a lower read count. Based on this, we can conclude that SQL was reading the entire index, instead of seeking the required data directly. To investigate further, let’s jump into the query plan and identify the cause for this behavior.
As we can see SQL is performing a full index scan. Furthermore, we can notice a small yellow triangle warning symbol displayed above the SELECT operator, suggesting a possible problem with our query. Upon hovering the mouse cursor over the SELECT operator, we discover the following information:
The problem is caused by the parameter @p0 of type nvarchar(4000), generated by nHibernate.
This is very often, that developers do not write data types within ORM mapping. In most cases, the default framework settings suffice, and issues do not occur. However, in our scenario, the framework defaults are not correct.
In order to resolve this problem, it is necessary to explicitly define the column type in the NHibernate mapping. Considering that the Location column is of type varchar(100), the mapping should be modified as follows:
public class UsersDescriptionMapping : ClassMap<UserDescription>
{
public UsersDescriptionMapping()
{
...
Map(x => x.Location)
.Length(100)
.CustomType("AnsiString")
.Nullable();
...
}
}
Then, when we run the application, the SQL Profiler reports that the query reads only 42 pages, and the revised query is as follows:
exec sp_executesql
N'select userdescri0_.Id as id1_1_, userdescri0_.Description as description2_1_, userdescri0_.CreationDate as creationdate3_1_, userdescri0_.Location as location4_1_, userdescri0_.UserId as userid5_1_ from dbo.UsersDescription userdescri0_ where userdescri0_.Location=@p0',
N'@p0 varchar(100)',
@p0='Warsaw'
As we can see, the type now matches the defined table. Upon reviewing the execution plan:
We can observe the anticipated index seek operation, and no warnings are present.
The situation remains the same in EF Core. We must explicitly define a type for column:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
var entity = modelBuilder.Entity<UserDescription>();
entity.ToTable("UsersDescription", "dbo");
...
entity
.Property(p => p.Location)
.HasMaxLength(100)
.IsUnicode(false)
.IsRequired(true);
...
}
To allow EF to generate correct SQL query.
In addition to implicit conversion problem, which is exactly the same as in EF Core and NHibernate, I found one more interesting freamwork behavior while preparing this post.
In Entity Framework 6, when dealing with non-nullable columns, it is crucial to explicitly mark them as Required to ensure their proper handling.
To show the problem, let’s examine the following mapping:
entity
.Property(p => p.Location)
.HasMaxLength(100)
.IsUnicode(false);
and a bit more complex query, generated by the following code:
dbContext
.UserDescriptions
.Where(x => x.Location == location && x.CreationDate > date)
.ToList();
Additionally, let’s add an index to handle this query:
CREATE INDEX CreationDate_Location
ON UsersDescription (CreationDate, Location)
WITH(DATA_COMPRESSION = ROW);
Query generated by EntityFramework 6 is:
exec sp_executesql N'
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[CreationDate] AS [CreationDate],
[Extent1].[Description] AS [Description],
[Extent1].[Location] AS [Location]
FROM
[dbo].[UsersDescription] AS [Extent1]
WHERE
(([Extent1].[Location] = @p__linq__0) OR (([Extent1].[Location] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent1].[CreationDate] > @p__linq__1)',N'@p__linq__0 varchar(8000),@p__linq__1 datetime2(7)
',@p__linq__0='Warsaw',@p__linq__1='2017-01-01 00:00:00'
If we run this query, we can see that the generated plan performs scanning of the entire created index:
However, when we eliminate the OR condition by explicitly marking the column as required in the mapping, SQL will be able to seek our index:
In this case, we are reducing the number of scans from 23126 to 7614.
Both EF Core and NHibernate generate query without OR condition, when we don’t mark properties as Required in mapping. However, if we attempt to pass a null value, the generated query will be incorrect, resembling something like:
exec sp_executesql N'
SELECT
[u].[Id], [u].[CreationDate], [u].[Description], [u].[Location]
FROM
[dbo].[UsersDescription] AS [u]
WHERE [u].[Location] = @__location_0 AND [u].[CreationDate] > NULL
',N'@__location_0 varchar(100)',@__location_0='Warsaw'
Conclusion: when dealing with nullable parameters, it is important to handle them with special care.
There are several other data types conversions which can also cause mentioned warning. Here are some examples:
SELECT * FROM Table WHERE StringColumn = 123
SELECT * FROM Table WHERE TimestampColumn = 123
To get a list of queries with implicit conversion warning in SQL Server, you can use following DMV’s query:
SELECT
DB_NAME(st.[dbid]) AS DatabaseName,
st.text AS [Query Text],
qp.query_plan AS [Query Plan]
FROM
sys.dm_exec_query_stats AS qs WITH (NOLOCK)
CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS ep
WHERE
CAST(query_plan AS VARCHAR(MAX))
LIKE ('%PlanAffectingConvert ConvertIssue="Seek Plan" Expression="CONVERT_IMPLICIT%')
To summarize, when utilizing an ORM, it is important to explicitly specify the column type, length and nullability. Although the example discussed in this blog post may not have a significant performance impact, it can become a crucial factor in more complex scenarios. Neglecting to define column types and lengths can lead to excessive I/O and CPU usage, which sooner or later may lead to bigger performance issues.
Regarding the choice between varchar and nvarchar, used in this blog post, one may question why varchar is used at all, considering that Unicode is required in most cases. While this topic deserves a dedicated blog post, for now - treat it as a good practice to employ appropriate data types whenever possible.
If you like this article, please share it via your social media: