May 8, 2023
7 mins read
LINQ is a powerful feature in C# that allows us to perform data querying operations. With LINQ, we can write very expressive queries using a syntax that closely resembles SQL queries.
One important aspect of LINQ is its lazy evaluation feature, which means that LINQ queries are only executed when their results are required. This can improve performance by avoiding unnecessary operations, but it can also create a potential problem: the stack overflow exception. This can occur when a long chain of LINQ methods is built up and then evaluated, causing the program to run out of stack memory and crash. This issue is similar to stack overflow caused by recursion but can be harder to identify since the cause may not be immediately obvious.
In this blog post, we will explore stack overflow exception caused by C# LINQ lazy evaluation show how to track this problem and fix it.
The problem was initially surfaced in the code repository of IdentityServer4. I think IdentityServer4 is a mature and high-quality product. Our investigation led us to believe that the issue may have been caused by incorrect refresh token configuration, resulting in an excessive number of PersistedGrants in the database for the same SubjectId and ClientId. While reproducing the issue may not be easy, it serves as an example of the type of problems that can arise from the extensive use of LINQ.
For reference, the original code can be found at this link: IdentityServer4 repository. The code presented here is a simplified version of that code.
Suppose we have a collection of arrays that contain integers, and we want to extract a sequence of unique integer values from those arrays.
static IEnumerable<int> GetDistinctValuesLazy(int[][] arrays)
{
var distincted = new List<int>().AsEnumerable();
foreach (var a in arrays)
{
distincted = distincted.Union(a).Distinct();
}
return distincted;
}
[MethodImpl(MethodImplOptions.NoInlining)]
static IList<int> Lazy(int[][] arrays)
{
var distincts = GetDistinctValuesLazy(arrays);
var result = distincts.ToList();
return result;
}
When we run it, with a high number of arrays, result may lead to StackOverflowException. In .NET 6.0, we can even see a nice exception summary:
Stack overflow.
Repeat 6885 times:
--------------------------------
at System.Linq.Enumerable+UnionIterator`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].GetNext()
at System.Linq.Enumerable+UnionIterator`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
at System.Linq.Enumerable+DistinctIterator`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
--------------------------------
at System.Linq.Enumerable+UnionIterator`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].GetNext()
at System.Linq.Enumerable+UnionIterator`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
at System.Collections.Generic.HashSet`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].UnionWith(System.Collections.Generic.IEnumerable`1<Int32>)
at System.Collections.Generic.HashSet`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]..ctor(System.Collections.Generic.IEnumerable`1<Int32>, System.Collections.Generic.IEqualityComparer`1<Int32>)
at System.Linq.Enumerable+DistinctIterator`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].ToList()
at System.Linq.Enumerable.ToList[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.Collections.Generic.IEnumerable`1<Int32>)
at StackOverflow.Program.Lazy(Int32[][])
at StackOverflow.Program.Main(System.String[])
You could already see, that Distinct method is not necessary and we could achieve the same without it. What’s more - due to .NET optimizations, version without Distinct wouldn’t cause an exception. In .NET Framework however, exception would still occur.
distincted = distincted.Union(a);
In general, there is no way to catch StackOverflowException in code (as per MSDN: Starting with the .NET Framework 2.0, you can’t catch a StackOverflowException object with a try/catch block, and the corresponding process is terminated by default.).
If you have this problem, you will probably get information about system unstability from users, or you will see this in logs.
The first place where you can see that the problem is related to StackOverflowException is Windows EventViewer - you will see errors with code 0xc00000fd, which stands for stack overflow error.
Next, you will need memory dump for analysis. You can get it, by executing following Procdump command:
procdump -accepteula -e 1 -f C00000FD.STACK_OVERFLOW
-g -ma -w PID or ProcessName
Using process name is a nice trick, but unfortunately - doesn’t allow for observing multiple instances.
Next step is to open memory dump in VisualStudio or Windbg. In past, VisualStudio wasn’t very usefull as it wasn’t displaing all stack frames:
So we couldn’t see where the execution has started. Now the situation is better, VS2022, collapses repeating stack frames:
Before VS2022, one option was to open dump with Windbg and display entire stack trace there. But as in VS2022 call stack is very readable, I will skip this option.
Difficult part in case of deferred execution is that place where the exception occurs might be not the same as where the problematic code is defined. In our example, we can see that the exception occured in method Lazy - but the problematic code is in method GetDistinctValuesLazy. In our case, the analysis is easy but when there is an entire chain of execution, before LINQ query gets evaluated, it might be more difficult.
There are several options to fix StackOverflowException caused by LINQ. The easiest - do not defer LINQ evaluation, the most efficient - if possible, get rid of linq. Here is a code and a benchmark:
//In benchmark: Eager
[MethodImpl(MethodImplOptions.NoInlining)]
static IEnumerable<int> GetDistinctValuesEager(int[][] arrays)
{
var distincted = new List<int>().AsEnumerable();
foreach (var a in arrays)
{
distincted = distincted.Union(a).Distinct().ToList();
}
return distincted;
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static void Eager(int[][] arrays)
{
var distincts = GetDistinctValuesEager(arrays);
var result = distincts.ToList();
}
//In benchmark: Split
[MethodImpl(MethodImplOptions.NoInlining)]
static IEnumerable<int> GetDistinctValues_Split(int[][] arrays)
{
var distincted = new List<int>().AsEnumerable();
int i = 0;
foreach (var a in arrays)
{
++i;
if(i % 1000 == 0)
{
distincted = distincted.Union(a).Distinct().ToList();
}
else
{
distincted = distincted.Union(a).Distinct();
}
}
return distincted;
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static void FindDistinctValuesSplit(int[][] arrays)
{
var distincts = GetDistinctValues_Split(arrays);
var result = distincts.ToList();
}
//In benchmark: SelectMany
[MethodImpl(MethodImplOptions.NoInlining)]
private static IEnumerable<int> GetDistinctValues_SelectMany(int[][] arrays)
{
return arrays.SelectMany(x => x).Distinct();
}
[*//MethodImpl(MethodImplOptions.NoInlining)]
public static IList<int> FindDistinctValues_SelectMany(int[][] arrays)
{
var distincts = GetDistinctValues_SelectMany(arrays);
var result = distincts.ToList();
return result;
}
//In benchmark: OnlyUnion
[MethodImpl(MethodImplOptions.NoInlining)]
public static IEnumerable<int> GetDistinctValuesUnionOnly(int[][] arrays)
{
var distincted = new List<int>().AsEnumerable();
foreach (var a in arrays)
{
distincted = distincted.Union(a);
}
return distincted;
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static IList<int> FindDistincValuesUnionOnly(int[][] arrays)
{
var distincts = GetDistinctValuesUnionOnly(arrays);
var result = distincts.ToList();
return result;
}
//In benchmark: WithoutLinq
[MethodImpl(MethodImplOptions.NoInlining)]
public static IEnumerable<int> _WithoutLinq(int[][] arrays)
{
HashSet<int> hashSet = new HashSet<int>();
for (int i = 0; i < arrays.Length; i++)
{
for (int j = 0; j < arrays[i].Length; j++)
{
hashSet.Add(arrays[i][j]);
}
}
var result = hashSet.ToList();
return result;
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static void WithoutLinq(int[][] arrays)
{
var distincts = _WithoutLinq(arrays);
var result = distincts.ToList();
}
Benchmark:
Method | Mean | Error | StdDev | Median | Ratio | Gen0 | Gen1 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|
Eager | 17,909.3 us | 356.34 us | 463.34 us | 17,927.0 us | 0.030 | 9125.0000 | 31.2500 | 55931.87 KB | 21.040 |
Split | 35,965.4 us | 717.71 us | 1,590.40 us | 35,180.1 us | 0.060 | 8230.7692 | 4076.9231 | 50471.9 KB | 18.986 |
WithoutLinq | 771.4 us | 14.65 us | 14.39 us | 769.1 us | 0.001 | - | - | 1.55 KB | 0.001 |
OnlyUnion | 601,144.5 us | 11,904.02 us | 22,935.01 us | 592,824.1 us | 1.000 | - | - | 2658.39 KB | 1.000 |
SelectMany | 2,063.9 us | 37.20 us | 32.98 us | 2,060.0 us | 0.003 | 101.5625 | - | 626.57 KB | 0.236 |
I used the Union method (instead of Union.Distinct) as the baseline. .NET 6 allows for the execution of the same LINQ method without causing a StackOverflowException. Benchmark was performed for arrays 20 000 of arrays, each with 10 elements. While removing the Distinct is a valid fix for the issue described in this post, it’s worth noting that sometimes it may not be possible to make such fix (and it isn’t possible at all in old .NET Framework).
Based on the observation, the approach that does not utilize LINQ outperforms other methods significantly in terms of both time and memory allocation. However LINQ SelectMany.Distinct is good enough, if there is no need to exectue the method hundreds of times. The readability of this method is significantly better.
Personally, I am disappointed with the underwhelming results of the Split method. Although I anticipated that the execution time would be slower than that of the Eager method, I was expecting significantly lower memory allocation.
You can find code for this post in GitHub repository.
If you like this article, please share it via your social media: