Stack Overflow Exception From Simple LINQ Operations

May 8, 2023

7 mins read

Introduction

LINQ is a powerful feature in C# that allows us to perform data querying operations. With LINQ, we can write very expressive queries using a syntax that closely resembles SQL queries.

One important aspect of LINQ is its lazy evaluation feature, which means that LINQ queries are only executed when their results are required. This can improve performance by avoiding unnecessary operations, but it can also create a potential problem: the stack overflow exception. This can occur when a long chain of LINQ methods is built up and then evaluated, causing the program to run out of stack memory and crash. This issue is similar to stack overflow caused by recursion but can be harder to identify since the cause may not be immediately obvious.

In this blog post, we will explore stack overflow exception caused by C# LINQ lazy evaluation show how to track this problem and fix it.

Problem

The problem was initially surfaced in the code repository of IdentityServer4. I think IdentityServer4 is a mature and high-quality product. Our investigation led us to believe that the issue may have been caused by incorrect refresh token configuration, resulting in an excessive number of PersistedGrants in the database for the same SubjectId and ClientId. While reproducing the issue may not be easy, it serves as an example of the type of problems that can arise from the extensive use of LINQ.

For reference, the original code can be found at this link: IdentityServer4 repository. The code presented here is a simplified version of that code.


Suppose we have a collection of arrays that contain integers, and we want to extract a sequence of unique integer values from those arrays.


static IEnumerable<int> GetDistinctValuesLazy(int[][] arrays)
{
	var distincted = new List<int>().AsEnumerable();

	foreach (var a in arrays)
	{
		distincted = distincted.Union(a).Distinct();
	}

	return distincted;
}

[MethodImpl(MethodImplOptions.NoInlining)]
static IList<int> Lazy(int[][] arrays)
{
	var distincts = GetDistinctValuesLazy(arrays);

	var result = distincts.ToList();

	return result;
}

When we run it, with a high number of arrays, result may lead to StackOverflowException. In .NET 6.0, we can even see a nice exception summary:

Stack overflow.
Repeat 6885 times:
--------------------------------
   at System.Linq.Enumerable+UnionIterator`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].GetNext()
   at System.Linq.Enumerable+UnionIterator`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
   at System.Linq.Enumerable+DistinctIterator`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
--------------------------------
   at System.Linq.Enumerable+UnionIterator`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].GetNext()
   at System.Linq.Enumerable+UnionIterator`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
   at System.Collections.Generic.HashSet`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].UnionWith(System.Collections.Generic.IEnumerable`1<Int32>)
   at System.Collections.Generic.HashSet`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]..ctor(System.Collections.Generic.IEnumerable`1<Int32>, System.Collections.Generic.IEqualityComparer`1<Int32>)
   at System.Linq.Enumerable+DistinctIterator`1[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].ToList()
   at System.Linq.Enumerable.ToList[[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.Collections.Generic.IEnumerable`1<Int32>)
   at StackOverflow.Program.Lazy(Int32[][])
   at StackOverflow.Program.Main(System.String[])

You could already see, that Distinct method is not necessary and we could achieve the same without it. What’s more - due to .NET optimizations, version without Distinct wouldn’t cause an exception. In .NET Framework however, exception would still occur.

    distincted = distincted.Union(a);

How to diagnose what causes StackOverflowException

In general, there is no way to catch StackOverflowException in code (as per MSDN: Starting with the .NET Framework 2.0, you can’t catch a StackOverflowException object with a try/catch block, and the corresponding process is terminated by default.).

If you have this problem, you will probably get information about system unstability from users, or you will see this in logs.

The first place where you can see that the problem is related to StackOverflowException is Windows EventViewer - you will see errors with code 0xc00000fd, which stands for stack overflow error.

Next, you will need memory dump for analysis. You can get it, by executing following Procdump command:

procdump -accepteula -e 1 -f C00000FD.STACK_OVERFLOW 
	-g -ma -w PID or ProcessName 

Using process name is a nice trick, but unfortunately - doesn’t allow for observing multiple instances.

Next step is to open memory dump in VisualStudio or Windbg. In past, VisualStudio wasn’t very usefull as it wasn’t displaing all stack frames:

Call stack - VisualStudio 2019

So we couldn’t see where the execution has started. Now the situation is better, VS2022, collapses repeating stack frames:

Call stack - VisualStudio 2022

Before VS2022, one option was to open dump with Windbg and display entire stack trace there. But as in VS2022 call stack is very readable, I will skip this option.

Difficult part in case of deferred execution is that place where the exception occurs might be not the same as where the problematic code is defined. In our example, we can see that the exception occured in method Lazy - but the problematic code is in method GetDistinctValuesLazy. In our case, the analysis is easy but when there is an entire chain of execution, before LINQ query gets evaluated, it might be more difficult.

Fix of problem

There are several options to fix StackOverflowException caused by LINQ. The easiest - do not defer LINQ evaluation, the most efficient - if possible, get rid of linq. Here is a code and a benchmark:

  1. Eager LINQ evaluation:

	//In benchmark: Eager
	
	[MethodImpl(MethodImplOptions.NoInlining)]
	static IEnumerable<int> GetDistinctValuesEager(int[][] arrays)
	{
        var distincted = new List<int>().AsEnumerable();

		foreach (var a in arrays)
		{
			distincted = distincted.Union(a).Distinct().ToList();
		}

		return distincted;
	}
	
	[MethodImpl(MethodImplOptions.NoInlining)]
	public static void Eager(int[][] arrays)
	{
		var distincts = GetDistinctValuesEager(arrays);

		var result = distincts.ToList();
	}
	
  1. Partial defferd LINQ evaluation (the idea behind this is to evaluate LINQ queries more often to limit stack size):

	//In benchmark: Split
	
	[MethodImpl(MethodImplOptions.NoInlining)]
	static IEnumerable<int> GetDistinctValues_Split(int[][] arrays)
	{
		var distincted = new List<int>().AsEnumerable();

		int i = 0;

		foreach (var a in arrays)
		{
			++i;

			if(i % 1000 == 0)
			{
				distincted = distincted.Union(a).Distinct().ToList();
			}
			else
			{
				distincted = distincted.Union(a).Distinct();
			}
		}

		return distincted;
	}
	
	[MethodImpl(MethodImplOptions.NoInlining)]
	public static void FindDistinctValuesSplit(int[][] arrays)
	{
		var distincts = GetDistinctValues_Split(arrays);

		var result = distincts.ToList();
	}
	
  1. LINQ SelectMany

	//In benchmark: SelectMany
	
	[MethodImpl(MethodImplOptions.NoInlining)]
	private static IEnumerable<int> GetDistinctValues_SelectMany(int[][] arrays)
	{
		return arrays.SelectMany(x => x).Distinct();
	}
	
	[*//MethodImpl(MethodImplOptions.NoInlining)]
	public static IList<int> FindDistinctValues_SelectMany(int[][] arrays)
	{
		var distincts = GetDistinctValues_SelectMany(arrays);

		var result = distincts.ToList();

		return result;
	}
  1. Union only

	//In benchmark: OnlyUnion

	[MethodImpl(MethodImplOptions.NoInlining)]
	public static IEnumerable<int> GetDistinctValuesUnionOnly(int[][] arrays)
	{
		var distincted = new List<int>().AsEnumerable();

		foreach (var a in arrays)
		{
			distincted = distincted.Union(a);
		}

		return distincted;
	}

	[MethodImpl(MethodImplOptions.NoInlining)]
	public static IList<int> FindDistincValuesUnionOnly(int[][] arrays)
	{
		var distincts = GetDistinctValuesUnionOnly(arrays);

		var result = distincts.ToList();

		return result;

	}
		
  1. Without LINQ

	//In benchmark: WithoutLinq

	[MethodImpl(MethodImplOptions.NoInlining)]
	public static IEnumerable<int> _WithoutLinq(int[][] arrays)
	{
		HashSet<int> hashSet = new HashSet<int>();

		for (int i = 0; i < arrays.Length; i++)
		{
			for (int j = 0; j < arrays[i].Length; j++)
			{
				hashSet.Add(arrays[i][j]);
			}
		}
		
		var result = hashSet.ToList();
		return result;
	}
	
	[MethodImpl(MethodImplOptions.NoInlining)]
	public static void WithoutLinq(int[][] arrays)
	{
		var distincts = _WithoutLinq(arrays);

		var result = distincts.ToList();
	}	
	

Benchmark:

Method Mean Error StdDev Median Ratio Gen0 Gen1 Allocated Alloc Ratio
Eager 17,909.3 us 356.34 us 463.34 us 17,927.0 us 0.030 9125.0000 31.2500 55931.87 KB 21.040
Split 35,965.4 us 717.71 us 1,590.40 us 35,180.1 us 0.060 8230.7692 4076.9231 50471.9 KB 18.986
WithoutLinq 771.4 us 14.65 us 14.39 us 769.1 us 0.001 - - 1.55 KB 0.001
OnlyUnion 601,144.5 us 11,904.02 us 22,935.01 us 592,824.1 us 1.000 - - 2658.39 KB 1.000
SelectMany 2,063.9 us 37.20 us 32.98 us 2,060.0 us 0.003 101.5625 - 626.57 KB 0.236

I used the Union method (instead of Union.Distinct) as the baseline. .NET 6 allows for the execution of the same LINQ method without causing a StackOverflowException. Benchmark was performed for arrays 20 000 of arrays, each with 10 elements. While removing the Distinct is a valid fix for the issue described in this post, it’s worth noting that sometimes it may not be possible to make such fix (and it isn’t possible at all in old .NET Framework).

Based on the observation, the approach that does not utilize LINQ outperforms other methods significantly in terms of both time and memory allocation. However LINQ SelectMany.Distinct is good enough, if there is no need to exectue the method hundreds of times. The readability of this method is significantly better.

Personally, I am disappointed with the underwhelming results of the Split method. Although I anticipated that the execution time would be slower than that of the Eager method, I was expecting significantly lower memory allocation.


You can find code for this post in GitHub repository.

If you like this article, please share it via your social media: