Thread Pool Starvation

Today I want to talk about a topic many of you might never think could happen, and that's exactly what I used to think during my professional career. But now, I'm working with a client where we have some applications we could consider legacy (on .NET Framework), which cause us a lot of problems, the vast majority due to thread pool starvation.

 

 

Here, we will see how to solve, or at least mitigate, these issues, since it's not always possible to fix them in the short term.

 

1 - What is the Thread Pool?

 

The first thing we need to understand is what the thread pool is. Basically, it's a set of logical threads your application has available to perform operations. When you run a program, its actions are executed on the processor, and for that you need a thread. With the thread pool, your application knows it has a fixed number of available threads and uses them as needed.

 

For example, if you have an app that calculates a multiplication, the calculation uses a thread. The same applies to web applications—when you receive a call to an endpoint, it's assigned to a thread, and when it finishes, the thread goes back to the thread pool.

The great thing about the thread pool is that it lets you reuse threads instead of creating new ones each time, which is more efficient. In C#, this is handled by the .NET Framework.

 

2 - What is Thread Pool Starvation?

 

When we talk about thread pool starvation, what we're really saying is all the threads we have are busy. This means any action we need to carry out will have to wait.

This leads to terrible performance problems and the application will basically time out.

 

This is easy to notice in services that receive HTTP calls. To keep it simple, if we have 20 threads and each one takes 1 minute to finish, if we receive 50 calls at once, there will be 30 calls waiting.

 

Obviously, in professional environments, these numbers are much higher, usually because we have more than one instance. For example, if we have 3 instances with 20 threads each, we can handle the 50 calls mentioned above.

thread pool

At first, this type of problem is very hard to spot. Companies often over-provision with much bigger machines than strictly necessary, but sometimes, our growth is much greater than expected, and that's where complications arise.

 

 

3 - How to Detect Thread Pool Starvation Issues

 

For me, one of the biggest challenges is detecting thread pool starvation. These kinds of problems are detected at the network level, not in a log. No matter what logging system you have—even the best one—you won't see an error like when you try to access a null or divide by zero.

 

What you'll see instead is calls taking a very long time to finish, and you won't know why. Some complete, some crash, but what you do see is that the number of slow calls just keeps growing and growing.

 

We also notice that these problems usually occur during usage spikes or when the app itself receives a higher number of calls.

If you can check CPU and RAM usage and see that they're low but response times keep increasing, it's very likely you have a thread pool starvation problem.

I'm not going to give numbers because no number is really definitive, but you might notice that a call that normally completes in 1 second then suddenly takes 5 minutes right after.

 

As I said, if that's your case, it's most likely a thread pool starvation problem. The hardest part is that you won't get any error—other than some calls timing out if you have that configured.

 

 

4 - Causes of Thread Pool Starvation

 

At first glance, the causes might seem obvious: the clearest one is what I already mentioned—we can handle 100 calls but we're getting 500. The math just doesn't work, so the server/container or whatever you're using needs to be scaled up.

Of course, this multiplies if the tasks you're performing take several minutes instead of milliseconds.

 

There are other causes such as misconfiguration or not releasing threads properly, but in C# this shouldn't happen since the .NET Framework manages all that for you.

 

 

But another, much more common cause is simply not knowing how to program properly. It might sound silly, but it's true.

soy programador meme

First, there's the issue of making more external calls than necessary.

 

For example, in a typical application, at the beginning of your use case you call the user microservice to get user data and, say, check if they have permission to modify the selected item.

 

So far so good, but your process continues, and later you make another call to the user microservice to get the same information again. So, in each process, you're making two calls when you only need one.

If you have an app that receives 10 calls per minute, it's not an issue, but if you get 60,000, you have a problem.

 

 

Another example is querying for users one-by-one inside a loop. Instead, it's better to send all user IDs in one call, because if you have 100 users to look up, doing 100 connections to another service and 100 to the database is much worse than just making one call, even if it's a bit slower.

 

 

 

4.1 - The Misuse of Task.Run

Without a doubt, the biggest cause of problems is the misuse of Task.Run (in C#). Unlike in desktop apps, Task.Run in web apps doesn't help—it can actually cause everything to collapse.

 

Task.Run is used to run a task "in the background" so you don't block the main thread. This makes sense for desktop applications—otherwise, the UI freezes.

But what happens in web applications? Especially when your API is receiving a call and there are Task.Runs floating around?

 

The first thing you might ask is why would anyone use Task.Run in an API?

Most of the time Task.Run is used due to a misunderstanding. We assume it'll work the same as in desktop apps, and that's where problems start.

 

Imagine you have an API that's fully async/await and well-configured, but some clients—for whatever reason—don't have an async implementation. For example, the user microservice has a C# client that performs HTTP calls and has its "getUserInfo(userId)" method that returns a UserInfo object. Since your code is all async, you wrap it in a Task.Run just so you can await the task.

UserInfo userInfo = await Task.Run(() => _userServiceClient.GetUserInfo(userId));

 

At first it might make sense, but this is NOT the same as async/await, and here's where the trouble starts. What most people assume is that, by using await, they're magically making the code asynchronous. But that's not true—you're just moving a blocking call to another thread in the thread pool.

 

And since you want the result and use async/await logic, you put the `await` keyword, which causes another problem: it blocks the thread running the task, and because you're using Task.Run, you're blocking yet another thread from the thread pool.

 

Returning to our earlier example with unnecessary double calls and misuse of Task.Run, you end up using 4 threads when you really only need one. It's wild if you think about it.

As I said, it's possible many of you have these issues and have never noticed because, in low-usage systems, they're hard to detect.

 

If the API client you're consuming is synchronous (not async), using Task.Run won't magically make it asynchronous.

Before wrapping up this section, if you need to call 4 or 5 services at once, THEN you can use Task.Run, because you'll make the calls in parallel and then await all of them. The response time is much faster, and the benefits outweigh the costs, although there are better ways to handle this.

// Run all the calls using Task.Run
var userInfoTask = Task.Run(() => _userServiceClient.GetUserInfo(userId));
var vehiclesTask = Task.Run(() => _vehicleServiceClient.GetVehicles(userId));
var storesTask = Task.Run(() => _storeServiceClient.GetStores(userId));

// Wait for all the tasks
await Task.WhenAll(userInfoTask, vehiclesTask, storesTask);

// Get the results individually
var userInfo = await userInfoTask;
var vehicles = await vehiclesTask;
var stores = await storesTask;

Note: It's better to use await over .Result on tasks, even when you know the result is ready.

 

Now let's look at how to fix this and similar scenarios.

 

5 - Mitigating Thread Pool Starvation Issues

 

Like everything in life, there are ways to avoid these problems, although most of the time the solution is to mitigate rather than totally fix them, since a fix can take a lot of time.

 

A - Throw money at the problem. Basically, if you're running out of threads, you throw a bigger server at it, faster CPUs, more servers behind a load balancer—MORE, MORE, MORE! The problem will go away... for a while. But it will come back stronger, trust me.

B - Remove Task.Run. If the troubled application is full of Task.Run calls, simply remove them. It's better to have one blocked thread than two. In many cases, this will be a semi-permanent (but not ideal) solution.

C - Rate Limiting. We discussed rate limiting in another post, which basically means limiting your app’s usage. Suppose your app starts having threadpool problems after 30,000 calls per minute (remember, once you get one stuck thread, others start to pile up "in the queue"). You set a limit at 27,000 calls, and any calls over that get dropped. Not ideal, but better than everything crashing.

D - Migrate everything to async. This is the ideal solution. We saw in the post about async/await that everything "leaving your code" should be async—including calls to external services, the filesystem, and the database. But this takes time: you'll probably need to migrate a bunch of HTTP clients, then their code, then your code, test everything, etc. This is the best solution in the long run. Here's what the earlier code looks like with Async/Await implemented correctly:

var userInfoTask = _userServiceClient.GetUserInfoAsync(userId);
var vehiclesTask = _vehicleServiceClient.GetVehiclesAsync(userId);
var storesTask = _storeServiceClient.GetStoresAsync(userId);

await Task.WhenAll(userInfoTask, vehiclesTask, storesTask);

var userInfo = await userInfoTask;
var vehicles = await vehiclesTask;
var stores = await storesTask;

 

As you can see, it's very similar except that all calls are now executed asynchronously, without blocking anything. This is the most efficient and recommended way to implement asynchrony in C#, since resources are used in the most optimal way.

 

 

Conclusion

Now you know: if you want to avoid performance issues in applications that require high availability and high performance, implement things correctly. A bad loop or poor implementation can bring down these kinds of systems.

 

This post was translated from Spanish. You can see the original one here.
If there is any problem you can add a comment bellow or contact me in the website's contact form

Uso del bloqueador de anuncios adblock

Hola!

Primero de todo bienvenido a la web de NetMentor donde podrás aprender programación en C# y .NET desde un nivel de principiante hasta más avanzado.


Yo entiendo que utilices un bloqueador de anuncios como AdBlock, Ublock o el propio navegador Brave. Pero te tengo que pedir por favor que desactives el bloqueador para esta web.


Intento personalmente no poner mucha publicidad, la justa para pagar el servidor y por supuesto que no sea intrusiva; Si pese a ello piensas que es intrusiva siempre me puedes escribir por privado o por Twitter a @NetMentorTW.


Si ya lo has desactivado, por favor recarga la página.


Un saludo y muchas gracias por tu colaboración

© copyright 2025 NetMentor | Todos los derechos reservados | RSS Feed

Buy me a coffee Invitame a un café