Add Rate Limiting to your ASP.NET APIs – Natively in ASP.NET

Last week, we looked at how we can rate-limit our ASP.NET APIs using a third party package called AspNetCoreRateLimit. Moreover, we looked at what rate-limiting is and why we should consider it when building an API. If you haven’t checked out that post yet, you can read it here. With that background, this week, let’s consider another way to solve this problem in your ASP.NET applications. Microsoft has made some efforts in bringing a rate-limiting capability to ASP.NET, natively within the framework, without having to rely on any third-party packages. Let’s take a look and see how we can utilize it.


To get started install the System.Threading.RateLimiting NuGet package.

dotnet add package System.Threading.RateLimiting

Next, let’s wire it up during startup in our Program.cs (or Startup.cs).

using Microsoft.AspNetCore.RateLimiting;
using System.Net;
using System.Threading.RateLimiting;

var builder = WebApplication.CreateBuilder(args);

// Add services to the container.



var app = builder.Build();

    new RateLimiterOptions
        RejectionStatusCode = 429,
        GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, IPAddress>(context =>
            IPAddress? remoteIpAddress = context.Connection.RemoteIpAddress;

            return RateLimitPartition.GetTokenBucketLimiter
            (remoteIpAddress!, _ =>
                new TokenBucketRateLimiterOptions
                    TokenLimit = 3,
                    QueueProcessingOrder = QueueProcessingOrder.NewestFirst,
                    QueueLimit = 0,
                    ReplenishmentPeriod = TimeSpan.FromSeconds(5),
                    TokensPerPeriod = 3,
                    AutoReplenishment = true



In the setup above, I’m using one of the handful of different types of rate limiters that we get with this library — the PartitionedRateLimiter. This type of rate limiting allows you to partition or segment your traffic by one or more rules. Since you have access to the HttpContext, you have virtually unlimited options on how you can carry out this segmenting.

  • By IP Address of the requestor, as I’ve done in my example above.
  • By a User Id or some other authentication information
  • By geolocation
  • Etc.

In my example above, I’m stating that every unique IP address must be treated as a different traffic segment. Next, for each segment, I’ve implemented a token system by utilizing the GetTokenBucketLimiter method. In this approach, each segment gets a set of tokens that they can use for consuming your API. Once those tokens are expended within a set time period, they will be denied further use until that time period elapses and they are replenished with more tokens. In such cases, a 429 (too many requests) status code is returned.

In my example, I’ve set a maximum of three tokens per the replenishment period of five seconds, and three tokens restored automatically at the end of each replenishment period.

I’m effectively bypassing the queuing system to keep my example simple. However, the queuing system allows you to queue up any excess requests that fell outside of the thresholds that you have set so that they can be processed later, rather than immediately returning an error status code.

With this in place, run your API, fire off requests and see how the application reacts.

postman tool showing the execution of API endpoints where successful responses are provided until tokens are expended and 429 status codes are being returned

Above you’ll see that my requests are being throttled resulting in 429 errors as I try to access the API when my tokens are already used up. But you’ll also see the auto-replenishment in action where results are then successfully returned as more tokens are replenished into the bucket.

This package is quite powerful with more than just partitioning and token bucket based rate limiting.

Concurrency Rate Limiting: This type allows you to specify the number of requests that your API can process simultaneously.

Fixed Window Rate Limiting: This algorithm limits the number of requests within a fixed time window. When the time window expires, a new time window starts and the request limit is reset.

Sliding Window Rate Limiting: This is similar to the fixed window but adds segments per window. The window slides one segment each segment interval. Requests taken from the expired time segment one window back are added to the current segment.


The Microsoft.AspNetCore.RateLimiting middleware in dotnet offers a flexible and robust solution for implementing rate limiting in your ASP.NET applications. Whether you need a fixed window, sliding window, token bucket, or concurrency rate limiter, this middleware has got you covered. Remember, it’s crucial to test your rate limiting configuration under load before deploying to ensure it behaves as expected and provides the protection you need. With these tools at your disposal, you can maintain the integrity of your server and provide a reliable service to your users.

Leave a Comment

Your email address will not be published. Required fields are marked *