A beginners guide to Microsoft Graph API rate limiting in Intune

Rate limiting is how an API tells you to slow down. In Microsoft Graph, it shows up as HTTP 429.
Graph has both global limits and service-specific limits. Hit too many requests too fast, and you’ll trigger either one.
Throttled responses should include Retry-After (seconds to wait), but Intune endpoints frequently omit it.
Intune report exports have their own throttle. 100 requests/tenant/min with per-user and per-app sub-limits.

Introduction

You’ve just written your first PowerShell script to pull Intune reporting data. You perform a few tests in your lab and it works splendidly, like mama’s spaghetti. Then you run it against your production environment where the rest of your team are already scripting to the Intune reportiong endpoint like it’s going out of fashion tomorrow…

{"error":{"code":"TooManyRequests","message":"An error has occurred.","details":[],"innerError":{"date":"2025-10-26T17:07:09","request-id":"44be9ec1-489e-4282-8c72-a77f5b14b0cb","client-request-id":"44be9ec1-489e-4282-8c72-a77f5b14b0cb"}}}

Your script crashes. You re-run it. It crashes again. You google the error and find something called “HTTP 429” and “rate limiting.”

What just happened?

Your script didn’t break because your code was bad. It broke because you were going too fast and Microsoft Graph slammed on the brakes to protect itself (and all the other tenants sharing the same service).

What Is Rate Limiting and Why Does It Exist?

Rate limiting is how APIs protect themselves from being overwhelmed. Microsoft Graph enforces rate limits for several reasons. It prevents service degradation so that one runaway script doesn’t slow things down for everyone else. It ensures fair access so that no single tenant can hog all the resources. And it protects the backend systems, because some operations are far more expensive than others and need guardrails to avoid accidental overload.

You’re most likely, but not always, to encounter rate limiting when your scripts are inefficient. For example, pulling far too many reports at once or looping endlessly through large datasets without any pause. Sometimes it isn’t even your script why Graph might tell you to back of..the service might already be under pressure from other jobs you can’t see. Maybe someone else in your tenant is exporting all the user photos with separate Graph calls so they can put together the office Secret Santa PowerPoint. Whatever the reason, either your own inefficient scripts or Geoff wanting to improve secret santa this year, Graph steps in to slow things down before the service buckles.

What are the Intune Rate Limits?

One of the confusing parts of working with Microsoft Graph is that not every endpoint has its own published limit. Some services, like the Intune reporting endpoint, spell things out clearly.

For example, the exportJobs endpoint allows up to 100 requests per tenant per minute, 8 per user, or 48 per app. Other services don’t publish exact numbers, and in those cases you’re governed by the generic Graph throttling rules instead.

https://learn.microsoft.com/en-us/intune/intune-service/fundamentals/reports-export-graph-apis#api-throttling-conditions

Global limits apply regardless of any individual service limit. There is a global limit of 130,000 requests per 10 seconds per app across all tenants. This is your absolute ceiling across everything. The following Microsoft doc also lists some individual service limits, which is good to know if you are working with Intune data and Microsoft Graph.

https://learn.microsoft.com/en-us/graph/throttling-limits

Intune enforces two sets of limits depending on what you’re doing. For write operations like POST, PUT, DELETE, and PATCH, you’re limited to 200 requests per 20 seconds across your entire tenant, but your specific app only gets 100 requests per 20 seconds of that quota.

For all operations including reads, the limits are higher. 2,000 requests per 20 seconds tenant-wide, with 1,000 requests per 20 seconds allocated to your app. In practice, this means your script can fire off up to 1,000 requests every 20 seconds (about 50 per second), but only 100 of those can be writes. If other admins or apps are hammering Intune at the same time, you’ll be sharing that tenant-wide quota and might hit throttling sooner than expected.

Microsoft Graph & Intune Throttling Limits

Scope	Request Type	Limit	Notes
Global (All Services)	Any	130,000 per 10 seconds per app	Absolute ceiling across all Graph services
Intune exportJobs	Any	100 per tenant per minute	Documented in export API docs
Intune exportJobs	Any	8 per user per minute	Sub-limit within tenant quota
Intune exportJobs	Any	48 per app per minute	Sub-limit within tenant quota
Intune (General)	POST, PUT, DELETE, PATCH	200 per tenant per 20 seconds	Tenant-wide write limit
Intune (General)	POST, PUT, DELETE, PATCH	100 per app per tenant per 20 seconds	Your app’s write limit
Intune (General)	Any (including GET)	2,000 per tenant per 20 seconds	Tenant-wide limit for all operations
Intune (General)	Any (including GET)	1,000 per app per tenant per 20 seconds	Your app’s limit for all operations

Your requests are checked against multiple limits simultaneously. The first one you hit triggers throttling.

How Do I Know If The Rate Limit Police Found Me?

Many of us wont read the limits and will only be interested in rate limiting when the service tells us to back off. Lets take a simple script to try to force Graph to say “back off son” (in a soutern American accent).

In this example, lets try and blitz the “8 per user per minute” sub-limit quota within tenant when making requsts to the Intune Reporting Endpoint. This script fires off 20 rapid exportJobs requests to Intune with the aim of crossing the documented limit of 8 per user per minute. By doing so it should trigger a 429 Too Many Requests response, letting us capture and display the status, headers, and body to see how Graph signals throttling.

$accessToken = $token.access_token
[System.Net.ServicePointManager]::DefaultConnectionLimit = 512
$headers = @{ Authorization = "Bearer $accessToken"; "Content-Type" = "application/json" }
$uri = "https://graph.microsoft.com/beta/deviceManagement/reports/exportJobs"

$payload = @{ reportName = "Devices"; format = "json" } | ConvertTo-Json -Depth 5

1..10 | ForEach-Object {
    $r = Invoke-WebRequest -Uri $uri -Headers $headers -Method POST -Body $payload -SkipHttpErrorCheck
    $status = [int]$r.StatusCode
    Write-Host ("Req {0}: HTTP {1}" -f $_, $status)

    if ($status -eq 429) {
        Write-Host "Headers:" -ForegroundColor Yellow
        $r.Headers.GetEnumerator() | ForEach-Object { Write-Host ("  {0}: {1}" -f $_.Key, ($_.Value -join ",")) }
        Write-Host "Body:" -ForegroundColor Yellow
        Write-Host $r.Content
    }
    elseif ($status -ge 400) {
        Write-Host "Error body:" -ForegroundColor Red
        Write-Host $r.Content
    }
    else {
        Write-Host "Success body:" -ForegroundColor Green
        Write-Host $r.Content
    }
}

Even though there is a documented rate limit of 8, per user per minute, for the reporting endpoint, it seems that is quite robust and can cope with a little stress. Here we can see that we were able to make 13 succesfull requests in quick succesion before we were shown the ~~middle finger~~ http code 429 “Too Many Requests”.

Req 14: HTTP 429 Headers: Cache-Control: no-store, no-cache Transfer-Encoding: chunked Vary: Accept-Encoding Strict-Transport-Security: max-age=31536000 request-id: 68ff7060-7068-4d32-bc39-af2083d222e0 client-request-id: 68ff7060-7068-4d32-bc39-af2083d222e0 x-ms-ags-diagnostic: {"ServerInfo":{"DataCenter":"UK South","Slice":"E","Ring":"5","ScaleUnit":"007","RoleInstance":"LO1PEPF0000496F"}} Date: Sat, 08 Nov 2025 11:01:08 GMT Content-Type: application/json Content-Language: en-us Body: {"error":{"code":"TooManyRequests","message":"An error has occurred.","details":[],"innerError":{"date":"2025-11-08T11:01:09","request-id":"68ff7060-7068-4d32-bc39-af2083d222e0","client-request-id":"68ff7060-7068-4d32-bc39-af2083d222e0"}}}

Retry-After and Respecting Backoff

When Microsoft Graph tells you “TooManyRequests“, it’s not the end of the world, it’s an invitation to back-off. If you look closely at the response headers when you’re throttled, you’ll sometimes (but not always) see a Retry-After value.

That header tells you how many seconds to wait before retrying. If your script respects it, the API will happily let you continue once that window has passed. If you ignore it and keep hammering, the throttle window will extend and you might get stuck waiting even longer.

Unfortunately, not every API endpoint is polite enough to tell you how long to wait.
The Intune Reporting Endpoint and deviceManagement APIs often omit Retry-After, leaving you to guess. In those cases, Microsoft recommends exponential backoff, progressively longer delays, each time you hit a 429.

Lets try and hit a different Graph endpoint, auditLogs, which I know does give us a Retry-After value when we spam it.

$AccessToken = $token.access_token
[System.Net.ServicePointManager]::DefaultConnectionLimit = 512
$headers = @{ Authorization = "Bearer $AccessToken"; "Content-Type" = "application/json"; "ConsistencyLevel" = "eventual" }
$uri = "https://graph.microsoft.com/v1.0/auditLogs/signIns?`$top=50"

1..400 | ForEach-Object {
    $r = Invoke-WebRequest -Uri $uri -Headers $headers -Method GET -SkipHttpErrorCheck
    $status = [int]$r.StatusCode
    Write-Host ("Req {0}: HTTP {1}" -f $_, $status)

    if ($status -eq 429) {
        $rawHeaders = ($r.RawContent -split "(`r`n){2}", 2)[0]
        Write-Host "Raw headers:" -ForegroundColor Yellow
        Write-Host $rawHeaders
        break
    }
    elseif ($status -ge 400) {
        Write-Host "Error body:" -ForegroundColor Red
        Write-Host $r.Content
        break
    }
}

Even though this code is performing a basic GET request, it’s a much heavier operation than something like listing Intune apps. The auditLogs/signIns endpoint queries Entra ID’s sign-in telemetry service, which isn’t just returning static directory data, it’s reading from large, constantly-updated log stores that track every authentication event in your tenant. Each request must be evaluated against live audit data, filtered, serialized, and sanitized before being returned.

In contrast, endpoints such as deviceAppManagement/mobileApps usually pull from a smaller, well-indexed configuration database, making them far cheaper to serve and easier to cache.

The addition of ConsistencyLevel: eventual also makes the call more expensive. That header enables advanced query features and forces Graph to route the request through a consistency-checked path instead of a lightweight read cache. Combine that with a medium-ish $top value and a tight request loop, and you’re sending a stream of queries that each consume more CPU / I/O, and memory on the Microsoft service. The result is that Graph’s backend recognises the high cost per request and triggers throttling, returning a 429 along with a Retry-After header to tell you exactly how long to pause before continuing.

Retry-After: 10

When a Graph endpoint or API returns this Retry-After value in the header, we now have some control and can adjust our automation scripts to polielty back off for that period of time. When you see this, sleep for 10 seconds (or use x-ms-retry-after-ms if present), then continue. If the header is missing, exponential backoff is your safety net.

Exponential Backoff

When the Retry-After header isn’t present, like when the Intune Reporting ENdpoint throws you a 429 curve ball, you can fall back to exponential backoff. This means waiting a little longer each time you’re throttled. My approach is to start small, double your delay after each 429, and stop once you succeed or reach a sensible maximum.

$AccessToken = $token.access_token
$headers = @{ Authorization = "Bearer $AccessToken"; "Content-Type" = "application/json" }
$uri = "https://graph.microsoft.com/beta/deviceManagement/reports/exportJobs"
$payload = @{ reportName = "Devices"; format = "json" } | ConvertTo-Json -Depth 5

$delay = 5
$maxDelay = 60

do {
    $r = Invoke-WebRequest -Uri $uri -Headers $headers -Method POST -Body $payload -SkipHttpErrorCheck
    $status = [int]$r.StatusCode
    Write-Host "HTTP $status"

    if ($status -eq 429) {
        Write-Host "Throttled — waiting $delay seconds..." -ForegroundColor Yellow
        Start-Sleep -Seconds $delay
        $delay = [Math]::Min($delay * 2, $maxDelay)
    }

} while ($status -eq 429)

Write-Host "Success after throttling!" -ForegroundColor Green

In this example, when we get a 429, we back off exponentially. STarting with a 5 second delay and then doubling it with each subsequent 429 to > 10 > 20 > 40 > 60 seconds until we’re allowed through.

Summary

Rate limiting isn’t an error, it’s feedback from a well structured API. When the Microsoft Graph returns a 429, it’s asking you to pause so everyone in Microsoft candy land gets fair access to the API. The key is how you respond!

If Retry-After is present, respect it. Wait the number of seconds (or milliseconds) the server suggests before retrying.
If it isn’t, use exponential backoff, starting with a small delay and double it until your requests succeed or reach a safe maximum.

It’s important to remember that different Graph workloads behave differently. Lightweight configuration endpoints like Intune apps can handle high request volumes, while heavier services such as Entra ID audit logs or the Intune Reporting Endpoint hit resource limits much faster. Understanding those limits and building polite retry logic into your scripts turns throttling from an automation car crash into a controlled pause.

Handle throttling gracefully, and your automations will keep running smoothly. If you skip proper retry handling, you’re at the mercy of the API and whether the dev remembered to include a Retry-After value or not.