Rate Limit and Retry: A Basic Scheme for Reliable Integrations
Next step
Open the bot or continue inside this section.
Article -> plan in AI
Paste this article URL into any AI and get an implementation plan for your project.
Read this article: https://vibecode.morecil.ru/en/integratsii-i-api/rate-limit-i-retry-bazovaya-skhema/
Work in my current project context.
Create an implementation plan for this stack:
1) what to change
2) which files to edit
3) risks and typical mistakes
4) how to verify everything works
If there are options, provide "quick" and "production-ready". How to use
- Copy this prompt and send it to your AI chat.
- Attach your project or open the repository folder in the AI tool.
- Ask for file-level changes, risks, and a quick verification checklist.
In any integration, sooner or later, the same problem arises. You send a request to the API and it does not pass. Sometimes the server responds with an error, sometimes the network crashes, sometimes the API just says, "too many requests.".
In the logs, it looks something like this:
429 Too Many Requests
or
500 Internal Server Error
If the system fails to respond properly to such situations, integration becomes unstable
- drop off
- data not synchronized
- events are lost
- the system starts repeating requests endlessly
To prevent this from happening, any serious integration uses two basic mechanisms:
rate limit and retry.
The first is responsible for controlling the speed of requests, the second is responsible for repeating attempts when errors occur.
If implemented correctly, integration becomes sustainable even with network failures and high loads.
What is the rate limit
Rate limit is a limit on the number of requests that can be sent to the API in a given period of time.
Almost all public APIs use such restrictions.
Examples:
- GitHub API: 5,000 requests per hour
- Stripe – approximately 100 requests per second
- many SaaS – 60 requests per minute
When the limit is exceeded, the server returns the answer:
429 Too Many Requests
Sometimes with an additional title:
Retry-After: 10
This means that a new request can only be sent in **10 seconds.
Such restrictions are necessary to protect infrastructure and distribute resources fairly among customers.
Why APIs restrict requests
At first glance, it may seem that the rate limit is just an inconvenience for developers. But there are important reasons for this mechanism.
The first reason is server protection. If one client sends thousands of requests per second, it can overload the system.
The second reason is the equal use of resources. Restrictions ensure that one user does not take up all the power of the service.
The third reason is security. Rate limit helps protect against brute force and other attacks.
What happens without a control rate limit
Imagine a simple integration that sends data to an API.
The code might look like this:
for (const event of events) {
await api.send(event)
}
If there are few events, everything works fine. But if there are thousands of them, the system starts sending a huge number of requests.
At some point, the API begins to respond with an error:
429 Too Many Requests
If the code doesn’t know how to handle these responses correctly, integration gets worse
- requests fall
- system overloads the API
- part of the data is lost
Therefore, it is important to control the speed of sending requests.
What's retry
Retry is a repeated attempt to execute a request after an error.
The point of retry is that many errors are **temporary.
For example:
- server overloaded
- balancer returned the error
- network cut off for a second
In such cases, repeated requests are often successful.
Typical errors in which retry is used:
500 Internal Server ErrorXX502 Bad GatewayXX503 Service UnavailableXXtimeoutXX
In all these situations, trying again makes sense.
When retry is not done
Some errors mean that the request will never be successful until the data changes.
For example:
400 Bad RequestXX401 UnauthorizedXX403 ForbiddenXX404 Not FoundXX
If you repeat such requests, the system will simply create an extra load.
Therefore, retry should only apply to **temporary errors.
The problem of naive retry
The simplest implementation of retry is as follows:
try {
await api.request()
} catch (e) {
await api.request()
}
But this approach can lead to a serious problem.
If the server is already overloaded, instant repeated requests will only increase the load. As a result, the system can get into a state where thousands of customers start repeating requests at the same time.
This is called the **retry storm, the storm of repeated requests.
To avoid this, a more accurate algorithm is used.
Exponential backoff
One of the most popular retry algorithms is **exponential backoff.
His idea is very simple: each attempt is carried out with increasing delay.
For example:
1 try at once 2 attempts - in 1 second 3 try - in 2 seconds 4 attempts in 4 seconds 5 attempts in 8 seconds
This gives the server time to recover and dramatically reduces the load.
Example of retry in JavaScript
Simple implementation of retry with exponential backoff:
async function requestWithRetry(fn, retries = 5) {
for (let attempt = 0; attempt < retries; attempt++) {
try {
return await fn()
} catch (error) {
if (attempt === retries - 1) {
throw error
}
const delay = 2 ** attempt * 1000
await new Promise(resolve => setTimeout(resolve, delay))
}
}
}
Use:
await requestWithRetry(() => fetch("https://api.example.com"))
If the request ends in an error, the function will automatically try again.
Limiting the speed of requests
In addition to retry, you often need to control the speed of sending requests.
The simplest scheme is to use a queue.
First, tasks are queued, then the worker sends them to the API at a controlled rate.
The simplest limiter can look like this:
class RateLimiter {
constructor(limit, interval) {
this.limit = limit
this.interval = interval
this.queue = []
this.active = 0
}
async schedule(task) {
return new Promise(resolve => {
this.queue.push({ task, resolve })
this.run()
})
}
run() {
if (this.active >= this.limit || this.queue.length === 0) return
const { task, resolve } = this.queue.shift()
this.active++
task().then(result => {
resolve(result)
setTimeout(() => {
this.active--
this.run()
}, this.interval)
})
}
}
This limiter allows you to send, for example, ** no more than 5 requests per second **.
What it looks like in real architecture
In production systems, the scheme usually looks like this:
- events fall in line
- worker takes the task
- rate limiter controls the speed
- aPI request is sent
- in case of error, retry is activated
This architecture allows the system to:
- not exceed the API limits
- correctly handle temporary errors
- do not lose data during network failures.
Outcome
Rate limit and retry are the foundation of any reliable integration.
Rate limit controls the speed of requests and protects the API from overload. *Retry helps the system automatically recover from temporary errors.
Even the simple implementation of these mechanisms significantly improves the stability of integrations and prevents data loss when working with external services.