vibecode.wiki
RU EN
~/wiki / integratsii-i-api / rate-limit-i-retry-bazovaya-skhema

Rate Limit and Retry: A Basic Scheme for Reliable Integrations

◷ 7 min read 3/4/2026

Next step

Open the bot or continue inside this section.

$ cd section/ $ open @mmorecil_bot

Article -> plan in AI

Paste this article URL into any AI and get an implementation plan for your project.

Read this article: https://vibecode.morecil.ru/en/integratsii-i-api/rate-limit-i-retry-bazovaya-skhema/ Work in my current project context. Create an implementation plan for this stack: 1) what to change 2) which files to edit 3) risks and typical mistakes 4) how to verify everything works If there are options, provide "quick" and "production-ready".
How to use
  1. Copy this prompt and send it to your AI chat.
  2. Attach your project or open the repository folder in the AI tool.
  3. Ask for file-level changes, risks, and a quick verification checklist.

In any integration, sooner or later, the same problem arises. You send a request to the API and it does not pass. Sometimes the server responds with an error, sometimes the network crashes, sometimes the API just says, "too many requests.".

In the logs, it looks something like this:

code
429 Too Many Requests

or

code
500 Internal Server Error

If the system fails to respond properly to such situations, integration becomes unstable

  • drop off
  • data not synchronized
  • events are lost
  • the system starts repeating requests endlessly

To prevent this from happening, any serious integration uses two basic mechanisms:

rate limit and retry.

The first is responsible for controlling the speed of requests, the second is responsible for repeating attempts when errors occur.

If implemented correctly, integration becomes sustainable even with network failures and high loads.


What is the rate limit

Rate limit is a limit on the number of requests that can be sent to the API in a given period of time.

Almost all public APIs use such restrictions.

Examples:

  • GitHub API: 5,000 requests per hour
  • Stripe – approximately 100 requests per second
  • many SaaS – 60 requests per minute

When the limit is exceeded, the server returns the answer:

code
429 Too Many Requests

Sometimes with an additional title:

code
Retry-After: 10

This means that a new request can only be sent in **10 seconds.

Such restrictions are necessary to protect infrastructure and distribute resources fairly among customers.


Why APIs restrict requests

At first glance, it may seem that the rate limit is just an inconvenience for developers. But there are important reasons for this mechanism.

The first reason is server protection. If one client sends thousands of requests per second, it can overload the system.

The second reason is the equal use of resources. Restrictions ensure that one user does not take up all the power of the service.

The third reason is security. Rate limit helps protect against brute force and other attacks.


What happens without a control rate limit

Imagine a simple integration that sends data to an API.

The code might look like this:

javascript
for (const event of events) {
  await api.send(event)
}

If there are few events, everything works fine. But if there are thousands of them, the system starts sending a huge number of requests.

At some point, the API begins to respond with an error:

code
429 Too Many Requests

If the code doesn’t know how to handle these responses correctly, integration gets worse

  • requests fall
  • system overloads the API
  • part of the data is lost

Therefore, it is important to control the speed of sending requests.


What's retry

Retry is a repeated attempt to execute a request after an error.

The point of retry is that many errors are **temporary.

For example:

  • server overloaded
  • balancer returned the error
  • network cut off for a second

In such cases, repeated requests are often successful.

Typical errors in which retry is used:

  • 500 Internal Server ErrorXX
  • 502 Bad GatewayXX
  • 503 Service UnavailableXX
  • timeoutXX

In all these situations, trying again makes sense.


When retry is not done

Some errors mean that the request will never be successful until the data changes.

For example:

  • 400 Bad RequestXX
  • 401 UnauthorizedXX
  • 403 ForbiddenXX
  • 404 Not FoundXX

If you repeat such requests, the system will simply create an extra load.

Therefore, retry should only apply to **temporary errors.


The problem of naive retry

The simplest implementation of retry is as follows:

javascript
try {
  await api.request()
} catch (e) {
  await api.request()
}

But this approach can lead to a serious problem.

If the server is already overloaded, instant repeated requests will only increase the load. As a result, the system can get into a state where thousands of customers start repeating requests at the same time.

This is called the **retry storm, the storm of repeated requests.

To avoid this, a more accurate algorithm is used.


Exponential backoff

One of the most popular retry algorithms is **exponential backoff.

His idea is very simple: each attempt is carried out with increasing delay.

For example:

1 try at once 2 attempts - in 1 second 3 try - in 2 seconds 4 attempts in 4 seconds 5 attempts in 8 seconds

This gives the server time to recover and dramatically reduces the load.


Example of retry in JavaScript

Simple implementation of retry with exponential backoff:

javascript
async function requestWithRetry(fn, retries = 5) {
  for (let attempt = 0; attempt < retries; attempt++) {
    try {
      return await fn()
    } catch (error) {

      if (attempt === retries - 1) {
        throw error
      }

      const delay = 2 ** attempt * 1000

      await new Promise(resolve => setTimeout(resolve, delay))
    }
  }
}

Use:

javascript
await requestWithRetry(() => fetch("https://api.example.com"))

If the request ends in an error, the function will automatically try again.


Limiting the speed of requests

In addition to retry, you often need to control the speed of sending requests.

The simplest scheme is to use a queue.

First, tasks are queued, then the worker sends them to the API at a controlled rate.

The simplest limiter can look like this:

javascript
class RateLimiter {
  constructor(limit, interval) {
    this.limit = limit
    this.interval = interval
    this.queue = []
    this.active = 0
  }

  async schedule(task) {
    return new Promise(resolve => {
      this.queue.push({ task, resolve })
      this.run()
    })
  }

  run() {
    if (this.active >= this.limit || this.queue.length === 0) return

    const { task, resolve } = this.queue.shift()
    this.active++

    task().then(result => {
      resolve(result)

      setTimeout(() => {
        this.active--
        this.run()
      }, this.interval)
    })
  }
}

This limiter allows you to send, for example, ** no more than 5 requests per second **.


What it looks like in real architecture

In production systems, the scheme usually looks like this:

  1. events fall in line
  2. worker takes the task
  3. rate limiter controls the speed
  4. aPI request is sent
  5. in case of error, retry is activated

This architecture allows the system to:

  • not exceed the API limits
  • correctly handle temporary errors
  • do not lose data during network failures.

Outcome

Rate limit and retry are the foundation of any reliable integration.

Rate limit controls the speed of requests and protects the API from overload. *Retry helps the system automatically recover from temporary errors.

Even the simple implementation of these mechanisms significantly improves the stability of integrations and prevents data loss when working with external services.