feat(rate-limiter): token bucket algorithm (#2270)

* fix(ratelimit): make deployed chat rate limited * improvement(rate-limiter): use token bucket algo * update docs * fix * fix type * fix db rate limiter * address greptile comments
2026-02-14 08:25:03 -05:00 · 2025-12-09 14:57:17 -08:00
parent 22abf98835
commit aea32d423f
20 changed files with 8511 additions and 658 deletions
--- a/apps/docs/content/docs/en/execution/api.mdx
+++ b/apps/docs/content/docs/en/execution/api.mdx
@@ -27,14 +27,16 @@ All API responses include information about your workflow execution limits and u
 "limits": {
  "workflowExecutionRateLimit": {
    "sync": {
-      "limit": 60,        // Max sync workflow executions per minute
-      "remaining": 58,    // Remaining sync workflow executions
-      "resetAt": "..."    // When the window resets
+      "requestsPerMinute": 60,  // Sustained rate limit per minute
+      "maxBurst": 120,          // Maximum burst capacity
+      "remaining": 118,         // Current tokens available (up to maxBurst)
+      "resetAt": "..."          // When tokens next refill
    },
    "async": {
-      "limit": 60,        // Max async workflow executions per minute
-      "remaining": 59,    // Remaining async workflow executions
-      "resetAt": "..."    // When the window resets
+      "requestsPerMinute": 200, // Sustained rate limit per minute
+      "maxBurst": 400,          // Maximum burst capacity
+      "remaining": 398,         // Current tokens available
+      "resetAt": "..."          // When tokens next refill
    }
  },
  "usage": {
@@ -46,7 +48,7 @@ All API responses include information about your workflow execution limits and u
 }
 ```

-**Note:** The rate limits in the response body are for workflow executions. The rate limits for calling this API endpoint are in the response headers (`X-RateLimit-*`).
+**Note:** Rate limits use a token bucket algorithm. `remaining` can exceed `requestsPerMinute` up to `maxBurst` when you haven't used your full allowance recently, allowing for burst traffic. The rate limits in the response body are for workflow executions. The rate limits for calling this API endpoint are in the response headers (`X-RateLimit-*`).

 ### Query Logs

@@ -108,13 +110,15 @@ Query workflow execution logs with extensive filtering options.
      "limits": {
        "workflowExecutionRateLimit": {
          "sync": {
-            "limit": 60,
-            "remaining": 58,
+            "requestsPerMinute": 60,
+            "maxBurst": 120,
+            "remaining": 118,
            "resetAt": "2025-01-01T12:35:56.789Z"
          },
          "async": {
-            "limit": 60,
-            "remaining": 59,
+            "requestsPerMinute": 200,
+            "maxBurst": 400,
+            "remaining": 398,
            "resetAt": "2025-01-01T12:35:56.789Z"
          }
        },
@@ -184,13 +188,15 @@ Retrieve detailed information about a specific log entry.
        "limits": {
          "workflowExecutionRateLimit": {
            "sync": {
-              "limit": 60,
-              "remaining": 58,
+              "requestsPerMinute": 60,
+              "maxBurst": 120,
+              "remaining": 118,
              "resetAt": "2025-01-01T12:35:56.789Z"
            },
            "async": {
-              "limit": 60,
-              "remaining": 59,
+              "requestsPerMinute": 200,
+              "maxBurst": 400,
+              "remaining": 398,
              "resetAt": "2025-01-01T12:35:56.789Z"
            }
          },
@@ -467,17 +473,25 @@ Failed webhook deliveries are retried with exponential backoff and jitter:

 ## Rate Limiting

-The API implements rate limiting to ensure fair usage:
+The API uses a **token bucket algorithm** for rate limiting, providing fair usage while allowing burst traffic:

- **Free plan**: 10 requests per minute
- **Pro plan**: 30 requests per minute
- **Team plan**: 60 requests per minute
- **Enterprise plan**: Custom limits
+| Plan | Requests/Minute | Burst Capacity |
+|------|-----------------|----------------|
+| Free | 10 | 20 |
+| Pro | 30 | 60 |
+| Team | 60 | 120 |
+| Enterprise | 120 | 240 |
+
+**How it works:**
+- Tokens refill at `requestsPerMinute` rate
+- You can accumulate up to `maxBurst` tokens when idle
+- Each request consumes 1 token
+- Burst capacity allows handling traffic spikes

 Rate limit information is included in response headers:
- `X-RateLimit-Limit`: Maximum requests per window
- `X-RateLimit-Remaining`: Requests remaining in current window
- `X-RateLimit-Reset`: ISO timestamp when the window resets
+- `X-RateLimit-Limit`: Requests per minute (refill rate)
+- `X-RateLimit-Remaining`: Current tokens available
+- `X-RateLimit-Reset`: ISO timestamp when tokens next refill

 ## Example: Polling for New Logs

--- a/apps/docs/content/docs/en/execution/costs.mdx
+++ b/apps/docs/content/docs/en/execution/costs.mdx
@@ -143,8 +143,20 @@ curl -X GET -H "X-API-Key: YOUR_API_KEY" -H "Content-Type: application/json" htt
 {
  "success": true,
  "rateLimit": {
-    "sync": { "isLimited": false, "limit": 10, "remaining": 10, "resetAt": "2025-09-08T22:51:55.999Z" },
-    "async": { "isLimited": false, "limit": 50, "remaining": 50, "resetAt": "2025-09-08T22:51:56.155Z" },
+    "sync": {
+      "isLimited": false,
+      "requestsPerMinute": 25,
+      "maxBurst": 50,
+      "remaining": 50,
+      "resetAt": "2025-09-08T22:51:55.999Z"
+    },
+    "async": {
+      "isLimited": false,
+      "requestsPerMinute": 200,
+      "maxBurst": 400,
+      "remaining": 400,
+      "resetAt": "2025-09-08T22:51:56.155Z"
+    },
    "authType": "api"
  },
  "usage": {
@@ -155,6 +167,11 @@ curl -X GET -H "X-API-Key: YOUR_API_KEY" -H "Content-Type: application/json" htt
 }
 ```

+**Rate Limit Fields:**
+- `requestsPerMinute`: Sustained rate limit (tokens refill at this rate)
+- `maxBurst`: Maximum tokens you can accumulate (burst capacity)
+- `remaining`: Current tokens available (can be up to `maxBurst`)
+
 **Response Fields:**
 - `currentPeriodCost` reflects usage in the current billing period
 - `limit` is derived from individual limits (Free/Pro) or pooled organization limits (Team/Enterprise)