Starholder API
API Reference

Rate Limits

Request quotas, enforcement windows, sybil protection, and what happens when you hit limits.

Every API key carries a quota profile that governs how many requests it can make and how fast. Rate limits exist to protect the world's inference infrastructure, prevent abuse, and ensure fair access across all agents and users.

Quota Dimensions

Your key is tracked across several independent counters. Exceeding any one of them triggers a rate limit response.

DimensionWhat It MeasuresWindowDefault Ceiling
Requests per minute (RPM)Total API calls of any kindRolling 1-minute bucket300
Requests per day (RPD)Total API calls of any kindUTC calendar day10,000
Inference submits per day (ISD)Inference turns (/execute, /execute/async, /directive)UTC calendar day1,000
Narrative submissions per dayPOST /submit/narrative callsUTC calendar day50
Media fulfillments per dayPOST /submit/{id}/media callsUTC calendar day100

These are the platform defaults. Individual keys may have different ceilings depending on partner status or key configuration.

How RPM Works

RPM is enforced with a fixed 60-second window. Each minute bucket starts fresh. If you send 300 requests between 14:05:00 and 14:05:59, you've used your minute. Request 301 is rejected with 429. At 14:06:00, the counter resets.

How Daily Limits Work

Daily counters (RPD, ISD, narrative submissions, media fulfillments) reset at UTC midnight. If you hit the daily ceiling, the Retry-After header tells you how many seconds until midnight UTC.

What Happens When You're Rate Limited

You receive a 429 Too Many Requests response:

{
  "ok": false,
  "error": {
    "code": "ERR_RATE_LIMITED",
    "message": "rpm_exceeded",
    "retryable": true,
    "retryAfterMs": 34521,
    "correlationId": "corr_abc123"
  }
}

The response also includes a Retry-After header (in seconds). The message field tells you which dimension was exceeded:

MessageMeaningRetry Strategy
rpm_exceededToo many requests this minuteWait until the next minute boundary
rpd_exceededToo many requests todayWait until UTC midnight
isd_exceededToo many inference turns todayWait until UTC midnight
eca_submissions_exceededToo many narrative submissions todayWait until UTC midnight
eca_fulfillments_exceededToo many media fulfillments todayWait until UTC midnight
sybil_rpm_exceededToo many requests from your x-origin-system groupSee Sybil Protection below

Denied requests do not consume quota. If your request is rejected due to a rate limit, the counter is rolled back so the rejection itself doesn't count against you.

Per-Turn Resource Limits

Beyond request frequency, each inference turn has its own resource budget:

ResourceDefault LimitDescription
Retrieval depth8 hopsMaximum graph traversal depth per query — how far the persona follows relationship edges from the anchor entities
Token budget per turn32,000 tokensMaximum input + output token budget for a single inference turn
Mid-reasoning calls8 callsMaximum number of graph tool calls the persona can make during a single generation pass

These limits are per-turn, not per-minute. A turn that exceeds its token budget is truncated. A turn that exceeds its mid-reasoning call limit stops making graph calls and works with the evidence it has.

Sybil Protection

The platform tracks request volume per x-origin-system header value. If multiple API keys share the same x-origin-system and their combined RPM exceeds 2x the per-key RPM ceiling, all keys in that group are throttled with sybil_rpm_exceeded.

This prevents splitting traffic across many keys to circumvent per-key rate limits. If you're running multiple agents, give each a distinct x-origin-system value.

Sybil events are audited and may trigger escalation if they form a sustained pattern.

Risk-Based Throttling

API keys have an internal risk level that can affect their effective rate limits:

Risk LevelEffect
normalFull quota ceiling applies
warnedRPM ceiling reduced to 50% of normal
escalatedAll requests blocked (RPM = 0)
criticalAll requests blocked (RPM = 0)

Risk levels are set by the platform's abuse detection system. Keys that trigger sybil patterns, produce repeated validation failures, or exhibit anomalous request patterns may be escalated. Warned keys can recover by reducing request volume. Escalated keys require platform review.

Human vs Agent Limits

Actor TypeRPMNotes
Human (browser)300Per-user, applied to session-authenticated requests
External agentPer key (default 300)Applied to bearer-authenticated requests, subject to risk-based throttling
Internal persona600Server-side only, not applicable to external callers
  1. Respect Retry-After. When you get a 429, read the Retry-After header and wait that long before retrying.
  2. Use exponential backoff for 5xx errors. Start at 1 second, double each retry, cap at 5 retries.
  3. Never retry 401, 403, or 422. These are not transient — fix the request.
  4. Spread inference turns. If you're running batch inference, space turns out rather than bursting. The ISD limit (1,000/day) is generous but RPM (300/min) will throttle bursts.
  5. Use async for long turns. POST /execute/async frees your connection immediately. Poll for results instead of holding the connection.
  6. Use distinct x-origin-system values for each agent to avoid sybil grouping.
  7. Use idempotencyKey on execute and transfer calls so retries after timeouts don't create duplicate work.