Models

General models

  • Claude-3.7 Sonnet
  • Claude-3.5 Sonnet
  • Qwen-3 Coder 480B
  • Gemini 2.5 Pro
  • OpenAI-o3
  • Grok-4
  • GPT-4.1
  • GPT-5-mini
  • OpenAI-o4-mini
  • OpenAI-o3-mini
  • Deepseek v3
  • Deepseek r1

Frontier models

  • GPT-5
  • Claude-4 Sonnet (1M Context Window)

Research models

  • Claude-4 Opus
  • Claude-4.1 Opus
  • OpenAI-o3-pro

How do rate limits work?

Limits reset at the end of each billing cycle, and you can view your usage here.
  • Free tier limits are restrictive, and not designed for everyday use
  • Business plans have ~2.5x the limit for research and frontier models compared to Developer plans.
  • Max plans have ~8x the limit for research and frontier models compared to Developer plans.
  • Credit overages will be adjusted based on pooled usage regularly, and you’ll be able to use at least your subscription amount with a generous amount more of usage.
This allows power users to still get a ton of usage from Firebender, without penalizing lower throughput users who are thoughtful about context management.

Background Agent

Background agent limits are more strict, in that they do not allow credit overages because it was designed to scale horizontally and automate large amounts of code changes.
  • Free tier limits are restrictive, and not designed for everyday use
  • Business plans have ~2x the limit for background agents than Developer plans.
  • Max plans have ~5x the limit for background agents than Developer plans.
This makes easy for engineers to reasonably scale out background agents for massively parallel workflows.

What if I hit a limit?

You’ll be notified explicitly that a rate limit is hit and when the rate limit will reset for that model. You can:
  • Use another model
  • Wait for the rate limit to reset
  • Upgrade to a higher tier
The next best model will be used automatically (e.g Opus converts to Sonnet) to avoid disruption, based on the given context, overall acceptance rates for each model, and speed.