2026 AI API Cost Benchmark
Quick Answer
Listed API prices are only a starting point. Real monthly budgets also depend on input-to-output token ratios, repeated conversation context, coding-agent tool results, cache behavior, retry rates, failed or moderated media jobs and user regeneration choices. This calculator separates the official per-unit rate from an operational planning budget that accounts for those multipliers.
Estimate LLM, coding-agent, image and video API costs before you ship.
Prices sourced from official provider pricing pages as of June 17, 2026. All figures are pricing snapshots — verify before production use.
Quick Estimate Calculator
These are editable planning scenarios, not universal usage averages. Fill in your own numbers or load one of the presets below.
Failed-job billing varies by provider. Verify the provider policy and reconcile the request ID with the billing dashboard. Do not assume all failed or retried jobs were billed.
Failed-job billing varies by provider. Verify the provider policy and reconcile the job ID with the billing dashboard. Do not assume all failed or retried jobs were billed.
Why Listed API Price Is Not the Final Product Cost
A provider's per-token, per-image or per-second price is the base unit, but real product costs diverge from it through several compounding factors:
- Input-to-output ratio — Prompt-heavy applications with short responses cost more per useful output than balanced request-response flows.
- Repeated conversation context — Multi-turn conversations re-send prior turns as context. The same 10-turn conversation can cost 5–10× a single-turn exchange.
- Coding-agent tool results — Each tool call returns data that is injected into the next model call as context. A single coding task can accumulate thousands of tool-result tokens.
- Cache write and read — Some providers charge for tokens stored as cache and credit cache reads at a significant discount. Using cache effectively can reduce input costs by 50–90%.
- Retries — Each retry doubles the cost of the original operation. Unhandled errors, timeouts and rate-limit backoff can multiply spend unexpectedly.
- Duplicate or overlapped jobs — A client timeout followed by a new submission creates two billed jobs instead of one.
- Failed or moderated media tasks — Some providers bill job creation regardless of outcome. Moderation blocks, invalid inputs and queue timeouts all create billable events.
- User rejection and regeneration — If users reject outputs and regenerate, each attempt is a new API call. Image and video workflows are particularly sensitive to this.
- Gateway and marketplace routing — OpenRouter and similar marketplaces add routing overhead and may charge differently from direct provider APIs.
- Billing reconciliation lag — Dashboard usage may not reflect billed amounts until hours after the request, making real-time cost monitoring difficult.
This calculator separates the official unit rate from a planning budget that multiplies the base cost by retry rate and a buffer. Always cross-reference with your actual organization usage data before committing a budget.
LLM API Cost
LLM API cost is driven by tokens. Every request sends input tokens (prompt + conversation context) and receives output tokens (completion). Prices are quoted per million tokens (M tokens).
monthly_requests =
monthly_active_users
× active_days_per_month
× requests_per_user_per_day
input_cost =
monthly_requests
× average_input_tokens
÷ 1,000,000
× input_price_per_million
output_cost =
monthly_requests
× average_output_tokens
÷ 1,000,000
× output_price_per_million
base_cost = input_cost + output_cost
retry_adjusted_cost =
base_cost × (1 + retry_rate_as_decimal)
planned_budget =
retry_adjusted_cost × (1 + buffer_as_decimal) Coding-Agent Cost
Coding agents cost more than simple chat requests because each completed task involves multiple model calls, large context windows and tool-result token accumulation.
Tool calls themselves are not free operations — they generate output tokens and inject result data into subsequent calls, increasing the input token count of every following request. Specific costs depend on the model, tool definitions, context window size, cache usage and provider billing rules. Use your actual usage records to calibrate these estimates.
monthly_model_calls =
tasks_per_month
× calls_per_task
total_input_tokens_per_call =
prompt_tokens + tool_result_tokens
base_call_cost =
(prompt_tokens + tool_result_tokens) ÷ 1,000,000 × input_price
+ output_tokens ÷ 1,000,000 × output_price
monthly_base_cost =
monthly_model_calls × base_call_cost
effective_input_cost =
(1 - cache_read_pct) × full_price
+ cache_read_pct × cache_discount_price
planned_budget =
monthly_base_cost × (1 + retry_rate) × (1 + buffer) Image Generation Cost
Image generation is usually billed per accepted output or per submitted job. The effective cost per accepted image is higher than the listed per-generation price because rejected or re-generated attempts also consume budget.
submitted_jobs =
accepted_images × average_attempts_per_accepted
effective_cost_per_accepted =
listed_price × average_attempts_per_accepted
monthly_generation_cost =
submitted_jobs × listed_price
effective_output_cost =
monthly_generation_cost ÷ accepted_images
planned_budget =
submitted_jobs × listed_price × (1 + buffer) Do not assume all failed or retried image generation jobs are billed or free. Inspect the provider's failure policy and check the billing dashboard before assuming a uniform treatment.
Video Generation Cost
Video generation is typically billed by generated duration (seconds) or per job. Async job lifecycle — creation, polling, completion, failure and webhook delivery — means that timeouts, retries and moderation blocks can all create billable events.
submitted_jobs =
accepted_videos × attempts_per_accepted
total_generated_seconds =
submitted_jobs × seconds_per_job
base_spend =
(billing_mode == per-second)
? total_generated_seconds × price_per_second
: submitted_jobs × price_per_job
planned_budget =
base_spend × (1 + buffer) Benchmark Methodology
Each data point on this page is classified as one of:
- Official fact — Directly quoted from the provider's current official documentation.
- Pricing snapshot — Verified from the official pricing page on the review date. Prices may have changed since then.
- Editable assumption — Default values in the calculator are illustrative planning estimates, not provider guarantees.
- Derived calculation — Results computed from the formulas above. Not audited by any provider.
- Provider-specific policy — Billing behavior that varies by provider, model and account tier.
Benchmark Data
All prices are pricing snapshots reviewed on June 17, 2026 from official provider pages. Prices change — always verify against the live provider pricing page before production use. Video generation models are deprecated or scheduled for shutdown; check current availability.
| Category | Provider | Model / Service | Unit | Input | Output | Notes |
|---|---|---|---|---|---|---|
| LLM | Anthropic | Claude 3.5 Sonnet 4 | per 1M tokens | $3.00 | $15.00 | |
| LLM | Anthropic | Claude 3.5 Sonnet 4 (cached) | per 1M tokens | $0.30 | — | Cache read |
| LLM | Anthropic | Claude Sonnet 4 | per 1M tokens | $1.50 | $6.00 | |
| LLM | Anthropic | Claude Sonnet 4 (cached) | per 1M tokens | $0.15 | — | Cache read |
| LLM | Anthropic | Claude Opus 4 | per 1M tokens | $15.00 | $75.00 | |
| LLM | OpenAI | gpt-4o | per 1M tokens | $2.50 | $10.00 | |
| LLM | OpenAI | gpt-4o-mini | per 1M tokens | $0.15 | $0.60 | |
| LLM | OpenAI | o3 | per 1M tokens | $15.00 | $60.00 | Reasoning model |
| LLM | OpenAI | o4-mini | per 1M tokens | $1.10 | $14.00 | Compact reasoning |
| LLM | OpenAI | gpt-4.1 | per 1M tokens | $2.00 | $8.00 | |
| LLM | gemini-2.5-flash | per 1M tokens | $0.30 | $1.20 | ||
| LLM | gemini-2.0-flash | per 1M tokens | $0.10 | $0.40 | ||
| Image | OpenAI | gpt-image-2 | per image | — | — | Price varies by quality, size, count |
| Image | Runway | gen4_image (720p) | per image | — | $0.05 | 5 credits at $0.01/credit |
| Image | Runway | gen4_image_turbo | per image | — | $0.02 | 2 credits at $0.01/credit |
| Video | Runway | gen4.5 | per second | — | $0.12 | 12 credits/sec at $0.01/credit |
| Video | Runway | gen4_turbo | per second | — | $0.05 | 5 credits/sec |
| Video | Runway | seedance2_fast (480p/720p) | per second | — | $0.29 | 29 credits/sec |
| Video | Runway | aleph2 | per second | — | $0.28 | 28 credits/sec; 56-credit minimum |
| Video | OpenAI | sora-2 (720p) | per second | — | $0.10 | DEPRECATED — shutdown 2026-09-24 |
| Video | OpenAI | sora-2-pro (720p) | per second | — | $0.30 | DEPRECATED — shutdown 2026-09-24 |
| Video | OpenAI | sora-2-pro (1080p) | per second | — | $0.70 | DEPRECATED — shutdown 2026-09-24 |
| Video | OpenAI | sora-2 (720p batch) | per second | — | $0.05 | DEPRECATED — shutdown 2026-09-24 |
| LLM | OpenRouter | Various | per 1M tokens | varies | varies | Prices vary by model and provider. Check openrouter.ai/docs/models |
Download CSV
Export the benchmark data as a CSV file for use in your own cost model or spreadsheet. The file includes category, provider, model, billing unit, prices and source URLs.
Official Sources
The benchmark data on this page was verified against official provider documentation as of June 17, 2026. Pricing, model availability and API terms may have changed since then. Verify against the live documentation before making integration decisions.
- Anthropic API Docs: Pricing — Claude model rates and prompt caching
- OpenAI API Docs: Pricing — GPT-4o, o3, o4-mini, gpt-image-2, Sora 2
- Google Gemini API: Pricing — Gemini model rates
- OpenRouter Docs: Models — Marketplace routing and per-model rates
- Runway Developer API: Pricing — Video and image generation credits
Related Guides
Claude Code Token Cost
Learn more about this topic
OpenRouter Credits
Learn more about this topic
Image Generation API Cost
Learn more about this topic
GPT Image API Cost
Learn more about this topic
Video Generation API Cost
Learn more about this topic
Failed Generation Cost
Learn more about this topic
Billing Transparency
Learn more about this topic
Small Prepaid Test
Learn more about this topic
This benchmark page provides an interactive calculator for LLM, coding-agent, image and video generation workloads. It separates the official per-unit rate from a planning budget that accounts for retries, context growth, tool-result token accumulation, failed jobs and user regeneration. Prices reviewed on June 17, 2026 from official provider documentation. Sora 2 and Runway deprecated models are flagged. The CSV export enables cost modeling in external tools. Estimates must be reconciled with actual provider usage records. This is an educational planning tool and is not affiliated with any API provider. AICostPlanner is an independent educational cost-planning site.
Frequently Asked Questions
How do I estimate monthly AI API cost?
Start with the official per-unit price (per token, per image, per second). Then multiply by your monthly volume. Finally, add a buffer for retries, failed jobs and unexpected load — typically 15–30% for LLM workloads and 20–40% for media generation. The calculator above applies this automatically based on the values you enter.
Why does a coding agent cost more than a single chat request?
A coding agent typically makes multiple model calls per task, each with a large input context that includes conversation history, tool definitions and tool-result data. A task that generates 10 model calls with 20,000 input tokens each consumes roughly 50× more than a single 500-token chat request. Cache read discounts can partially offset this, but the effect varies by provider and task type.
Should retries be included in an API budget?
Yes. Each retry doubles the cost of the original operation. If your retry rate is 3%, your budget should be multiplied by 1.03. If retries are 20%, multiply by 1.20. The calculator adds this adjustment automatically. Track your actual retry rate from logs rather than guessing.
Are failed image or video jobs always charged?
Do not assume. Billing for failed jobs depends on the provider, the specific error type and whether the job was created or partially processed. For Runway, safety failures are not refunded; for OpenAI Sora, consult the current error documentation. Always inspect the job ID and compare against the organization usage dashboard before assuming a universal policy.
How often is the benchmark updated?
The benchmark reflects official provider pricing as of the review date shown on the page (June 17, 2026). Prices change frequently — check the official provider pricing pages directly before making production budget commitments. The CSV file is named with the snapshot date so you can track which version you are using.
Can I use the CSV in my own spreadsheet?
Yes. The CSV file includes columns for category, provider, model, billing unit, input price, output price, source URL and review date. You can import it into any spreadsheet tool, filter by category or provider, and combine it with your own volume estimates. Remember that prices may have changed since the snapshot date.
Does this calculator replace provider billing data?
No. The calculator produces planning estimates. Actual billed amounts depend on real token counts, cache behavior, failure handling and provider-specific billing policies. Always reconcile your estimates against the provider's usage dashboard and actual invoice before committing a production budget.
Compare available model pricing
Use a small prepaid test to verify actual cost before scaling a production workflow.