2026 AI API Cost Benchmark

Last reviewed: June 17, 2026

Quick Answer

Listed API prices are only a starting point. Real monthly budgets also depend on input-to-output token ratios, repeated conversation context, coding-agent tool results, cache behavior, retry rates, failed or moderated media jobs and user regeneration choices. This calculator separates the official per-unit rate from an operational planning budget that accounts for those multipliers.

Estimate LLM, coding-agent, image and video API costs before you ship.

Prices sourced from official provider pricing pages as of June 17, 2026. All figures are pricing snapshots — verify before production use.

Quick Estimate Calculator

These are editable planning scenarios, not universal usage averages. Fill in your own numbers or load one of the presets below.

Presets:
Number of users who make at least one API request per month.
Average days each active user makes a request.
Average API requests per active user per active day.
Prompt tokens, including conversation context.
Completion tokens per response.
Check the provider pricing page for current rate.
Check the provider pricing page for current rate.
Estimated percentage of requests that are retried.
Extra percentage added to cover retries, context growth and unexpected load.
Enter values above to see your estimated monthly cost.

Why Listed API Price Is Not the Final Product Cost

A provider's per-token, per-image or per-second price is the base unit, but real product costs diverge from it through several compounding factors:

  • Input-to-output ratio — Prompt-heavy applications with short responses cost more per useful output than balanced request-response flows.
  • Repeated conversation context — Multi-turn conversations re-send prior turns as context. The same 10-turn conversation can cost 5–10× a single-turn exchange.
  • Coding-agent tool results — Each tool call returns data that is injected into the next model call as context. A single coding task can accumulate thousands of tool-result tokens.
  • Cache write and read — Some providers charge for tokens stored as cache and credit cache reads at a significant discount. Using cache effectively can reduce input costs by 50–90%.
  • Retries — Each retry doubles the cost of the original operation. Unhandled errors, timeouts and rate-limit backoff can multiply spend unexpectedly.
  • Duplicate or overlapped jobs — A client timeout followed by a new submission creates two billed jobs instead of one.
  • Failed or moderated media tasks — Some providers bill job creation regardless of outcome. Moderation blocks, invalid inputs and queue timeouts all create billable events.
  • User rejection and regeneration — If users reject outputs and regenerate, each attempt is a new API call. Image and video workflows are particularly sensitive to this.
  • Gateway and marketplace routing — OpenRouter and similar marketplaces add routing overhead and may charge differently from direct provider APIs.
  • Billing reconciliation lag — Dashboard usage may not reflect billed amounts until hours after the request, making real-time cost monitoring difficult.

This calculator separates the official unit rate from a planning budget that multiplies the base cost by retry rate and a buffer. Always cross-reference with your actual organization usage data before committing a budget.

LLM API Cost

LLM API cost is driven by tokens. Every request sends input tokens (prompt + conversation context) and receives output tokens (completion). Prices are quoted per million tokens (M tokens).

monthly_requests =
  monthly_active_users
  × active_days_per_month
  × requests_per_user_per_day

input_cost =
  monthly_requests
  × average_input_tokens
  ÷ 1,000,000
  × input_price_per_million

output_cost =
  monthly_requests
  × average_output_tokens
  ÷ 1,000,000
  × output_price_per_million

base_cost = input_cost + output_cost

retry_adjusted_cost =
  base_cost × (1 + retry_rate_as_decimal)

planned_budget =
  retry_adjusted_cost × (1 + buffer_as_decimal)

Coding-Agent Cost

Coding agents cost more than simple chat requests because each completed task involves multiple model calls, large context windows and tool-result token accumulation.

Tool calls themselves are not free operations — they generate output tokens and inject result data into subsequent calls, increasing the input token count of every following request. Specific costs depend on the model, tool definitions, context window size, cache usage and provider billing rules. Use your actual usage records to calibrate these estimates.

monthly_model_calls =
  tasks_per_month
  × calls_per_task

total_input_tokens_per_call =
  prompt_tokens + tool_result_tokens

base_call_cost =
  (prompt_tokens + tool_result_tokens) ÷ 1,000,000 × input_price
  + output_tokens ÷ 1,000,000 × output_price

monthly_base_cost =
  monthly_model_calls × base_call_cost

effective_input_cost =
  (1 - cache_read_pct) × full_price
  + cache_read_pct × cache_discount_price

planned_budget =
  monthly_base_cost × (1 + retry_rate) × (1 + buffer)

Image Generation Cost

Image generation is usually billed per accepted output or per submitted job. The effective cost per accepted image is higher than the listed per-generation price because rejected or re-generated attempts also consume budget.

submitted_jobs =
  accepted_images × average_attempts_per_accepted

effective_cost_per_accepted =
  listed_price × average_attempts_per_accepted

monthly_generation_cost =
  submitted_jobs × listed_price

effective_output_cost =
  monthly_generation_cost ÷ accepted_images

planned_budget =
  submitted_jobs × listed_price × (1 + buffer)

Do not assume all failed or retried image generation jobs are billed or free. Inspect the provider's failure policy and check the billing dashboard before assuming a uniform treatment.

Video Generation Cost

Video generation is typically billed by generated duration (seconds) or per job. Async job lifecycle — creation, polling, completion, failure and webhook delivery — means that timeouts, retries and moderation blocks can all create billable events.

submitted_jobs =
  accepted_videos × attempts_per_accepted

total_generated_seconds =
  submitted_jobs × seconds_per_job

base_spend =
  (billing_mode == per-second)
    ? total_generated_seconds × price_per_second
    : submitted_jobs × price_per_job

planned_budget =
  base_spend × (1 + buffer)

Benchmark Methodology

Each data point on this page is classified as one of:

  • Official fact — Directly quoted from the provider's current official documentation.
  • Pricing snapshot — Verified from the official pricing page on the review date. Prices may have changed since then.
  • Editable assumption — Default values in the calculator are illustrative planning estimates, not provider guarantees.
  • Derived calculation — Results computed from the formulas above. Not audited by any provider.
  • Provider-specific policy — Billing behavior that varies by provider, model and account tier.

Benchmark Data

All prices are pricing snapshots reviewed on June 17, 2026 from official provider pages. Prices change — always verify against the live provider pricing page before production use. Video generation models are deprecated or scheduled for shutdown; check current availability.

CategoryProviderModel / ServiceUnitInputOutputNotes
LLM Anthropic Claude 3.5 Sonnet 4 per 1M tokens $3.00 $15.00
LLM Anthropic Claude 3.5 Sonnet 4 (cached) per 1M tokens $0.30 Cache read
LLM Anthropic Claude Sonnet 4 per 1M tokens $1.50 $6.00
LLM Anthropic Claude Sonnet 4 (cached) per 1M tokens $0.15 Cache read
LLM Anthropic Claude Opus 4 per 1M tokens $15.00 $75.00
LLM OpenAI gpt-4o per 1M tokens $2.50 $10.00
LLM OpenAI gpt-4o-mini per 1M tokens $0.15 $0.60
LLM OpenAI o3 per 1M tokens $15.00 $60.00 Reasoning model
LLM OpenAI o4-mini per 1M tokens $1.10 $14.00 Compact reasoning
LLM OpenAI gpt-4.1 per 1M tokens $2.00 $8.00
LLM Google gemini-2.5-flash per 1M tokens $0.30 $1.20
LLM Google gemini-2.0-flash per 1M tokens $0.10 $0.40
Image OpenAI gpt-image-2 per image Price varies by quality, size, count
Image Runway gen4_image (720p) per image $0.05 5 credits at $0.01/credit
Image Runway gen4_image_turbo per image $0.02 2 credits at $0.01/credit
Video Runway gen4.5 per second $0.12 12 credits/sec at $0.01/credit
Video Runway gen4_turbo per second $0.05 5 credits/sec
Video Runway seedance2_fast (480p/720p) per second $0.29 29 credits/sec
Video Runway aleph2 per second $0.28 28 credits/sec; 56-credit minimum
Video OpenAI sora-2 (720p) per second $0.10 DEPRECATED — shutdown 2026-09-24
Video OpenAI sora-2-pro (720p) per second $0.30 DEPRECATED — shutdown 2026-09-24
Video OpenAI sora-2-pro (1080p) per second $0.70 DEPRECATED — shutdown 2026-09-24
Video OpenAI sora-2 (720p batch) per second $0.05 DEPRECATED — shutdown 2026-09-24
LLM OpenRouter Various per 1M tokens varies varies Prices vary by model and provider. Check openrouter.ai/docs/models

Download CSV

Export the benchmark data as a CSV file for use in your own cost model or spreadsheet. The file includes category, provider, model, billing unit, prices and source URLs.

ai-api-cost-benchmark-2026-06.csv

Snapshot as of June 17, 2026. Prices may have changed. Columns: category, provider, model_or_service, billing_unit, input_price_per_million, cached_input_price_per_million, output_price_per_million, price_per_image, price_per_second, currency, source_url, source_checked_at, notes.

Official Sources

The benchmark data on this page was verified against official provider documentation as of June 17, 2026. Pricing, model availability and API terms may have changed since then. Verify against the live documentation before making integration decisions.

  • Anthropic API Docs: Pricing — Claude model rates and prompt caching
  • OpenAI API Docs: Pricing — GPT-4o, o3, o4-mini, gpt-image-2, Sora 2
  • Google Gemini API: Pricing — Gemini model rates
  • OpenRouter Docs: Models — Marketplace routing and per-model rates
  • Runway Developer API: Pricing — Video and image generation credits

Related Guides

AI Summary

This benchmark page provides an interactive calculator for LLM, coding-agent, image and video generation workloads. It separates the official per-unit rate from a planning budget that accounts for retries, context growth, tool-result token accumulation, failed jobs and user regeneration. Prices reviewed on June 17, 2026 from official provider documentation. Sora 2 and Runway deprecated models are flagged. The CSV export enables cost modeling in external tools. Estimates must be reconciled with actual provider usage records. This is an educational planning tool and is not affiliated with any API provider. AICostPlanner is an independent educational cost-planning site.

Frequently Asked Questions

How do I estimate monthly AI API cost?

Start with the official per-unit price (per token, per image, per second). Then multiply by your monthly volume. Finally, add a buffer for retries, failed jobs and unexpected load — typically 15–30% for LLM workloads and 20–40% for media generation. The calculator above applies this automatically based on the values you enter.

Why does a coding agent cost more than a single chat request?

A coding agent typically makes multiple model calls per task, each with a large input context that includes conversation history, tool definitions and tool-result data. A task that generates 10 model calls with 20,000 input tokens each consumes roughly 50× more than a single 500-token chat request. Cache read discounts can partially offset this, but the effect varies by provider and task type.

Should retries be included in an API budget?

Yes. Each retry doubles the cost of the original operation. If your retry rate is 3%, your budget should be multiplied by 1.03. If retries are 20%, multiply by 1.20. The calculator adds this adjustment automatically. Track your actual retry rate from logs rather than guessing.

Are failed image or video jobs always charged?

Do not assume. Billing for failed jobs depends on the provider, the specific error type and whether the job was created or partially processed. For Runway, safety failures are not refunded; for OpenAI Sora, consult the current error documentation. Always inspect the job ID and compare against the organization usage dashboard before assuming a universal policy.

How often is the benchmark updated?

The benchmark reflects official provider pricing as of the review date shown on the page (June 17, 2026). Prices change frequently — check the official provider pricing pages directly before making production budget commitments. The CSV file is named with the snapshot date so you can track which version you are using.

Can I use the CSV in my own spreadsheet?

Yes. The CSV file includes columns for category, provider, model, billing unit, input price, output price, source URL and review date. You can import it into any spreadsheet tool, filter by category or provider, and combine it with your own volume estimates. Remember that prices may have changed since the snapshot date.

Does this calculator replace provider billing data?

No. The calculator produces planning estimates. Actual billed amounts depend on real token counts, cache behavior, failure handling and provider-specific billing policies. Always reconcile your estimates against the provider's usage dashboard and actual invoice before committing a production budget.

Compare available model pricing

Use a small prepaid test to verify actual cost before scaling a production workflow.