Pricing Guide · 6 min read · Updated July 2026

How to Think About LLM Pricing

LLM pricing looks simple on the surface — dollars per million tokens — but the real cost of using AI for coding depends on factors most developers overlook. This guide explains how pricing actually works and how to estimate what you'll really pay.

The Basics: Tokens, Input, and Output

Every LLM provider charges based on tokens — roughly 0.75 words per token in English, or about 3–4 characters. Code can be more variable. You're charged separately for:

Most providers charge 3–5x more for output than input, because generation is more computationally expensive. A typical coding interaction might have 5,000 input tokens (your code + prompt) and 2,000 output tokens (the model's response).

Current Pricing Landscape (July 2026)

ModelInput (per 1M tokens)Output (per 1M tokens)Relative Cost vs Cheapest
Gemini 2.5 Flash~$0.15~$0.601x (baseline)
DeepSeek-V3~$0.27~$0.40~1.5x
Yi-Lightning~$0.14~$0.43~1x
Claude Haiku 4.5$0.80$4.00~6x
GPT-5 Mini~$1.50~$6.00~10x
Claude Sonnet 5$3.00$15.00~20x
Qwen3-Coder~$0.50~$2.00~3x
GPT-5~$15.00~$60.00~100x
Claude Opus 4.8$15.00$75.00~110x

Prices are approximate and subject to change. Check official provider pricing pages for current rates.

What Actually Drives Your Costs

1. Context Window Utilization

The biggest hidden cost driver is how much context you send with each request. A 200K token context window is powerful, but if you're sending 50K tokens of codebase context with every query, your input costs multiply fast:

At 500 queries per day with full-repo context, that's $150/day — or $4,500/month. The same usage with targeted 5K context drops to ~$225/month.

2. Prompt Caching

Both Anthropic and OpenAI support prompt caching, which can reduce input costs by up to 90% for repeated context. If you frequently send the same codebase context or system instructions, caching is essential:

💡 Key insight: Prompt caching is the single biggest cost lever for AI coding. If you're not using it and you send similar context repeatedly, you're likely paying 5–10x more than necessary.

3. Output Length Control

Longer outputs cost more. Most providers charge 3–5x for output vs input. Strategies to control output costs:

4. Usage Patterns

Your actual cost depends heavily on how you use AI:

Real Monthly Cost Estimates

Based on a developer making ~500 AI-assisted coding interactions per day (typical for heavy AI tool users):

StackMonthly Input CostMonthly Output CostTotal (approx.)
DeepSeek-V3 only~$2.00~$3.00~$5/mo
Gemini 2.5 Flash only~$1.10~$4.50~$6/mo
GPT-5 Mini only~$11.00~$45.00~$56/mo
Claude Sonnet 5 only~$22.50~$112.50~$135/mo
Sonnet 5 + DeepSeek mix (70/30)~$16.00~$80.00~$96/mo
Sonnet 5 + caching~$6.00~$112.50~$119/mo

Assumptions: 500 queries/day, avg 5K input + 2K output per query, 22 working days/month. Caching assumes 70% cache hit rate on input. Mix assumes 70% of queries on cheaper model.

Practical Cost-Saving Strategies

  1. Route by task complexity: Send simple completions to DeepSeek-V3 or Haiku, complex reasoning to Sonnet or Opus. A 70/30 split can cut costs by 40–60%.
  2. Enable prompt caching: If you use Claude or GPT via API, structure your prompts so repeated context (system instructions, codebase context) is cacheable.
  3. Trim context: Don't send your entire codebase when the relevant file is 500 lines. Be selective about what context the model actually needs.
  4. Use free tiers: Gemini 2.5 Flash has a generous free tier. GitHub Copilot includes a certain number of completions in the subscription.
  5. Monitor usage: Set up usage alerts. It's easy to accidentally leave a long-context session running and rack up unexpected costs.
  6. Consider subscriptions: For heavy users, flat-rate subscriptions (Copilot, Claude Code) may be cheaper than per-token API pricing — do the math for your usage level.

Related Reading