Pricing Guide · 6 min read · Updated July 2026

How to Think About LLM Pricing

LLM pricing looks simple on the surface — dollars per million tokens — but the real cost of using AI for coding depends on factors most developers overlook. This guide explains how pricing actually works and how to estimate what you'll really pay.

The Basics: Tokens, Input, and Output

Every LLM provider charges based on tokens — roughly 0.75 words per token in English, or about 3–4 characters. Code can be more variable. You're charged separately for:

Input tokens: Everything you send to the model — your prompt, the code context, conversation history, system instructions.
Output tokens: Everything the model generates — code completions, explanations, function calls.

Most providers charge 3–5x more for output than input, because generation is more computationally expensive. A typical coding interaction might have 5,000 input tokens (your code + prompt) and 2,000 output tokens (the model's response).

Current Pricing Landscape (July 2026)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Relative Cost vs Cheapest
Gemini 2.5 Flash	~$0.15	~$0.60	1x (baseline)
DeepSeek-V3	~$0.27	~$0.40	~1.5x
Yi-Lightning	~$0.14	~$0.43	~1x
Claude Haiku 4.5	$0.80	$4.00	~6x
GPT-5 Mini	~$1.50	~$6.00	~10x
Claude Sonnet 5	$3.00	$15.00	~20x
Qwen3-Coder	~$0.50	~$2.00	~3x
GPT-5	~$15.00	~$60.00	~100x
Claude Opus 4.8	$15.00	$75.00	~110x

Prices are approximate and subject to change. Check official provider pricing pages for current rates.

What Actually Drives Your Costs

1. Context Window Utilization

The biggest hidden cost driver is how much context you send with each request. A 200K token context window is powerful, but if you're sending 50K tokens of codebase context with every query, your input costs multiply fast:

Light context (2K input per query): $0.006/query on Sonnet 5
Medium context (10K input per query): $0.03/query on Sonnet 5
Full-repo context (100K input per query): $0.30/query on Sonnet 5

At 500 queries per day with full-repo context, that's $150/day — or $4,500/month. The same usage with targeted 5K context drops to ~$225/month.

2. Prompt Caching

Both Anthropic and OpenAI support prompt caching, which can reduce input costs by up to 90% for repeated context. If you frequently send the same codebase context or system instructions, caching is essential:

Claude: Cache read costs $0.30/M tokens (vs $3.00 for base input on Sonnet)
GPT: Cached input is 50% cheaper than uncached

💡 Key insight: Prompt caching is the single biggest cost lever for AI coding. If you're not using it and you send similar context repeatedly, you're likely paying 5–10x more than necessary.

3. Output Length Control

Longer outputs cost more. Most providers charge 3–5x for output vs input. Strategies to control output costs:

Set reasonable max_tokens limits based on your actual needs
Ask for concise responses when detailed explanations aren't needed
Use models with efficient tokenization for code (Claude and GPT are both good here)

4. Usage Patterns

Your actual cost depends heavily on how you use AI:

Autocomplete-style: Many small requests (100–500 tokens each), high volume. Use a cheap, fast model.
Chat-style: Conversational debugging with longer context. Mid-tier models work well.
Deep analysis: Few requests but very large context (full files/repos). Premium models justified for accuracy.
Batch processing: Documentation, tests, migrations. Use the cheapest capable model — DeepSeek-V3 or Gemini Flash.

Real Monthly Cost Estimates

Based on a developer making ~500 AI-assisted coding interactions per day (typical for heavy AI tool users):

Stack	Monthly Input Cost	Monthly Output Cost	Total (approx.)
DeepSeek-V3 only	~$2.00	~$3.00	~$5/mo
Gemini 2.5 Flash only	~$1.10	~$4.50	~$6/mo
GPT-5 Mini only	~$11.00	~$45.00	~$56/mo
Claude Sonnet 5 only	~$22.50	~$112.50	~$135/mo
Sonnet 5 + DeepSeek mix (70/30)	~$16.00	~$80.00	~$96/mo
Sonnet 5 + caching	~$6.00	~$112.50	~$119/mo

Assumptions: 500 queries/day, avg 5K input + 2K output per query, 22 working days/month. Caching assumes 70% cache hit rate on input. Mix assumes 70% of queries on cheaper model.

Practical Cost-Saving Strategies

Route by task complexity: Send simple completions to DeepSeek-V3 or Haiku, complex reasoning to Sonnet or Opus. A 70/30 split can cut costs by 40–60%.
Enable prompt caching: If you use Claude or GPT via API, structure your prompts so repeated context (system instructions, codebase context) is cacheable.
Trim context: Don't send your entire codebase when the relevant file is 500 lines. Be selective about what context the model actually needs.
Use free tiers: Gemini 2.5 Flash has a generous free tier. GitHub Copilot includes a certain number of completions in the subscription.
Monitor usage: Set up usage alerts. It's easy to accidentally leave a long-context session running and rack up unexpected costs.
Consider subscriptions: For heavy users, flat-rate subscriptions (Copilot, Claude Code) may be cheaper than per-token API pricing — do the math for your usage level.

How to Think About LLM Pricing

The Basics: Tokens, Input, and Output

Current Pricing Landscape (July 2026)

What Actually Drives Your Costs

1. Context Window Utilization

2. Prompt Caching

3. Output Length Control

4. Usage Patterns

Real Monthly Cost Estimates

Practical Cost-Saving Strategies

Related Reading