Rule of thumb For English: 1 token ≈ 4 characters or 0.75 words. For German, more like: 1 token ≈ 3 characters or 0.6 words. Special characters, emojis, and code generate a disproportionately high number of tokens.
0 / 10,000
02.0004.0006.0008.000
Tokens 0
Characters / Words 0 0 Words
Estimated cost  

Cost breakdown (per request)

Input
Output
Total
Turn Input Overhead Output Total turn
185092001.059
21.100152001.315
31.350212001.571
41.600272001.827
51.850332002.083
Total6.7501051.0007.855
Scenario Per conversation At 100,000 conv./month
Standard (no caching, no batch)$0.0356$3,560
With prompt caching (system prompt cached from turn 2 onward)$0.0269$2,692
With caching + Batch API (50% discount)$0.0135$1,346
What this means The default chatbot implementation (no caching, no batch, everything in real time) is the most expensive option. Anyone serious about costs builds caching in from the start — with OpenAI and Anthropic, it is a simple header setting. Batch is only relevant where response time is not critical.
Batch API
Asynchronous processing mode with a 50% discount on input and output. Responses arrive within 24 hours instead of in real time.
Cached Input
Input tokens that were already processed in an earlier call and temporarily stored by the provider. They are billed at about 10% of the normal price.
Context Window
Maximum number of tokens a model can process in a single request (input + output combined).
Input Token
A token sent to the AI — everything that appears in the prompt: system prompt, user message, chat history.
Max Output
Maximum number of tokens the model can produce in a single response. It is lower than the context window.
Output Token
A token generated by the AI in its response. Typically 4–5x more expensive than input.
Overhead Tokens
Invisible control tokens that every message in a chat request receives — typically 3 per message plus 3 for the overall request.
Prompt Caching
Mechanism by which providers temporarily store recurring prompt parts (typically system prompts) and bill them at the cached price on the next request.
System Prompt
The instruction at the beginning of an AI request that defines the model’s behavior (persona, response style, constraints). It is sent again with every request in a conversation.
Token
Smallest processing unit of a language model. Subword-based, typically 3–5 characters long. Each provider has its own tokenizer.
Tokenizer
Algorithm that breaks text into tokens. OpenAI uses tiktoken, other providers use proprietary methods.

Pricing as of: May 2026. Always check the providers’ official pages for current rates.