Cost Optimization
Practical strategies to reduce your AI costs while maintaining quality and performance.
Understanding AI Costs
AI providers charge based on tokens (words/characters). Costs vary dramatically by:
- Model size: GPT-4o costs ~10x more than GPT-3.5 Turbo
- Input vs Output: Output tokens often cost 2-3x more
- Context window: Longer conversations accumulate costs
Strategy 1: Choose the Right Model
Not every task needs GPT-4o. Match the model to the complexity:
| Task Type | Recommended Model | Cost/1M tokens |
|---|---|---|
| Simple classification, tagging | GPT-3.5 Turbo / Gemini 2.0 Flash | $0.50 - $1.50 |
| Summarization, extraction | Claude 3 Haiku / Gemini 2.5 Flash | $1.00 - $5.00 |
| Complex reasoning, analysis | GPT-4o / Claude 3 Sonnet | $5.00 - $15.00 |
| Critical decisions, legal work | GPT-4 Turbo / Claude 3 Opus | $15.00 - $30.00 |
Implementation Example:
model-routing.ts
Strategy 2: Minimize Token Usage
1. Keep Prompts Concise
Remove unnecessary words and examples:
prompt-optimization.ts
2. Limit max_tokens
limit-tokens.ts
3. Truncate Long Inputs
truncate-input.ts
Strategy 3: Cache Repeated Queries
If users ask the same questions frequently, cache the responses:
simple-cache.ts
Strategy 4: Batch Multiple Tasks
Process multiple items in a single request:
batching.ts
Strategy 5: Manage Conversation History
For chatbots, limit conversation history to recent messages:
context-management.ts
Strategy 6: Monitor Costs in Real-Time
Cencori tracks costs automatically. Use the dashboard to:
- View daily cost trends
- Compare costs by model
- Identify expensive queries
- Set up low balance alerts
Check costs programmatically:
check-costs.ts
Real-World Cost Examples
| Scenario | Without Optimization | With Optimization | Savings |
|---|---|---|---|
| 1000 sentiment analyses/day | GPT-4o: $50/day | GPT-3.5: $5/day | 90% ($1,350/mo) |
| Customer support chatbot | Full history: $200/day | 5 msg history: $50/day | 75% ($4,500/mo) |
| Document summarization | Full docs: $100/day | Truncated: $30/day | 70% ($2,100/mo) |
Quick Wins Checklist
- ✅ Use GPT-3.5 Turbo or Gemini 2.5 Flash for simple tasks
- ✅ Set
max_tokenslimits - ✅ Cache frequently asked questions
- ✅ Batch process when possible
- ✅ Truncate long inputs
- ✅ Use streaming for better UX without extra cost
- ✅ Monitor costs daily in Cencori dashboard

