Cost Optimization

Practical strategies to reduce your AI costs while maintaining quality and performance.

Understanding AI Costs

AI providers charge based on tokens (words/characters). Costs vary dramatically by:

  • Model size: GPT-4o costs ~10x more than GPT-3.5 Turbo
  • Input vs Output: Output tokens often cost 2-3x more
  • Context window: Longer conversations accumulate costs

Strategy 1: Choose the Right Model

Not every task needs GPT-4o. Match the model to the complexity:

Task TypeRecommended ModelCost/1M tokens
Simple classification, taggingGPT-3.5 Turbo / Gemini 2.0 Flash$0.50 - $1.50
Summarization, extractionClaude 3 Haiku / Gemini 2.5 Flash$1.00 - $5.00
Complex reasoning, analysisGPT-4o / Claude 3 Sonnet$5.00 - $15.00
Critical decisions, legal workGPT-4 Turbo / Claude 3 Opus$15.00 - $30.00

Implementation Example:

model-routing.ts

Strategy 2: Minimize Token Usage

1. Keep Prompts Concise

Remove unnecessary words and examples:

prompt-optimization.ts

2. Limit max_tokens

limit-tokens.ts

3. Truncate Long Inputs

truncate-input.ts

Strategy 3: Cache Repeated Queries

If users ask the same questions frequently, cache the responses:

simple-cache.ts

Strategy 4: Batch Multiple Tasks

Process multiple items in a single request:

batching.ts

Strategy 5: Manage Conversation History

For chatbots, limit conversation history to recent messages:

context-management.ts

Strategy 6: Monitor Costs in Real-Time

Cencori tracks costs automatically. Use the dashboard to:

  • View daily cost trends
  • Compare costs by model
  • Identify expensive queries
  • Set up low balance alerts

Check costs programmatically:

check-costs.ts

Real-World Cost Examples

ScenarioWithout OptimizationWith OptimizationSavings
1000 sentiment analyses/dayGPT-4o: $50/dayGPT-3.5: $5/day90% ($1,350/mo)
Customer support chatbotFull history: $200/day5 msg history: $50/day75% ($4,500/mo)
Document summarizationFull docs: $100/dayTruncated: $30/day70% ($2,100/mo)

Quick Wins Checklist

  • ✅ Use GPT-3.5 Turbo or Gemini 2.5 Flash for simple tasks
  • ✅ Set max_tokens limits
  • ✅ Cache frequently asked questions
  • ✅ Batch process when possible
  • ✅ Truncate long inputs
  • ✅ Use streaming for better UX without extra cost
  • ✅ Monitor costs daily in Cencori dashboard