How To

How to estimate AI inference costs

Name: Sila
Author: Sila

When using Sila, you pay AI providers directly for inference (the model's compute). Sila adds no fees.

How pricing works

You bring your own keys: OpenAI, Google, Anthropic, or your own server (OpenAI‑compatible APIs).
Pay as you go: Providers bill for tokens (text), images, and audio minutes.
Local models: With Ollama or similar, inference is free to use, but you pay with your hardware resources and electricity.

Text pricing explained

Tokens are chunks of text (~3–4 characters per token on average).
You pay for both input tokens (your prompt + context) and output tokens (the model's reply).
Providers publish prices, usually "$X per 1M tokens" for input and output separately.

Simple estimate

Short message: ~100–300 tokens
Long prompt with files/context: 1k–8k+ tokens
Model reply: ~200–1,000 tokens

To estimate cost for a message: (input_tokens × input_rate) + (output_tokens × output_rate)

Images and audio

Vision: You pay per image and/or per processed tokens depending on provider.
Audio: You pay per minute for transcription or TTS.

Example monthly costs with GPT-5

These examples use GPT-5 pricing: $1.25 per 1M input tokens, $10.00 per 1M output tokens.

Starter (casual)
- 10 chats/day × 30 days = 300 chats
- ~1,200 tokens/chat total
- ≈ 360k tokens/month
- Input: ~180k × $1.25 = $0.23
- Output: ~180k × $10.00 = $1.80
- Total: ~$2/month
Pro (deep work)
- 30 chats/day × 30 days = 900 chats
- ~3k tokens/chat total
- ≈ 2.7M tokens/month
- Input: ~1.35M × $1.25 = $1.69
- Output: ~1.35M × $10.00 = $13.50
- Total: ~$15/month
Team (5 people)
- 5 × 30 chats/day × 30 days = 4,500 chats
- ~3k tokens/chat total
- ≈ 13.5M tokens/month
- Input: ~6.75M × $1.25 = $8.44
- Output: ~6.75M × $10.00 = $67.50
- Total: ~$76/month

Assumptions: balanced input/output tokens, moderate context sizes, using GPT-5 pricing.

Cost management

Sila helps manage costs automatically with efficient defaults and smart context management. You can always see the selected model per assistant and chat to track usage.