1. How To

How to estimate AI inference costs

When using Sila, you pay AI providers directly for inference (the model's compute). Sila adds no fees.

How pricing works

  • You bring your own keys: OpenAI, Google, Anthropic, or your own server (OpenAI‑compatible APIs).
  • Pay as you go: Providers bill for tokens (text), images, and audio minutes.
  • Local models: With Ollama or similar, inference is free to use, but you pay with your hardware resources and electricity.

Text pricing explained

  • Tokens are chunks of text (~3–4 characters per token on average).
  • You pay for both input tokens (your prompt + context) and output tokens (the model's reply).
  • Providers publish prices, usually "$X per 1M tokens" for input and output separately.

Simple estimate

  • Short message: ~100–300 tokens
  • Long prompt with files/context: 1k–8k+ tokens
  • Model reply: ~200–1,000 tokens

To estimate cost for a message: (input_tokens × input_rate) + (output_tokens × output_rate)

Images and audio

  • Vision: You pay per image and/or per processed tokens depending on provider.
  • Audio: You pay per minute for transcription or TTS.

Example monthly costs with GPT-5

These examples use GPT-5 pricing: $1.25 per 1M input tokens, $10.00 per 1M output tokens.

  • Starter (casual)
    • 10 chats/day × 30 days = 300 chats
    • ~1,200 tokens/chat total
    • ≈ 360k tokens/month
    • Input: ~180k × $1.25 = $0.23
    • Output: ~180k × $10.00 = $1.80
    • Total: ~$2/month
  • Pro (deep work)
    • 30 chats/day × 30 days = 900 chats
    • ~3k tokens/chat total
    • ≈ 2.7M tokens/month
    • Input: ~1.35M × $1.25 = $1.69
    • Output: ~1.35M × $10.00 = $13.50
    • Total: ~$15/month
  • Team (5 people)
    • 5 × 30 chats/day × 30 days = 4,500 chats
    • ~3k tokens/chat total
    • ≈ 13.5M tokens/month
    • Input: ~6.75M × $1.25 = $8.44
    • Output: ~6.75M × $10.00 = $67.50
    • Total: ~$76/month

Assumptions: balanced input/output tokens, moderate context sizes, using GPT-5 pricing.

Cost management

Sila helps manage costs automatically with efficient defaults and smart context management. You can always see the selected model per assistant and chat to track usage.