Pick chunk size and overlap, see exactly how many chunks your document splits into, what embedding them costs, and how much vector storage they need — for every major embedding model.
Chunks
217
Total tokens stored
111,104
Overlap overhead
11.1%
Embedding cost
$0.00222
Document tokens
100,000
Effective overlap
50 tok
Vector dimensions
1,536d
Vector storage
1.3 MB
Retrieval-Augmented Generation (RAG) works by splitting your documents into chunks, converting each chunk into an embedding vector, and storing those vectors in a database. At query time, the system retrieves the most semantically similar chunks and passes them to the LLM as context. The chunk size you pick — how many tokens of text each vector represents — is the single most consequential decision in the pipeline.
Small chunks (128-512 tokens) capture precise, atomic facts and retrieve cleanly for narrow questions. But they lose surrounding context: a chunk that says "the total is $14,000" tells you nothing about what the total is for. Pair small chunks with Q&A-style retrieval where the question carries most of the context.
Large chunks (1024-4096 tokens) preserve narrative and multi-step reasoning, which matters for summarization and legal/contractual documents where clauses reference each other. The tradeoff: each retrieved chunk consumes more of your LLM's context window, so you can retrieve fewer chunks per query, and embeddings become less discriminative as they try to represent more text in a single vector.
Overlap is the number of tokens each chunk shares with its neighbors. Overlap protects against information being split at a boundary — the classic case is a pronoun reference ("it failed") where the antecedent is in the previous chunk. 10-20% overlap is the common rule of thumb; legal and technical docs benefit from 20-30%; simple chat/support content can get away with 5-10%. Overlap costs extra — every overlapping token is embedded and stored twice.
Embedding cost scales linearly with total tokens stored. For a 100K-token document split into 512-token chunks with 50-token overlap, you pay for roughly 111K tokens (11% overhead) at embedding time. With 100-token overlap, overhead jumps to 22%. For a 10M-document corpus, that's the difference between a $2 and a $4 embedding bill with text-embedding-3-small — small on one doc, real on a corpus.
Vector storage is often the hidden cost. Each chunk's embedding is a vector of 1,024-3,072 float32 numbers — 4 KB to 12 KB per chunk. A million-chunk corpus on text-embedding-3-large is 12 GB of raw vectors, before index overhead. This tool shows you the raw storage size; budget 1.5-2x on top for HNSW or IVF index structures.
If you're already using Canary, you don't touch any of this — we handle chunking, embedding, and retrieval internally with strategies tuned per content type. This calculator is useful if you're benchmarking self-hosted RAG, evaluating vendors, or just sizing an embedding spend before you build.
Canary is an AI chatbot platform that trains on your website content, documentation, and FAQs to answer visitor questions 24/7. Unlike scripted chatbots, Canary uses a large language model with retrieval-augmented generation, so answers are grounded in your actual content — not generic AI guesses. Businesses use Canary to reduce support tickets by around 60%, capture leads after hours, and scale customer support without hiring.
Three ways, all in under 5 minutes: (1) paste your website URL and Canary crawls and indexes up to 500 pages, (2) upload PDFs, Word docs, Markdown, or CSVs directly, (3) add Q&A pairs manually for anything your docs don't cover. Canary automatically re-indexes whenever you add new content. You can combine multiple sources into one knowledge base.
About 5 minutes from signup to a live chatbot on your site. Point Canary at your website URL, wait for the crawl to complete (1-3 minutes for most sites), and paste one script tag into your HTML. The chatbot is trained on your content, styled to match your brand, and answering visitor questions with source citations — no code required beyond the single script tag.
Yes. Canary works on any HTML page — native support for WordPress, Shopify, Webflow, Squarespace, Wix, and any custom-built site. The widget is a single script tag (4KB, loads asynchronously, zero impact on page speed). React, Next.js, Vue, and Svelte apps also supported with the same one-line install.
Starter is free forever with 50 conversations/month. Growth is $49/month with 1,000 conversations, 5 knowledge sources, and team access. Scale is $149/month for unlimited conversations, unlimited knowledge sources, and priority support. Annual plans are discounted 20%. No per-message fees, no hidden costs, no credit card required for the free plan.
By default, GPT-5.4-nano — OpenAI's fastest current-generation model, tuned for customer support quality. You can switch per tenant to GPT-4o, GPT-5, or Claude models if you need more reasoning power, longer context, or multilingual strength. Model choice is a setting, not a plan gate — any paid plan can upgrade the model.
Yes. Canary's AI detects visitor intent — questions about pricing, integration, custom quotes — and asks for contact info in the natural flow of conversation, without forms. Captured leads sync to your email, CRM (HubSpot and Salesforce integrations built-in), or any tool via webhook. Lead capture is included in every plan, not an add-on.
Canary is trained to say "I don't know" rather than hallucinate — it only answers from your actual content. When it hits a knowledge gap, it offers to collect the visitor's email and hand the conversation off to a human via email notification, or passes the conversation directly into your support inbox. You control the escalation flow per tenant.
Yes. Canary auto-detects the visitor's language and responds in it — over 30 languages supported out of the box, including Spanish, French, German, Portuguese, Japanese, Arabic, and Hindi. Your source content can stay in English; Canary translates answers on the fly at response time. No additional configuration required.
Canary is multi-tenant with per-tenant vector store isolation — your content, conversations, and leads are never cross-queried with another tenant. Data is encrypted at rest and in transit. Visitor chat data is retained per your plan's retention policy (30 days on Starter, 90 days on Growth, unlimited on Scale). GDPR-compliant, with data-residency options available on Scale.
Join businesses that have automated support, captured more leads, and cut response times to zero — no code required.
Free forever on Starter. No credit card required.