Paste or upload a JSONL file. Every line is validated against OpenAI's fine-tuning schema — with token counts, training cost estimates, and a download of the cleaned dataset.
One JSON object per line. Validation is live — nothing leaves your browser.
OpenAI's fine-tuning API accepts training data as JSONL — one JSON object per line, no commas, no wrapping array. Each line is a single training example. Get the schema wrong and the upload fails; get a subtle issue right and the training succeeds but produces a model that behaves unexpectedly. This validator catches both classes of problem before you pay to fine-tune.
The current chat format uses a single messages array per line with role (system / user / assistant) and content for each message:
{"messages": [
{"role": "system", "content": "You are a cheerful support agent."},
{"role": "user", "content": "How do I reset my password?"},
{"role": "assistant", "content": "Click 'Forgot password' on the login page."}
]}The legacy completion format — used for older models — has separate prompt and completion fields. If you're fine-tuning a current-generation model, stick with chat format. The validator supports both.
OpenAI's training pricing in 2026 ranges from $3/1M tokens (GPT-4o mini) to $25/1M tokens (GPT-4o), with the full dataset re-processed for each epoch (default 3). A 500K-token dataset at 3 epochs on GPT-4o mini costs about $4.50 — small. On GPT-4o that's $37.50 — real. The cost table above assumes your dataset trains end-to-end; change the epoch count to see your actual spend.
Common mistakes this validator catches: missing messages field, invalid role values (e.g. "bot" instead of "assistant"), assistant messages with empty content (training no-ops), examples with no assistant message at all (won't train anything), and fewer than 10 examples in the dataset (OpenAI's hard minimum).
If you're considering fine-tuning a chatbot, try Canary first — it uses Retrieval-Augmented Generation (RAG) to ground responses in your content without training a custom model. RAG is faster (live, no training run), cheaper (no $25/M training bill), and updates instantly when your docs change. Fine-tuning is the right tool when you need a specific behavioral style; RAG is the right tool when you need factual accuracy on proprietary content.
Canary is an AI chatbot platform that trains on your website content, documentation, and FAQs to answer visitor questions 24/7. Unlike scripted chatbots, Canary uses a large language model with retrieval-augmented generation, so answers are grounded in your actual content — not generic AI guesses. Businesses use Canary to reduce support tickets by around 60%, capture leads after hours, and scale customer support without hiring.
Three ways, all in under 5 minutes: (1) paste your website URL and Canary crawls and indexes up to 500 pages, (2) upload PDFs, Word docs, Markdown, or CSVs directly, (3) add Q&A pairs manually for anything your docs don't cover. Canary automatically re-indexes whenever you add new content. You can combine multiple sources into one knowledge base.
About 5 minutes from signup to a live chatbot on your site. Point Canary at your website URL, wait for the crawl to complete (1-3 minutes for most sites), and paste one script tag into your HTML. The chatbot is trained on your content, styled to match your brand, and answering visitor questions with source citations — no code required beyond the single script tag.
Yes. Canary works on any HTML page — native support for WordPress, Shopify, Webflow, Squarespace, Wix, and any custom-built site. The widget is a single script tag (4KB, loads asynchronously, zero impact on page speed). React, Next.js, Vue, and Svelte apps also supported with the same one-line install.
Starter is free forever with 50 conversations/month. Growth is $49/month with 1,000 conversations, 5 knowledge sources, and team access. Scale is $149/month for unlimited conversations, unlimited knowledge sources, and priority support. Annual plans are discounted 20%. No per-message fees, no hidden costs, no credit card required for the free plan.
By default, GPT-5.4-nano — OpenAI's fastest current-generation model, tuned for customer support quality. You can switch per tenant to GPT-4o, GPT-5, or Claude models if you need more reasoning power, longer context, or multilingual strength. Model choice is a setting, not a plan gate — any paid plan can upgrade the model.
Yes. Canary's AI detects visitor intent — questions about pricing, integration, custom quotes — and asks for contact info in the natural flow of conversation, without forms. Captured leads sync to your email, CRM (HubSpot and Salesforce integrations built-in), or any tool via webhook. Lead capture is included in every plan, not an add-on.
Canary is trained to say "I don't know" rather than hallucinate — it only answers from your actual content. When it hits a knowledge gap, it offers to collect the visitor's email and hand the conversation off to a human via email notification, or passes the conversation directly into your support inbox. You control the escalation flow per tenant.
Yes. Canary auto-detects the visitor's language and responds in it — over 30 languages supported out of the box, including Spanish, French, German, Portuguese, Japanese, Arabic, and Hindi. Your source content can stay in English; Canary translates answers on the fly at response time. No additional configuration required.
Canary is multi-tenant with per-tenant vector store isolation — your content, conversations, and leads are never cross-queried with another tenant. Data is encrypted at rest and in transit. Visitor chat data is retained per your plan's retention policy (30 days on Starter, 90 days on Growth, unlimited on Scale). GDPR-compliant, with data-residency options available on Scale.
Join businesses that have automated support, captured more leads, and cut response times to zero — no code required.
Free forever on Starter. No credit card required.