How We Built an AI Chatbot SaaS for $127/Month

When we decided to build an AI chatbot SaaS, the first thing we Googled was how much it would cost to run in production. Every result quoted $60,000–$500,000 for development. Nobody talked about the monthly infrastructure bill.

Development is a one-time cost. Infrastructure is what you pay every month, forever, and it's what determines whether the unit economics ever make sense.

We're Optivus. We built Canary, a multi-tenant AI chatbot platform that lets any business deploy a trained, knowledge-base-powered chat widget on their website in minutes. Today it runs 10 tenants in production. Our infrastructure bill is $127 a month.

This is an honest breakdown of how we got there: the stack, the decisions, the things we chose not to build, and the math that makes it work.

Why We Decided to Build an AI Chatbot SaaS From Scratch

Before writing a single line of code, we priced out the alternatives.

Platform	Base Price	What You Actually Pay
Tidio (Growth + Lyro AI)	~$49/mo	$100–$150/mo depending on Lyro AI conversation volume
Chatbase (Standard)	$150/mo	+$39–199/mo branding removal, +$59–199/mo custom domain (add-on pricing varies; verify at chatbase.co)
SiteGPT (Scale)	$259/mo	3 bots max
Intercom (Advanced, 3 seats)	$255/mo (billed annually)	+$0.99 per AI-resolved conversation
Botpress (Team)	$495/mo	50K messages cap, 3 bots

The problem wasn't that these tools are bad. They're designed for one company using one chatbot. Our use case was different: we needed to provision and manage chatbots for multiple clients, each with their own knowledge base, branding, and conversation history, all from a single platform.

At that point, Chatbase at $150/mo becomes $1,500/mo for 10 clients. Tidio doesn't have a reseller model. Intercom's per-resolution pricing ($0.99 × 2,000 resolutions = $1,980/mo in AI fees alone) would've made us unprofitable before the first invoice.

Building was the only sensible option. So we did.

The Architecture (At a High Level)

Before we talk dollars, here's what we actually built and a rough roadmap of how it came together.

Three applications in a pnpm monorepo:

API — Express + TypeScript, deployed on Render. Handles chat, knowledge base management, lead capture, analytics, tenant config.
Admin SPA — React + Vite, deployed on Vercel. The dashboard where clients manage their chatbot.
Chat Widget — Preact + Vite, deployed on Vercel as a CDN-served JS bundle. Embedded on client websites via a single <script> tag.

Key infrastructure dependencies:

Supabase — PostgreSQL database, row-level security for tenant isolation, Auth, file storage
OpenAI — GPT-4.1-mini for chat, vector stores for knowledge base retrieval (Responses API)
Firecrawl — URL scraping for knowledge base training
Resend — Transactional email (handoff notifications, daily digest)

What we deliberately left out:

LangChain, LlamaIndex, or any AI framework
A dedicated vector database (Pinecone, Weaviate, Qdrant)
Redis for session caching
A message queue (SQS, BullMQ)
WebSockets

How the Build Phased Out

For anyone planning to build an AI chatbot SaaS themselves, here's the rough sequence we followed:

Phase 1 — Core chat (weeks 1–2): Express API, SSE streaming, OpenAI Responses API integration, basic message history in Supabase.

Phase 2 — Multi-tenancy (weeks 3–4): Supabase RLS policies, per-tenant config, tenant isolation on all routes, embed key validation.

Phase 3 — Knowledge base (weeks 5–6): File upload to OpenAI vector stores, file_search tool, URL scraping via Firecrawl, per-tenant vector store scoping.

Phase 4 — Admin dashboard + widget (weeks 7–10): React SPA with tenant settings, analytics, Preact widget, branding config, lead capture, CSAT ratings.

A solo engineer with full-stack experience can get to a working MVP by the end of Phase 3. Phase 4 is where most of the polish time goes.

The $127/Month Bill, Line by Line

Here's exactly what we pay each month to run 10 production tenants:

Service	Plan	Monthly Cost
Render (API server)	Starter web service	$7
Supabase	Pro	$25
OpenAI (gpt-4.1-mini)	Pay-as-you-go	~$40–60
OpenAI (vector stores)	$0.10/GB/day, first 1 GB free	~$5
Firecrawl	Hobby (billed yearly)	$16
Resend	Pro	$20
Vercel (Admin + Widget)	Pro	$20
Total		~$127–147

Infrastructure cost breakdown — where the ~$133/month actually goes across 10 tenants

A few things worth unpacking here.

OpenAI costs are genuinely cheap with gpt-4.1-mini

GPT-4.1-mini is priced at $0.40 per million input tokens and $1.60 per million output tokens (as of March 2026). A typical chatbot conversation (a user message plus retrieved context plus a response) runs about 800–1,200 tokens total. That's less than $0.001 per conversation.

Even at 50,000 conversations a month across all 10 tenants, the OpenAI bill stays under $50. For comparison, Intercom charges $0.99 per resolved AI conversation. We pay roughly $0.001. That's nearly a 1,000× difference.

We deliberately chose GPT-4.1-mini over GPT-4.1 (full). The full model costs $2.00/$8.00 per million tokens (as of March 2026), five times more expensive on both input and output. For knowledge-base Q&A, where the AI is mostly retrieving and reformatting information rather than reasoning from scratch, the mini model performs nearly identically. We tested both. Users couldn't tell the difference. We keep the savings.

Supabase handles more than you'd expect for $25/mo

The Supabase Pro plan includes PostgreSQL with row-level security (our multi-tenant isolation layer), Auth (JWT, magic links, OAuth), 100 GB file storage for knowledge base documents, 100,000 monthly active users, and realtime subscriptions for live handoff status.

We don't run a separate auth server. We don't run a separate file storage bucket. We don't run a separate realtime service. All of that is $25.

Vector stores are essentially free at our scale

OpenAI gives the first 1 GB of vector store storage free. Each tenant gets their own vector store for knowledge base isolation. At 10 tenants with modest knowledge bases, we're at roughly 200–500 MB total. We pay about $5/month when usage creeps above the free tier.

For comparison: Pinecone's Starter plan is free but limited to 2 GB and a single region. Their Standard plan carries a $50/month minimum commitment before usage-based pricing kicks in, and per-tenant isolation requires careful namespace management that adds operational complexity. For our use case, OpenAI's built-in vector stores are both cheaper and simpler.

The Render $7/mo decision (and its trade-offs)

Render's Starter plan is 512 MB RAM, 0.5 CPU, with cold starts after 15 minutes of inactivity. For a production API, that sounds concerning.

In practice: most of our traffic comes from embedded widgets on client websites during business hours. Cold starts happen at night when nobody's chatting. The first message of the day takes about 2–3 seconds longer than subsequent messages. Users experience this as normal first-load latency, not a failure.

When we hit sustained traffic that justifies it, we'll move to Render Standard at $25/mo. For now, $7 works.

What the $127 doesn't include

Honest accounting: the bill above covers infrastructure, not total cost of ownership. A more complete picture includes developer time for ongoing maintenance, feature work, and bug fixes. Domain registration runs about $15/year. Error monitoring tools like Sentry start at ~$26/mo. GitHub Actions is free for public repos but has usage limits on private ones. SSL is handled free by Render and Vercel.

If you're evaluating build vs. buy, factor developer time into the comparison. Infrastructure at $127/mo is real. The hidden cost is the hours you spend maintaining it.

Want to see the final product? Canary is live — free trial, no credit card, setup in 5 minutes.

The Three Technical Decisions That Kept Costs Down

1. OpenAI Responses API, not Assistants API

Most guides written before mid-2025 recommend the OpenAI Assistants API. As of August 2026, it's deprecated.

We built on the Responses API (openai.responses.create()), which is the current GA pattern. It uses stateless requests with optional previous_response_id for conversation threading. There's no per-thread storage fee, no 30-day conversation expiry, and no assistant object to manage.

The practical difference: we store conversation context in Supabase (where we control retention, cost, and queryability) and pass previous_response_id to link turns. OpenAI handles the inference; we handle the history. This is cheaper and gives us full control over conversation data for analytics.

The chat widget is embedded on client websites as a <script> tag. It runs on sites we don't control, next to assets we didn't write, in browsers we can't predict.

A React-based widget would add ~40 KB (gzipped) to every page it's embedded on. Preact is API-compatible with React but stripped of non-essential features, coming in at 4 KB gzipped. The API surface is identical: hooks, components, context, refs. We use preact/compat for any React-specific third-party components.

For clients embedding the widget on e-commerce sites where Core Web Vitals directly affect conversion and SEO rankings, this matters. We've had clients ask specifically about bundle size before integrating. "4 KB" closes the conversation.

3. Server-Sent Events instead of WebSockets

Chat streaming is implemented with SSE (text/event-stream). The AI response streams token-by-token to the browser without holding a WebSocket connection.

SSE is unidirectional (server to client), which is exactly the shape of streaming AI responses. It works over standard HTTP/2, survives proxies and load balancers that WebSocket upgrades often don't, and requires zero special server configuration. The client sends new messages via standard POST requests.

WebSockets would have required a separate connection management layer, sticky sessions on the load balancer, and a stateful server process. With SSE, our API stays stateless and horizontally scalable without any infrastructure changes.

A Note on AI Provider Choice

We evaluated other providers before committing to OpenAI. Anthropic Claude (Haiku tier) is priced comparably to GPT-4.1-mini. Google Gemini Flash is currently cheaper on a per-token basis. Open-source models (Llama 3, Mistral) via providers like Together AI can undercut both on pure inference cost.

We chose OpenAI because of the integrated vector store + file_search tool combination. It eliminates the need for a separate vector database entirely. Gemini and Claude both require external vector DBs (Pinecone, Weaviate), which adds $50+/mo and an extra API to manage. If OpenAI raises prices or that integration advantage disappears, the provider layer in our architecture is swappable with a few hours of work.

Multi-Tenant Isolation: How It Actually Works

The hardest part of building a multi-tenant AI chatbot SaaS isn't the AI. It's making sure Tenant A can never see Tenant B's conversations, knowledge base, or leads. And enforcing that at the database level, not just the application level.

Our isolation model has three layers.

Database layer. Every table that contains tenant data has a tenant_id column. Supabase row-level security policies enforce that any authenticated query can only read/write rows where tenant_id = auth.jwt()->>'tenant_id'. This is enforced at the database level regardless of what the application code does.

OpenAI layer. Each tenant gets their own vector store. File uploads, knowledge base scraping, and retrieval are all scoped to vector_store_ids: [tenant.vector_store_id] in the tool definition. There's no shared vector store that could accidentally return another tenant's content.

API layer. Every protected route validates that the authenticated user belongs to the tenant they're operating on. For public-facing widget routes (where users aren't authenticated), the tenant_id comes from the public_key in the embed code, not a JWT, and is validated against a whitelist of allowed origins.

This three-layer approach means a bug in any single layer doesn't create a data leak. Defense in depth, without the complexity of a separate permissions microservice.

Three-layer data isolation — a bug in any single layer doesn't create a data leak

On compliance and data residency: We're currently running on Supabase's US region. For GDPR-sensitive deployments, Supabase Pro supports EU regions. We don't yet have SOC 2 certification or formal audit logging. Those are on the roadmap for enterprise tier. If your clients are in regulated industries (healthcare, finance), factor that scope into your timeline.

What We Chose Not to Build (and Why)

Here's what we explicitly decided not to build.

No LangChain/LlamaIndex. These frameworks add abstraction over OpenAI's already-good API. At our scale, they would've added a dependency with breaking changes every few months. They obscure what's actually being sent to the API, which makes debugging harder. And they provide no cost benefit. We call the OpenAI SDK directly. The code is ~200 lines instead of 20, but we understand every line.

No dedicated vector database. OpenAI's built-in vector stores handle our file search use case natively. Adding Pinecone would've added $50+/mo minimum for functionality we already have, plus a second API to manage, second rate limit to monitor, and second service to go down.

No Redis. Our sessions are JWTs. Our caching layer is Supabase's connection pooler. For the traffic volume of 10 tenants with moderate usage, adding Redis would've cost $15–$30/mo for no measurable performance benefit.

We also skipped a message queue. Certain operations (topic classification, email digests) run in the background, and we handle this with Promise.then() fire-and-forget patterns and a cron job (node-cron) that runs inside the API process. At our scale, this is sufficient and free. If we were at 1,000 tenants, we'd revisit.

Finally, no Slack integration. Several clients asked about Slack notifications. We evaluated it. The webhook setup is simple, but the edge cases (webhook rotations, workspace permissions, error handling for deleted channels) add maintenance overhead disproportionate to the value. We do email notifications well instead.

The Competitor Pricing Reality Check

We looked at this honestly. Here's what the major platforms actually cost for a multi-tenant use case (10 clients, ~5,000 AI conversations/month each):

Platform	Per-Client Cost	10-Client Total	Notes
Tidio (Growth + Lyro AI)	$100–150/mo	$1,000–1,500/mo	No reseller model; each is a separate account; Lyro pricing is volume-tiered
Chatbase (Standard, with branding removal)	$189–349/mo	$1,890–3,490/mo	$150 base + add-on pricing varies; verify current rates
SiteGPT (Scale, max 3 bots)	$259/mo	$259/mo*	*Can't even do 10 clients
Intercom (Advanced, Fin AI)	$255 (annual) + $4,950 AI fees	$5,205/mo	$0.99 × 5,000 resolutions; seat cost is billed annually at $85/seat
Canary (self-hosted)	$12.70/mo	$127/mo	Fixed infra, ~$0.001 per conversation

The asterisked SiteGPT figure is not a typo. Their Scale plan supports a maximum of 3 bots, so it literally cannot support 10 clients.

The Intercom number is jarring but accurate. At $0.99 per AI-resolved conversation, a platform handling 5,000 AI resolutions per month pays $4,950 in usage fees alone, before seat costs. (Note: Intercom's per-seat price is $85/month billed annually, or $99/month billed monthly.)

None of this means those platforms are bad. They have features we don't have: CRM integrations, phone support, live agent routing with a polished agent UI. We're not competing with Intercom for enterprise support teams. We're competing for the business owner who wants a trained AI chatbot on their website and doesn't want a $500/month SaaS bill.

Comparing options for your clients? See Canary's pricing — the flat monthly rate includes unlimited conversations.

How to Price and Monetize a Chatbot SaaS

The cost side is one half of the equation. The other half is what you charge.

At $127/mo in infrastructure for 10 tenants, your per-tenant infra cost is $12.70. That means any price above $12.70 contributes to margin. In practice, the market tolerates substantially more.

Common pricing models for chatbot SaaS:

The simplest approach is a per-chatbot flat monthly fee per deployed chatbot or domain. That's our model. It's easy to explain and easy to sell.

You can also go per-seat, charging the client per team member who accesses the admin dashboard. This works well when clients have large support teams. A per-conversation model mirrors how you pay OpenAI, but it's harder to explain and creates anxiety for clients who can't predict their volume. Many platforms use a tiered flat rate (Starter/Growth/Pro by feature set), which is the easiest to sell but requires careful feature gating.

We charge a flat monthly fee per client. At a conservative $99/month per tenant, 10 clients generate $990/month in revenue against $127/month in infrastructure, a 7.8× gross margin on infra costs before developer time.

At $199/month per tenant (mid-market pricing for a managed chatbot with knowledge base and analytics), 10 clients = $1,990/month. The infrastructure cost doesn't change. The margin does.

The real point is the fixed cost structure. OpenAI charges $0.001 per conversation. You're not charging per conversation. You're charging for the platform, the management layer, and the ongoing training and support. That gap is where the business lives.

Scaling the Bill, Not the Architecture

The most important property of this cost structure: it doesn't scale linearly with tenants.

Our fixed costs (Render, Supabase, Vercel, Firecrawl, Resend) total about $88/month regardless of whether we have 1 tenant or 100. The variable costs (OpenAI inference, vector storage) scale with usage, not with tenant count.

Adding tenant #11 costs us roughly:

$0 in fixed infrastructure (already paid)
~$4–6/month in OpenAI usage at average conversation volumes
~$0.50/month in vector storage

The unit economics improve with every new client. That's the economic logic behind building instead of buying: at some number of tenants, the build cost amortizes and every new client is nearly pure margin.

What Comes Next

The $127/month architecture is intentionally conservative. We designed it to cover the first 10–50 tenants without changes. At 100+ tenants, the Render Starter plan would need to step up to Standard or Pro ($25–$85/month). At 500+ tenants, we'd look at a managed Kubernetes layer. But those are good problems to have, and neither changes the fundamental unit economics.

The real insight from this build: the AI is cheap. The infrastructure around the AI is cheap. The cost of third-party SaaS is expensive.

If you're building an AI chatbot SaaS and want to see what the final product looks like, Canary is live. We offer a free trial, no credit card, setup in about 5 minutes.

Frequently Asked Questions

How long does it take to build an AI chatbot SaaS from scratch? A capable solo full-stack engineer can reach a production MVP in 8–12 weeks: roughly 2 weeks for core chat infrastructure, 2 weeks for multi-tenancy and data isolation, 2 weeks for knowledge base integration, and 3–4 weeks for the admin dashboard and widget. A 2-person team (one backend, one frontend) cuts that to 5–7 weeks. Timeline expands significantly if you're adding enterprise features (SSO, audit logging, custom integrations) from the start.

What is the best AI model for building a chatbot? For most knowledge-base chatbot use cases, GPT-4.1-mini offers the best cost-to-quality ratio at $0.40/$1.60 per million tokens (as of March 2026). For pure token cost, Google Gemini Flash is cheaper. For reasoning-heavy or complex multi-step tasks, GPT-4.1 full or Anthropic Claude Sonnet 4 offer stronger performance at higher cost. The practical differentiator if you're using OpenAI: the integrated vector store + file_search API eliminates the need for a separate vector database, which saves $50+/month at scale.

How do AI chatbot SaaS companies make money? The standard model is a flat monthly subscription per client/tenant, typically ranging from $49 to $299/month depending on features and conversation volume included. At our infra cost of ~$12.70 per tenant, the margin on a $99/month plan is roughly 87% gross margin on infrastructure alone (before developer time and support). OpenAI charges per conversation; you charge a flat monthly fee. The gap between $0.001/conversation and a $99 flat fee is where the business lives.

What is multi-tenant architecture for SaaS? Multi-tenancy means a single deployment of your application serves multiple distinct customers ("tenants"), with each tenant's data completely isolated from others. In a chatbot SaaS context, this means one API server, one database, and one codebase simultaneously serving 10 different companies, each with their own conversations, knowledge base, branding, and settings, without any tenant being able to access another's data. The alternative (single-tenant) would mean deploying a separate API and database instance per customer, which multiplies infrastructure costs proportionally. Multi-tenancy is what makes the $127/month figure possible for 10 clients.

When should you build vs. buy a chatbot platform? Build if you need multi-tenant capabilities at scale, full control over data and AI behavior, or you're building a product rather than using a tool. Buy if you're a single business that needs a chatbot deployed in an afternoon and don't want to maintain infrastructure. The inflection point is roughly 3–5 clients: below that, existing platforms are cheaper than the developer hours to build. Above that, the economics of building start to compound in your favor.