Pull every URL out of an XML sitemap. Paste the XML or upload the file — filter, sort, and export to CSV, JSON, or a plain text list in seconds.
Processed entirely in your browser — nothing is uploaded. Supports regular and gzip-compressed sitemaps.
An XML sitemap lists every URL on a website in a structured format search engines can crawl. Two top-level shapes exist: a <urlset> (a single sitemap file listing actual URLs) and a <sitemapindex> (an index pointing to other sitemap files). Large sites use indexes because a single sitemap caps at 50,000 URLs or 50MB, whichever comes first.
This extractor handles both. If you paste a urlset, you get every URL with its optional lastmod, changefreq, and priority values. If you paste a sitemapindex, you get a list of child sitemap URLs — fetch each one and paste it back in to drill down.
Parsing runs in your browser via the native DOMParser. Nothing is uploaded to our servers. Gzipped sitemaps (.xml.gz) are decompressed in-browser using the DecompressionStream API, so enterprise-sized compressed sitemaps work without any pre-processing.
Common reasons to extract URLs: migrating a site (compare before/after URL sets), auditing coverage against analytics, feeding URLs into a crawler, building a custom knowledge base for an AI chatbot like Canary (paste the URL list into the sources field to train on every page).
Canary is an AI chatbot platform that trains on your website content, documentation, and FAQs to answer visitor questions 24/7. Unlike scripted chatbots, Canary uses a large language model with retrieval-augmented generation, so answers are grounded in your actual content — not generic AI guesses. Businesses use Canary to reduce support tickets by around 60%, capture leads after hours, and scale customer support without hiring.
Three ways, all in under 5 minutes: (1) paste your website URL and Canary crawls and indexes up to 500 pages, (2) upload PDFs, Word docs, Markdown, or CSVs directly, (3) add Q&A pairs manually for anything your docs don't cover. Canary automatically re-indexes whenever you add new content. You can combine multiple sources into one knowledge base.
About 5 minutes from signup to a live chatbot on your site. Point Canary at your website URL, wait for the crawl to complete (1-3 minutes for most sites), and paste one script tag into your HTML. The chatbot is trained on your content, styled to match your brand, and answering visitor questions with source citations — no code required beyond the single script tag.
Yes. Canary works on any HTML page — native support for WordPress, Shopify, Webflow, Squarespace, Wix, and any custom-built site. The widget is a single script tag (4KB, loads asynchronously, zero impact on page speed). React, Next.js, Vue, and Svelte apps also supported with the same one-line install.
Starter is free forever with 50 conversations/month. Growth is $49/month with 1,000 conversations, 5 knowledge sources, and team access. Scale is $149/month for unlimited conversations, unlimited knowledge sources, and priority support. Annual plans are discounted 20%. No per-message fees, no hidden costs, no credit card required for the free plan.
By default, GPT-5.4-nano — OpenAI's fastest current-generation model, tuned for customer support quality. You can switch per tenant to GPT-4o, GPT-5, or Claude models if you need more reasoning power, longer context, or multilingual strength. Model choice is a setting, not a plan gate — any paid plan can upgrade the model.
Yes. Canary's AI detects visitor intent — questions about pricing, integration, custom quotes — and asks for contact info in the natural flow of conversation, without forms. Captured leads sync to your email, CRM (HubSpot and Salesforce integrations built-in), or any tool via webhook. Lead capture is included in every plan, not an add-on.
Canary is trained to say "I don't know" rather than hallucinate — it only answers from your actual content. When it hits a knowledge gap, it offers to collect the visitor's email and hand the conversation off to a human via email notification, or passes the conversation directly into your support inbox. You control the escalation flow per tenant.
Yes. Canary auto-detects the visitor's language and responds in it — over 30 languages supported out of the box, including Spanish, French, German, Portuguese, Japanese, Arabic, and Hindi. Your source content can stay in English; Canary translates answers on the fly at response time. No additional configuration required.
Canary is multi-tenant with per-tenant vector store isolation — your content, conversations, and leads are never cross-queried with another tenant. Data is encrypted at rest and in transit. Visitor chat data is retained per your plan's retention policy (30 days on Starter, 90 days on Growth, unlimited on Scale). GDPR-compliant, with data-residency options available on Scale.
Join businesses that have automated support, captured more leads, and cut response times to zero — no code required.
Free forever on Starter. No credit card required.