An XML sitemap is supposed to help Google discover your pages. On a small site, that is almost automatic. On an e-commerce catalog with 10,000 or 100,000 SKU, a poorly structured sitemap is one of the main causes of under-indexing: thousands of products Google never crawls because your sitemap sends the wrong signal.
Here is the method for structuring a product sitemap that works, with the real rules and limits you only see in production.
A sitemap XML file can contain only 50,000 URLs maximum, with a max size of 50 MB uncompressed. That is specified in the sitemap.xml protocol.
But 50,000 URLs is the technical limit — the practical limit is lower. Google implicitly recommends staying under 10,000-20,000 URLs per file so its crawler can process each URL properly. Beyond that, it may sample and ignore tail-end URLs.
Operational rule: 5,000-20,000 URLs per sitemap file is the optimal range. Above that, shard it.
Many stores set <lastmod> to today’s date to “tell Google the page changed.” That is a trick Google has learned to detect: if every day you declare 10,000 pages changed but they did not actually change, Google ignores all your lastmod values after 2-3 weeks.
Best practice:
<lastmod> = actual date of the last modification to the page content (not the crawl date or today’s date)
ISO 8601 format: 2026-04-23 or 2026-04-23T14:30:00+02:00
Update it only when the product really changes (price, description, images, stock)
If you cannot determine the last modified date, omit the field — that is better than a fake lastmod that hurts your crawler credibility.
Temporarily out-of-stock products (a few days) should stay in the sitemap — their rankings are an asset worth preserving until stock returns.
Products that are permanently discontinued (never sold again) should be removed from the sitemap AND from the Google index (via 301 redirect or meta robots noindex).
A common bad practice is deleting discontinued product pages and returning 404. Result: you lose external backlinks, reviews, and Google takes 6-12 months to clean its index. Prefer a 301 redirect to a similar product or the parent category.
Google Search Console: Settings → Sitemaps → Submit https://example.com/sitemap.xml
Bing Webmaster Tools: Sitemaps → Submit sitemap
Yandex Webmaster (if you target Russia) — rare in 2026
Submitting to Bing is underestimated. Bing + Yahoo = 6-8% of global search traffic, ~15-20% in the US and UK. DuckDuckGo also uses the Bing index. Ignoring Bing = ignoring 10-20% of potential traffic.
Post-submission monitoring:
GSC → Indexing → Sitemaps: check that "Discovered" = number of submitted URLs, and that "Indexed" rises over time
If Indexed / Discovered < 50% after 8 weeks, you have a page quality problem (not a sitemap problem)
Sitemap generated automatically at https://<yourshop>.myshopify.com/sitemap.xml and /sitemap_products_1.xml, etc. You cannot customize it directly. Limits:
Automatically excludes unpublished products and out-of-stock products with "continue selling = off"
Google crawls the sitemap as often as it considers relevant — typically daily for large active catalogs, weekly for less dynamic sites. You can force a re-crawl via GSC by clicking "Submit" on an already submitted sitemap.
Yes, always. Line: Sitemap: https://example.com/sitemap.xml at the end of robots.txt. Crawlers that do not have access to your GSC (Bing, DuckDuckGo, AI bots) find it that way.
No. A sitemap is a hint, not an exclusion. Pages not listed can still be crawled if they are linked from other pages. To actually block them, use robots.txt or meta name="robots" content="noindex".
Image sitemaps: yes in some verticals (fashion, home decor), they help unlock Google Images indexing and can double traffic for some product types. Video sitemaps: less critical, unless you have a lot of product videos.
Yes. Even if it works, sharding to 10,000 URLs per file improves re-crawl frequency and makes adjustments faster (invalidate one shard instead of the whole file). The migration takes only a few hours and brings a 5-15% gain in indexing rate.