Implementing Dynamic Sitemap Generation with Content Collections

You’ve deployed a portfolio with 50 projects, 30 tutorials, and a handful of category pages. Google Search Console shows almost no indexed pages. The crawler visited your homepage and stopped. The problem isn’t your content — it’s that search engines didn’t know your pages existed in the first place.

Sitemaps tell crawlers exactly what’s on your site and how to find it. You could write one by hand, but the moment you add a new project you’ve broken the sitemap. The real solution is to generate it at build time from your content directly. That’s what src/pages/sitemap.xml.ts does in this portfolio — an Astro API route that queries all content collections, filters drafts, and serves a complete XML sitemap automatically on every build.

What You Need First

You should be comfortable with TypeScript basics, Astro Content Collections, and have a rough idea of what an API route is. The sitemap protocol itself is straightforward — it’s just XML with a specific structure that search engines understand.

Your astro.config.mjs needs a site property set, since that’s where the sitemap generator gets the base URL:

export default defineConfig({
  site: 'https://jasontran.pages.dev',
  integrations: [sitemap()],
});

From Static XML to a Build-Time Generator

The naive approach is a static public/sitemap.xml file. You write it once, it gets stale immediately, and every new piece of content requires a manual edit. The problem compounds fast: wrong URLs, missing pages, outdated last-modified dates.

Astro’s file-based routing handles this elegantly. A file named sitemap.xml.ts in src/pages/ is treated as an API route — it exports a GET handler that Astro executes at build time. The .xml extension tells Astro to serve the output with the right content type.

// src/pages/sitemap.xml.ts
import type { APIRoute } from 'astro';

export const GET: APIRoute = async ({ site }) => {
  const siteUrl = site?.toString() || 'https://yoursite.com';
  // ... generate XML
  return new Response(xmlString, {
    headers: { 'Content-Type': 'application/xml; charset=utf-8' },
  });
};

The filename must be sitemap.xml.ts. The .xml part is what tells Astro to serve it as XML rather than HTML.

Querying Collections and Building the URL List

The generator pulls from three content collections — projects, tutorials, and categories — filtering out anything marked as a draft:

import { getCollection } from 'astro:content';

const projects  = await getCollection('projects',  ({ data }) => !data.draft);
const tutorials = await getCollection('tutorials', ({ data }) => !data.draft);
const categories = await getCollection('category');

getCollection is async because it reads from disk during the build. The filter callback gets the full entry, so you can check any frontmatter field. For categories there’s no draft field, so we take everything.

Static pages get defined manually since they don’t live in content collections:

const staticPages = [
  { url: '',          changefreq: 'weekly',  priority: 1.0 },
  { url: 'projects',  changefreq: 'weekly',  priority: 0.9 },
  { url: 'tutorials', changefreq: 'weekly',  priority: 0.9 },
  { url: 'terminal',  changefreq: 'monthly', priority: 0.8 },
  { url: 'about',     changefreq: 'monthly', priority: 0.7 },
  { url: 'contact',   changefreq: 'monthly', priority: 0.7 },
  { url: 'freelance', changefreq: 'monthly', priority: 0.8 },
];

The priority values are relative hints, not directives. A 1.0 on the homepage tells crawlers this is the most important page. changefreq is also advisory — search engines may ignore it if they detect a different update pattern in practice.

Assembling the XML

Template literals are the simplest way to build sitemap XML. The structure is predictable, the data is trusted (it’s your own content), and you avoid adding a dependency for something this straightforward:

const sitemap = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  ${staticPages.map(page => `
  <url>
    <loc>${siteUrl}${page.url}</loc>
    <changefreq>${page.changefreq}</changefreq>
    <priority>${page.priority}</priority>
  </url>`).join('')}
  
  ${projects.map(project => `
  <url>
    <loc>${siteUrl}projects/${project.id}</loc>
    <lastmod>${project.data.publishDate.toISOString()}</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>`).join('')}
  
  ${tutorials.map(tutorial => `
  <url>
    <loc>${siteUrl}tutorials/${tutorial.id}</loc>
    <lastmod>${tutorial.data.publishDate.toISOString()}</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.7</priority>
  </url>`).join('')}
  
  ${categories.map(category => `
  <url>
    <loc>${siteUrl}category/${category.data.slug || category.id}</loc>
    <changefreq>weekly</changefreq>
    <priority>0.7</priority>
  </url>`).join('')}
</urlset>`;

project.id is the content collection entry’s filename without the extension — so src/content/projects/simple-router.mdx becomes "simple-router". toISOString() formats dates as 2024-01-15T10:30:00.000Z, which is the required format for <lastmod> tags.

The category.data.slug || category.id fallback handles cases where a category defines a custom slug in its frontmatter. If it doesn’t, we fall back to the filename-based ID.

One thing to watch for: if project.id ever contains special XML characters like & or <, the sitemap becomes invalid XML. In practice, Astro content collection IDs are alphanumeric plus hyphens (derived from filenames), so this isn’t a real problem for this portfolio. If you extend this pattern to content where IDs could contain arbitrary characters, add an XML escape function.

Returning the Response and Making It Discoverable

return new Response(sitemap, {
  headers: {
    'Content-Type': 'application/xml; charset=utf-8',
  },
});

The Content-Type header is what tells browsers and crawlers this is XML. Without it, some tools will treat the response as plain text and fail to parse it.

Once your sitemap is live at /sitemap.xml, tell crawlers where to find it by adding a reference in public/robots.txt:

User-agent: *
Allow: /

Sitemap: https://jasontran.pages.dev/sitemap.xml

Then submit the URL manually in Google Search Console (Sitemaps section) and Bing Webmaster Tools. The robots.txt reference enables automatic discovery, but manual submission speeds up initial indexing.

A Bug Worth Knowing About

The original code filtered categories by c.data.type === 'project' before building category URLs. That filter always returns an empty array because the category schema doesn’t define a type field. If you’re building on this codebase, drop the filter and just map all categories directly — the version above reflects that fix.

Things That Break Quietly

If site is not configured in astro.config.mjs, the site parameter in GET comes back as undefined. The fallback to 'https://yoursite.com' means your sitemap silently produces wrong URLs. Better to fail loudly at build time:

export const GET: APIRoute = async ({ site }) => {
  if (!site) {
    throw new Error('site URL must be configured in astro.config.mjs');
  }
  // ...
};

Trailing slashes are another quiet issue. If trailingSlash is set to 'never' in your Astro config, make sure your sitemap URLs don’t have trailing slashes either. Search engines treat /projects and /projects/ as different URLs, and inconsistency here can split your crawl budget.

For performance context: on a site with 100 projects and 50 tutorials, the entire sitemap generation — reading from disk, building the XML string — takes roughly 65ms at build time. After that, the output is a static file served from CDN with zero runtime cost.