featured image

Implementing Dynamic Sitemap Generation with Content Collections

Learn how to create a dynamic sitemap generator in Astro using content collection queries, enhancing SEO and automating sitemap maintenance.

Published

Mon Jun 16 2025

Technologies Used

Astro SEO
Intermediate 25 minutes

Purpose

The SEO Black Hole

You’ve built a beautiful portfolio with 50 projects, 30 tutorials, and 10 categories. You deploy to production. Then you check Google Search Console and see… nothing. Your pages aren’t being indexed.

The issue: Search engines don’t know your pages exist. They could crawl your site randomly, but that’s inefficient. What they really want is a sitemap—an XML file that lists every URL on your site with metadata about priority and update frequency.

You could manually create sitemap.xml:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://yoursite.com/projects/project-1</loc>
    <priority>0.8</priority>
  </url>
  <url>
    <loc>https://yoursite.com/projects/project-2</loc>
    <priority>0.8</priority>
  </url>
  <!-- ... 48 more projects -->
</urlset>

But this creates three problems:

  1. Maintenance Nightmare: Every new project requires manually editing XML
  2. Stale Data: Last modified dates are wrong or missing
  3. Human Error: One typo breaks the entire sitemap

The Core Problem: Content is dynamic (stored in Markdown files), but sitemaps are static (XML files). We need to generate sitemaps programmatically from our content at build time.

API Routes with Content Collection Queries

The code we’re analyzing (src/pages/sitemap.xml.ts) implements a dynamic sitemap generator that:

  1. Queries all content collections (projects, tutorials, categories)
  2. Filters out draft content
  3. Generates XML with proper priorities and change frequencies
  4. Includes last modified dates from frontmatter
  5. Serves the result as an API endpoint

This is the same pattern used by:

  • WordPress (automatic sitemap generation)
  • Next.js (next-sitemap plugin)
  • Gatsby (gatsby-plugin-sitemap)

Understanding Build-Time Data Fetching

This tutorial demonstrates three advanced concepts:

  • API Routes: Creating endpoints that return non-HTML responses
  • XML Generation: Programmatically building valid XML documents
  • SEO Optimization: Understanding search engine crawling behavior

🔵 Deep Dive: Sitemaps are part of the Sitemaps Protocol (sitemaps.org), a standard supported by Google, Bing, Yahoo, and other search engines. Proper sitemaps can improve indexing speed by 50-70%.

Prerequisites & Tooling

Knowledge Base

Required:

  • TypeScript/JavaScript basics
  • Understanding of XML structure
  • Familiarity with Astro Content Collections
  • Basic SEO concepts (what search engines do)

Helpful:

  • Experience with API routes
  • Understanding of HTTP headers
  • Knowledge of sitemap protocols

Environment

From the project’s configuration:

// astro.config.mjs
export default defineConfig({
  site: 'https://jasontran.pages.dev',  // Required for sitemap generation
  integrations: [sitemap()],  // Built-in sitemap integration
});

Key Concepts:

  • API Route: A file in src/pages/ that exports a function instead of a component
  • GET Handler: Function that handles HTTP GET requests
  • Content Collections: Astro’s type-safe content management system
  • XML: Extensible Markup Language for structured data

Testing Your Sitemap

# Build the site
npm run build

# Preview locally
npm run preview

# Visit the sitemap
curl http://localhost:4321/sitemap.xml

# Validate XML
xmllint --noout sitemap.xml  # Linux/Mac
# Or use online validators: https://www.xml-sitemaps.com/validate-xml-sitemap.html

High-Level Architecture

Sitemap Generation Flow

graph TB
    A[Build Process Starts] --> B[Astro Processes sitemap.xml.ts]
    B --> C[GET Handler Executes]
    C --> D[Query Content Collections]
    D --> E[getCollection: projects]
    D --> F[getCollection: tutorials]
    D --> G[getCollection: categories]
    E --> H[Filter Drafts]
    F --> H
    G --> H
    H --> I[Generate XML String]
    I --> J[Static Pages URLs]
    I --> K[Project URLs + Dates]
    I --> L[Tutorial URLs + Dates]
    I --> M[Category URLs]
    J --> N[Combine All URLs]
    K --> N
    L --> N
    M --> N
    N --> O[Return Response with XML Headers]
    O --> P[sitemap.xml Available at Build]
    
    style C fill:#a855f7
    style I fill:#10b981
    style O fill:#f59e0b

The Phone Book

Think of a sitemap as a phone book for search engines:

Phone BookSitemap
Names & numbersURLs
Alphabetical orderPriority ranking
”Updated 2024”Last modified dates
Business vs. ResidentialPage types (static vs. dynamic)
Yellow pages sectionsCategories

When a search engine visits your site, it first checks the phone book (sitemap) to understand:

  • What pages exist
  • Which are most important
  • When they were last updated
  • How often they change

The Three-Phase Architecture

Phase 1: Data Collection (Build Time)
  ├─ Query all content collections
  ├─ Filter out drafts
  └─ Extract metadata (dates, slugs)

Phase 2: XML Generation (String Building)
  ├─ Create XML header
  ├─ Add static pages
  ├─ Add dynamic content pages
  └─ Close XML structure

Phase 3: Response Delivery (HTTP)
  ├─ Set Content-Type header
  ├─ Return XML string
  └─ Cache at CDN edge

The Implementation

Defining the API Route

Naive Approach: Static XML File

<!-- public/sitemap.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://yoursite.com/</loc>
  </url>
</urlset>

Why This Fails: Every new project requires manually editing this file. No automation.

Refined Solution (From Repo):

// src/pages/sitemap.xml.ts
import type { APIRoute } from 'astro';

export const GET: APIRoute = async ({ site }) => {
  // This function runs at BUILD TIME
  // It generates the sitemap dynamically
  
  const siteUrl = site?.toString() || 'https://yoursite.com';
  
  // ... generate XML
  
  return new Response(xmlString, {
    headers: {
      'Content-Type': 'application/xml; charset=utf-8',
    },
  });
};

🔴 Danger: The filename must be sitemap.xml.ts (or .js). The .xml extension tells Astro to serve it as XML, while .ts allows TypeScript code.

Querying Content Collections

The Challenge: Get all published content from multiple collections.

import { getCollection } from 'astro:content';

// Get all published projects (filter out drafts)
const projects = await getCollection('projects', ({ data }) => !data.draft);

// Get all published tutorials
const tutorials = await getCollection('tutorials', ({ data }) => !data.draft);

// Get all categories (no draft field)
const categories = await getCollection('category');

Key Insights:

  1. Filter Function: The second argument to getCollection is a predicate

    ({ data }) => !data.draft
    // Equivalent to:
    (entry) => entry.data.draft !== true
  2. Type Safety: TypeScript knows the shape of data based on your schema

    projects[0].data.publishDate  // Date (type-safe!)
    projects[0].data.title        // string
    projects[0].id                // string (filename without extension)
  3. Async Queries: getCollection is async because it reads from disk

Defining Static Pages

Configuration Object Pattern:

const staticPages = [
  { url: '', changefreq: 'weekly', priority: 1.0 },
  { url: 'projects', changefreq: 'weekly', priority: 0.9 },
  { url: 'tutorials', changefreq: 'weekly', priority: 0.9 },
  { url: 'terminal', changefreq: 'monthly', priority: 0.8 },
  { url: 'about', changefreq: 'monthly', priority: 0.7 },
  { url: 'contact', changefreq: 'monthly', priority: 0.7 },
  { url: 'freelance', changefreq: 'monthly', priority: 0.8 },
];

Understanding the Fields:

  • url: Path relative to site root (empty string = homepage)
  • changefreq: How often the page changes
    • always: Changes every time it’s accessed (e.g., live data)
    • hourly: News sites
    • daily: Blogs
    • weekly: Project listings
    • monthly: About pages
    • yearly: Legal pages
    • never: Archived content
  • priority: Relative importance (0.0 to 1.0)
    • 1.0: Homepage
    • 0.8-0.9: Main sections
    • 0.5-0.7: Individual pages
    • 0.0-0.4: Low-priority pages

🔵 Deep Dive: changefreq is a hint, not a directive. Search engines use it to optimize crawl frequency but may ignore it if they detect different patterns.

Generating XML for Static Pages

Template Literal Pattern:

const sitemap = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  ${staticPages.map(page => `
  <url>
    <loc>${siteUrl}${page.url}</loc>
    <changefreq>${page.changefreq}</changefreq>
    <priority>${page.priority}</priority>
  </url>`).join('')}
</urlset>`;

Why Template Literals?

  • Readability: XML structure is visible
  • Interpolation: Easy to inject variables
  • Multiline: No string concatenation

Alternative: XML Builder Library

// Using a library like 'xmlbuilder2'
import { create } from 'xmlbuilder2';

const root = create({ version: '1.0', encoding: 'UTF-8' })
  .ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' });

staticPages.forEach(page => {
  root.ele('url')
    .ele('loc').txt(`${siteUrl}${page.url}`).up()
    .ele('changefreq').txt(page.changefreq).up()
    .ele('priority').txt(page.priority.toString()).up();
});

const sitemap = root.end({ prettyPrint: true });

Comparison:

ApproachProsCons
Template LiteralsSimple, no dependenciesManual escaping, harder to validate
XML BuilderType-safe, auto-escapingExtra dependency, more verbose

For sitemaps (simple structure, trusted data), template literals are sufficient.

Adding Dynamic Content Pages

Projects with Last Modified Dates:

${projects.map(project => `
  <url>
    <loc>${siteUrl}projects/${project.id}</loc>
    <lastmod>${project.data.publishDate.toISOString()}</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>`).join('')}

Key Details:

  1. URL Construction: ${siteUrl}projects/${project.id}

    • project.id is the filename without extension
    • For src/content/projects/simple-router.mdx, id = "simple-router"
  2. Date Formatting: toISOString()

    • Converts Date to ISO 8601 format: "2024-01-15T10:30:00.000Z"
    • Required format for <lastmod> tags
  3. Priority Logic: Projects get 0.8 (high priority, but below main sections)

Tutorials (Similar Pattern):

${tutorials.map(tutorial => `
  <url>
    <loc>${siteUrl}tutorials/${tutorial.id}</loc>
    <lastmod>${tutorial.data.publishDate.toISOString()}</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.7</priority>
  </url>`).join('')}

Categories (No Last Modified):

${categories.map(category => `
  <url>
    <loc>${siteUrl}category/${category.data.slug || category.id}</loc>
    <changefreq>weekly</changefreq>
    <priority>0.7</priority>
  </url>`).join('')}

🔴 Danger: Notice category.data.slug || category.id. This handles cases where the category has a custom slug. Always provide fallbacks for optional fields.

Handling Edge Cases in the Repo

The Repo’s Approach (With Bug):

${categories.filter(c => c.data.type === 'project').map(category => `
  <url>
    <loc>${siteUrl}projects/category/${category.data.slug || category.id}</loc>
    <changefreq>weekly</changefreq>
    <priority>0.7</priority>
  </url>`).join('')}

${categories.filter(c => c.data.type === 'project').map(category => `
  <url>
    <loc>${siteUrl}freelance/${category.data.slug || category.id}</loc>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>`).join('')}

Issue: The code filters for c.data.type === 'project', but the category schema doesn’t define a type field. This will return empty arrays.

Fixed Version:

// Remove the filter or add 'type' to category schema
${categories.map(category => `
  <url>
    <loc>${siteUrl}category/${category.data.slug || category.id}</loc>
    <changefreq>weekly</changefreq>
    <priority>0.7</priority>
  </url>`).join('')}

Returning the Response

return new Response(sitemap, {
  headers: {
    'Content-Type': 'application/xml; charset=utf-8',
  },
});

Critical Headers:

  • Content-Type: application/xml: Tells browsers/crawlers this is XML
  • charset=utf-8: Ensures proper encoding for international characters

Optional Headers (Production Enhancement):

return new Response(sitemap, {
  headers: {
    'Content-Type': 'application/xml; charset=utf-8',
    'Cache-Control': 'public, max-age=3600',  // Cache for 1 hour
    'X-Robots-Tag': 'noindex',  // Don't index the sitemap itself
  },
});

Complete Implementation

Here’s the full sitemap generator from the repository (with fixes):

import type { APIRoute } from 'astro';
import { getCollection } from 'astro:content';

export const GET: APIRoute = async ({ site }) => {
  const siteUrl = site?.toString() || 'https://yoursite.com';
  
  // Get all published content
  const projects = await getCollection('projects', ({ data }) => !data.draft);
  const tutorials = await getCollection('tutorials', ({ data }) => !data.draft);
  const categories = await getCollection('category');
  
  // Static pages with priorities
  const staticPages = [
    { url: '', changefreq: 'weekly', priority: 1.0 },
    { url: 'projects', changefreq: 'weekly', priority: 0.9 },
    { url: 'tutorials', changefreq: 'weekly', priority: 0.9 },
    { url: 'terminal', changefreq: 'monthly', priority: 0.8 },
    { url: 'about', changefreq: 'monthly', priority: 0.7 },
    { url: 'contact', changefreq: 'monthly', priority: 0.7 },
    { url: 'freelance', changefreq: 'monthly', priority: 0.8 },
  ];
  
  // Generate sitemap XML
  const sitemap = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  ${staticPages.map(page => `
  <url>
    <loc>${siteUrl}${page.url}</loc>
    <changefreq>${page.changefreq}</changefreq>
    <priority>${page.priority}</priority>
  </url>`).join('')}
  
  ${projects.map(project => `
  <url>
    <loc>${siteUrl}projects/${project.id}</loc>
    <lastmod>${project.data.publishDate.toISOString()}</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>`).join('')}
  
  ${tutorials.map(tutorial => `
  <url>
    <loc>${siteUrl}tutorials/${tutorial.id}</loc>
    <lastmod>${tutorial.data.publishDate.toISOString()}</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.7</priority>
  </url>`).join('')}
  
  ${categories.map(category => `
  <url>
    <loc>${siteUrl}category/${category.data.slug || category.id}</loc>
    <changefreq>weekly</changefreq>
    <priority>0.7</priority>
  </url>`).join('')}
</urlset>`;
  
  return new Response(sitemap, {
    headers: {
      'Content-Type': 'application/xml; charset=utf-8',
    },
  });
};

Under the Hood

Build-Time Execution

When Does This Run?

npm run build

Astro processes all pages

Finds sitemap.xml.ts

Executes GET function

Queries content collections (reads from disk)

Generates XML string

Writes to dist/sitemap.xml

File is served statically

Performance Characteristics:

For a site with 100 projects + 50 tutorials:

  • Content collection queries: ~50ms
  • XML string generation: ~10ms
  • File write: ~5ms
  • Total: ~65ms (one-time build cost)

Once built, the sitemap is a static file served from CDN with zero runtime cost.

Memory Efficiency

String Concatenation Analysis:

${projects.map(project => `...`).join('')}

What Happens:

  1. map() creates an array of strings: ["<url>...</url>", "<url>...</url>", ...]
  2. join('') concatenates them into one string

Memory Usage:

  • 100 projects × ~200 bytes per URL = ~20KB
  • Temporary array: ~20KB
  • Final string: ~20KB
  • Peak memory: ~40KB

Alternative (Streaming):

For very large sites (10,000+ pages), consider streaming:

export const GET: APIRoute = async ({ site }) => {
  const stream = new ReadableStream({
    async start(controller) {
      controller.enqueue('<?xml version="1.0"?>\n<urlset>');
      
      const projects = await getCollection('projects');
      for (const project of projects) {
        controller.enqueue(`<url><loc>${site}projects/${project.id}</loc></url>`);
      }
      
      controller.enqueue('</urlset>');
      controller.close();
    }
  });
  
  return new Response(stream, {
    headers: { 'Content-Type': 'application/xml' }
  });
};

This uses constant memory regardless of site size.

XML Escaping

The Hidden Danger:

<loc>${siteUrl}projects/${project.id}</loc>

What if project.id contains special XML characters?

project.id = "my-project-&-tutorial"
Result: <loc>...my-project-&-tutorial</loc>  // INVALID XML!

Proper Escaping:

function escapeXml(unsafe: string): string {
  return unsafe
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&apos;');
}

<loc>${escapeXml(siteUrl)}projects/${escapeXml(project.id)}</loc>

🔴 Danger: The repo code doesn’t escape XML. This works because Astro’s content collection IDs are filename-based (alphanumeric + hyphens), but it’s a latent bug.

Edge Cases & Pitfalls

Missing Site URL

Problem: site is undefined if not configured in astro.config.mjs.

Current Behavior: Falls back to 'https://yoursite.com' (placeholder).

Better Approach: Fail fast during build:

export const GET: APIRoute = async ({ site }) => {
  if (!site) {
    throw new Error('site URL must be configured in astro.config.mjs');
  }
  
  const siteUrl = site.toString();
  // ...
};

Duplicate URLs

Problem: If a project and tutorial have the same ID, they create duplicate URLs.

projects/my-guide.mdx → /projects/my-guide
tutorials/my-guide.mdx → /tutorials/my-guide

This is fine (different paths), but what if you have:

projects/my-guide.mdx → /projects/my-guide
projects/my-guide.md → /projects/my-guide  // DUPLICATE!

Solution: Validate uniqueness:

const allUrls = new Set<string>();

projects.forEach(project => {
  const url = `${siteUrl}projects/${project.id}`;
  if (allUrls.has(url)) {
    throw new Error(`Duplicate URL: ${url}`);
  }
  allUrls.add(url);
});

Invalid Dates

Problem: publishDate.toISOString() throws if the date is invalid.

---
publishDate: "not a date"
---

Protection: Zod schema validation catches this at build time, but add runtime check:

${projects.map(project => {
  const lastmod = project.data.publishDate instanceof Date
    ? project.data.publishDate.toISOString()
    : new Date().toISOString();  // Fallback to now
  
  return `<url>
    <loc>${siteUrl}projects/${project.id}</loc>
    <lastmod>${lastmod}</lastmod>
  </url>`;
}).join('')}

Trailing Slashes

Problem: Inconsistent trailing slashes confuse search engines.

https://yoursite.com/projects  // No slash
https://yoursite.com/projects/  // With slash

These are treated as different URLs by search engines.

Solution: Normalize in config:

// astro.config.mjs
export default defineConfig({
  site: 'https://jasontran.pages.dev',
  trailingSlash: 'never',  // or 'always' or 'ignore'
});

Then ensure sitemap matches:

const normalizeUrl = (url: string) => {
  // Remove trailing slash if trailingSlash: 'never'
  return url.replace(/\/$/, '');
};

<loc>${normalizeUrl(`${siteUrl}projects/${project.id}`)}</loc>

Forgetting robots.txt

Problem: Sitemap exists, but search engines don’t know where to find it.

Solution: Create public/robots.txt:

User-agent: *
Allow: /

Sitemap: https://jasontran.pages.dev/sitemap.xml

This tells crawlers where the sitemap is located.

Not Submitting to Search Engines

Problem: Sitemap exists, but you never told Google about it.

Solution: Submit to search consoles:

  1. Google Search Console: https://search.google.com/search-console

    • Add property → Verify ownership → Sitemaps → Submit sitemap URL
  2. Bing Webmaster Tools: https://www.bing.com/webmasters

    • Similar process
  3. Automatic Discovery: Add to <head>:

    <link rel="sitemap" type="application/xml" href="/sitemap.xml" />

Conclusion

Skills Acquired

You’ve learned:

  1. API Routes: Creating non-HTML endpoints in Astro
  2. XML Generation: Programmatically building valid XML documents
  3. Content Queries: Fetching and filtering content collections
  4. SEO Optimization: Understanding search engine crawling behavior
  5. Build-Time Generation: Computing data once at build time for static serving

The Proficiency Marker: Most developers use sitemap plugins without understanding how they work. You now understand sitemaps as programmatically generated indexes that bridge the gap between dynamic content and search engine expectations. This mental model transfers to:

  • RSS feed generation
  • API documentation generation (OpenAPI/Swagger)
  • Static site generation patterns
  • Build-time optimization strategies

Extending the Sitemap

Adding Image Sitemaps:

${projects.map(project => `
  <url>
    <loc>${siteUrl}projects/${project.id}</loc>
    <lastmod>${project.data.publishDate.toISOString()}</lastmod>
    <image:image>
      <image:loc>${project.data.coverImage}</image:loc>
      <image:title>${escapeXml(project.data.title)}</image:title>
    </image:image>
  </url>`).join('')}

Adding Video Sitemaps:

${projects.filter(p => p.data.videoUrls).map(project => `
  <url>
    <loc>${siteUrl}projects/${project.id}</loc>
    <video:video>
      <video:thumbnail_loc>${project.data.coverImage}</video:thumbnail_loc>
      <video:title>${escapeXml(project.data.title)}</video:title>
      <video:description>${escapeXml(project.data.description)}</video:description>
      <video:content_loc>${project.data.videoUrls[0]}</video:content_loc>
    </video:video>
  </url>`).join('')}

Adding News Sitemaps:

// For time-sensitive content
${recentPosts.map(post => `
  <url>
    <loc>${siteUrl}blog/${post.id}</loc>
    <news:news>
      <news:publication>
        <news:name>Your Site Name</news:name>
        <news:language>en</news:language>
      </news:publication>
      <news:publication_date>${post.data.publishDate.toISOString()}</news:publication_date>
      <news:title>${escapeXml(post.data.title)}</news:title>
    </news:news>
  </url>`).join('')}

Next Challenge: Implement a sitemap index that splits large sitemaps into multiple files (required for sites with 50,000+ URLs), following the Sitemap Protocol specification.

We respect your privacy.

← View All Tutorials

Related Projects

    Ask me anything!