Implementing Type-Safe Content Collections with Zod Validation

I deployed a portfolio update at midnight, went to bed, and woke up to a recruiter’s email asking why my featured project had “undefined” where the category should display. I’d mistyped a category name in the frontmatter — web-dev instead of web-development. The TypeScript build succeeded. The site deployed. The bug shipped.

That’s the fundamental problem with treating Markdown frontmatter as unstructured text. You validate your TypeScript functions with types, but frontmatter is just strings. One typo in one file breaks production, and you find out from a recruiter instead of from a build error.

Astro’s Content Collections API treats frontmatter as first-class data with Zod schema validation, automatic TypeScript type generation from those schemas, and relational references between content types. The entire src/content.config.ts file for this portfolio — six collections, cross-references between them — is what this tutorial breaks down.

Before You Begin

Required:

TypeScript basics (types, interfaces, generics)
Understanding of Markdown frontmatter
Familiarity with the concept of schemas and validation

Helpful:

Experience with form validation libraries (Yup, Joi)
Understanding of relational databases (foreign keys, joins)

Setup:

# Content collections are built into Astro 2.0+
npm install astro@latest

Zod ships with Astro’s content collections — no separate install needed.

How Validation Fits Into the Build Pipeline

graph TB
    A[Markdown Files] --> B[Content Loader]
    B --> C[Zod Schema Validation]
    C --> D{Valid?}
    D -->|No| E[Build Error]
    D -->|Yes| F[Type Generation]
    F --> G[TypeScript Types]
    G --> H[Astro Components]
    H --> I[Type-Safe Queries]
    I --> J[Rendered HTML]
    
    K[content.config.ts] --> C
    K --> F
    
    style C fill:#a855f7
    style F fill:#10b981
    style E fill:#ef4444

Zod validation runs during the build phase, not at runtime. Schema violations cause build errors before deployment — the broken category name I typo’d would have stopped the build entirely instead of shipping as “undefined.” Once validated, data is cached and served with no runtime overhead.

Defining a Basic Collection Schema

Here’s what the blog collection definition looks like:

import { defineCollection, z } from 'astro:content';
import { glob } from 'astro/loaders';

const blogCollection = defineCollection({
  loader: glob({ 
    pattern: "**/*.mdx",
    base: "./src/content/blog"
  }),
  
  schema: ({ image }) =>
    z.object({
      title: z.string(),
      description: z.string(),
      publishDate: z.date(),
      coverImage: z.string(),
      categories: z.array(z.string()),
      tags: z.array(z.string())
    })
});

One thing that trips people up: the schema is a function that receives helpers like image(), not a plain object. Forgetting the function wrapper is a common mistake that produces a confusing error.

Optional Fields, Defaults, and Why They’re Different

schema: ({ image }) =>
  z.object({
    title: z.string(),           // Required — build fails if missing
    
    featured: z.boolean().optional(),         // Can be omitted; value is undefined
    draft: z.boolean().default(false),        // Can be omitted; value defaults to false
    tags: z.array(z.string()).optional().default([])  // Redundant — default makes it non-optional
  })

.optional() means the field can be undefined. .default(value) means the field is always present, using the default when omitted. .optional().default(value) is redundant — once you add a default, the field is never undefined. I’ve shipped that redundancy myself.

The projects collection uses more of Zod’s validators:

const projectsCollection = defineCollection({
  loader: glob({ pattern: "**/*.mdx", base: "./src/content/projects" }),
  schema: ({ image }) =>
    z.object({
      title: z.string(),
      description: z.string(),
      publishDate: z.date(),
      coverImage: z.string(),
      categories: z.array(reference('category')),
      tags: z.array(z.string()),
      githubUrl: z.string().url(),       // Validates URL format including protocol
      demoUrl: z.string().url().optional(),
      demoType: z.enum(['live', 'video', 'none']).default('none'),
      videoUrls: z.array(z.string().url()).optional(),
      featured: z.boolean().optional(),
      draft: z.boolean().default(false)
    })
});

z.string().url() catches 90% of URL typos at build time — missing https://, malformed domains, etc. That broken GitHub link from my opening scenario would have been a build error instead of a production bug.

Relational References: Foreign Keys in Markdown

The most useful feature is reference(), which creates typed foreign key relationships between content collections.

Without references, you’d store categories as strings:

categories: z.array(z.string())  // ['web-development', 'python']

If you typo a category name, you get a wrong category at runtime. If you rename a category, you have to hunt through every file that references it.

With reference(), Astro validates that the referenced content actually exists:

import { defineCollection, z, reference } from 'astro:content';

const categoryCollection = defineCollection({
  loader: glob({ pattern: "**/*.md", base: "./src/content/category" }),
  schema: z.object({
    title: z.string(),
    description: z.string(),
    slug: z.string().optional()
  })
});

const projectsCollection = defineCollection({
  loader: glob({ pattern: "**/*.mdx", base: "./src/content/projects" }),
  schema: z.object({
    title: z.string(),
    categories: z.array(reference('category')),
  })
});

In your Markdown:

---
title: "My Web App"
categories: ['web-development', 'python-automation']
---

Astro validates that src/content/category/web-development.md and src/content/category/python-automation.md both exist. Typo 'web-developmnet' and the build fails with a clear error rather than silently showing a broken category.

Projects and tutorials cross-reference each other — projects reference their related tutorials, tutorials reference their related projects:

const projectsCollection = defineCollection({
  schema: z.object({
    relatedTutorials: z.array(reference('tutorials')).optional()
  })
});

const tutorialsCollection = defineCollection({
  schema: z.object({
    relatedProjects: z.array(reference('projects')).optional()
  })
});

Astro stores IDs, not full objects, so circular references don’t create infinite loops. References are lazy — you explicitly resolve them when you need the referenced content:

const project = await getEntry('projects', 'simple-router');
const tutorials = await Promise.all(
  project.data.relatedTutorials.map(ref => getEntry('tutorials', ref.id))
);

How Astro Generates Types From Schemas

When you run npm run dev, Astro generates .astro/types.d.ts automatically. After defining your schema, TypeScript knows the exact shape of every field in every collection:

import type { CollectionEntry } from 'astro:content';

const project: CollectionEntry<'projects'> = await getEntry('projects', 'simple-router');

// TypeScript knows all of these:
project.data.title        // string
project.data.githubUrl    // string
project.data.featured     // boolean | undefined
project.data.categories   // Array<reference to category>

No any, no manual type definitions to maintain — the types derive directly from the Zod schema. Change the schema and the types update automatically on the next build.

Things That Will Go Wrong

Date format strictness. z.date() only accepts ISO 8601 format:

# Works:
publishDate: 2024-01-15
publishDate: 2024-01-15T10:30:00Z

# Fails:
publishDate: "January 15, 2024"
publishDate: 2024-1-15

Use z.coerce.date() if you need more flexible parsing.

Adding a required field breaks all existing content. If you add author: z.string() to the blog schema, every existing post without an author field fails validation. Solutions: make it optional, provide a default, or run a migration script before changing the schema.

Reference validation is shallow. reference() only checks that the ID exists, not that the referenced content has valid data. A category that exists but has an empty description passes reference validation — you need strict schemas on the referenced collections too.

The image() helper only validates local files. If you’re using Cloudinary URLs or any external image service, use z.string() instead of image(). The helper only works with files in src/assets/.

Querying in Components

// Single entry
const project = await getEntry('projects', Astro.params.slug);
// Fully typed — TypeScript knows all fields

// Filtered collection
const projects = await getCollection('projects', ({ data }) => {
  return data.draft !== true;
});

projects.sort((a, b) => +b.data.publishDate - +a.data.publishDate);

// Resolving references
const tutorials = await Promise.all(
  project.data.relatedTutorials.map(ref => 
    getEntry('tutorials', ref.id)
  )
);

The build-time validation means that by the time your component code runs, you can trust the data. No defensive checks for missing fields on required properties, no runtime errors from malformed dates, no broken URLs. The schema caught those at build time.

That’s what it’s worth to treat content as data rather than documentation.