featured image

Implementing Polymorphic Tagging with AI-Powered Content Classification

Learn how to build a scalable polymorphic tagging system using Laravel's Eloquent ORM, integrated with Google Gemini AI for automatic content classification. This tutorial covers database design, AI integration, asynchronous processing, and fuzzy matching techniques.

Published

Sat Sep 27 2025

Technologies Used

LLM Laravel API Integration Redis PHP PostgreSQL SQL
Intermediate 63 minutes

Purpose

The Problem

You’re building a content platform where users post about K-pop. A single post might mention:

  • Multiple groups (BLACKPINK, BTS, NewJeans)
  • Specific idols (Jennie, Jungkook, Hanni)
  • Songs (“Pink Venom”, “Dynamite”, “Attention”)
  • Variety shows (Running Man, Knowing Bros)

The challenges:

  1. Manual tagging is tedious: Users won’t tag everything correctly
  2. Free-text tags are messy: “blackpink” vs “BLACKPINK” vs “Black Pink”
  3. Search becomes impossible: How do you find all posts about a specific group?
  4. Relationships are complex: Tags can reference different entity types
  5. External content needs automation: Reddit posts need automatic tagging

The naive approach? Create separate tables for each tag type (post_groups, post_idols, post_songs). This leads to:

  • Code duplication: Same logic repeated 4+ times
  • Maintenance nightmare: Adding a new taggable type requires new migrations, models, and relationships
  • Query complexity: Finding “all content related to BLACKPINK” requires UNION queries across multiple tables

The Solution

We’re analyzing MyKpopLists’ polymorphic tagging system combined with Google Gemini AI for automatic content classification. This architecture:

  • Uses a single post_tags table for all tag types
  • Leverages Laravel’s polymorphic relationships
  • Integrates AI to automatically tag Reddit posts
  • Processes content asynchronously using job queues

What You’ll Learn

This tutorial covers:

  • Polymorphic many-to-many relationships in Laravel
  • External API integration (Reddit, Google Gemini)
  • Asynchronous job processing with Laravel Queues
  • Rate limiting and retry strategies
  • Fuzzy matching algorithms for entity resolution

Prerequisites & Tooling

Knowledge Base

  • Intermediate Laravel (models, relationships, migrations)
  • Understanding of database foreign keys and indexes
  • Basic knowledge of HTTP APIs and JSON
  • Familiarity with asynchronous processing concepts

Environment

  • PHP: 8.2+
  • Laravel: 12.x
  • Queue Driver: Database (development) or Redis (production)
  • External APIs: Reddit JSON API, Google Gemini API

Required API Keys

# .env configuration
REDDIT_USERNAME=your-app-name
GEMINI_API_KEY=your-gemini-api-key

High-Level Architecture

The Complete Data Flow

graph TB
    A[Reddit API] -->|Fetch Posts| B[RedditService]
    B -->|Queue Job| C[ProcessSingleRedditPost]
    C -->|Extract Content| D[Post Data]
    D -->|Send to AI| E[GeminiTaggingService]
    E -->|Analyze Text| F[Google Gemini API]
    F -->|Return Tags| G[Tag Suggestions]
    G -->|Fuzzy Match| H[Entity Resolution]
    H -->|Create Records| I[Post Model]
    I -->|Polymorphic Relations| J[PostTag Pivot]
    J -->|Links to| K[Groups/Idols/Songs]
    
    style E fill:#f9f,stroke:#333,stroke-width:2px
    style J fill:#bbf,stroke:#333,stroke-width:2px

The Polymorphic Relationship Structure

erDiagram
    POSTS ||--o{ POST_TAGS : has
    POST_TAGS }o--|| GROUPS : references
    POST_TAGS }o--|| IDOLS : references
    POST_TAGS }o--|| SONGS : references
    POST_TAGS }o--|| VARIETY_SHOWS : references
    
    POSTS {
        int id
        string title
        text content
        string reddit_id
    }
    
    POST_TAGS {
        int post_id
        string taggable_type
        int taggable_id
    }
    
    GROUPS {
        int id
        string name
    }
    
    IDOLS {
        int id
        string stage_name
    }

Understanding Polymorphic Tagging

Think of polymorphic tagging like a universal sticky note system:

  • You have different types of items: books, movies, music albums, games
  • Instead of having separate sticky note pads for each type, you have ONE pad
  • Each sticky note has: “Item Type” + “Item ID” + “Note Content”
  • A single note can reference any item type

Similarly, post_tags is one table that can reference Groups, Idols, Songs, or Variety Shows using taggable_type (the item type) and taggable_id (which specific item).

The Implementation

Understanding the Naive Approach

Most developers start with separate tables for each tag type:

// ❌ NAIVE APPROACH - Separate tables
Schema::create('post_group_tags', function (Blueprint $table) {
    $table->foreignId('post_id')->constrained();
    $table->foreignId('group_id')->constrained();
});

Schema::create('post_idol_tags', function (Blueprint $table) {
    $table->foreignId('post_id')->constrained();
    $table->foreignId('idol_id')->constrained();
});

Schema::create('post_song_tags', function (Blueprint $table) {
    $table->foreignId('post_id')->constrained();
    $table->foreignId('song_id')->constrained();
});

// This approach requires:
// - 4+ separate migrations
// - 4+ separate relationships in Post model
// - 4+ separate queries to find all tags
// - Duplicate code everywhere

Problems:

  • Adding a new taggable type (e.g., Albums) requires new migration, model changes, and query updates
  • Finding “all content tagged with X” requires complex UNION queries
  • Code duplication across similar functionality

The Polymorphic Migration

Laravel’s polymorphic relationships solve this with a single table:

<?php
// database/migrations/xxxx_create_post_tags_table.php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration
{
    public function up(): void
    {
        Schema::create('post_tags', function (Blueprint $table) {
            $table->id();
            
            // The post being tagged
            $table->foreignId('post_id')
                  ->constrained()
                  ->onDelete('cascade');  // Delete tags when post is deleted
            
            // Polymorphic columns - the magic happens here
            $table->string('taggable_type');  // e.g., "App\Models\Group"
            $table->unsignedBigInteger('taggable_id');  // e.g., 42
            
            $table->timestamps();
            
            // Composite index for fast lookups
            $table->index(['taggable_type', 'taggable_id']);
            
            // Prevent duplicate tags
            $table->unique(['post_id', 'taggable_type', 'taggable_id']);
        });
    }
};

Key insights:

  • taggable_type: Stores the full class name (e.g., “App\Models\Group”)
  • taggable_id: Stores the ID of that specific entity
  • Together, they can reference ANY model in your application
  • The unique constraint prevents tagging the same entity twice

Setting Up the Post Model Relationships

Now we define the relationships in the Post model:

<?php
// app/Models/Post.php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;

class Post extends Model
{
    // Method 1: Access all tags (any type)
    public function tags()
    {
        return $this->hasMany(PostTag::class);
    }
    
    // Method 2: Access specific tag types using morphedByMany
    // This is Laravel's magic for polymorphic many-to-many
    
    public function taggedGroups()
    {
        return $this->morphedByMany(
            Group::class,           // The related model
            'taggable',             // The name in the pivot table (taggable_type, taggable_id)
            'post_tags'             // The pivot table name
        );
    }
    
    public function taggedIdols()
    {
        return $this->morphedByMany(Idol::class, 'taggable', 'post_tags');
    }
    
    public function taggedSongs()
    {
        return $this->morphedByMany(Song::class, 'taggable', 'post_tags');
    }
    
    public function taggedVarietyShows()
    {
        return $this->morphedByMany(VarietyShow::class, 'taggable', 'post_tags');
    }
    
    // Convenience method to get all tagged entities
    public function getTaggedEntitiesAttribute()
    {
        return $this->taggedGroups
            ->merge($this->taggedIdols)
            ->merge($this->taggedSongs)
            ->merge($this->taggedVarietyShows);
    }
}

Usage examples:

$post = Post::find(1);

// Get all groups tagged in this post
$groups = $post->taggedGroups;  // Collection of Group models

// Get all tags (any type)
$allTags = $post->tagged_entities;  // Merged collection

// Check if a specific group is tagged
$isTagged = $post->taggedGroups()->where('id', 5)->exists();

The Reverse Relationship

The taggable models also need relationships:

<?php
// app/Models/Group.php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;

class Group extends Model
{
    // Get all posts that tagged this group
    public function posts()
    {
        return $this->morphToMany(
            Post::class,      // The related model
            'taggable',       // The morph name
            'post_tags'       // The pivot table
        );
    }
}

// Now you can do:
$group = Group::find(1);
$posts = $group->posts;  // All posts tagged with this group

Fetching Reddit Posts

The RedditService handles API communication:

<?php
// app/Services/RedditService.php (simplified)

namespace App\Services;

use Illuminate\Support\Facades\Http;
use Carbon\Carbon;

class RedditService
{
    private string $baseUrl = 'https://www.reddit.com';
    private string $userAgent;

    public function __construct()
    {
        // Reddit requires a unique User-Agent
        $this->userAgent = config('app.name') . '/1.0 (by /u/' . 
                          config('services.reddit.username') . ')';
    }

    public function fetchDailyPosts(): array
    {
        // Fetch from Reddit's JSON API
        $response = Http::withHeaders([
            'User-Agent' => $this->userAgent,
        ])->get($this->baseUrl . '/r/kpop/new.json?limit=200');

        if (!$response->successful()) {
            return [];
        }

        $data = $response->json();
        $posts = $data['data']['children'] ?? [];

        // Filter to last 24 hours
        $twentyFourHoursAgo = time() - (24 * 60 * 60);
        
        $recentPosts = array_filter($posts, function ($post) use ($twentyFourHoursAgo) {
            return $post['data']['created_utc'] > $twentyFourHoursAgo;
        });

        // Format for our system
        return array_map(function ($post) {
            return $this->formatRedditPost($post['data']);
        }, $recentPosts);
    }

    private function formatRedditPost(array $postData): array
    {
        return [
            'reddit_id' => $postData['id'],
            'title' => $postData['title'],
            'content' => $postData['selftext'] ?? '',
            'url' => $postData['url'] ?? null,
            'author' => $postData['author'],
            'score' => $postData['score'],
            'created_utc' => Carbon::createFromTimestamp($postData['created_utc']),
            'permalink' => 'https://www.reddit.com' . $postData['permalink'],
            'flair_text' => $postData['link_flair_text'] ?? null,
        ];
    }
}

Key points:

  • Reddit’s JSON API is public (no auth required)
  • User-Agent is mandatory (Reddit blocks generic agents)
  • We filter by timestamp to get recent posts
  • Data is normalized to our application’s format

AI-Powered Tagging with Google Gemini

This is where the magic happens - using AI to analyze content:

<?php
// app/Services/GeminiTaggingService.php (core logic)

namespace App\Services;

use Illuminate\Support\Facades\Http;
use App\Models\Group;
use App\Models\Idol;
use App\Models\Song;

class GeminiTaggingService
{
    private string $apiKey;
    private string $apiUrl = 'https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent';

    public function __construct()
    {
        $this->apiKey = config('services.gemini.api_key');
    }

    public function analyzePost(array $postData): array
    {
        // Step 1: Build the prompt with context
        $prompt = $this->buildPrompt($postData);
        
        // Step 2: Send to Gemini API
        $response = Http::withHeaders([
            'Content-Type' => 'application/json',
        ])->post($this->apiUrl . '?key=' . $this->apiKey, [
            'contents' => [
                [
                    'parts' => [
                        ['text' => $prompt]
                    ]
                ]
            ],
            'generationConfig' => [
                'temperature' => 0.2,  // Low temperature for consistent results
                'topK' => 40,
                'topP' => 0.95,
            ]
        ]);

        if (!$response->successful()) {
            return ['groups' => [], 'idols' => [], 'songs' => []];
        }

        // Step 3: Parse AI response
        $aiResponse = $response->json();
        $text = $aiResponse['candidates'][0]['content']['parts'][0]['text'] ?? '';
        
        // Step 4: Extract structured data
        return $this->parseAIResponse($text);
    }

    private function buildPrompt(array $postData): string
    {
        // Get all available entities from database
        $groups = Group::pluck('name')->toArray();
        $idols = Idol::pluck('stage_name')->toArray();
        $songs = Song::pluck('title')->toArray();

        return <<<PROMPT
You are a K-pop content analyzer. Analyze the following post and identify relevant entities.

POST TITLE: {$postData['title']}
POST CONTENT: {$postData['content']}
POST FLAIR: {$postData['flair_text']}

AVAILABLE GROUPS: {$this->formatList($groups)}
AVAILABLE IDOLS: {$this->formatList($idols)}
AVAILABLE SONGS: {$this->formatList($songs)}

Return ONLY a JSON object with this structure:
{
  "groups": ["group1", "group2"],
  "idols": ["idol1", "idol2"],
  "songs": ["song1", "song2"]
}

Only include entities you are CONFIDENT are mentioned. Return empty arrays if unsure.
PROMPT;
    }

    private function parseAIResponse(string $text): array
    {
        // Extract JSON from AI response (may have markdown formatting)
        $jsonStart = strpos($text, '{');
        $jsonEnd = strrpos($text, '}');
        
        if ($jsonStart === false || $jsonEnd === false) {
            return ['groups' => [], 'idols' => [], 'songs' => []];
        }
        
        $jsonString = substr($text, $jsonStart, $jsonEnd - $jsonStart + 1);
        $data = json_decode($jsonString, true);
        
        return $data ?? ['groups' => [], 'idols' => [], 'songs' => []];
    }

    private function formatList(array $items): string
    {
        return implode(', ', array_slice($items, 0, 100));  // Limit to avoid token limits
    }
}

Critical design decisions:

  • Low temperature (0.2): Makes AI responses more deterministic and consistent
  • Provide entity lists: Constrains AI to known entities (prevents hallucination)
  • JSON output: Structured data is easier to parse than natural language
  • Confidence filtering: Only tag when AI is certain

Entity Resolution with Fuzzy Matching

AI returns names, but we need database IDs. This requires fuzzy matching:

<?php
// app/Services/RedditPostProcessingService.php (entity resolution)

namespace App\Services;

use App\Models\Group;
use App\Models\Idol;
use App\Models\Song;

class RedditPostProcessingService
{
    public function resolveEntities(array $aiTags): array
    {
        return [
            'groups' => $this->resolveGroups($aiTags['groups'] ?? []),
            'idols' => $this->resolveIdols($aiTags['idols'] ?? []),
            'songs' => $this->resolveSongs($aiTags['songs'] ?? []),
        ];
    }

    private function resolveGroups(array $groupNames): array
    {
        $resolved = [];
        
        foreach ($groupNames as $name) {
            // Try exact match first
            $group = Group::where('name', $name)->first();
            
            if (!$group) {
                // Try case-insensitive match
                $group = Group::whereRaw('LOWER(name) = ?', [strtolower($name)])->first();
            }
            
            if (!$group) {
                // Try fuzzy match (handles typos, abbreviations)
                $group = $this->fuzzyMatchGroup($name);
            }
            
            if ($group) {
                $resolved[] = $group->id;
            }
        }
        
        return array_unique($resolved);
    }

    private function fuzzyMatchGroup(string $name): ?Group
    {
        // Get all groups and calculate similarity scores
        $groups = Group::all();
        $bestMatch = null;
        $bestScore = 0;
        
        foreach ($groups as $group) {
            // Levenshtein distance (edit distance)
            $distance = levenshtein(
                strtolower($name),
                strtolower($group->name)
            );
            
            // Convert to similarity score (0-100)
            $maxLength = max(strlen($name), strlen($group->name));
            $similarity = (1 - ($distance / $maxLength)) * 100;
            
            // Require 80% similarity
            if ($similarity > 80 && $similarity > $bestScore) {
                $bestScore = $similarity;
                $bestMatch = $group;
            }
        }
        
        return $bestMatch;
    }

    private function resolveIdols(array $idolNames): array
    {
        $resolved = [];
        
        foreach ($idolNames as $name) {
            // Check both stage_name and birth_name
            $idol = Idol::where('stage_name', $name)
                       ->orWhere('birth_name', $name)
                       ->first();
            
            if (!$idol) {
                $idol = Idol::whereRaw('LOWER(stage_name) = ?', [strtolower($name)])
                           ->orWhereRaw('LOWER(birth_name) = ?', [strtolower($name)])
                           ->first();
            }
            
            if ($idol) {
                $resolved[] = $idol->id;
            }
        }
        
        return array_unique($resolved);
    }

    private function resolveSongs(array $songTitles): array
    {
        $resolved = [];
        
        foreach ($songTitles as $title) {
            $song = Song::whereRaw('LOWER(title) = ?', [strtolower($title)])->first();
            
            if ($song) {
                $resolved[] = $song->id;
            }
        }
        
        return array_unique($resolved);
    }
}

Fuzzy matching strategies:

  1. Exact match: Fastest, handles perfect matches
  2. Case-insensitive: Handles capitalization differences
  3. Levenshtein distance: Handles typos and minor variations
  4. Threshold (80%): Prevents false positives

Creating the Post with Tags

Finally, we create the post and attach tags using polymorphic relationships:

<?php
// app/Services/RedditPostProcessingService.php (continued)

public function createPostWithTags(array $postData, array $resolvedEntities): void
{
    // Step 1: Check if post already exists
    if (Post::where('reddit_id', $postData['reddit_id'])->exists()) {
        return;  // Skip duplicates
    }

    // Step 2: Get or create the Reddit bot user
    $botUser = User::firstOrCreate(
        ['username' => 'reddit_bot'],
        [
            'name' => 'Reddit Bot',
            'email' => 'reddit@mykpoplists.com',
            'password' => bcrypt(Str::random(32)),
        ]
    );

    // Step 3: Create the post
    $post = Post::create([
        'user_id' => $botUser->id,
        'title' => $postData['title'],
        'content' => $postData['content'],
        'video_url' => $postData['url'],
        'reddit_id' => $postData['reddit_id'],
        'reddit_permalink' => $postData['permalink'],
        'reddit_author' => $postData['author'],
        'reddit_flair' => $postData['flair_text'],
    ]);

    // Step 4: Attach tags using polymorphic relationships
    
    // Attach groups
    if (!empty($resolvedEntities['groups'])) {
        $post->taggedGroups()->attach($resolvedEntities['groups']);
    }

    // Attach idols
    if (!empty($resolvedEntities['idols'])) {
        $post->taggedIdols()->attach($resolvedEntities['idols']);
    }

    // Attach songs
    if (!empty($resolvedEntities['songs'])) {
        $post->taggedSongs()->attach($resolvedEntities['songs']);
    }
}

What happens under the hood with attach():

// When you call:
$post->taggedGroups()->attach([1, 2, 3]);

// Laravel executes:
INSERT INTO post_tags (post_id, taggable_type, taggable_id, created_at, updated_at)
VALUES 
  (42, 'App\Models\Group', 1, NOW(), NOW()),
  (42, 'App\Models\Group', 2, NOW(), NOW()),
  (42, 'App\Models\Group', 3, NOW(), NOW());

The taggable_type is automatically set based on the relationship definition!

Asynchronous Processing with Job Queues

Processing posts synchronously would block the application. We use Laravel’s queue system:

<?php
// app/Jobs/ProcessSingleRedditPost.php

namespace App\Jobs;

use App\Services\GeminiTaggingService;
use App\Services\RedditPostProcessingService;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\Log;

class ProcessSingleRedditPost implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    // Retry configuration
    public int $tries = 3;              // Retry up to 3 times
    public int $backoff = 60;           // Wait 60 seconds between retries
    public int $timeout = 120;          // Timeout after 2 minutes

    public function __construct(
        public array $postData
    ) {}

    public function handle(
        GeminiTaggingService $geminiService,
        RedditPostProcessingService $processingService
    ): void
    {
        Log::info('Processing Reddit post job', [
            'reddit_id' => $this->postData['reddit_id']
        ]);

        try {
            // Step 1: Use AI to analyze the post
            $aiTags = $geminiService->analyzePost($this->postData);

            // Step 2: Resolve entity names to database IDs
            $resolvedEntities = $processingService->resolveEntities($aiTags);

            // Step 3: Create post with tags
            $processingService->createPostWithTags($this->postData, $resolvedEntities);

            Log::info('Successfully processed Reddit post job', [
                'reddit_id' => $this->postData['reddit_id'],
                'tags' => $resolvedEntities
            ]);

        } catch (\Exception $e) {
            Log::error('Reddit post job failed', [
                'reddit_id' => $this->postData['reddit_id'],
                'error' => $e->getMessage()
            ]);

            // Re-throw to trigger retry mechanism
            throw $e;
        }
    }

    // Called when all retries are exhausted
    public function failed(\Throwable $exception): void
    {
        Log::error('Reddit post job permanently failed', [
            'reddit_id' => $this->postData['reddit_id'],
            'error' => $exception->getMessage()
        ]);
    }
}

Job queue benefits:

  • Non-blocking: User requests return immediately
  • Retry logic: Automatic retries with exponential backoff
  • Rate limiting: Process jobs gradually to respect API limits
  • Failure handling: Failed jobs are logged and can be manually retried

Dispatching Jobs with Rate Limiting

The command that orchestrates everything:

<?php
// app/Console/Commands/QueueRedditPosts.php

namespace App\Console\Commands;

use App\Services\RedditService;
use App\Jobs\ProcessSingleRedditPost;
use Illuminate\Console\Command;
use Carbon\Carbon;

class QueueRedditPosts extends Command
{
    protected $signature = 'reddit:queue-posts 
                           {--spread-hours=12 : Hours to spread processing over}
                           {--max-posts=50 : Maximum posts to queue}';

    protected $description = 'Queue Reddit posts for gradual processing';

    public function handle(RedditService $redditService): int
    {
        $this->info('Fetching posts from r/kpop...');
        
        // Fetch posts from Reddit
        $posts = $redditService->fetchDailyPosts();
        
        $maxPosts = $this->option('max-posts');
        $posts = array_slice($posts, 0, $maxPosts);
        
        $this->info('Found ' . count($posts) . ' posts to process');

        // Calculate delay between jobs
        $spreadHours = $this->option('spread-hours');
        $totalSeconds = $spreadHours * 3600;
        $delayBetweenJobs = count($posts) > 1 
            ? $totalSeconds / count($posts) 
            : 0;

        // Queue each post with increasing delay
        foreach ($posts as $index => $postData) {
            $delaySeconds = (int) ($index * $delayBetweenJobs);
            
            ProcessSingleRedditPost::dispatch($postData)
                ->delay(now()->addSeconds($delaySeconds));
            
            $this->info("Queued post {$postData['reddit_id']} " .
                       "to process in " . gmdate('H:i:s', $delaySeconds));
        }

        $this->info('All posts queued successfully!');
        
        return Command::SUCCESS;
    }
}

Rate limiting strategy:

  • 50 posts spread over 12 hours = 1 post every 14.4 minutes
  • Respects Reddit’s rate limits (60 requests/minute)
  • Respects Gemini’s daily quota (1500 requests/day)
  • Jobs process gradually throughout the day

Usage:

# Queue 50 posts over 12 hours (default)
php artisan reddit:queue-posts

# Queue 30 posts over 8 hours
php artisan reddit:queue-posts --max-posts=30 --spread-hours=8

# Schedule to run daily at 6 AM
# In routes/console.php:
Schedule::command('reddit:queue-posts')->dailyAt('06:00');

Under the Hood

How Polymorphic Relationships Work in SQL

When you call $post->taggedGroups, Laravel generates this query:

SELECT groups.* 
FROM groups
INNER JOIN post_tags 
  ON groups.id = post_tags.taggable_id
WHERE post_tags.post_id = 42
  AND post_tags.taggable_type = 'App\Models\Group';

The magic is in the WHERE clause - it filters by both the post ID and the model type.

Memory Considerations: Eager Loading

// ❌ BAD: N+1 queries
$posts = Post::all();  // 1 query
foreach ($posts as $post) {
    echo $post->taggedGroups->count();  // N queries
}
// Total: 1 + N queries

// ✅ GOOD: Eager loading
$posts = Post::with('taggedGroups')->get();  // 2 queries total
foreach ($posts as $post) {
    echo $post->taggedGroups->count();  // No additional queries
}

Laravel’s eager loading executes:

-- Query 1: Get posts
SELECT * FROM posts;

-- Query 2: Get all related groups in one query
SELECT groups.*, post_tags.post_id
FROM groups
INNER JOIN post_tags ON groups.id = post_tags.taggable_id
WHERE post_tags.post_id IN (1, 2, 3, 4, 5)
  AND post_tags.taggable_type = 'App\Models\Group';

API Rate Limiting: The Math

Reddit API:

  • Limit: 60 requests per minute
  • Our usage: 1 request per 14.4 minutes (well under limit)

Google Gemini API:

  • Free tier: 60 requests per minute, 1500 per day
  • Our usage: 50 posts per day = 50 requests per day
  • Spread over 12 hours = 4.17 requests per hour (well under limit)

Why spread jobs?

  • Prevents burst traffic that triggers rate limits
  • Distributes server load evenly
  • Allows for manual intervention if issues arise
  • Reduces memory pressure (one job at a time)

Job Queue Internals

When you dispatch a job, Laravel:

  1. Serializes the job (converts to JSON)
  2. Stores in database (or Redis/SQS)
  3. Queue worker picks it up when ready
  4. Unserializes and executes the handle() method
  5. Deletes from queue on success, or retries on failure
// What gets stored in the jobs table:
{
  "uuid": "9d3a5c8e-...",
  "displayName": "App\\Jobs\\ProcessSingleRedditPost",
  "job": "Illuminate\\Queue\\CallQueuedHandler@call",
  "maxTries": 3,
  "timeout": 120,
  "data": {
    "commandName": "App\\Jobs\\ProcessSingleRedditPost",
    "command": "O:38:\"App\\Jobs\\ProcessSingleRedditPost\":1:{...}"
  },
  "attempts": 0,
  "reserved_at": null,
  "available_at": 1704067200,  // Unix timestamp for delayed execution
  "created_at": 1704063600
}

Edge Cases & Pitfalls

AI Hallucination

AI models can “hallucinate” - confidently returning entities that don’t exist:

// AI might return:
{
  "groups": ["BLACKPINK", "SuperNova"],  // SuperNova doesn't exist!
  "idols": ["Jennie", "StarGirl"]        // StarGirl is made up!
}

// Solution: Entity resolution with fuzzy matching
$resolved = $processingService->resolveEntities($aiTags);
// Result: Only BLACKPINK and Jennie are matched
// SuperNova and StarGirl are silently dropped

Best practices:

  • Always validate AI output against your database
  • Use fuzzy matching with strict thresholds (80%+)
  • Log unmatched entities for manual review
  • Periodically audit AI accuracy

Duplicate Tags

Without proper constraints, you could tag the same entity multiple times:

// Without unique constraint:
$post->taggedGroups()->attach([1, 1, 1]);  // Creates 3 identical tags!

// Solution: Database unique constraint
$table->unique(['post_id', 'taggable_type', 'taggable_id']);

// Or use sync() instead of attach()
$post->taggedGroups()->sync([1, 2, 3]);  // Removes old tags, adds new ones

Memory Exhaustion with Large Batches

// ❌ BAD: Loading all entities into memory
$groups = Group::all();  // Could be 10,000+ groups
$prompt = "AVAILABLE GROUPS: " . implode(', ', $groups->pluck('name')->toArray());

// ✅ GOOD: Limit to most relevant
$groups = Group::orderBy('followers_count', 'desc')
               ->limit(100)
               ->pluck('name')
               ->toArray();

Why Polymorphic Over Separate Tables?

Polymorphic advantages:

  • Single source of truth for tagging logic
  • Easy to add new taggable types (just add relationship)
  • Consistent querying across all tag types
  • Reduced code duplication

When NOT to use polymorphic:

  • Different tag types need different pivot columns
  • Performance is critical (polymorphic adds slight overhead)
  • You need database-level foreign key constraints on taggable_id

Security: Validating External Content

// ❌ DANGEROUS: Trusting Reddit content directly
Post::create([
    'title' => $postData['title'],  // Could contain XSS!
    'content' => $postData['content']  // Could be malicious!
]);

// ✅ SAFE: Sanitize and validate
use Illuminate\Support\Str;

Post::create([
    'title' => Str::limit(strip_tags($postData['title']), 255),
    'content' => strip_tags($postData['content'], '<p><br><a>')
]);

Testing Polymorphic Relationships

use Tests\TestCase;
use App\Models\Post;
use App\Models\Group;

class PolymorphicTaggingTest extends TestCase
{
    public function test_post_can_be_tagged_with_multiple_groups()
    {
        $post = Post::factory()->create();
        $groups = Group::factory()->count(3)->create();
        
        $post->taggedGroups()->attach($groups->pluck('id'));
        
        $this->assertCount(3, $post->taggedGroups);
        $this->assertDatabaseCount('post_tags', 3);
    }
    
    public function test_group_shows_all_tagged_posts()
    {
        $group = Group::factory()->create();
        $posts = Post::factory()->count(5)->create();
        
        foreach ($posts as $post) {
            $post->taggedGroups()->attach($group->id);
        }
        
        $this->assertCount(5, $group->posts);
    }
    
    public function test_deleting_post_removes_tags()
    {
        $post = Post::factory()->create();
        $group = Group::factory()->create();
        
        $post->taggedGroups()->attach($group->id);
        $this->assertDatabaseCount('post_tags', 1);
        
        $post->delete();
        $this->assertDatabaseCount('post_tags', 0);  // Cascade delete
    }
}

Conclusion

What You’ve Learned

You now understand how to build a production-grade content tagging system that:

  1. Uses polymorphic relationships to handle multiple entity types with a single table
  2. Integrates AI for automatic content classification
  3. Implements fuzzy matching to resolve entity names to database IDs
  4. Processes asynchronously using job queues with retry logic
  5. Respects rate limits by spreading jobs over time

The Key Insights

Polymorphic relationships are about flexibility. Instead of creating rigid table structures for each relationship type, you create a flexible system that can adapt as your application grows.

AI is a tool, not a solution. The real engineering is in the validation, entity resolution, and error handling around the AI. The AI provides suggestions; your code makes decisions.

Asynchronous processing is essential at scale. Blocking operations (API calls, AI processing) should never happen in the request/response cycle.

Next Steps

  • Extend: Add more taggable types (Albums, Companies, Events)
  • Optimize: Implement caching for frequently accessed tags
  • Enhance: Add confidence scores to AI tags for manual review
  • Monitor: Build a dashboard to track AI accuracy over time
  • Scale: Move from database queues to Redis or SQS for better performance

Real-World Applications

This pattern is used by:

  • YouTube: Videos tagged with topics, people, locations
  • Medium: Articles tagged with topics, publications, authors
  • Spotify: Songs tagged with genres, moods, artists, playlists
  • Instagram: Posts tagged with people, locations, products

You’ve just learned how major platforms handle flexible, scalable tagging systems. 🎉

We respect your privacy.

← View All Tutorials

Related Projects

    Ask me anything!