On this page
- Purpose
- The Problem
- The Solution
- What You’ll Learn
- Prerequisites & Tooling
- Knowledge Base
- Environment
- Required API Keys
- High-Level Architecture
- The Complete Data Flow
- The Polymorphic Relationship Structure
- Understanding Polymorphic Tagging
- The Implementation
- Understanding the Naive Approach
- The Polymorphic Migration
- Setting Up the Post Model Relationships
- The Reverse Relationship
- Fetching Reddit Posts
- AI-Powered Tagging with Google Gemini
- Entity Resolution with Fuzzy Matching
- Creating the Post with Tags
- Asynchronous Processing with Job Queues
- Dispatching Jobs with Rate Limiting
- Under the Hood
- How Polymorphic Relationships Work in SQL
- Memory Considerations: Eager Loading
- API Rate Limiting: The Math
- Job Queue Internals
- Edge Cases & Pitfalls
- AI Hallucination
- Duplicate Tags
- Memory Exhaustion with Large Batches
- Why Polymorphic Over Separate Tables?
- Security: Validating External Content
- Testing Polymorphic Relationships
- Conclusion
- What You’ve Learned
- The Key Insights
- Next Steps
- Real-World Applications
Purpose
The Problem
You’re building a content platform where users post about K-pop. A single post might mention:
- Multiple groups (BLACKPINK, BTS, NewJeans)
- Specific idols (Jennie, Jungkook, Hanni)
- Songs (“Pink Venom”, “Dynamite”, “Attention”)
- Variety shows (Running Man, Knowing Bros)
The challenges:
- Manual tagging is tedious: Users won’t tag everything correctly
- Free-text tags are messy: “blackpink” vs “BLACKPINK” vs “Black Pink”
- Search becomes impossible: How do you find all posts about a specific group?
- Relationships are complex: Tags can reference different entity types
- External content needs automation: Reddit posts need automatic tagging
The naive approach? Create separate tables for each tag type (post_groups, post_idols, post_songs). This leads to:
- Code duplication: Same logic repeated 4+ times
- Maintenance nightmare: Adding a new taggable type requires new migrations, models, and relationships
- Query complexity: Finding “all content related to BLACKPINK” requires UNION queries across multiple tables
The Solution
We’re analyzing MyKpopLists’ polymorphic tagging system combined with Google Gemini AI for automatic content classification. This architecture:
- Uses a single
post_tagstable for all tag types - Leverages Laravel’s polymorphic relationships
- Integrates AI to automatically tag Reddit posts
- Processes content asynchronously using job queues
What You’ll Learn
This tutorial covers:
- Polymorphic many-to-many relationships in Laravel
- External API integration (Reddit, Google Gemini)
- Asynchronous job processing with Laravel Queues
- Rate limiting and retry strategies
- Fuzzy matching algorithms for entity resolution
Prerequisites & Tooling
Knowledge Base
- Intermediate Laravel (models, relationships, migrations)
- Understanding of database foreign keys and indexes
- Basic knowledge of HTTP APIs and JSON
- Familiarity with asynchronous processing concepts
Environment
- PHP: 8.2+
- Laravel: 12.x
- Queue Driver: Database (development) or Redis (production)
- External APIs: Reddit JSON API, Google Gemini API
Required API Keys
# .env configuration
REDDIT_USERNAME=your-app-name
GEMINI_API_KEY=your-gemini-api-key
High-Level Architecture
The Complete Data Flow
graph TB
A[Reddit API] -->|Fetch Posts| B[RedditService]
B -->|Queue Job| C[ProcessSingleRedditPost]
C -->|Extract Content| D[Post Data]
D -->|Send to AI| E[GeminiTaggingService]
E -->|Analyze Text| F[Google Gemini API]
F -->|Return Tags| G[Tag Suggestions]
G -->|Fuzzy Match| H[Entity Resolution]
H -->|Create Records| I[Post Model]
I -->|Polymorphic Relations| J[PostTag Pivot]
J -->|Links to| K[Groups/Idols/Songs]
style E fill:#f9f,stroke:#333,stroke-width:2px
style J fill:#bbf,stroke:#333,stroke-width:2px
The Polymorphic Relationship Structure
erDiagram
POSTS ||--o{ POST_TAGS : has
POST_TAGS }o--|| GROUPS : references
POST_TAGS }o--|| IDOLS : references
POST_TAGS }o--|| SONGS : references
POST_TAGS }o--|| VARIETY_SHOWS : references
POSTS {
int id
string title
text content
string reddit_id
}
POST_TAGS {
int post_id
string taggable_type
int taggable_id
}
GROUPS {
int id
string name
}
IDOLS {
int id
string stage_name
}
Understanding Polymorphic Tagging
Think of polymorphic tagging like a universal sticky note system:
- You have different types of items: books, movies, music albums, games
- Instead of having separate sticky note pads for each type, you have ONE pad
- Each sticky note has: “Item Type” + “Item ID” + “Note Content”
- A single note can reference any item type
Similarly, post_tags is one table that can reference Groups, Idols, Songs, or Variety Shows using taggable_type (the item type) and taggable_id (which specific item).
The Implementation
Understanding the Naive Approach
Most developers start with separate tables for each tag type:
// ❌ NAIVE APPROACH - Separate tables
Schema::create('post_group_tags', function (Blueprint $table) {
$table->foreignId('post_id')->constrained();
$table->foreignId('group_id')->constrained();
});
Schema::create('post_idol_tags', function (Blueprint $table) {
$table->foreignId('post_id')->constrained();
$table->foreignId('idol_id')->constrained();
});
Schema::create('post_song_tags', function (Blueprint $table) {
$table->foreignId('post_id')->constrained();
$table->foreignId('song_id')->constrained();
});
// This approach requires:
// - 4+ separate migrations
// - 4+ separate relationships in Post model
// - 4+ separate queries to find all tags
// - Duplicate code everywhere
Problems:
- Adding a new taggable type (e.g., Albums) requires new migration, model changes, and query updates
- Finding “all content tagged with X” requires complex UNION queries
- Code duplication across similar functionality
The Polymorphic Migration
Laravel’s polymorphic relationships solve this with a single table:
<?php
// database/migrations/xxxx_create_post_tags_table.php
use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;
return new class extends Migration
{
public function up(): void
{
Schema::create('post_tags', function (Blueprint $table) {
$table->id();
// The post being tagged
$table->foreignId('post_id')
->constrained()
->onDelete('cascade'); // Delete tags when post is deleted
// Polymorphic columns - the magic happens here
$table->string('taggable_type'); // e.g., "App\Models\Group"
$table->unsignedBigInteger('taggable_id'); // e.g., 42
$table->timestamps();
// Composite index for fast lookups
$table->index(['taggable_type', 'taggable_id']);
// Prevent duplicate tags
$table->unique(['post_id', 'taggable_type', 'taggable_id']);
});
}
};
Key insights:
taggable_type: Stores the full class name (e.g., “App\Models\Group”)taggable_id: Stores the ID of that specific entity- Together, they can reference ANY model in your application
- The unique constraint prevents tagging the same entity twice
Setting Up the Post Model Relationships
Now we define the relationships in the Post model:
<?php
// app/Models/Post.php
namespace App\Models;
use Illuminate\Database\Eloquent\Model;
class Post extends Model
{
// Method 1: Access all tags (any type)
public function tags()
{
return $this->hasMany(PostTag::class);
}
// Method 2: Access specific tag types using morphedByMany
// This is Laravel's magic for polymorphic many-to-many
public function taggedGroups()
{
return $this->morphedByMany(
Group::class, // The related model
'taggable', // The name in the pivot table (taggable_type, taggable_id)
'post_tags' // The pivot table name
);
}
public function taggedIdols()
{
return $this->morphedByMany(Idol::class, 'taggable', 'post_tags');
}
public function taggedSongs()
{
return $this->morphedByMany(Song::class, 'taggable', 'post_tags');
}
public function taggedVarietyShows()
{
return $this->morphedByMany(VarietyShow::class, 'taggable', 'post_tags');
}
// Convenience method to get all tagged entities
public function getTaggedEntitiesAttribute()
{
return $this->taggedGroups
->merge($this->taggedIdols)
->merge($this->taggedSongs)
->merge($this->taggedVarietyShows);
}
}
Usage examples:
$post = Post::find(1);
// Get all groups tagged in this post
$groups = $post->taggedGroups; // Collection of Group models
// Get all tags (any type)
$allTags = $post->tagged_entities; // Merged collection
// Check if a specific group is tagged
$isTagged = $post->taggedGroups()->where('id', 5)->exists();
The Reverse Relationship
The taggable models also need relationships:
<?php
// app/Models/Group.php
namespace App\Models;
use Illuminate\Database\Eloquent\Model;
class Group extends Model
{
// Get all posts that tagged this group
public function posts()
{
return $this->morphToMany(
Post::class, // The related model
'taggable', // The morph name
'post_tags' // The pivot table
);
}
}
// Now you can do:
$group = Group::find(1);
$posts = $group->posts; // All posts tagged with this group
Fetching Reddit Posts
The RedditService handles API communication:
<?php
// app/Services/RedditService.php (simplified)
namespace App\Services;
use Illuminate\Support\Facades\Http;
use Carbon\Carbon;
class RedditService
{
private string $baseUrl = 'https://www.reddit.com';
private string $userAgent;
public function __construct()
{
// Reddit requires a unique User-Agent
$this->userAgent = config('app.name') . '/1.0 (by /u/' .
config('services.reddit.username') . ')';
}
public function fetchDailyPosts(): array
{
// Fetch from Reddit's JSON API
$response = Http::withHeaders([
'User-Agent' => $this->userAgent,
])->get($this->baseUrl . '/r/kpop/new.json?limit=200');
if (!$response->successful()) {
return [];
}
$data = $response->json();
$posts = $data['data']['children'] ?? [];
// Filter to last 24 hours
$twentyFourHoursAgo = time() - (24 * 60 * 60);
$recentPosts = array_filter($posts, function ($post) use ($twentyFourHoursAgo) {
return $post['data']['created_utc'] > $twentyFourHoursAgo;
});
// Format for our system
return array_map(function ($post) {
return $this->formatRedditPost($post['data']);
}, $recentPosts);
}
private function formatRedditPost(array $postData): array
{
return [
'reddit_id' => $postData['id'],
'title' => $postData['title'],
'content' => $postData['selftext'] ?? '',
'url' => $postData['url'] ?? null,
'author' => $postData['author'],
'score' => $postData['score'],
'created_utc' => Carbon::createFromTimestamp($postData['created_utc']),
'permalink' => 'https://www.reddit.com' . $postData['permalink'],
'flair_text' => $postData['link_flair_text'] ?? null,
];
}
}
Key points:
- Reddit’s JSON API is public (no auth required)
- User-Agent is mandatory (Reddit blocks generic agents)
- We filter by timestamp to get recent posts
- Data is normalized to our application’s format
AI-Powered Tagging with Google Gemini
This is where the magic happens - using AI to analyze content:
<?php
// app/Services/GeminiTaggingService.php (core logic)
namespace App\Services;
use Illuminate\Support\Facades\Http;
use App\Models\Group;
use App\Models\Idol;
use App\Models\Song;
class GeminiTaggingService
{
private string $apiKey;
private string $apiUrl = 'https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent';
public function __construct()
{
$this->apiKey = config('services.gemini.api_key');
}
public function analyzePost(array $postData): array
{
// Step 1: Build the prompt with context
$prompt = $this->buildPrompt($postData);
// Step 2: Send to Gemini API
$response = Http::withHeaders([
'Content-Type' => 'application/json',
])->post($this->apiUrl . '?key=' . $this->apiKey, [
'contents' => [
[
'parts' => [
['text' => $prompt]
]
]
],
'generationConfig' => [
'temperature' => 0.2, // Low temperature for consistent results
'topK' => 40,
'topP' => 0.95,
]
]);
if (!$response->successful()) {
return ['groups' => [], 'idols' => [], 'songs' => []];
}
// Step 3: Parse AI response
$aiResponse = $response->json();
$text = $aiResponse['candidates'][0]['content']['parts'][0]['text'] ?? '';
// Step 4: Extract structured data
return $this->parseAIResponse($text);
}
private function buildPrompt(array $postData): string
{
// Get all available entities from database
$groups = Group::pluck('name')->toArray();
$idols = Idol::pluck('stage_name')->toArray();
$songs = Song::pluck('title')->toArray();
return <<<PROMPT
You are a K-pop content analyzer. Analyze the following post and identify relevant entities.
POST TITLE: {$postData['title']}
POST CONTENT: {$postData['content']}
POST FLAIR: {$postData['flair_text']}
AVAILABLE GROUPS: {$this->formatList($groups)}
AVAILABLE IDOLS: {$this->formatList($idols)}
AVAILABLE SONGS: {$this->formatList($songs)}
Return ONLY a JSON object with this structure:
{
"groups": ["group1", "group2"],
"idols": ["idol1", "idol2"],
"songs": ["song1", "song2"]
}
Only include entities you are CONFIDENT are mentioned. Return empty arrays if unsure.
PROMPT;
}
private function parseAIResponse(string $text): array
{
// Extract JSON from AI response (may have markdown formatting)
$jsonStart = strpos($text, '{');
$jsonEnd = strrpos($text, '}');
if ($jsonStart === false || $jsonEnd === false) {
return ['groups' => [], 'idols' => [], 'songs' => []];
}
$jsonString = substr($text, $jsonStart, $jsonEnd - $jsonStart + 1);
$data = json_decode($jsonString, true);
return $data ?? ['groups' => [], 'idols' => [], 'songs' => []];
}
private function formatList(array $items): string
{
return implode(', ', array_slice($items, 0, 100)); // Limit to avoid token limits
}
}
Critical design decisions:
- Low temperature (0.2): Makes AI responses more deterministic and consistent
- Provide entity lists: Constrains AI to known entities (prevents hallucination)
- JSON output: Structured data is easier to parse than natural language
- Confidence filtering: Only tag when AI is certain
Entity Resolution with Fuzzy Matching
AI returns names, but we need database IDs. This requires fuzzy matching:
<?php
// app/Services/RedditPostProcessingService.php (entity resolution)
namespace App\Services;
use App\Models\Group;
use App\Models\Idol;
use App\Models\Song;
class RedditPostProcessingService
{
public function resolveEntities(array $aiTags): array
{
return [
'groups' => $this->resolveGroups($aiTags['groups'] ?? []),
'idols' => $this->resolveIdols($aiTags['idols'] ?? []),
'songs' => $this->resolveSongs($aiTags['songs'] ?? []),
];
}
private function resolveGroups(array $groupNames): array
{
$resolved = [];
foreach ($groupNames as $name) {
// Try exact match first
$group = Group::where('name', $name)->first();
if (!$group) {
// Try case-insensitive match
$group = Group::whereRaw('LOWER(name) = ?', [strtolower($name)])->first();
}
if (!$group) {
// Try fuzzy match (handles typos, abbreviations)
$group = $this->fuzzyMatchGroup($name);
}
if ($group) {
$resolved[] = $group->id;
}
}
return array_unique($resolved);
}
private function fuzzyMatchGroup(string $name): ?Group
{
// Get all groups and calculate similarity scores
$groups = Group::all();
$bestMatch = null;
$bestScore = 0;
foreach ($groups as $group) {
// Levenshtein distance (edit distance)
$distance = levenshtein(
strtolower($name),
strtolower($group->name)
);
// Convert to similarity score (0-100)
$maxLength = max(strlen($name), strlen($group->name));
$similarity = (1 - ($distance / $maxLength)) * 100;
// Require 80% similarity
if ($similarity > 80 && $similarity > $bestScore) {
$bestScore = $similarity;
$bestMatch = $group;
}
}
return $bestMatch;
}
private function resolveIdols(array $idolNames): array
{
$resolved = [];
foreach ($idolNames as $name) {
// Check both stage_name and birth_name
$idol = Idol::where('stage_name', $name)
->orWhere('birth_name', $name)
->first();
if (!$idol) {
$idol = Idol::whereRaw('LOWER(stage_name) = ?', [strtolower($name)])
->orWhereRaw('LOWER(birth_name) = ?', [strtolower($name)])
->first();
}
if ($idol) {
$resolved[] = $idol->id;
}
}
return array_unique($resolved);
}
private function resolveSongs(array $songTitles): array
{
$resolved = [];
foreach ($songTitles as $title) {
$song = Song::whereRaw('LOWER(title) = ?', [strtolower($title)])->first();
if ($song) {
$resolved[] = $song->id;
}
}
return array_unique($resolved);
}
}
Fuzzy matching strategies:
- Exact match: Fastest, handles perfect matches
- Case-insensitive: Handles capitalization differences
- Levenshtein distance: Handles typos and minor variations
- Threshold (80%): Prevents false positives
Creating the Post with Tags
Finally, we create the post and attach tags using polymorphic relationships:
<?php
// app/Services/RedditPostProcessingService.php (continued)
public function createPostWithTags(array $postData, array $resolvedEntities): void
{
// Step 1: Check if post already exists
if (Post::where('reddit_id', $postData['reddit_id'])->exists()) {
return; // Skip duplicates
}
// Step 2: Get or create the Reddit bot user
$botUser = User::firstOrCreate(
['username' => 'reddit_bot'],
[
'name' => 'Reddit Bot',
'email' => 'reddit@mykpoplists.com',
'password' => bcrypt(Str::random(32)),
]
);
// Step 3: Create the post
$post = Post::create([
'user_id' => $botUser->id,
'title' => $postData['title'],
'content' => $postData['content'],
'video_url' => $postData['url'],
'reddit_id' => $postData['reddit_id'],
'reddit_permalink' => $postData['permalink'],
'reddit_author' => $postData['author'],
'reddit_flair' => $postData['flair_text'],
]);
// Step 4: Attach tags using polymorphic relationships
// Attach groups
if (!empty($resolvedEntities['groups'])) {
$post->taggedGroups()->attach($resolvedEntities['groups']);
}
// Attach idols
if (!empty($resolvedEntities['idols'])) {
$post->taggedIdols()->attach($resolvedEntities['idols']);
}
// Attach songs
if (!empty($resolvedEntities['songs'])) {
$post->taggedSongs()->attach($resolvedEntities['songs']);
}
}
What happens under the hood with attach():
// When you call:
$post->taggedGroups()->attach([1, 2, 3]);
// Laravel executes:
INSERT INTO post_tags (post_id, taggable_type, taggable_id, created_at, updated_at)
VALUES
(42, 'App\Models\Group', 1, NOW(), NOW()),
(42, 'App\Models\Group', 2, NOW(), NOW()),
(42, 'App\Models\Group', 3, NOW(), NOW());
The taggable_type is automatically set based on the relationship definition!
Asynchronous Processing with Job Queues
Processing posts synchronously would block the application. We use Laravel’s queue system:
<?php
// app/Jobs/ProcessSingleRedditPost.php
namespace App\Jobs;
use App\Services\GeminiTaggingService;
use App\Services\RedditPostProcessingService;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\Log;
class ProcessSingleRedditPost implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
// Retry configuration
public int $tries = 3; // Retry up to 3 times
public int $backoff = 60; // Wait 60 seconds between retries
public int $timeout = 120; // Timeout after 2 minutes
public function __construct(
public array $postData
) {}
public function handle(
GeminiTaggingService $geminiService,
RedditPostProcessingService $processingService
): void
{
Log::info('Processing Reddit post job', [
'reddit_id' => $this->postData['reddit_id']
]);
try {
// Step 1: Use AI to analyze the post
$aiTags = $geminiService->analyzePost($this->postData);
// Step 2: Resolve entity names to database IDs
$resolvedEntities = $processingService->resolveEntities($aiTags);
// Step 3: Create post with tags
$processingService->createPostWithTags($this->postData, $resolvedEntities);
Log::info('Successfully processed Reddit post job', [
'reddit_id' => $this->postData['reddit_id'],
'tags' => $resolvedEntities
]);
} catch (\Exception $e) {
Log::error('Reddit post job failed', [
'reddit_id' => $this->postData['reddit_id'],
'error' => $e->getMessage()
]);
// Re-throw to trigger retry mechanism
throw $e;
}
}
// Called when all retries are exhausted
public function failed(\Throwable $exception): void
{
Log::error('Reddit post job permanently failed', [
'reddit_id' => $this->postData['reddit_id'],
'error' => $exception->getMessage()
]);
}
}
Job queue benefits:
- Non-blocking: User requests return immediately
- Retry logic: Automatic retries with exponential backoff
- Rate limiting: Process jobs gradually to respect API limits
- Failure handling: Failed jobs are logged and can be manually retried
Dispatching Jobs with Rate Limiting
The command that orchestrates everything:
<?php
// app/Console/Commands/QueueRedditPosts.php
namespace App\Console\Commands;
use App\Services\RedditService;
use App\Jobs\ProcessSingleRedditPost;
use Illuminate\Console\Command;
use Carbon\Carbon;
class QueueRedditPosts extends Command
{
protected $signature = 'reddit:queue-posts
{--spread-hours=12 : Hours to spread processing over}
{--max-posts=50 : Maximum posts to queue}';
protected $description = 'Queue Reddit posts for gradual processing';
public function handle(RedditService $redditService): int
{
$this->info('Fetching posts from r/kpop...');
// Fetch posts from Reddit
$posts = $redditService->fetchDailyPosts();
$maxPosts = $this->option('max-posts');
$posts = array_slice($posts, 0, $maxPosts);
$this->info('Found ' . count($posts) . ' posts to process');
// Calculate delay between jobs
$spreadHours = $this->option('spread-hours');
$totalSeconds = $spreadHours * 3600;
$delayBetweenJobs = count($posts) > 1
? $totalSeconds / count($posts)
: 0;
// Queue each post with increasing delay
foreach ($posts as $index => $postData) {
$delaySeconds = (int) ($index * $delayBetweenJobs);
ProcessSingleRedditPost::dispatch($postData)
->delay(now()->addSeconds($delaySeconds));
$this->info("Queued post {$postData['reddit_id']} " .
"to process in " . gmdate('H:i:s', $delaySeconds));
}
$this->info('All posts queued successfully!');
return Command::SUCCESS;
}
}
Rate limiting strategy:
- 50 posts spread over 12 hours = 1 post every 14.4 minutes
- Respects Reddit’s rate limits (60 requests/minute)
- Respects Gemini’s daily quota (1500 requests/day)
- Jobs process gradually throughout the day
Usage:
# Queue 50 posts over 12 hours (default)
php artisan reddit:queue-posts
# Queue 30 posts over 8 hours
php artisan reddit:queue-posts --max-posts=30 --spread-hours=8
# Schedule to run daily at 6 AM
# In routes/console.php:
Schedule::command('reddit:queue-posts')->dailyAt('06:00');
Under the Hood
How Polymorphic Relationships Work in SQL
When you call $post->taggedGroups, Laravel generates this query:
SELECT groups.*
FROM groups
INNER JOIN post_tags
ON groups.id = post_tags.taggable_id
WHERE post_tags.post_id = 42
AND post_tags.taggable_type = 'App\Models\Group';
The magic is in the WHERE clause - it filters by both the post ID and the model type.
Memory Considerations: Eager Loading
// ❌ BAD: N+1 queries
$posts = Post::all(); // 1 query
foreach ($posts as $post) {
echo $post->taggedGroups->count(); // N queries
}
// Total: 1 + N queries
// ✅ GOOD: Eager loading
$posts = Post::with('taggedGroups')->get(); // 2 queries total
foreach ($posts as $post) {
echo $post->taggedGroups->count(); // No additional queries
}
Laravel’s eager loading executes:
-- Query 1: Get posts
SELECT * FROM posts;
-- Query 2: Get all related groups in one query
SELECT groups.*, post_tags.post_id
FROM groups
INNER JOIN post_tags ON groups.id = post_tags.taggable_id
WHERE post_tags.post_id IN (1, 2, 3, 4, 5)
AND post_tags.taggable_type = 'App\Models\Group';
API Rate Limiting: The Math
Reddit API:
- Limit: 60 requests per minute
- Our usage: 1 request per 14.4 minutes (well under limit)
Google Gemini API:
- Free tier: 60 requests per minute, 1500 per day
- Our usage: 50 posts per day = 50 requests per day
- Spread over 12 hours = 4.17 requests per hour (well under limit)
Why spread jobs?
- Prevents burst traffic that triggers rate limits
- Distributes server load evenly
- Allows for manual intervention if issues arise
- Reduces memory pressure (one job at a time)
Job Queue Internals
When you dispatch a job, Laravel:
- Serializes the job (converts to JSON)
- Stores in database (or Redis/SQS)
- Queue worker picks it up when ready
- Unserializes and executes the handle() method
- Deletes from queue on success, or retries on failure
// What gets stored in the jobs table:
{
"uuid": "9d3a5c8e-...",
"displayName": "App\\Jobs\\ProcessSingleRedditPost",
"job": "Illuminate\\Queue\\CallQueuedHandler@call",
"maxTries": 3,
"timeout": 120,
"data": {
"commandName": "App\\Jobs\\ProcessSingleRedditPost",
"command": "O:38:\"App\\Jobs\\ProcessSingleRedditPost\":1:{...}"
},
"attempts": 0,
"reserved_at": null,
"available_at": 1704067200, // Unix timestamp for delayed execution
"created_at": 1704063600
}
Edge Cases & Pitfalls
AI Hallucination
AI models can “hallucinate” - confidently returning entities that don’t exist:
// AI might return:
{
"groups": ["BLACKPINK", "SuperNova"], // SuperNova doesn't exist!
"idols": ["Jennie", "StarGirl"] // StarGirl is made up!
}
// Solution: Entity resolution with fuzzy matching
$resolved = $processingService->resolveEntities($aiTags);
// Result: Only BLACKPINK and Jennie are matched
// SuperNova and StarGirl are silently dropped
Best practices:
- Always validate AI output against your database
- Use fuzzy matching with strict thresholds (80%+)
- Log unmatched entities for manual review
- Periodically audit AI accuracy
Duplicate Tags
Without proper constraints, you could tag the same entity multiple times:
// Without unique constraint:
$post->taggedGroups()->attach([1, 1, 1]); // Creates 3 identical tags!
// Solution: Database unique constraint
$table->unique(['post_id', 'taggable_type', 'taggable_id']);
// Or use sync() instead of attach()
$post->taggedGroups()->sync([1, 2, 3]); // Removes old tags, adds new ones
Memory Exhaustion with Large Batches
// ❌ BAD: Loading all entities into memory
$groups = Group::all(); // Could be 10,000+ groups
$prompt = "AVAILABLE GROUPS: " . implode(', ', $groups->pluck('name')->toArray());
// ✅ GOOD: Limit to most relevant
$groups = Group::orderBy('followers_count', 'desc')
->limit(100)
->pluck('name')
->toArray();
Why Polymorphic Over Separate Tables?
Polymorphic advantages:
- Single source of truth for tagging logic
- Easy to add new taggable types (just add relationship)
- Consistent querying across all tag types
- Reduced code duplication
When NOT to use polymorphic:
- Different tag types need different pivot columns
- Performance is critical (polymorphic adds slight overhead)
- You need database-level foreign key constraints on taggable_id
Security: Validating External Content
// ❌ DANGEROUS: Trusting Reddit content directly
Post::create([
'title' => $postData['title'], // Could contain XSS!
'content' => $postData['content'] // Could be malicious!
]);
// ✅ SAFE: Sanitize and validate
use Illuminate\Support\Str;
Post::create([
'title' => Str::limit(strip_tags($postData['title']), 255),
'content' => strip_tags($postData['content'], '<p><br><a>')
]);
Testing Polymorphic Relationships
use Tests\TestCase;
use App\Models\Post;
use App\Models\Group;
class PolymorphicTaggingTest extends TestCase
{
public function test_post_can_be_tagged_with_multiple_groups()
{
$post = Post::factory()->create();
$groups = Group::factory()->count(3)->create();
$post->taggedGroups()->attach($groups->pluck('id'));
$this->assertCount(3, $post->taggedGroups);
$this->assertDatabaseCount('post_tags', 3);
}
public function test_group_shows_all_tagged_posts()
{
$group = Group::factory()->create();
$posts = Post::factory()->count(5)->create();
foreach ($posts as $post) {
$post->taggedGroups()->attach($group->id);
}
$this->assertCount(5, $group->posts);
}
public function test_deleting_post_removes_tags()
{
$post = Post::factory()->create();
$group = Group::factory()->create();
$post->taggedGroups()->attach($group->id);
$this->assertDatabaseCount('post_tags', 1);
$post->delete();
$this->assertDatabaseCount('post_tags', 0); // Cascade delete
}
}
Conclusion
What You’ve Learned
You now understand how to build a production-grade content tagging system that:
- Uses polymorphic relationships to handle multiple entity types with a single table
- Integrates AI for automatic content classification
- Implements fuzzy matching to resolve entity names to database IDs
- Processes asynchronously using job queues with retry logic
- Respects rate limits by spreading jobs over time
The Key Insights
Polymorphic relationships are about flexibility. Instead of creating rigid table structures for each relationship type, you create a flexible system that can adapt as your application grows.
AI is a tool, not a solution. The real engineering is in the validation, entity resolution, and error handling around the AI. The AI provides suggestions; your code makes decisions.
Asynchronous processing is essential at scale. Blocking operations (API calls, AI processing) should never happen in the request/response cycle.
Next Steps
- Extend: Add more taggable types (Albums, Companies, Events)
- Optimize: Implement caching for frequently accessed tags
- Enhance: Add confidence scores to AI tags for manual review
- Monitor: Build a dashboard to track AI accuracy over time
- Scale: Move from database queues to Redis or SQS for better performance
Real-World Applications
This pattern is used by:
- YouTube: Videos tagged with topics, people, locations
- Medium: Articles tagged with topics, publications, authors
- Spotify: Songs tagged with genres, moods, artists, playlists
- Instagram: Posts tagged with people, locations, products
You’ve just learned how major platforms handle flexible, scalable tagging systems. 🎉