On this page
- Purpose
- The Problem
- The Solution
- Why This Matters
- Prerequisites & Tooling
- High-Level Architecture
- Implementation
- Step 1: Function Signature and Initial Type Check
- Step 2: Loop Through Expected Keys
- Step 3: Handle Empty Strings (The Clever Part)
- Step 4: Type Validation with Numeric String Handling
- Complete Function
- Under the Hood
- Memory Efficiency
- Why type(x) is not int Instead of isinstance(x, int)?
- Edge Cases & Pitfalls
- Edge Case 1: None vs. Empty String
- Edge Case 2: Numeric Strings with Spaces
- Security Consideration: Injection Attacks
- Conclusion
Purpose
The Problem
Every web API receives untrusted data from clients. A common beginner mistake is writing repetitive validation code for each route:
# Bad: Repetitive validation
if "name" not in data:
return "Missing name", 400
if type(data["name"]) is not str:
return "Name must be string", 400
if "age" not in data:
return "Missing age", 400
if type(data["age"]) is not int:
return "Age must be integer", 400
# ... repeat for 10 more fields
This approach is error-prone, hard to maintain, and violates the DRY (Don’t Repeat Yourself) principle. Professional codebases use generic validation functions that work with any data structure.
The Solution
We’re studying the validate_input_data_generic() function from server.py, which validates any dictionary against expected keys and types using a single reusable function.
Why This Matters
This validator handles edge cases that junior developers often miss:
- Numeric strings that should be treated as integers (“42” → 42)
- Empty strings as “optional data” signals
- Float values that might arrive as strings from JSON
Prerequisites & Tooling
Knowledge Base:
- Python 3.11+ basic syntax (loops, conditionals)
- Understanding of dictionaries and type checking
- Basic exception handling (
try/except)
Environment:
# No special dependencies - pure Python!
python --version # Should be 3.11+
High-Level Architecture
graph TD
A[Incoming Dictionary] --> B{Is it a dict?}
B -->|No| C[Return Error: Not a dictionary]
B -->|Yes| D[Loop through expected keys]
D --> E{Key exists?}
E -->|No| F[Return Error: Missing key]
E -->|Yes| G{Value is empty string?}
G -->|Yes| H[Skip validation - allow empty]
G -->|No| I{Type matches?}
I -->|Yes| J[Continue to next key]
I -->|No| K{Expected type is int/float?}
K -->|Yes| L{Can convert to numeric?}
L -->|Yes| J
L -->|No| M[Return Error: Invalid type]
K -->|No| M
J --> N{More keys?}
N -->|Yes| D
N -->|No| O[Return True - Valid!]
Analogy: Think of this like a security checkpoint at an airport. Each piece of luggage (data field) must:
- Exist (not missing)
- Be the right type (liquid in liquid containers, solids in solid containers)
- Have special allowances (empty containers are okay)
Implementation
Step 1: Function Signature and Initial Type Check
Logic: We need to accept three inputs: the data to validate, the list of expected keys, and their corresponding types. First, we verify the data is even a dictionary.
def validate_input_data_generic(in_data, expected_keys, expected_types):
"""
Validates that input data is a dictionary with correct information
Parameters
----------
in_data : dict
Object received by the POST request
expected_keys : list
Keys that should be found in the POST request dictionary
expected_types : list
The value data types that should be found (must match order of expected_keys)
Returns
-------
str: Error message if there is a problem, or
bool: True if input data is valid
"""
# CRITICAL: Check type before attempting dictionary operations
if type(in_data) is not dict:
return "Input is not a dictionary"
🔴 Danger: Never assume in_data is a dictionary! If a client sends "hello" instead of {"key": "value"}, calling in_data["key"] will crash with TypeError: string indices must be integers.
Step 2: Loop Through Expected Keys
Logic: We use zip() to pair each expected key with its expected type, then iterate through all pairs simultaneously.
# Pair each key with its expected type and iterate
for key, value_type in zip(expected_keys, expected_types):
# Check if the key exists in the input dictionary
if key not in in_data:
# Use .format() for dynamic error messages
return "Key {} is missing from input".format(key)
🔵 Deep Dive: Why zip()? This Python built-in creates pairs from two lists:
zip(['name', 'age'], [str, int])
# Produces: [('name', str), ('age', int)]
This is more elegant than indexing: for i in range(len(expected_keys)).
Step 3: Handle Empty Strings (The Clever Part)
Logic: This system treats empty strings as “no data provided” rather than invalid data. This allows partial updates (e.g., updating only the patient name without CPAP data).
# Allow empty strings to pass validation
if in_data[key] == "":
continue # Skip to next key
Real-World Example:
# Valid request: Update only the name
{"patient_name": "John Doe", "CPAP_pressure": ""}
# Valid request: Update only CPAP data
{"patient_name": "", "CPAP_pressure": "15"}
Step 4: Type Validation with Numeric String Handling
Logic: JSON serialization often converts numbers to strings. We need to handle "42" as valid for int fields.
# Check if the actual type matches the expected type
if type(in_data[key]) is not value_type:
# Special handling for float types
if value_type == float:
try:
# Attempt to convert to float
float(in_data[key])
continue # Conversion succeeded, move to next key
except ValueError:
# Not a numeric string
return "Key {} is not an int or numeric string".format(key)
# Special handling for int types
if value_type == int:
# .isnumeric() checks if string contains only digits
if str(in_data[key]).isnumeric() is False:
return "Key {} is not an int or numeric string".format(key)
else:
# For all other types (str, list, etc.), reject mismatches
return "Key {} has the incorrect value type".format(key)
# All validations passed!
return True
🔴 Danger: The isnumeric() method only works for positive integers:
"42".isnumeric() # True
"-42".isnumeric() # False (negative sign not allowed)
"4.2".isnumeric() # False (decimal point not allowed)
This is acceptable here because CPAP pressure values are always positive integers (4-25 cmH2O).
Complete Function
def validate_input_data_generic(in_data, expected_keys, expected_types):
"""Validates dictionary against expected structure"""
# Step 1: Type check
if type(in_data) is not dict:
return "Input is not a dictionary"
# Step 2-4: Validate each key-value pair
for key, value_type in zip(expected_keys, expected_types):
if key not in in_data:
return "Key {} is missing from input".format(key)
if in_data[key] == "":
continue # Allow empty strings
if type(in_data[key]) is not value_type:
if value_type == float:
try:
float(in_data[key])
continue
except ValueError:
return "Key {} is not an int or numeric string".format(key)
if value_type == int:
if str(in_data[key]).isnumeric() is False:
return "Key {} is not an int or numeric string".format(key)
else:
return "Key {} has the incorrect value type".format(key)
return True
Under the Hood
Memory Efficiency
This function operates in O(n) time complexity where n = number of expected keys. It makes a single pass through the keys without creating intermediate data structures.
Memory footprint:
zip()returns an iterator (not a list), so it uses O(1) space regardless of input size- No temporary dictionaries or lists are created
- Early returns prevent unnecessary work (fails fast)
Why type(x) is not int Instead of isinstance(x, int)?
# This codebase uses:
type(in_data[key]) is not value_type
# Why not this?
isinstance(in_data[key], value_type)
The type() check is strict and rejects subclasses, while isinstance() accepts them:
type(True) is int # False (bool is a subclass of int)
isinstance(True, int) # True (bool inherits from int)
For API validation, strictness is preferred. We don’t want {"age": True} to pass as a valid integer.
Edge Cases & Pitfalls
Edge Case 1: None vs. Empty String
# This passes validation:
{"patient_name": ""}
# This fails with "incorrect value type":
{"patient_name": None}
The code explicitly checks for "" but not None. In a production system, you might want:
if in_data[key] == "" or in_data[key] is None:
continue
Edge Case 2: Numeric Strings with Spaces
# This fails validation:
{"room_number": " 42 "} # Leading/trailing spaces
# Fix: Add .strip() before validation
if str(in_data[key]).strip().isnumeric() is False:
return "Key {} is not an int or numeric string".format(key)
Security Consideration: Injection Attacks
🔴 Danger: The .format() method with user-provided keys could theoretically be exploited:
# Malicious input
{"__class__": "exploit"}
# Error message exposes internal object structure
"Key __class__ is missing from input"
Better approach:
return f"Key {key!r} is missing from input" # Uses repr() for safe output
Conclusion
What You Learned:
- Generic Programming Pattern: Write one function that works for multiple data structures by parameterizing the expected schema
- Defensive Programming: Always validate input type before accessing properties
- Graceful Degradation: Handle edge cases (numeric strings, empty values) instead of rejecting them
- Early Returns: Fail fast to avoid unnecessary processing
Skill Transfer: This pattern applies to:
- Form validation in web frameworks (React, Vue)
- Config file parsing
- CSV/Excel data ingestion pipelines
- API middleware authentication checks