Building a High-Performance Multi-Protocol Packet Processing Engine

The Problem in the Middle of a Router

You’re sitting at the core of a router processing thousands of packets per second. Each packet arrives as raw bytes — you don’t know if it’s ARP, ICMP, or IP until you parse it. The hard part isn’t any individual protocol; it’s the asynchronous coordination when you need information you don’t have yet.

Here’s the concrete scenario: a packet arrives destined for 10.0.2.2. You look up the routing table, find the right interface. But you don’t know the MAC address of the next hop. You need to send an ARP request. Meanwhile, the original IP packet is in limbo — you can’t forward it until ARP completes. While you’re waiting, five more packets arrive for the same destination. Do you queue all six? Send six ARP requests? What if ARP times out? What if another packet arrives for a different destination that also needs ARP?

This tutorial dissects sr_handlepacket() from router/sr_router.c — the main event loop of a software router. This single function orchestrates protocol multiplexing, stateful ARP resolution with queuing, ICMP error generation, IP forwarding with TTL management, and longest-prefix match routing.

What You Need to Know First

C pointers, pointer arithmetic, struct layout
What a state machine is
How network packet headers nest inside each other
Helpful but not required: how OS kernels do packet processing, what lock-free data structures look like

Key files:

router/sr_router.c — main packet handler (this tutorial)
router/sr_protocol.h — protocol header definitions
router/sr_arpcache.h — ARP cache state structures
router/sr_rt.h — routing table structures

Six Packet Types, One State Machine

Every packet that enters the router follows one of six paths through a state machine. The machine reads the Ethernet type, dispatches to either ARP or IP handling, and from there makes forwarding decisions based on the IP destination and TTL:

stateDiagram-v2
    [*] --> ReceivePacket: Ethernet frame arrives
    ReceivePacket --> ValidateEthernet: Parse Ethernet header
    ValidateEthernet --> ParseEtherType: Read EtherType field
    
    ParseEtherType --> ARPPath: 0x0806 (ARP)
    ParseEtherType --> IPPath: 0x0800 (IPv4)
    ParseEtherType --> Drop: Unknown type
    
    ARPPath --> ARPRequest: opcode=1 (request)
    ARPPath --> ARPReply: opcode=2 (reply)
    ARPRequest --> SendARPReply: Target IP = my IP?
    ARPRequest --> Drop: Not for me
    SendARPReply --> [*]
    ARPReply --> UpdateARPCache: Cache MAC mapping
    UpdateARPCache --> SendQueuedPackets: Packets waiting?
    SendQueuedPackets --> [*]
    
    IPPath --> ValidateIP: Checksum, TTL, length
    ValidateIP --> DestinationCheck: For me or forward?
    DestinationCheck --> ICMPHandler: Destination = my IP
    DestinationCheck --> ForwardPacket: Destination = other network
    ICMPHandler --> SendEchoReply: ICMP Echo Request
    ICMPHandler --> Drop: Other ICMP types
    SendEchoReply --> [*]
    ForwardPacket --> LongestPrefixMatch: Lookup in routing table
    LongestPrefixMatch --> CheckTTL: Route found
    LongestPrefixMatch --> SendICMPUnreachable: No route
    CheckTTL --> DecrementTTL: TTL > 1
    CheckTTL --> SendICMPTimeExceeded: TTL = 1
    DecrementTTL --> ARPLookup: Get next-hop MAC
    ARPLookup --> SendPacket: MAC in cache
    ARPLookup --> QueuePacket: ARP needed
    QueuePacket --> SendARPRequest_Async: Queue for ARP
    SendARPRequest_Async --> [*]: Wait for ARP reply
    SendPacket --> [*]
    Drop --> [*]

Part 1: Entry Point and Zero-Copy Parsing

The function signature gives you everything you need to parse a packet without copying it:

void sr_handlepacket(struct sr_instance* sr,
                     uint8_t * packet,      /* Raw packet buffer (lent) */
                     unsigned int len,
                     char* interface)
{
    if (len < sizeof(sr_ethernet_hdr_t)) {
        fprintf(stderr, "Error: Packet too short (len=%u)\n", len);
        return;
    }
    
    /* Cast raw bytes to Ethernet header (zero-copy) */
    sr_ethernet_hdr_t* eth_hdr = (sr_ethernet_hdr_t*)packet;
    
    struct sr_if* iface = sr_get_interface(sr, interface);
    if (iface == NULL) {
        fprintf(stderr, "Error: Unknown interface %s\n", interface);
        return;
    }
    
    uint16_t ether_type = ntohs(eth_hdr->ether_type);

The comment /* lent */ on the packet buffer is critical documentation. The buffer is owned by the VNS communication layer. We must not call free() on it. If we need to retain the packet beyond this function’s scope, we must malloc() a copy.

All the *_hdr pointers we create during processing are views into the same buffer — no copying occurs. This is zero-copy processing. At 100,000 packets per second, the difference matters.

Part 2: Protocol Multiplexing

    if (ether_type == ethertype_arp) {
        if (len < sizeof(sr_ethernet_hdr_t) + sizeof(sr_arp_hdr_t)) {
            fprintf(stderr, "Error: Malformed ARP packet\n");
            return;
        }
        sr_arp_hdr_t* arp_hdr = (sr_arp_hdr_t*)(packet + sizeof(sr_ethernet_hdr_t));
        handle_arp_packet(sr, eth_hdr, arp_hdr, iface);
        
    } else if (ether_type == ethertype_ip) {
        if (len < sizeof(sr_ethernet_hdr_t) + sizeof(sr_ip_hdr_t)) {
            fprintf(stderr, "Error: Malformed IP packet\n");
            return;
        }
        sr_ip_hdr_t* ip_hdr = (sr_ip_hdr_t*)(packet + sizeof(sr_ethernet_hdr_t));
        handle_ip_packet(sr, packet, len, eth_hdr, ip_hdr, iface);
    } else {
        return;
    }
}

(packet + sizeof(sr_ethernet_hdr_t)) is byte-level pointer arithmetic. We’re skipping the first 14 bytes (Ethernet header) and casting the remaining bytes as an IP or ARP header. If the Ethernet header size is wrong — say, someone modified the struct packing — we’d read garbage or segfault. This is why the size validations before each cast are not optional.

Part 3: ARP Request/Reply Handling

void handle_arp_packet(struct sr_instance* sr,
                      sr_ethernet_hdr_t* eth_hdr,
                      sr_arp_hdr_t* arp_hdr,
                      struct sr_if* iface)
{
    uint16_t arp_op = ntohs(arp_hdr->ar_op);
    
    if (arp_op == arp_op_request) {
        uint32_t target_ip = arp_hdr->ar_tip;
        
        if (target_ip != iface->ip) {
            return;  // Not for us — don't proxy ARP
        }
        
        send_arp_reply(sr, iface->name, target_ip,
                      arp_hdr->ar_sip, eth_hdr->ether_shost, iface->addr);
        
    } else if (arp_op == arp_op_reply) {
        uint32_t resolved_ip = arp_hdr->ar_sip;
        unsigned char* resolved_mac = arp_hdr->ar_sha;
        
        struct sr_arpentry* entry = sr_arpcache_insert(&(sr->cache), 
                                                       resolved_mac, 
                                                       resolved_ip);
    }
}

When sr_arpcache_insert() is called, it triggers a cascading state change that I think of as “implicit state machine execution”: it wakes up every packet sleeping in the ARP request queue for this IP, performs longest-prefix match on each one, decrements their TTL, recalculates checksums, sends them out the appropriate interface, and frees the queued packet structures. One function call triggers a complex multi-step workflow — all because we finally got the MAC address we needed.

Part 4: IP Validation — Four Checks, Any One Can Drop the Packet

void handle_ip_packet(struct sr_instance* sr,
                     uint8_t* packet,
                     unsigned int len,
                     sr_ethernet_hdr_t* eth_hdr,
                     sr_ip_hdr_t* ip_hdr,
                     struct sr_if* iface)
{
    /* Validation 1: Checksum integrity */
    uint16_t received_cksum = ip_hdr->ip_sum;
    ip_hdr->ip_sum = 0;
    uint16_t calculated_cksum = cksum(ip_hdr, sizeof(sr_ip_hdr_t));
    
    if (received_cksum != calculated_cksum) {
        return;  // Drop corrupted packet
    }
    ip_hdr->ip_sum = received_cksum;  // Restore original
    
    if (ip_hdr->ip_ttl <= 0) { return; }
    
    uint16_t ip_total_len = ntohs(ip_hdr->ip_len);
    if (ip_total_len < sizeof(sr_ip_hdr_t)) { return; }
    
    if (len < sizeof(sr_ethernet_hdr_t) + ip_total_len) { return; }

The checksum validation temporarily modifies ip_hdr->ip_sum by setting it to zero, recalculates, then restores the original. This mutates the original packet buffer — which is fine in a single-threaded router, but would be a race condition in a multi-threaded one where another thread might read the same buffer during this window.

Part 5: Destination Check — For This Router or Forward?

    struct sr_if* if_walker = sr->if_list;
    int packet_for_me = 0;
    
    while (if_walker != NULL) {
        if (ip_hdr->ip_dst == if_walker->ip) {
            packet_for_me = 1;
            break;
        }
        if_walker = if_walker->next;
    }
    
    if (packet_for_me) {
        if (ip_hdr->ip_p == ip_protocol_icmp) {
            sr_icmp_hdr_t* icmp_hdr = (sr_icmp_hdr_t*)(packet + 
                                      sizeof(sr_ethernet_hdr_t) + 
                                      sizeof(sr_ip_hdr_t));
            
            if (icmp_hdr->icmp_type == 8) {  // Echo Request (ping)
                send_icmp_echo_reply(sr, iface, ntohs(ip_hdr->ip_id),
                                    (uint8_t*)icmp_hdr + sizeof(sr_icmp_hdr_t),
                                    ip_total_len - sizeof(sr_ip_hdr_t) - sizeof(sr_icmp_hdr_t),
                                    ip_hdr->ip_src, ip_hdr->ip_dst,
                                    eth_hdr->ether_shost, iface->addr);
            }
        } else {
            send_icmp_error_message(sr, iface->name, ntohs(ip_hdr->ip_id),
                                   packet + sizeof(sr_ethernet_hdr_t),
                                   ip_hdr->ip_src, ip_hdr->ip_dst,
                                   eth_hdr->ether_shost, iface->addr,
                                   ICMP_TYPE_DEST_UNREACHABLE);
        }
        
    } else {
        forward_ip_packet(sr, packet, len, eth_hdr, ip_hdr, iface);
    }
}

The interface iteration is O(n) in the number of interfaces — typically 2–4 for a software router. Production hardware routers use a hash table for O(1) lookup when they have hundreds of interfaces.

Part 6: The Forwarding Critical Path

void forward_ip_packet(struct sr_instance* sr,
                      uint8_t* packet, unsigned int len,
                      sr_ethernet_hdr_t* eth_hdr, sr_ip_hdr_t* ip_hdr,
                      struct sr_if* iface)
{
    struct sr_rt matched_rt = longest_prefix_match(sr, ip_hdr->ip_dst);
    
    if (matched_rt.dest.s_addr == 0) {
        send_icmp_error_message(sr, iface->name, ntohs(ip_hdr->ip_id),
                               (uint8_t*)ip_hdr, ip_hdr->ip_src, ip_hdr->ip_dst,
                               eth_hdr->ether_shost, iface->addr,
                               ICMP_TYPE_DEST_UNREACHABLE);
        return;
    }
    
    ip_hdr->ip_ttl--;
    
    if (ip_hdr->ip_ttl == 0) {
        send_icmp_error_message(sr, iface->name, ntohs(ip_hdr->ip_id),
                               (uint8_t*)ip_hdr, ip_hdr->ip_src, ip_hdr->ip_dst,
                               eth_hdr->ether_shost, iface->addr,
                               ICMP_TYPE_TIME_EXCEEDED);
        return;
    }
    
    ip_hdr->ip_sum = 0;
    ip_hdr->ip_sum = cksum(ip_hdr, sizeof(sr_ip_hdr_t));
    
    uint32_t next_hop_ip = matched_rt.gw.s_addr;
    struct sr_arpentry* arp_entry = sr_arpcache_lookup(&(sr->cache), next_hop_ip);
    
    if (arp_entry != NULL) {
        // Fast path: MAC address known
        struct sr_if* out_iface = sr_get_interface(sr, matched_rt.interface);
        memcpy(eth_hdr->ether_dhost, arp_entry->mac, ETHER_ADDR_LEN);
        memcpy(eth_hdr->ether_shost, out_iface->addr, ETHER_ADDR_LEN);
        sr_send_packet(sr, packet, len, matched_rt.interface);
        free(arp_entry);  // Must free — lookup returns a dynamically allocated copy
        
    } else {
        // Slow path: queue packet, send ARP request
        struct sr_arpreq* req = sr_arpcache_queuereq(&(sr->cache),
                                                     next_hop_ip, packet, len,
                                                     matched_rt.interface);
        if (req->times_sent == 0) {
            send_arp_request(sr, next_hop_ip);
            req->sent = time(NULL);
            req->times_sent = 1;
        }
    }
}

The free(arp_entry) after sending is easy to miss. sr_arpcache_lookup() returns a dynamically allocated copy of the cache entry — not a pointer into the cache itself. This prevents races (the cache might be modified while we’re using the entry), but it means we own the allocation and must free it. Forgetting this creates a memory leak proportional to forwarded packet count.

Part 7: Longest Prefix Match

struct sr_rt longest_prefix_match(struct sr_instance* sr, uint32_t ip_addr)
{
    struct sr_rt* rt = sr->routing_table;
    struct sr_rt best_match;
    uint32_t longest_mask = 0;
    
    memset(&best_match, 0, sizeof(struct sr_rt));
    
    while (rt != NULL) {
        uint32_t masked_dest = ip_addr & rt->mask.s_addr;
        uint32_t masked_route = rt->dest.s_addr & rt->mask.s_addr;
        
        if (masked_dest == masked_route) {
            uint32_t current_mask = ntohl(rt->mask.s_addr);
            
            if (current_mask > longest_mask) {
                best_match = *rt;
                longest_mask = current_mask;
            }
        }
        
        rt = rt->next;
    }
    
    return best_match;
}

The mask comparison works because masks are stored in network byte order (big-endian), and ntohl converts them to host byte order for comparison. 255.255.255.0 (0xFFFFFF00) is larger than 255.255.0.0 (0xFFFF0000), so /24 routes correctly beat /16 routes when both match. This is O(n) linear scan — fine for a routing table with a handful of entries, but production routers use Patricia tries for O(log n) lookup against tables with hundreds of thousands of entries.

The Danger Zones

The ARP queue race. Thread 1 is in forward_ip_packet, just called sr_arpcache_queuereq, about to send an ARP request. Meanwhile, Thread 2 receives an ARP reply for that exact IP, inserts it into the cache, and sends the queued packets. Thread 1 now sends a redundant ARP request for an IP that’s already been resolved. This is benign in terms of correctness, but it’s a real race condition that in more complex scenarios could cause duplicate packets. The fix is holding a lock around the entire queue-and-send sequence.

ICMP echo reflection. An attacker can send ICMP echo requests with a spoofed source IP (the victim’s IP), and the router will send echo replies to the victim — amplifying traffic. This router doesn’t implement rate limiting on ICMP replies, which is intentional for a teaching implementation but would be unacceptable in production.

The queued packet memory leak. When an ARP request times out, the timeout handler sends ICMP Unreachable for each queued packet, then destroys the request. If the handler doesn’t free pkt->buf and pkt for each packet in the queue before destroying the request struct, those allocations leak. The fix: save pkt->next before freeing pkt, free pkt->buf, free pkt, then advance. Always save the next pointer before freeing the current node.

Incremental checksum update for TTL. Right now we recalculate the full IP checksum after every TTL decrement — about 20 CPU cycles. RFC 1624 describes an incremental update formula that recalculates only the changed bytes in 3 cycles. At 100,000 forwarded packets per second, that’s a meaningful difference on embedded hardware.

This is what it feels like to sit inside a router’s core loop. The next time you ping google.com, you’ll see it differently: 100 lines of carefully orchestrated pointer arithmetic, state machines, and memory ownership rules — moving at wire speed.