On this page
- The Problem in the Middle of a Router
- What You Need to Know First
- Six Packet Types, One State Machine
- Part 1: Entry Point and Zero-Copy Parsing
- Part 2: Protocol Multiplexing
- Part 3: ARP Request/Reply Handling
- Part 4: IP Validation — Four Checks, Any One Can Drop the Packet
- Part 5: Destination Check — For This Router or Forward?
- Part 6: The Forwarding Critical Path
- Part 7: Longest Prefix Match
- The Danger Zones
The Problem in the Middle of a Router
You’re sitting at the core of a router processing thousands of packets per second. Each packet arrives as raw bytes — you don’t know if it’s ARP, ICMP, or IP until you parse it. The hard part isn’t any individual protocol; it’s the asynchronous coordination when you need information you don’t have yet.
Here’s the concrete scenario: a packet arrives destined for 10.0.2.2. You look up the routing table, find the right interface. But you don’t know the MAC address of the next hop. You need to send an ARP request. Meanwhile, the original IP packet is in limbo — you can’t forward it until ARP completes. While you’re waiting, five more packets arrive for the same destination. Do you queue all six? Send six ARP requests? What if ARP times out? What if another packet arrives for a different destination that also needs ARP?
This tutorial dissects sr_handlepacket() from router/sr_router.c — the main event loop of a software router. This single function orchestrates protocol multiplexing, stateful ARP resolution with queuing, ICMP error generation, IP forwarding with TTL management, and longest-prefix match routing.
What You Need to Know First
- C pointers, pointer arithmetic, struct layout
- What a state machine is
- How network packet headers nest inside each other
- Helpful but not required: how OS kernels do packet processing, what lock-free data structures look like
Key files:
router/sr_router.c— main packet handler (this tutorial)router/sr_protocol.h— protocol header definitionsrouter/sr_arpcache.h— ARP cache state structuresrouter/sr_rt.h— routing table structures
Six Packet Types, One State Machine
Every packet that enters the router follows one of six paths through a state machine. The machine reads the Ethernet type, dispatches to either ARP or IP handling, and from there makes forwarding decisions based on the IP destination and TTL:
stateDiagram-v2
[*] --> ReceivePacket: Ethernet frame arrives
ReceivePacket --> ValidateEthernet: Parse Ethernet header
ValidateEthernet --> ParseEtherType: Read EtherType field
ParseEtherType --> ARPPath: 0x0806 (ARP)
ParseEtherType --> IPPath: 0x0800 (IPv4)
ParseEtherType --> Drop: Unknown type
ARPPath --> ARPRequest: opcode=1 (request)
ARPPath --> ARPReply: opcode=2 (reply)
ARPRequest --> SendARPReply: Target IP = my IP?
ARPRequest --> Drop: Not for me
SendARPReply --> [*]
ARPReply --> UpdateARPCache: Cache MAC mapping
UpdateARPCache --> SendQueuedPackets: Packets waiting?
SendQueuedPackets --> [*]
IPPath --> ValidateIP: Checksum, TTL, length
ValidateIP --> DestinationCheck: For me or forward?
DestinationCheck --> ICMPHandler: Destination = my IP
DestinationCheck --> ForwardPacket: Destination = other network
ICMPHandler --> SendEchoReply: ICMP Echo Request
ICMPHandler --> Drop: Other ICMP types
SendEchoReply --> [*]
ForwardPacket --> LongestPrefixMatch: Lookup in routing table
LongestPrefixMatch --> CheckTTL: Route found
LongestPrefixMatch --> SendICMPUnreachable: No route
CheckTTL --> DecrementTTL: TTL > 1
CheckTTL --> SendICMPTimeExceeded: TTL = 1
DecrementTTL --> ARPLookup: Get next-hop MAC
ARPLookup --> SendPacket: MAC in cache
ARPLookup --> QueuePacket: ARP needed
QueuePacket --> SendARPRequest_Async: Queue for ARP
SendARPRequest_Async --> [*]: Wait for ARP reply
SendPacket --> [*]
Drop --> [*]
Part 1: Entry Point and Zero-Copy Parsing
The function signature gives you everything you need to parse a packet without copying it:
void sr_handlepacket(struct sr_instance* sr,
uint8_t * packet, /* Raw packet buffer (lent) */
unsigned int len,
char* interface)
{
if (len < sizeof(sr_ethernet_hdr_t)) {
fprintf(stderr, "Error: Packet too short (len=%u)\n", len);
return;
}
/* Cast raw bytes to Ethernet header (zero-copy) */
sr_ethernet_hdr_t* eth_hdr = (sr_ethernet_hdr_t*)packet;
struct sr_if* iface = sr_get_interface(sr, interface);
if (iface == NULL) {
fprintf(stderr, "Error: Unknown interface %s\n", interface);
return;
}
uint16_t ether_type = ntohs(eth_hdr->ether_type);
The comment /* lent */ on the packet buffer is critical documentation. The buffer is owned by the VNS communication layer. We must not call free() on it. If we need to retain the packet beyond this function’s scope, we must malloc() a copy.
All the *_hdr pointers we create during processing are views into the same buffer — no copying occurs. This is zero-copy processing. At 100,000 packets per second, the difference matters.
Part 2: Protocol Multiplexing
if (ether_type == ethertype_arp) {
if (len < sizeof(sr_ethernet_hdr_t) + sizeof(sr_arp_hdr_t)) {
fprintf(stderr, "Error: Malformed ARP packet\n");
return;
}
sr_arp_hdr_t* arp_hdr = (sr_arp_hdr_t*)(packet + sizeof(sr_ethernet_hdr_t));
handle_arp_packet(sr, eth_hdr, arp_hdr, iface);
} else if (ether_type == ethertype_ip) {
if (len < sizeof(sr_ethernet_hdr_t) + sizeof(sr_ip_hdr_t)) {
fprintf(stderr, "Error: Malformed IP packet\n");
return;
}
sr_ip_hdr_t* ip_hdr = (sr_ip_hdr_t*)(packet + sizeof(sr_ethernet_hdr_t));
handle_ip_packet(sr, packet, len, eth_hdr, ip_hdr, iface);
} else {
return;
}
}
(packet + sizeof(sr_ethernet_hdr_t)) is byte-level pointer arithmetic. We’re skipping the first 14 bytes (Ethernet header) and casting the remaining bytes as an IP or ARP header. If the Ethernet header size is wrong — say, someone modified the struct packing — we’d read garbage or segfault. This is why the size validations before each cast are not optional.
Part 3: ARP Request/Reply Handling
void handle_arp_packet(struct sr_instance* sr,
sr_ethernet_hdr_t* eth_hdr,
sr_arp_hdr_t* arp_hdr,
struct sr_if* iface)
{
uint16_t arp_op = ntohs(arp_hdr->ar_op);
if (arp_op == arp_op_request) {
uint32_t target_ip = arp_hdr->ar_tip;
if (target_ip != iface->ip) {
return; // Not for us — don't proxy ARP
}
send_arp_reply(sr, iface->name, target_ip,
arp_hdr->ar_sip, eth_hdr->ether_shost, iface->addr);
} else if (arp_op == arp_op_reply) {
uint32_t resolved_ip = arp_hdr->ar_sip;
unsigned char* resolved_mac = arp_hdr->ar_sha;
struct sr_arpentry* entry = sr_arpcache_insert(&(sr->cache),
resolved_mac,
resolved_ip);
}
}
When sr_arpcache_insert() is called, it triggers a cascading state change that I think of as “implicit state machine execution”: it wakes up every packet sleeping in the ARP request queue for this IP, performs longest-prefix match on each one, decrements their TTL, recalculates checksums, sends them out the appropriate interface, and frees the queued packet structures. One function call triggers a complex multi-step workflow — all because we finally got the MAC address we needed.
Part 4: IP Validation — Four Checks, Any One Can Drop the Packet
void handle_ip_packet(struct sr_instance* sr,
uint8_t* packet,
unsigned int len,
sr_ethernet_hdr_t* eth_hdr,
sr_ip_hdr_t* ip_hdr,
struct sr_if* iface)
{
/* Validation 1: Checksum integrity */
uint16_t received_cksum = ip_hdr->ip_sum;
ip_hdr->ip_sum = 0;
uint16_t calculated_cksum = cksum(ip_hdr, sizeof(sr_ip_hdr_t));
if (received_cksum != calculated_cksum) {
return; // Drop corrupted packet
}
ip_hdr->ip_sum = received_cksum; // Restore original
if (ip_hdr->ip_ttl <= 0) { return; }
uint16_t ip_total_len = ntohs(ip_hdr->ip_len);
if (ip_total_len < sizeof(sr_ip_hdr_t)) { return; }
if (len < sizeof(sr_ethernet_hdr_t) + ip_total_len) { return; }
The checksum validation temporarily modifies ip_hdr->ip_sum by setting it to zero, recalculates, then restores the original. This mutates the original packet buffer — which is fine in a single-threaded router, but would be a race condition in a multi-threaded one where another thread might read the same buffer during this window.
Part 5: Destination Check — For This Router or Forward?
struct sr_if* if_walker = sr->if_list;
int packet_for_me = 0;
while (if_walker != NULL) {
if (ip_hdr->ip_dst == if_walker->ip) {
packet_for_me = 1;
break;
}
if_walker = if_walker->next;
}
if (packet_for_me) {
if (ip_hdr->ip_p == ip_protocol_icmp) {
sr_icmp_hdr_t* icmp_hdr = (sr_icmp_hdr_t*)(packet +
sizeof(sr_ethernet_hdr_t) +
sizeof(sr_ip_hdr_t));
if (icmp_hdr->icmp_type == 8) { // Echo Request (ping)
send_icmp_echo_reply(sr, iface, ntohs(ip_hdr->ip_id),
(uint8_t*)icmp_hdr + sizeof(sr_icmp_hdr_t),
ip_total_len - sizeof(sr_ip_hdr_t) - sizeof(sr_icmp_hdr_t),
ip_hdr->ip_src, ip_hdr->ip_dst,
eth_hdr->ether_shost, iface->addr);
}
} else {
send_icmp_error_message(sr, iface->name, ntohs(ip_hdr->ip_id),
packet + sizeof(sr_ethernet_hdr_t),
ip_hdr->ip_src, ip_hdr->ip_dst,
eth_hdr->ether_shost, iface->addr,
ICMP_TYPE_DEST_UNREACHABLE);
}
} else {
forward_ip_packet(sr, packet, len, eth_hdr, ip_hdr, iface);
}
}
The interface iteration is O(n) in the number of interfaces — typically 2–4 for a software router. Production hardware routers use a hash table for O(1) lookup when they have hundreds of interfaces.
Part 6: The Forwarding Critical Path
void forward_ip_packet(struct sr_instance* sr,
uint8_t* packet, unsigned int len,
sr_ethernet_hdr_t* eth_hdr, sr_ip_hdr_t* ip_hdr,
struct sr_if* iface)
{
struct sr_rt matched_rt = longest_prefix_match(sr, ip_hdr->ip_dst);
if (matched_rt.dest.s_addr == 0) {
send_icmp_error_message(sr, iface->name, ntohs(ip_hdr->ip_id),
(uint8_t*)ip_hdr, ip_hdr->ip_src, ip_hdr->ip_dst,
eth_hdr->ether_shost, iface->addr,
ICMP_TYPE_DEST_UNREACHABLE);
return;
}
ip_hdr->ip_ttl--;
if (ip_hdr->ip_ttl == 0) {
send_icmp_error_message(sr, iface->name, ntohs(ip_hdr->ip_id),
(uint8_t*)ip_hdr, ip_hdr->ip_src, ip_hdr->ip_dst,
eth_hdr->ether_shost, iface->addr,
ICMP_TYPE_TIME_EXCEEDED);
return;
}
ip_hdr->ip_sum = 0;
ip_hdr->ip_sum = cksum(ip_hdr, sizeof(sr_ip_hdr_t));
uint32_t next_hop_ip = matched_rt.gw.s_addr;
struct sr_arpentry* arp_entry = sr_arpcache_lookup(&(sr->cache), next_hop_ip);
if (arp_entry != NULL) {
// Fast path: MAC address known
struct sr_if* out_iface = sr_get_interface(sr, matched_rt.interface);
memcpy(eth_hdr->ether_dhost, arp_entry->mac, ETHER_ADDR_LEN);
memcpy(eth_hdr->ether_shost, out_iface->addr, ETHER_ADDR_LEN);
sr_send_packet(sr, packet, len, matched_rt.interface);
free(arp_entry); // Must free — lookup returns a dynamically allocated copy
} else {
// Slow path: queue packet, send ARP request
struct sr_arpreq* req = sr_arpcache_queuereq(&(sr->cache),
next_hop_ip, packet, len,
matched_rt.interface);
if (req->times_sent == 0) {
send_arp_request(sr, next_hop_ip);
req->sent = time(NULL);
req->times_sent = 1;
}
}
}
The free(arp_entry) after sending is easy to miss. sr_arpcache_lookup() returns a dynamically allocated copy of the cache entry — not a pointer into the cache itself. This prevents races (the cache might be modified while we’re using the entry), but it means we own the allocation and must free it. Forgetting this creates a memory leak proportional to forwarded packet count.
Part 7: Longest Prefix Match
struct sr_rt longest_prefix_match(struct sr_instance* sr, uint32_t ip_addr)
{
struct sr_rt* rt = sr->routing_table;
struct sr_rt best_match;
uint32_t longest_mask = 0;
memset(&best_match, 0, sizeof(struct sr_rt));
while (rt != NULL) {
uint32_t masked_dest = ip_addr & rt->mask.s_addr;
uint32_t masked_route = rt->dest.s_addr & rt->mask.s_addr;
if (masked_dest == masked_route) {
uint32_t current_mask = ntohl(rt->mask.s_addr);
if (current_mask > longest_mask) {
best_match = *rt;
longest_mask = current_mask;
}
}
rt = rt->next;
}
return best_match;
}
The mask comparison works because masks are stored in network byte order (big-endian), and ntohl converts them to host byte order for comparison. 255.255.255.0 (0xFFFFFF00) is larger than 255.255.0.0 (0xFFFF0000), so /24 routes correctly beat /16 routes when both match. This is O(n) linear scan — fine for a routing table with a handful of entries, but production routers use Patricia tries for O(log n) lookup against tables with hundreds of thousands of entries.
The Danger Zones
The ARP queue race. Thread 1 is in forward_ip_packet, just called sr_arpcache_queuereq, about to send an ARP request. Meanwhile, Thread 2 receives an ARP reply for that exact IP, inserts it into the cache, and sends the queued packets. Thread 1 now sends a redundant ARP request for an IP that’s already been resolved. This is benign in terms of correctness, but it’s a real race condition that in more complex scenarios could cause duplicate packets. The fix is holding a lock around the entire queue-and-send sequence.
ICMP echo reflection. An attacker can send ICMP echo requests with a spoofed source IP (the victim’s IP), and the router will send echo replies to the victim — amplifying traffic. This router doesn’t implement rate limiting on ICMP replies, which is intentional for a teaching implementation but would be unacceptable in production.
The queued packet memory leak. When an ARP request times out, the timeout handler sends ICMP Unreachable for each queued packet, then destroys the request. If the handler doesn’t free pkt->buf and pkt for each packet in the queue before destroying the request struct, those allocations leak. The fix: save pkt->next before freeing pkt, free pkt->buf, free pkt, then advance. Always save the next pointer before freeing the current node.
Incremental checksum update for TTL. Right now we recalculate the full IP checksum after every TTL decrement — about 20 CPU cycles. RFC 1624 describes an incremental update formula that recalculates only the changed bytes in 3 cycles. At 100,000 forwarded packets per second, that’s a meaningful difference on embedded hardware.
This is what it feels like to sit inside a router’s core loop. The next time you ping google.com, you’ll see it differently: 100 lines of carefully orchestrated pointer arithmetic, state machines, and memory ownership rules — moving at wire speed.