On this page
- The FPGA Is Gone — But the Processor Still Needs to Think It Has a Keyboard and a Screen
- What You Need to Understand to Follow Along
- The Boundary Diagram: What Changes and What Does Not
- Dissecting the Swap: FPGA Keyboard vs. Simulated Keyboard
- The FPGA path (Wrapper.v, lines 95–100)
- The simulation path (main.cpp, lines 155–166)
- The dual-write register that makes this possible (regfile.v, lines 93–102)
- Dissecting the Swap: VGA Pins vs. SDL Texture
- The FPGA output (artix_wordle.v)
- The simulation output (sim_top.v, lines 269–279)
- The Clock Domain Problem That Does Not Exist
- What Breaks When You Get the Abstraction Wrong
- You Now Know How to Decouple I/O From Logic in Hardware Design
The FPGA Is Gone — But the Processor Still Needs to Think It Has a Keyboard and a Screen
You have designed a processor. It runs on an FPGA, reads a PS/2 keyboard, and drives a VGA display. Now you need to test it — but you do not have the FPGA board. You cannot plug a PS/2 keyboard into your laptop. Your monitor does not have a VGA input.
The conventional answer is to rewrite the design for simulation. Strip out the keyboard module, replace the display with waveform probes, and verify behavior by reading signal traces. This works for functional verification, but it does not let you play the game. You cannot see letters appear on a grid. You cannot type a guess and watch it turn green.
This tutorial dissects the abstraction layer that lets the same MIPS processor — with identical pipeline logic, register file, and memory — run interactively on a desktop computer. The simulation top module (sim_top.v) replaces exactly two boundaries: where keystrokes enter and where pixels leave. Everything between those boundaries is untouched RTL.
🔵 Deep Dive: This technique is called “hardware-in-the-loop simulation with I/O virtualization.” It is used in automotive, aerospace, and game console development wherever physical hardware is expensive or unavailable during the development cycle. The principle is the same at every scale: decouple I/O from logic, then swap the I/O layer.
What You Need to Understand to Follow Along
Concepts:
- How Verilator compiles Verilog into a C++ class with input/output member variables
- The difference between a module’s port interface and its internal logic
- Clock domain relationships (system clock vs. pixel clock)
- The SDL3 event loop and texture rendering model
Tools:
- Verilator >= 5.0
- CMake >= 3.16
- SDL3 (fetched automatically by CPM)
- C++17 compiler
Files under discussion:
sim/sim_top.v— simulation top-level (282 lines)sim/main.cpp— C++ SDL3 harness (182 lines)modules/processor/Wrapper.v— FPGA top-level (131 lines)modules/memory/regfile.v— register file with dual-write R15 (122 lines)
The Boundary Diagram: What Changes and What Does Not
The processor core — pipeline, ALU, register file, ROM, RAM — is a black box that consumes a clock, a reset, and instruction/data memory interfaces. It does not know or care whether its inputs come from physical hardware or a C++ program.
The abstraction boundary is drawn at exactly two points: the keyboard input path and the display output path. Everything inside the boundary is shared code. Everything outside is swapped per target.
flowchart TB
subgraph SHARED["Shared RTL (identical in both targets)"]
CPU["5-Stage Processor"]
REG["Register File\n(R0–R31)"]
ROM["Instruction ROM\n(main.mem)"]
RAM["Data RAM\n(words.mem)"]
LFSR["LFSR + Word ROM"]
VGA["VGA Timing\n(simple_480p)"]
SPR["Sprite Engine\n(board + letter)"]
COLOR["Color Logic\n(green/yellow/gray)"]
end
subgraph FPGA["FPGA I/O Layer (Wrapper.v)"]
PS2["PS/2 Interface\n(VHDL)"]
KB["Keyboard_input.v"]
ILA["ILA Debug Probe"]
VGAOUT["VGA Pins\n(4-bit R/G/B)"]
end
subgraph SIM["Simulation I/O Layer (sim_top.v + main.cpp)"]
SDL["SDL3 Event Loop"]
TEX["Texture Renderer"]
VCD["VCD Trace Output"]
end
PS2 -->|"scan code"| KB -->|"ASCII → letter"| REG
SDL -->|"scancode → letter"| REG
REG -->|"word1–word6"| SPR
CPU <--> REG
CPU <--> ROM
CPU <--> RAM
LFSR -->|"actual word"| REG
VGA --> SPR --> COLOR
COLOR -->|"4-bit paint"| VGAOUT
COLOR -->|"8-bit SDL"| TEX
Dissecting the Swap: FPGA Keyboard vs. Simulated Keyboard
The FPGA path (Wrapper.v, lines 95–100)
On the FPGA, a physical PS/2 keyboard generates a serial clock-data signal pair. The Ps2Interface VHDL module decodes this protocol into eight-bit scan codes. A Verilog module (Keyboard_input) buffers the scan code, looks up its ASCII value in a 256-entry ROM, and asserts a letter_ready pulse. The wrapper then encodes the ASCII value into the format the processor expects:
// Wrapper.v — FPGA keyboard path
wire [6:0] ascii0;
wire letter_ready;
Keyboard_input kbin(.clk(clock), .ps2_clk(ps2_clk), .ps2_data(ps2_data),
.ascii_val0(ascii0), .letter_ready(letter_ready));
assign letter = ascii0 - 65 + {1'b1, 31'b0}; // ASCII 'A'=65 → code 0, bit 31 set
This path involves a VHDL module (not supported by Verilator), asynchronous PS/2 protocol timing, and a ROM lookup — none of which can run in a pure Verilog simulation.
The simulation path (main.cpp, lines 155–166)
In simulation, SDL3 captures keyboard events from the operating system. The C++ harness converts the scancode directly to the same 32-bit format the processor expects, then drives the Verilator model’s input ports:
// main.cpp — Simulation keyboard path
if (ev.key.scancode >= SDL_SCANCODE_A && ev.key.scancode <= SDL_SCANCODE_Z) {
int code = ev.key.scancode - SDL_SCANCODE_A; // A=0, B=1, ..., Z=25
top->letter = static_cast<uint32_t>(code) | 0x80000000u; // bit 31 = enable
top->letter_en = 1;
letter_en_countdown = LETTER_EN_CYCLES; // hold for 8 clock toggles
}
The critical insight is that the register file does not care which path produced the value. Both paths deliver a 32-bit value on the letter port and a pulse on letter_en. The processor reads R15 and sees identical data regardless of the source.
The dual-write register that makes this possible (regfile.v, lines 93–102)
R15 is unique in this register file. It accepts writes from two sources — the external keyboard and the processor itself — with the keyboard taking priority:
wire en15_proc;
and enable15_proc(en15_proc, write_select[15], ctrl_writeEnable);
wire en15 = en15_proc | letter_en; // either source can write
wire [31:0] d15 = letter_en ? letter : data_writeReg; // keyboard has priority
register32 register15(.d(d15), .clk(clock), .q(read15), .clr(ctrl_reset), .en(en15));
This dual-write design exists because the assembly code needs to clear R15 after reading a letter (addi $letter, $r0, 0). If R15 were write-only from the keyboard, the processor’s clear instruction would silently fail, and R15 would retain the old letter value forever — causing the game loop to process the same letter repeatedly.
🔴 Danger: If both sources write on the same clock edge, the priority mux ensures the keyboard value wins. In practice, this race is extremely unlikely — the keyboard pulses letter_en for only eight cycles, and the processor clears R15 many cycles later. But the priority mux guarantees correctness even in the degenerate case.
Dissecting the Swap: VGA Pins vs. SDL Texture
The FPGA output (artix_wordle.v)
On the FPGA, the display controller outputs four-bit red, green, and blue channels plus hsync and vsync signals to physical VGA pins. The monitor reconstructs the image from these analog signals.
The simulation output (sim_top.v, lines 269–279)
In simulation, there is no monitor. Instead, the module outputs ten-bit pixel coordinates (sdl_sx, sdl_sy), a data-enable flag (sdl_de), and eight-bit color channels — all registered on the pixel clock edge:
always @(posedge clk_pix) begin
sdl_sx <= sx;
sdl_sy <= sy;
sdl_de <= de;
sdl_r <= {2{paint_r}}; // 4-bit to 8-bit: 0xA becomes 0xAA
sdl_g <= {2{paint_g}};
sdl_b <= {2{paint_b}};
end
The C++ harness reads these outputs on every clock evaluation and writes them into a pixel buffer:
if (top->sdl_de) {
Pixel *p = &framebuffer[top->sdl_sy * H_RES + top->sdl_sx];
p->a = 0xFF;
p->r = top->sdl_r;
p->g = top->sdl_g;
p->b = top->sdl_b;
}
At the end of each frame (detected when sdl_sy rolls past the active region), the buffer is uploaded to an SDL texture and presented to the window. The result is identical to what the VGA monitor would show — same pixels, same colors, same sixty-frame-per-second refresh.
The Clock Domain Problem That Does Not Exist
In a real FPGA, the 25 MHz pixel clock is generated by a phase-locked loop (PLL) from the 100 MHz system clock. PLLs are vendor-specific IP — Xilinx, Intel, and Lattice all have different primitives. This is a portability problem.
The simulation avoids it entirely with a two-bit counter:
reg [1:0] pixCounter = 0;
always @(posedge clk) pixCounter <= pixCounter + 1;
wire clk_pix = pixCounter[1]; // toggles at 1/4 of clk frequency
This divider is deterministic — clk_pix has a fixed phase relationship to clk, with no jitter and no startup delay. In hardware, a PLL achieves the same result with better jitter characteristics, but for simulation, the divider is functionally identical and completely portable across any Verilog simulator.
The C++ harness simply toggles the main clock on every iteration. It does not need to know about the pixel clock at all — the Verilog model handles the division internally, and the SDL outputs update at the correct rate automatically:
while (running) {
top->clk ^= 1; // toggle main clock
top->eval(); // Verilator evaluates all combinational + sequential logic
// ... sample outputs, render frame when ready ...
}
What Breaks When You Get the Abstraction Wrong
Scenario 1: The letter pulse is too short. If letter_en is held high for only one clock toggle (half a cycle) instead of eight, the register file might miss the write entirely — en15 needs to be high during a rising clock edge, and a single toggle might land on the wrong edge. The eight-cycle pulse guarantees at least four rising edges, providing ample capture margin.
Scenario 2: The pixel buffer overruns. The sdl_de signal is only high during the 640×480 active region. If the bounds check on x and y were removed from the C++ harness, pixels during the blanking interval (with coordinates up to 799×524) would write outside the framebuffer array, corrupting the stack or heap. The bounds check (x < H_RES && y < V_RES) is not defensive programming — it is a hard correctness requirement.
Scenario 3: The frame boundary detection misses. Frame rendering triggers when sdl_sy == 480 && sdl_sx == 0. Because the pixel clock divides the main clock by four, and the C++ loop toggles the main clock once per iteration, the same (sx, sy) pair is visible for approximately eight consecutive iterations. The SDL render call therefore fires multiple times at the frame boundary — but since SDL_RenderPresent is idempotent (presenting the same texture twice is a no-op), this causes no visible artifact, only a negligible performance overhead.
You Now Know How to Decouple I/O From Logic in Hardware Design
The transferable skill is not Verilator syntax or SDL3 API calls. It is the principle of I/O boundary isolation: if you design your system so that all platform-specific behavior is concentrated at the input and output edges, the entire core becomes portable by construction.
In this project, the processor does not know it is running on an FPGA. It does not know it is running in a simulator. It fetches instructions from ROM, reads registers, computes ALU results, and writes back — identically in both environments. The only difference is who fills R15 with a letter and who reads the color of pixel (302, 147).
This principle applies beyond hardware. In software architecture, the same pattern appears as the Ports and Adapters (Hexagonal) architecture: the domain logic is the core, and I/O adapters — database drivers, HTTP handlers, message queues — are swappable shells. The FPGA Wordle project is a hardware implementation of that exact same idea.