Overview
Virtual memory is the foundation of process isolation, demand paging, and copy-on-write. This article implements Sv32 — the 32-bit virtual memory scheme defined by the RISC-V Privileged Specification — from scratch on a bare-metal RISC-V core. By the end you'll have a working kernel that maps user processes into their own address spaces, handles page faults, and correctly flushes the TLB on context switches.
Target hardware is a RISC-V RV32IMA core (e.g., the VexRiscv softcore on a FPGA, or QEMU -machine virt). The code is written in C with a small amount of inline assembly for CSR access and the page-table walk trap handler.
- Understand the Sv32 two-level page table format and PTE bit fields
- Write a kernel that maps physical memory into a virtual address space
- Enable the MMU by writing to the
satpCSR and taking the first page fault - Implement a page fault handler that allocates and maps demand pages
- Correctly flush the TLB on address space switches with SFENCE.VMA
Sv32 Address Space
Sv32 maps a 32-bit virtual address space onto physical addresses up to 34 bits (16GB) using a two-level table structure. Every address translation splits the 32-bit VA into three fields:
- Bits 31–22: VPN[1] (10 bits) → PD index
- Bits 21–12: VPN[0] (10 bits) → PT index
- Bits 11–0: Page offset (12 bits)
- Page size: 4KB (1 << 12)
- 1024 entries × 4 bytes = 4KB per table
- Root PD address stored in satp[21:0]
- Each entry is a PTE (page table entry)
- Non-leaf PTE: points to next-level table
- 1024 entries × 4 bytes = 4KB per PT
- One PT per valid PD entry
- Leaf PTEs: map 4KB physical pages
- Megapages: leaf at PD level → 4MB pages
- PPN[1:0] (22 bits): physical page number
- RSW (2 bits): reserved for OS use
- D,A,G,U,X,W,R,V (8 bits): flags
- V=valid, R=read, W=write, X=execute
Page Table Structure
#define PAGE_SIZE 4096UL #define PAGE_SHIFT 12 #define PTE_COUNT 1024 // entries per page table // PTE flag bits #define PTE_V (1 << 0) // Valid #define PTE_R (1 << 1) // Readable #define PTE_W (1 << 2) // Writable #define PTE_X (1 << 3) // Executable #define PTE_U (1 << 4) // User-accessible #define PTE_G (1 << 5) // Global mapping #define PTE_A (1 << 6) // Accessed #define PTE_D (1 << 7) // Dirty #define PTE_PPN_SHIFT 10 #define PA_TO_PPN(pa) ((pa) >> PAGE_SHIFT) #define PPN_TO_PA(ppn) ((ppn) << PAGE_SHIFT) // Construct a leaf PTE from a physical address + flags #define MAKE_PTE(pa, flags) \ ((PA_TO_PPN(pa) << PTE_PPN_SHIFT) | (flags) | PTE_V) typedef uint32_t pte_t; typedef pte_t pagetable_t[PTE_COUNT]; // exactly one 4KB page // satp register value for Sv32 mode #define SATP_SV32 (1U << 31) #define MAKE_SATP(root_pa) (SATP_SV32 | PA_TO_PPN(root_pa))
Page Table Walk
The hardware performs the page table walk automatically on a TLB miss. Understanding it helps you debug faults and implement the software-managed parts (allocation, mapping, unmapping) correctly.
// Map a single 4KB virtual page to a physical page void vm_map(pagetable_t root, uint32_t va, uint32_t pa, uint32_t flags) { uint32_t vpn1 = (va >> 22) & 0x3FF; // bits 31–22 uint32_t vpn0 = (va >> 12) & 0x3FF; // bits 21–12 // Level-1: get or allocate a page table pte_t *pde = &root[vpn1]; if (!(*pde & PTE_V)) { pagetable_t *pt = (pagetable_t *)pmem_alloc(); memset(pt, 0, PAGE_SIZE); *pde = MAKE_PTE((uint32_t)pt, 0); // non-leaf: no R/W/X } // Level-0: insert the leaf PTE pagetable_t *pt = (pagetable_t *)PPN_TO_PA((*pde >> PTE_PPN_SHIFT)); (*pt)[vpn0] = MAKE_PTE(pa, flags); } // Software page table walk — mirrors what hardware does uint32_t vm_translate(pagetable_t root, uint32_t va) { pte_t pde = root[(va >> 22) & 0x3FF]; if (!(pde & PTE_V)) return 0xFFFFFFFF; // fault pagetable_t *pt = (pagetable_t *)PPN_TO_PA((pde >> PTE_PPN_SHIFT)); pte_t pte = (*pt)[(va >> 12) & 0x3FF]; if (!(pte & PTE_V) || !(pte & (PTE_R|PTE_W|PTE_X))) return 0xFFFFFFFF; uint32_t ppn = pte >> PTE_PPN_SHIFT; return PPN_TO_PA(ppn) | (va & 0xFFF); // PA = PPN | offset }
If the PDE itself has R, W, or X set it's a megapage (superpage) — a 4MB direct mapping. The hardware stops the walk at level 1 and uses VPN[0] as part of the physical offset. Useful for mapping the kernel's identity region in a single PDE.
TLB & Flushing
The TLB caches virtual-to-physical translations. Modifying a page table entry does not automatically invalidate cached translations — you must explicitly flush with SFENCE.VMA or the processor may continue using the stale mapping.
// Flush entire TLB — use on context switch static inline void tlb_flush_all(void) { __asm__ volatile ("sfence.vma zero, zero" ::: "memory"); } // Flush a single virtual address — prefer on map/unmap static inline void tlb_flush_va(uint32_t va) { __asm__ volatile ("sfence.vma %0, zero" :: "r"(va) : "memory"); } // Enable Sv32 — write satp then fence static inline void vm_enable(uint32_t root_pa) { __asm__ volatile ( "csrw satp, %0\n" "sfence.vma zero, zero\n" :: "r"(MAKE_SATP(root_pa)) : "memory" ); } // Context switch: swap address space + flush void vm_switch(uint32_t new_root_pa) { __asm__ volatile ( "csrw satp, %0\n" "sfence.vma zero, zero\n" :: "r"(MAKE_SATP(new_root_pa)) : "memory" ); }
PTEs with the G (global) bit set are not flushed by SFENCE.VMA with a non-zero ASID. Mark kernel mappings as global so they survive context switches without an explicit flush — this avoids the TLB cold-start penalty on every process switch.
Page Fault Handler
When the hardware walks the page table and finds an invalid PTE it raises a page fault exception. The trap handler saves context, identifies the fault type from scause, reads the faulting address from stval, and either maps a new page or kills the process.
#define CAUSE_LOAD_PAGE_FAULT 13 #define CAUSE_STORE_PAGE_FAULT 15 #define CAUSE_FETCH_PAGE_FAULT 12 void trap_handler(uint32_t scause, uint32_t stval, struct trapframe *tf) { switch (scause) { case CAUSE_LOAD_PAGE_FAULT: case CAUSE_STORE_PAGE_FAULT: case CAUSE_FETCH_PAGE_FAULT: { uint32_t va = stval & ~(uint32_t)(PAGE_SIZE - 1); if (!handle_page_fault(current->pagetable, va, scause)) { proc_kill(current, SIGSEGV); } break; } default: panic("unhandled trap scause=%d stval=%08x", scause, stval); } } static bool handle_page_fault(pagetable_t pt, uint32_t va, uint32_t cause) { // Demand page: allocate a physical page and map it uint32_t pa = pmem_alloc(); if (!pa) return false; // OOM memset((void *)pa, 0, PAGE_SIZE); uint32_t flags = PTE_R | PTE_W | PTE_U; if (cause == CAUSE_FETCH_PAGE_FAULT) flags |= PTE_X; vm_map(pt, va, pa, flags); tlb_flush_va(va); return true; }
Bringing It All Together
The kernel startup sequence: set up physical memory allocator → build kernel page table with identity mapping → enable MMU → jump to virtual address. After that, each new process gets its own root page table cloned from the kernel template.
void kernel_main(void) { // Step 1: initialise the physical page allocator pmem_init(DRAM_START, DRAM_END); // Step 2: build the kernel page table static pagetable_t kpgtbl __attribute__((aligned(PAGE_SIZE))); memset(kpgtbl, 0, sizeof(kpgtbl)); // Identity-map all DRAM as a kernel megapage (4MB granule) for (uint32_t pa = DRAM_START; pa < DRAM_END; pa += 0x400000) { uint32_t vpn1 = pa >> 22; // Leaf at level-1 = 4MB megapage, global, kernel-only R/W/X kpgtbl[vpn1] = MAKE_PTE(pa, PTE_R|PTE_W|PTE_X|PTE_G); } // Step 3: install trap vector before enabling MMU __asm__ volatile ("csrw stvec, %0" :: "r"(&trap_entry)); // Step 4: enable Sv32 — from here all accesses go through MMU vm_enable((uint32_t)kpgtbl); uart_puts("[kernel] MMU enabled, running in virtual address space\n"); sched_init(); proc_create_init(kpgtbl); // first user process sched_run(); // never returns }
Run with qemu-system-riscv32 -machine virt -bios none -kernel kernel.elf -serial mon:stdio. GDB attach on port 1234 with -s -S.
Print scause and stval from every unhandled trap. The RISC-V spec table in Chapter 4 maps every cause code to a human-readable fault type.
Forgetting SFENCE.VMA after modifying a PTE causes intermittent faults that are nearly impossible to reproduce. Always fence after any map/unmap operation.
Page tables must be 4KB-aligned. Declare with __attribute__((aligned(4096))) or allocate from the physical page allocator which guarantees alignment.
Resources & References
Go Deeper
Once Sv32 is running, the natural next steps are copy-on-write fork, demand paging from an ELF loader, and swapping pages to a backing store.