01

Overview

Virtual memory is the foundation of process isolation, demand paging, and copy-on-write. This article implements Sv32 — the 32-bit virtual memory scheme defined by the RISC-V Privileged Specification — from scratch on a bare-metal RISC-V core. By the end you'll have a working kernel that maps user processes into their own address spaces, handles page faults, and correctly flushes the TLB on context switches.

Target hardware is a RISC-V RV32IMA core (e.g., the VexRiscv softcore on a FPGA, or QEMU -machine virt). The code is written in C with a small amount of inline assembly for CSR access and the page-table walk trap handler.

  • Understand the Sv32 two-level page table format and PTE bit fields
  • Write a kernel that maps physical memory into a virtual address space
  • Enable the MMU by writing to the satp CSR and taking the first page fault
  • Implement a page fault handler that allocates and maps demand pages
  • Correctly flush the TLB on address space switches with SFENCE.VMA
02

Sv32 Address Space

Sv32 maps a 32-bit virtual address space onto physical addresses up to 34 bits (16GB) using a two-level table structure. Every address translation splits the 32-bit VA into three fields:

📐 VA Breakdown
  • Bits 31–22: VPN[1] (10 bits) → PD index
  • Bits 21–12: VPN[0] (10 bits) → PT index
  • Bits 11–0: Page offset (12 bits)
  • Page size: 4KB (1 << 12)
🗂️ Page Directory
  • 1024 entries × 4 bytes = 4KB per table
  • Root PD address stored in satp[21:0]
  • Each entry is a PTE (page table entry)
  • Non-leaf PTE: points to next-level table
📄 Page Table
  • 1024 entries × 4 bytes = 4KB per PT
  • One PT per valid PD entry
  • Leaf PTEs: map 4KB physical pages
  • Megapages: leaf at PD level → 4MB pages
🔐 PTE Bit Fields
  • PPN[1:0] (22 bits): physical page number
  • RSW (2 bits): reserved for OS use
  • D,A,G,U,X,W,R,V (8 bits): flags
  • V=valid, R=read, W=write, X=execute
03

Page Table Structure

vm.h
C
#define PAGE_SIZE     4096UL
#define PAGE_SHIFT    12
#define PTE_COUNT     1024    // entries per page table

// PTE flag bits
#define PTE_V    (1 << 0)    // Valid
#define PTE_R    (1 << 1)    // Readable
#define PTE_W    (1 << 2)    // Writable
#define PTE_X    (1 << 3)    // Executable
#define PTE_U    (1 << 4)    // User-accessible
#define PTE_G    (1 << 5)    // Global mapping
#define PTE_A    (1 << 6)    // Accessed
#define PTE_D    (1 << 7)    // Dirty

#define PTE_PPN_SHIFT  10
#define PA_TO_PPN(pa)  ((pa) >> PAGE_SHIFT)
#define PPN_TO_PA(ppn) ((ppn) << PAGE_SHIFT)

// Construct a leaf PTE from a physical address + flags
#define MAKE_PTE(pa, flags) \
    ((PA_TO_PPN(pa) << PTE_PPN_SHIFT) | (flags) | PTE_V)

typedef uint32_t pte_t;
typedef pte_t    pagetable_t[PTE_COUNT]; // exactly one 4KB page

// satp register value for Sv32 mode
#define SATP_SV32   (1U << 31)
#define MAKE_SATP(root_pa) (SATP_SV32 | PA_TO_PPN(root_pa))
04

Page Table Walk

The hardware performs the page table walk automatically on a TLB miss. Understanding it helps you debug faults and implement the software-managed parts (allocation, mapping, unmapping) correctly.

vm.c — map and walk
C
// Map a single 4KB virtual page to a physical page
void vm_map(pagetable_t root, uint32_t va,
            uint32_t pa, uint32_t flags) {
    uint32_t vpn1 = (va >> 22) & 0x3FF; // bits 31–22
    uint32_t vpn0 = (va >> 12) & 0x3FF; // bits 21–12

    // Level-1: get or allocate a page table
    pte_t *pde = &root[vpn1];
    if (!(*pde & PTE_V)) {
        pagetable_t *pt = (pagetable_t *)pmem_alloc();
        memset(pt, 0, PAGE_SIZE);
        *pde = MAKE_PTE((uint32_t)pt, 0); // non-leaf: no R/W/X
    }

    // Level-0: insert the leaf PTE
    pagetable_t *pt = (pagetable_t *)PPN_TO_PA((*pde >> PTE_PPN_SHIFT));
    (*pt)[vpn0] = MAKE_PTE(pa, flags);
}

// Software page table walk — mirrors what hardware does
uint32_t vm_translate(pagetable_t root, uint32_t va) {
    pte_t pde = root[(va >> 22) & 0x3FF];
    if (!(pde & PTE_V)) return 0xFFFFFFFF; // fault

    pagetable_t *pt = (pagetable_t *)PPN_TO_PA((pde >> PTE_PPN_SHIFT));
    pte_t pte = (*pt)[(va >> 12) & 0x3FF];
    if (!(pte & PTE_V) || !(pte & (PTE_R|PTE_W|PTE_X))) return 0xFFFFFFFF;

    uint32_t ppn = pte >> PTE_PPN_SHIFT;
    return PPN_TO_PA(ppn) | (va & 0xFFF); // PA = PPN | offset
}
💡
Megapage shortcut

If the PDE itself has R, W, or X set it's a megapage (superpage) — a 4MB direct mapping. The hardware stops the walk at level 1 and uses VPN[0] as part of the physical offset. Useful for mapping the kernel's identity region in a single PDE.

05

TLB & Flushing

The TLB caches virtual-to-physical translations. Modifying a page table entry does not automatically invalidate cached translations — you must explicitly flush with SFENCE.VMA or the processor may continue using the stale mapping.

tlb.h
C
// Flush entire TLB — use on context switch
static inline void tlb_flush_all(void) {
    __asm__ volatile ("sfence.vma zero, zero" ::: "memory");
}

// Flush a single virtual address — prefer on map/unmap
static inline void tlb_flush_va(uint32_t va) {
    __asm__ volatile ("sfence.vma %0, zero" :: "r"(va) : "memory");
}

// Enable Sv32 — write satp then fence
static inline void vm_enable(uint32_t root_pa) {
    __asm__ volatile (
        "csrw satp, %0\n"
        "sfence.vma zero, zero\n"
        :: "r"(MAKE_SATP(root_pa)) : "memory"
    );
}

// Context switch: swap address space + flush
void vm_switch(uint32_t new_root_pa) {
    __asm__ volatile (
        "csrw satp, %0\n"
        "sfence.vma zero, zero\n"
        :: "r"(MAKE_SATP(new_root_pa)) : "memory"
    );
}
💡
Global mappings skip the flush

PTEs with the G (global) bit set are not flushed by SFENCE.VMA with a non-zero ASID. Mark kernel mappings as global so they survive context switches without an explicit flush — this avoids the TLB cold-start penalty on every process switch.

06

Page Fault Handler

When the hardware walks the page table and finds an invalid PTE it raises a page fault exception. The trap handler saves context, identifies the fault type from scause, reads the faulting address from stval, and either maps a new page or kills the process.

trap.c — page fault dispatch
C
#define CAUSE_LOAD_PAGE_FAULT   13
#define CAUSE_STORE_PAGE_FAULT  15
#define CAUSE_FETCH_PAGE_FAULT  12

void trap_handler(uint32_t scause, uint32_t stval,
                   struct trapframe *tf) {
    switch (scause) {
        case CAUSE_LOAD_PAGE_FAULT:
        case CAUSE_STORE_PAGE_FAULT:
        case CAUSE_FETCH_PAGE_FAULT: {
            uint32_t va = stval & ~(uint32_t)(PAGE_SIZE - 1);
            if (!handle_page_fault(current->pagetable, va, scause)) {
                proc_kill(current, SIGSEGV);
            }
            break;
        }
        default:
            panic("unhandled trap scause=%d stval=%08x", scause, stval);
    }
}

static bool handle_page_fault(pagetable_t pt,
                               uint32_t va, uint32_t cause) {
    // Demand page: allocate a physical page and map it
    uint32_t pa = pmem_alloc();
    if (!pa) return false; // OOM
    memset((void *)pa, 0, PAGE_SIZE);

    uint32_t flags = PTE_R | PTE_W | PTE_U;
    if (cause == CAUSE_FETCH_PAGE_FAULT) flags |= PTE_X;
    vm_map(pt, va, pa, flags);
    tlb_flush_va(va);
    return true;
}
07

Bringing It All Together

The kernel startup sequence: set up physical memory allocator → build kernel page table with identity mapping → enable MMU → jump to virtual address. After that, each new process gets its own root page table cloned from the kernel template.

kernel_main.c
C
void kernel_main(void) {
    // Step 1: initialise the physical page allocator
    pmem_init(DRAM_START, DRAM_END);

    // Step 2: build the kernel page table
    static pagetable_t kpgtbl __attribute__((aligned(PAGE_SIZE)));
    memset(kpgtbl, 0, sizeof(kpgtbl));

    // Identity-map all DRAM as a kernel megapage (4MB granule)
    for (uint32_t pa = DRAM_START; pa < DRAM_END; pa += 0x400000) {
        uint32_t vpn1 = pa >> 22;
        // Leaf at level-1 = 4MB megapage, global, kernel-only R/W/X
        kpgtbl[vpn1] = MAKE_PTE(pa, PTE_R|PTE_W|PTE_X|PTE_G);
    }

    // Step 3: install trap vector before enabling MMU
    __asm__ volatile ("csrw stvec, %0" :: "r"(&trap_entry));

    // Step 4: enable Sv32 — from here all accesses go through MMU
    vm_enable((uint32_t)kpgtbl);

    uart_puts("[kernel] MMU enabled, running in virtual address space\n");
    sched_init();
    proc_create_init(kpgtbl); // first user process
    sched_run();               // never returns
}
🧪 QEMU Testing

Run with qemu-system-riscv32 -machine virt -bios none -kernel kernel.elf -serial mon:stdio. GDB attach on port 1234 with -s -S.

🔍 Debugging Tips

Print scause and stval from every unhandled trap. The RISC-V spec table in Chapter 4 maps every cause code to a human-readable fault type.

⚠️ Common Pitfalls

Forgetting SFENCE.VMA after modifying a PTE causes intermittent faults that are nearly impossible to reproduce. Always fence after any map/unmap operation.

📏 Alignment

Page tables must be 4KB-aligned. Declare with __attribute__((aligned(4096))) or allocate from the physical page allocator which guarantees alignment.

08

Resources & References

Go Deeper

Once Sv32 is running, the natural next steps are copy-on-write fork, demand paging from an ELF loader, and swapping pages to a backing store.