Writing Your Own
Bootloader from
Scratch
Overview
This tutorial walks you through writing a complete, working bootloader for RISC-V from an empty file. No HAL, no SDK — just a RISC-V toolchain, a linker script, and a clear understanding of what happens in the first nanoseconds after power-on.
We target RISC-V on QEMU so you can follow along without hardware, then discuss how the same principles apply directly to real chips like the ESP32-C3, GD32VF103, and SiFive FE310.
- Understand what happens between power-on and
main() - Write RISC-V startup assembly from scratch — no crt0
- Write a linker script that places code and data in the right memory regions
- Load an application image from flash, validate it, and jump to it
- Understand the difference between ROM bootloader, Stage 1, and Stage 2
What Is a Bootloader?
A bootloader is the first software that runs on a processor after reset. Its job is simple in theory: initialise just enough hardware, find the application image, validate it, and hand control over. In practice, it has to do all of this without any OS, without a heap, and often without even a stack — until it sets one up itself.
On most RISC-V chips, the hardware ROM bootloader (hardcoded by the vendor) runs first, then hands off to your Stage 1 bootloader stored in flash. Your Stage 1 then loads and launches the main application — or a Stage 2 if a full update/recovery system is needed.
Boot Stages in Detail
0x00001000 on QEMU virt). The CPU jumps there in machine mode with all registers undefined.sp), zeroes the .bss section, and copies .data from flash to SRAM.Memory Map
Before writing a single line of code you need to know exactly where everything lives. This is the layout we use for the QEMU virt machine — flash starts at 0x20000000, SRAM at 0x80000000.
Startup Assembly
This is the first code that runs — before any C, before any globals, before any stack. It must set up the machine-mode trap vector, initialise the stack pointer, clear BSS, copy initialised data from flash to SRAM, then call main().
# startup.S — RISC-V Stage 1 Bootloader Entry # Assembled with: riscv64-unknown-elf-gcc -march=rv32ima -mabi=ilp32 .section .text.entry .global _start .type _start, @function _start: # 1. Disable interrupts at machine level csrw mstatus, zero # 2. Set machine-mode trap vector (simple direct mode) la t0, _trap_handler csrw mtvec, t0 # 3. Initialise stack pointer — top of BL SRAM region la sp, _stack_top # 4. Zero the .bss section la a0, _bss_start la a1, _bss_end .bss_loop: bge a0, a1, .bss_done sw zero, 0(a0) addi a0, a0, 4 j .bss_loop .bss_done: # 5. Copy .data from flash (LMA) to SRAM (VMA) la a0, _data_lma # source: load address in flash la a1, _data_start # dest: run address in SRAM la a2, _data_end .data_loop: bge a1, a2, .data_done lw t0, 0(a0) sw t0, 0(a1) addi a0, a0, 4 addi a1, a1, 4 j .data_loop .data_done: # 6. Call C bootloader main — must not return call bootloader_main # 7. Halt if main returns (should never happen) .hang: wfi j .hang
Linker Script
The linker script tells GLD exactly where to place every section of the binary. It defines the symbols your startup assembly reads — _bss_start, _data_lma, _stack_top, etc. Getting this wrong is the most common source of mysterious crashes.
/* bootloader.ld — Stage 1 Bootloader Linker Script */ /* Target: QEMU virt RISC-V (rv32ima) */ OUTPUT_ARCH(riscv) ENTRY(_start) MEMORY { FLASH (rx) : ORIGIN = 0x20000000, LENGTH = 64K SRAM (rwx) : ORIGIN = 0x80000000, LENGTH = 32K } SECTIONS { /* ── Code goes into flash ── */ .text 0x20000000 : { KEEP(*(.text.entry)) /* _start must be first */ *(.text .text.*) *(.rodata .rodata.*) . = ALIGN(4); _data_lma = .; /* LMA of .data in flash */ } > FLASH /* ── Initialised data: lives in flash, runs in SRAM ── */ .data : AT(_data_lma) { _data_start = .; *(.data .data.*) . = ALIGN(4); _data_end = .; } > SRAM /* ── Zero-initialised: SRAM only, no flash storage ── */ .bss (NOLOAD) : { _bss_start = .; *(.bss .bss.*) *(COMMON) . = ALIGN(4); _bss_end = .; } > SRAM /* ── Stack at top of SRAM ── */ .stack (NOLOAD) : { . = ALIGN(16); . += 4096; /* 4KB stack */ _stack_top = .; } > SRAM /* ── Discard unneeded sections ── */ /DISCARD/ : { *(.comment) *(.note*) } }
KEEP(*(.text.entry)) tells the linker never to discard the entry section even if no other symbol references it. Without this, --gc-sections can silently strip your reset vector.Loading & Jumping to the Application
Once initialisation is done, the bootloader reads the application header from a known flash address, checks it's valid, then jumps. The jump must be a true branch — not a function call — so the return address register is clean when the app starts.
#include "bootloader.h" #define APP_FLASH_BASE 0x20010000UL #define APP_MAGIC 0xC0DEBABE /* Application header — placed at APP_FLASH_BASE by app linker script */ typedef struct { uint32_t magic; /* Must equal APP_MAGIC */ uint32_t version; /* Firmware semantic version */ uint32_t entry_offset; /* Offset from base to _start */ uint32_t image_size; /* Total image size in bytes */ uint32_t crc32; /* CRC32 of image (excl. header)*/ } AppHeader_t; /* Jump to application — this function MUST NOT return */ static void __attribute__((noreturn)) jump_to_app(uint32_t entry_addr) { /* Disable all interrupts before leaving bootloader */ __asm__ volatile("csrw mstatus, zero"); /* Clear all registers so app starts clean */ __asm__ volatile( "mv ra, zero\n" "mv sp, zero\n" /* App sets its own stack */ "mv gp, zero\n" "mv tp, zero\n" "jr %0\n" /* Bare jump — no return addr */ :: "r"(entry_addr) ); __builtin_unreachable(); } void bootloader_main(void) { uart_init(); uart_puts("[BL] Stage 1 bootloader started\r\n"); const AppHeader_t *hdr = (const AppHeader_t *)APP_FLASH_BASE; /* Validate magic number */ if (hdr->magic != APP_MAGIC) { uart_puts("[BL] ERROR: bad magic — no valid image\r\n"); while(1); } /* Validate CRC32 */ uint32_t calc_crc = crc32( (const uint8_t *)(APP_FLASH_BASE + sizeof(AppHeader_t)), hdr->image_size ); if (calc_crc != hdr->crc32) { uart_puts("[BL] ERROR: CRC mismatch — image corrupt\r\n"); while(1); } uint32_t entry = APP_FLASH_BASE + hdr->entry_offset; uart_puts("[BL] Image valid — jumping to application\r\n"); jump_to_app(entry); }
Image Validation with CRC32
Never jump to an image without validating it. A power cut during a flash write can leave a half-written binary — without a CRC check, your bootloader will hand control to garbage and the device bricks. This is a compact CRC32 suitable for bootloader use.
#include <stdint.h> #include <stddef.h> /* Compact table-less CRC32 — ~40 bytes flash, ~8 cycles/byte */ uint32_t crc32(const uint8_t *data, size_t len) { uint32_t crc = 0xFFFFFFFFUL; while (len--) { crc ^= *data++; for (int i = 0; i < 8; i++) { if (crc & 1) crc = (crc >> 1) ^ 0xEDB88320UL; else crc >>= 1; } } return crc ^ 0xFFFFFFFFUL; } /* Generate header CRC when building the image: python3 -c "import binascii,sys; d=open(sys.argv[1],'rb').read()[16:]; print(hex(binascii.crc32(d)&0xffffffff))" app.bin */
CC = riscv64-unknown-elf-gcc
CFLAGS = -march=rv32ima -mabi=ilp32 -O2 -Wall \
-ffreestanding -nostdlib -nostartfiles
LDFLAGS = -T bootloader.ld -Wl,--gc-sections
SRCS = startup.S bootloader_main.c crc32.c uart.c
TARGET = bootloader.elf
all: $(TARGET)
$(TARGET): $(SRCS)
$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $^
riscv64-unknown-elf-objcopy -O binary $@ bootloader.bin
@echo "Size:"; riscv64-unknown-elf-size $@
run: $(TARGET)
qemu-system-riscv32 -machine virt -nographic \
-bios none \
-device loader,file=bootloader.bin,addr=0x20000000 \
-device loader,file=app.bin,addr=0x20010000
debug: $(TARGET)
qemu-system-riscv32 -machine virt -nographic -bios none \
-device loader,file=bootloader.bin,addr=0x20000000 \
-device loader,file=app.bin,addr=0x20010000 \
-s -S &
riscv64-unknown-elf-gdb bootloader.elf \
-ex "target remote :1234" -ex "load"
clean:
rm -f *.elf *.bin
Exercises
- Add a 3-second UART countdown before jumping — pressing any key drops into a recovery shell
- Store two app slots in flash (A/B) and implement a boot counter that falls back to slot B after 3 failed boots of slot A
- Replace the table-less CRC32 with a hardware CRC peripheral if your target has one — compare code size and speed
- Port the bootloader from QEMU virt to a real RISC-V board (ESP32-C3 or GD32VF103) — document the memory map differences
- Add an ECDSA signature check on the app header to prevent unsigned firmware from running
Go Deeper
The next step is implementing virtual memory on RISC-V — page tables, TLB flushes, and supervisor mode context switches.