RISC-V From Zero · Standalone Tutorial

Writing Your Own
Bootloader from
Scratch

RISC-V Advanced Bare Metal C OS Dev Assembly Source Code
Bootloader tutorial
Writing Your Own Bootloader from Scratch
RISC-V · Bare Metal · Advanced
1:04:12
Advanced
1:04:12
Advanced
RISC-V · QEMU · GCC Bare Metal
01

Overview

This tutorial walks you through writing a complete, working bootloader for RISC-V from an empty file. No HAL, no SDK — just a RISC-V toolchain, a linker script, and a clear understanding of what happens in the first nanoseconds after power-on.

We target RISC-V on QEMU so you can follow along without hardware, then discuss how the same principles apply directly to real chips like the ESP32-C3, GD32VF103, and SiFive FE310.

  • Understand what happens between power-on and main()
  • Write RISC-V startup assembly from scratch — no crt0
  • Write a linker script that places code and data in the right memory regions
  • Load an application image from flash, validate it, and jump to it
  • Understand the difference between ROM bootloader, Stage 1, and Stage 2
02

What Is a Bootloader?

A bootloader is the first software that runs on a processor after reset. Its job is simple in theory: initialise just enough hardware, find the application image, validate it, and hand control over. In practice, it has to do all of this without any OS, without a heap, and often without even a stack — until it sets one up itself.

On most RISC-V chips, the hardware ROM bootloader (hardcoded by the vendor) runs first, then hands off to your Stage 1 bootloader stored in flash. Your Stage 1 then loads and launches the main application — or a Stage 2 if a full update/recovery system is needed.

🔒 ROM Bootloader (vendor)
Hard-coded in chip ROM · Runs at reset · Cannot be modified
⚙️ Stage 1 — YOUR bootloader
Stored in flash · Initialises SRAM · Validates app image
📦 Stage 2 (optional)
OTA update logic · Recovery mode · Key verification
🚀 Application / OS
Your firmware · FreeRTOS · Linux kernel
03

Boot Stages in Detail

1
Reset Vector
At reset, the PC is loaded from the reset vector address (typically 0x00001000 on QEMU virt). The CPU jumps there in machine mode with all registers undefined.
Machine Mode · No stack yet
2
Stack & BSS Initialisation
Your startup code sets up the stack pointer (sp), zeroes the .bss section, and copies .data from flash to SRAM.
Assembly · Before any C runs
3
Hardware Initialisation
Clock tree, UART for debug output, any GPIO needed for boot mode selection. Keep this absolutely minimal — your bootloader must be small and fast.
C code · Minimal HAL or bare register writes
4
Image Validation & Jump
Read the application header from flash, verify the magic number and CRC32, then perform a bare jump to the application's reset vector. Stack and registers are cleaned up before the jump.
C + inline assembly · Non-return
04

Memory Map

Before writing a single line of code you need to know exactly where everything lives. This is the layout we use for the QEMU virt machine — flash starts at 0x20000000, SRAM at 0x80000000.

// QEMU virt — RISC-V Memory Layout
0x00001000
ROM Bootloader
4 KB
0x20000000
Stage 1 BL (.text)
64 KB
0x20010000
Application Image
960 KB
0x80000000
BL Stack + Data
32 KB
0x80008000
App SRAM
224 KB
⚠️
Bootloader and App must not share SRAM regions
Once the bootloader jumps to the application, all bootloader stack frames are invalid. The application's linker script must start its stack and data after the bootloader's reserved region, or the bootloader must fully tear down before jumping.
05

Startup Assembly

This is the first code that runs — before any C, before any globals, before any stack. It must set up the machine-mode trap vector, initialise the stack pointer, clear BSS, copy initialised data from flash to SRAM, then call main().

01
startup.S — RISC-V reset entry
startup.S
ASM
# startup.S — RISC-V Stage 1 Bootloader Entry
# Assembled with: riscv64-unknown-elf-gcc -march=rv32ima -mabi=ilp32

.section .text.entry
.global  _start
.type    _start, @function

_start:
    # 1. Disable interrupts at machine level
    csrw    mstatus, zero

    # 2. Set machine-mode trap vector (simple direct mode)
    la      t0, _trap_handler
    csrw    mtvec, t0

    # 3. Initialise stack pointer — top of BL SRAM region
    la      sp, _stack_top

    # 4. Zero the .bss section
    la      a0, _bss_start
    la      a1, _bss_end
.bss_loop:
    bge     a0, a1, .bss_done
    sw      zero, 0(a0)
    addi    a0, a0, 4
    j       .bss_loop
.bss_done:

    # 5. Copy .data from flash (LMA) to SRAM (VMA)
    la      a0, _data_lma      # source: load address in flash
    la      a1, _data_start    # dest:   run address in SRAM
    la      a2, _data_end
.data_loop:
    bge     a1, a2, .data_done
    lw      t0, 0(a0)
    sw      t0, 0(a1)
    addi    a0, a0, 4
    addi    a1, a1, 4
    j       .data_loop
.data_done:

    # 6. Call C bootloader main — must not return
    call    bootloader_main

    # 7. Halt if main returns (should never happen)
.hang:
    wfi
    j       .hang
06

Linker Script

The linker script tells GLD exactly where to place every section of the binary. It defines the symbols your startup assembly reads — _bss_start, _data_lma, _stack_top, etc. Getting this wrong is the most common source of mysterious crashes.

02
bootloader.ld — linker script
bootloader.ld
LD
/* bootloader.ld — Stage 1 Bootloader Linker Script */
/* Target: QEMU virt RISC-V (rv32ima)               */

OUTPUT_ARCH(riscv)
ENTRY(_start)

MEMORY {
    FLASH (rx)  : ORIGIN = 0x20000000, LENGTH = 64K
    SRAM  (rwx) : ORIGIN = 0x80000000, LENGTH = 32K
}

SECTIONS {
    /* ── Code goes into flash ── */
    .text 0x20000000 : {
        KEEP(*(.text.entry))   /* _start must be first */
        *(.text .text.*)
        *(.rodata .rodata.*)
        . = ALIGN(4);
        _data_lma = .;         /* LMA of .data in flash */
    } > FLASH

    /* ── Initialised data: lives in flash, runs in SRAM ── */
    .data : AT(_data_lma) {
        _data_start = .;
        *(.data .data.*)
        . = ALIGN(4);
        _data_end = .;
    } > SRAM

    /* ── Zero-initialised: SRAM only, no flash storage ── */
    .bss (NOLOAD) : {
        _bss_start = .;
        *(.bss .bss.*)
        *(COMMON)
        . = ALIGN(4);
        _bss_end = .;
    } > SRAM

    /* ── Stack at top of SRAM ── */
    .stack (NOLOAD) : {
        . = ALIGN(16);
        . += 4096;            /* 4KB stack */
        _stack_top = .;
    } > SRAM

    /* ── Discard unneeded sections ── */
    /DISCARD/ : { *(.comment) *(.note*) }
}
💡
KEEP prevents linker garbage collection
KEEP(*(.text.entry)) tells the linker never to discard the entry section even if no other symbol references it. Without this, --gc-sections can silently strip your reset vector.
07

Loading & Jumping to the Application

Once initialisation is done, the bootloader reads the application header from a known flash address, checks it's valid, then jumps. The jump must be a true branch — not a function call — so the return address register is clean when the app starts.

03
bootloader_main.c — load and jump
bootloader_main.c
C
#include "bootloader.h"

#define APP_FLASH_BASE  0x20010000UL
#define APP_MAGIC       0xC0DEBABE

/* Application header — placed at APP_FLASH_BASE by app linker script */
typedef struct {
    uint32_t magic;          /* Must equal APP_MAGIC        */
    uint32_t version;        /* Firmware semantic version   */
    uint32_t entry_offset;   /* Offset from base to _start  */
    uint32_t image_size;     /* Total image size in bytes   */
    uint32_t crc32;          /* CRC32 of image (excl. header)*/
} AppHeader_t;

/* Jump to application — this function MUST NOT return */
static void __attribute__((noreturn))
jump_to_app(uint32_t entry_addr) {
    /* Disable all interrupts before leaving bootloader */
    __asm__ volatile("csrw mstatus, zero");

    /* Clear all registers so app starts clean */
    __asm__ volatile(
        "mv  ra,  zero\n"
        "mv  sp,  zero\n"  /* App sets its own stack */
        "mv  gp,  zero\n"
        "mv  tp,  zero\n"
        "jr  %0\n"          /* Bare jump — no return addr */
        :: "r"(entry_addr)
    );
    __builtin_unreachable();
}

void bootloader_main(void) {
    uart_init();
    uart_puts("[BL] Stage 1 bootloader started\r\n");

    const AppHeader_t *hdr =
        (const AppHeader_t *)APP_FLASH_BASE;

    /* Validate magic number */
    if (hdr->magic != APP_MAGIC) {
        uart_puts("[BL] ERROR: bad magic — no valid image\r\n");
        while(1);
    }

    /* Validate CRC32 */
    uint32_t calc_crc = crc32(
        (const uint8_t *)(APP_FLASH_BASE + sizeof(AppHeader_t)),
        hdr->image_size
    );
    if (calc_crc != hdr->crc32) {
        uart_puts("[BL] ERROR: CRC mismatch — image corrupt\r\n");
        while(1);
    }

    uint32_t entry = APP_FLASH_BASE + hdr->entry_offset;
    uart_puts("[BL] Image valid — jumping to application\r\n");
    jump_to_app(entry);
}
08

Image Validation with CRC32

Never jump to an image without validating it. A power cut during a flash write can leave a half-written binary — without a CRC check, your bootloader will hand control to garbage and the device bricks. This is a compact CRC32 suitable for bootloader use.

04
crc32.c — compact implementation
crc32.c
C
#include <stdint.h>
#include <stddef.h>

/* Compact table-less CRC32 — ~40 bytes flash, ~8 cycles/byte */
uint32_t crc32(const uint8_t *data, size_t len) {
    uint32_t crc = 0xFFFFFFFFUL;

    while (len--) {
        crc ^= *data++;
        for (int i = 0; i < 8; i++) {
            if (crc & 1)
                crc = (crc >> 1) ^ 0xEDB88320UL;
            else
                crc >>= 1;
        }
    }

    return crc ^ 0xFFFFFFFFUL;
}

/* Generate header CRC when building the image:
   python3 -c "import binascii,sys; d=open(sys.argv[1],'rb').read()[16:];
   print(hex(binascii.crc32(d)&0xffffffff))" app.bin
*/
05
Makefile — build bootloader + run on QEMU
Makefile
MAKE
CC      = riscv64-unknown-elf-gcc
CFLAGS  = -march=rv32ima -mabi=ilp32 -O2 -Wall \
          -ffreestanding -nostdlib -nostartfiles
LDFLAGS = -T bootloader.ld -Wl,--gc-sections

SRCS    = startup.S bootloader_main.c crc32.c uart.c
TARGET  = bootloader.elf

all: $(TARGET)

$(TARGET): $(SRCS)
	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $^
	riscv64-unknown-elf-objcopy -O binary $@ bootloader.bin
	@echo "Size:"; riscv64-unknown-elf-size $@

run: $(TARGET)
	qemu-system-riscv32 -machine virt -nographic \
	  -bios none \
	  -device loader,file=bootloader.bin,addr=0x20000000 \
	  -device loader,file=app.bin,addr=0x20010000

debug: $(TARGET)
	qemu-system-riscv32 -machine virt -nographic -bios none \
	  -device loader,file=bootloader.bin,addr=0x20000000 \
	  -device loader,file=app.bin,addr=0x20010000 \
	  -s -S &
	riscv64-unknown-elf-gdb bootloader.elf \
	  -ex "target remote :1234" -ex "load"

clean:
	rm -f *.elf *.bin
09

Exercises

  • Add a 3-second UART countdown before jumping — pressing any key drops into a recovery shell
  • Store two app slots in flash (A/B) and implement a boot counter that falls back to slot B after 3 failed boots of slot A
  • Replace the table-less CRC32 with a hardware CRC peripheral if your target has one — compare code size and speed
  • Port the bootloader from QEMU virt to a real RISC-V board (ESP32-C3 or GD32VF103) — document the memory map differences
  • Add an ECDSA signature check on the app header to prevent unsigned firmware from running

Go Deeper

The next step is implementing virtual memory on RISC-V — page tables, TLB flushes, and supervisor mode context switches.