0. Introduction
I’ll write a simple boot program with the goal of running “hello world” on real hardware. Since I have a Raspberry Pi that a friend gave me and it’s just sitting around, it seems like a good toy to play with. However, I’m mostly referencing http://wiki.osdev.org/ARM_RaspberryPi_Tutorial_C, so if you want detailed information, please read the original. The first half is preparation of the emulator and cross compiler. I’ve tried to make it so that everything works if you copy and paste.
The target is the Raspberry Pi Model B. (It’s the only one I have) From Wikipedia:
- CPU: 700 MHz / ARM1176JZF-S core (ARM11 family)
- Board: BCM2835
- Memory: 512MB
- Instead of built-in SSD or HD, it uses an SD card as storage.
- It also boots from the SD card.
1. Emulator
First, use qemu to confirm that Raspbian works.
The qemu that was originally installed includes arm1176 in qemu-system-arm -cpu ?,
but the BCM2835 board doesn’t seem to be included in qemu-system-arm -M ?,
so I’ll use versatile pb instead.
Therefore, be careful because the hardware addresses (GPIO and UART) differ
between the real hardware and the emulator.
Specify the CPU type with -cpu, memory size (MB) with -m, image file with -hda,
and options with -append.
Specify the kernel file in the host environment with -kernel.
The Raspbian image file is available here:
http://www.raspberrypi.org/downloads
$ wget http://xecdesign.com/downloads/linux-qemu/kernel-qemu
$ wget http://files.velocix.com/c1410/images/raspbian/2012-10-28-wheezy-raspbian/2012-10-28-wheezy-raspbian.zip
$ unzip 2012-10-28-wheezy-raspbian.zip
$ qemu-system-arm \
-kernel kernel-qemu \
-M versatilepb \
-cpu arm1176 -m 256 \
-append "root=/dev/sda2 panic=1" -hda 2012-10-28-wheezy-raspbian.img
2. Cross Compiler
First, install binutils. This is a collection of assembler, linker, etc. I’ll also include newlib in case I need to port something in the future. This step is probably the most difficult. I had to redo it many times orz
Set TARGET to arm-none-eabi, and PREFIX to /usr/local/cross-pi.
$ export LDFLAGS="-L/opt/local/lib"
$ export CFLAGS="-I/usr/local/include -O2"
$ cd /usr/local
$ mkdir cross-pi
$ cd /usr/local/src
$ wget http://ftp.gnu.org/gnu/binutils/binutils-2.23.tar.gz
$ tar xvf binutils-2.23.tar.gz
$ cd binutils-2.23
$ ./configure --target=arm-none-eabi --prefix=/usr/local/cross-pi
$ make
$ make install
$ cd /usr/local/src
$ wget http://ftp.tsukuba.wide.ad.jp/software/gcc/releases/gcc-4.8.2/gcc-4.8.2.tar.gz
$ wget ftp://sourceware.org/pub/newlib/newlib-1.20.0.tar.gz
$ tar xvf gcc-4.8.2.tar.gz
$ tar xvf newlib-1.20.0.tar.gz
$ cd gcc-4.8.2
$ ln -s ../newlib-1.20.0/newlib .
$ mkdir work
$ cd work
$ ../configure --prefix=/usr/local/cross-pi --target=arm-none-eabi --enable-multilib --with-newlib --enable-languages="c,c++" --enable-interwork
$ make
$ make install
3. Kernel
It consists of the following files:
- include/mmio.h
- include/uart.h
- link-arm-eabi.ld Linker script
- boot.S Boot
- main.c Entry point
- uart.c UART initialization and communication
- syscalls.c System call contents (for newlib)
- Makefile
3.1 include/mmio.h
External pins are mapped to memory (Memory Mapped IO), so these are functions to manipulate them. I’ve added inline to inline expand them, and volatile to prevent optimization.
#ifndef MMIO_H
#define MMIO_H
#include <stdint.h>
/* Write to MMIO */
static inline void mmio_write(uint32_t reg, uint32_t data) {
uint32_t *ptr = (uint32_t*)reg;
asm volatile("__mmio_write_%=: str %[data], [%[reg]]"
: : [reg]"r"(ptr), [data]"r"(data));
}
/* Read from MMIO */
static inline uint32_t mmio_read(uint32_t reg) {
uint32_t *ptr = (uint32_t*)reg;
uint32_t data;
asm volatile("ldr %[data], [%[reg]]"
: [data]"=r"(data) : [reg]"r"(ptr));
return data;
}
#endif
3.2 include/uart.h
Define specific MMIO addresses. IS_EMULATE is ugly. Sorry. (I’ll write it in the Makefile.) For the real hardware, refer to http://www.raspberrypi.org/wp-content/uploads/2012/02/BCM2835-ARM-Peripherals.pdf. 0x7Ennnnnn (bus address) corresponds to 0x20nnnnnn (physical address). It’s written on pages 5 and 6. Page 89 has the GPIO base address. From page 175, there are UART base physical addresses and offsets for DR, RSRECR, etc. (This might be common to both real hardware and versatile) For the emulator, it’s written in http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0224i/Chdbeibh.html 4.1 Memory Map and other places.
#ifndef UART_H
#define UART_H
/*
* 1: Emulator
* 0: Real hardware
*/
#define IS_EMULATE 1
#include <stdint.h>
enum {
#if IS_EMULATE == 1
/* Emulator */
GPIO_BASE = 0x101E4000,
UART0_BASE = 0x101F1000,
#else
/* Real hardware */
GPIO_BASE = 0x20200000,
UART0_BASE = 0x20201000,
#endif
/* Controls actuation of pull up/down to ALL GPIO pins. */
GPPUD = (GPIO_BASE + 0x94),
/* Controls actuation of pull up/down for specific GPIO pin. */
GPPUDCLK0 = (GPIO_BASE + 0x98),
/* UART0 */
UART0_DR = (UART0_BASE + 0x00),
UART0_RSRECR = (UART0_BASE + 0x04),
UART0_FR = (UART0_BASE + 0x18),
UART0_ILPR = (UART0_BASE + 0x20),
UART0_IBRD = (UART0_BASE + 0x24),
UART0_FBRD = (UART0_BASE + 0x28),
UART0_LCRH = (UART0_BASE + 0x2C),
UART0_CR = (UART0_BASE + 0x30),
UART0_IFLS = (UART0_BASE + 0x34),
UART0_IMSC = (UART0_BASE + 0x38),
UART0_RIS = (UART0_BASE + 0x3C),
UART0_MIS = (UART0_BASE + 0x40),
UART0_ICR = (UART0_BASE + 0x44),
UART0_DMACR = (UART0_BASE + 0x48),
UART0_ITCR = (UART0_BASE + 0x80),
UART0_ITIP = (UART0_BASE + 0x84),
UART0_ITOP = (UART0_BASE + 0x88),
UART0_TDR = (UART0_BASE + 0x8C),
};
void uart_init();
void uart_putc(uint8_t byte);
void uart_puts(const char *str);
uint8_t uart_getc();
#endif
3.3 link-arm-eabi.ld
This is a linker script.
It determines the memory layout.
ENTRY(Start) represents the entry point of the kernel image.
(The symbol Start is defined in boot.S.)
Since we convert to elf first and then to raw binary, it seems better for the linker this way.
(I don’t really understand)
“.” is the current memory address.
Place the .text.boot section at the beginning of the text segment.
Place the rest of the text segment after it.
_text_start, _text_end, etc. are explicitly declared to use them from source code.
Actually, _bss_start is used from boot.S. (Variables initialized to 0)
. = ALIGN(4096) aligns every 4096 bytes.
This is to prevent different segments from being mixed in the same page which would cause permission problems.
ENTRY(Start)
SECTIONS
{
/*
* Starts at LOADER_ADDR.
* 0x8000 ... Real hardware
* 0x10000 ... qemu
* */
//. = 0x8000;
. = 0x10000;
_start = .;
_text_start = .;
.text :
{
KEEP(*(.text.boot)) /* .text.boot section at the beginning of text segment */
*(.text) /* Place other .text sections after it */
}
. = ALIGN(4096);
_text_end = .;
_rodata_start = .;
.rodata :
{
*(.rodata)
}
. = ALIGN(4096);
_rodata_end = .;
_data_start = .;
.data :
{
*(.data)
}
. = ALIGN(4096);
_data_end = .;
_bss_start = .;
.bss :
{
bss = .;
*(.bss)
}
. = ALIGN(4096);
_bss_end = .;
end = .;
_end = .;
}
3.4 boot.S
Set up the .text.boot section to be placed at the beginning of the text segment. Then, set sp appropriately, zero-clear the bss area, and jump to kernel_main (C language).
.section ".text.boot"
.globl Start
/*
* Entry point for the kernel.
* r15 -> Program counter (0x8000 for real hardware, 0x10000 for emulator)
* r0 -> 0x00000000
* r1 -> 0x00000C42
* r2 -> 0x00000100 ATAGS
* Don't use r0-r2 for kernel_main
*/
Start:
/* Initialize stack pointer */
mov sp, #0x8000
/* Zero clear bss */
ldr r4, =_bss_start
ldr r9, =_bss_end
mov r5, #0
mov r6, #0
mov r7, #0
mov r8, #0
b 2f
1:
/* Store r5-r8 to address r4, increment r4
* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0170b/BABEFCIB.html
*/
stmia r4!, {r5-r8}
/* Loop if less than bss_end */
2:
cmp r4, r9
blo 1b
/* Jump to kernel_main */
ldr r3, =kernel_main
blx r3
halt:
wfe
b halt
3.5 main.c
Since I worked hard to include newlib, write() and strlen() can be used without problems. For write(), I need to write the implementation. (I’ll explain this later.) From the GPU’s bootloader(?), parameters are passed to the kernel through registers r0-r2, and passed from boot.S to kernel_main.
- r0: Basically 0
- r1: ARM Linux Machine Type. Raspberry Pi uses the Broadcom BCM2835 SoC (a chip combining GPU, CPU, DSP, SDRAM), and since BCM2835 is a specific chip of BCM2708, 3138 (0xC42) corresponding to BCM2708 is set.
- r2: ATAGS information is stored.
#include <stdint.h>
#include <uart.h>
#include <stdlib.h>
const char *hello="Hello, cinderella of the tea plantation.\n";
const char *halt="** halt **\n";
void kernel_main(uint32_t r0, uint32_t r1, uint32_t atags) {
int *a;
int i;
char s[30];
(void)r0;
(void)r1;
(void)atags;
uart_init();
uart_puts(hello);
write(0,hello,strlen(hello));
// Wait a bit
for(volatile int i = 0; i < 10000000; ++i) { }
uart_puts(halt);
while(1){
uart_putc(uart_getc());
}
}
3.6 uart.c
UART initialization and communication functions. The initialization procedure is around page 101 of the BCM2835 ARM Peripherals (PDF document). It says to wait 150 cycles.
#include<stdint.h>
#include<mmio.h>
#include<uart.h>
/*
* Wait for count cycles
*
* Don't optimize the loop
*
* 00000030 <__delay_16>:
* 30: e2533001 subs r3, r3, #1
* 34: 1afffffd bne 30 <__delay_16>
*/
static void delay(int32_t count) {
asm volatile("__delay_%=: subs %[count], %[count], #1; bne __delay_%=\n"
: : [count]"r"(count) : "cc");
}
/*
* UART0 initialization
*
* 1. Enable GP IXD and RXD
* 2. Calculate baud rate
* 3. Enable interrupts
*/
void uart_init() {
/* Disable UART0, set Control Register to 0 */
mmio_write(UART0_CR, 0x00000000);
/* Initialize GPIO pins 14, 15
* Used for IXD and RXD for serial communication (from p102)
* GPIO has 6 types of assignments. ALT0 is used here
* GPIO pin 14: IXD0
* GPIO pin 15: RXD0
*/
/* Disable GP pull-up/down and wait 150 cycles */
mmio_write(GPPUD, 0x00000000);
delay(150);
/* Set bits 14 and 15 of GP pull-up/down and wait 150 cycles */
mmio_write(GPPUDCLK0, (1 << 14) | (1 << 15));
delay(150);
/* Set GPPUDCLK0 to 0 */
mmio_write(GPPUDCLK0, 0x00000000);
/* Interrupt Clear Register: disable interrupts */
mmio_write(UART0_ICR, 0x7FF);
/* Calculate baud rate (used for serial communication) integer and fractional parts
*
* Divider = UART_CLOCK/(16 * Baud)
* Fraction part register = (Fractional part * 64) + 0.5
* UART_CLOCK = 3000000; Baud = 115200.
*
* Divider = 3000000/(16 * 115200) = 1.627 = ~1.
* Fractional part register = (.627 * 64) + 0.5 = 40.6 = ~40.
*
*/
mmio_write(UART0_IBRD, 1);
mmio_write(UART0_FBRD, 40);
/* Enable FIFO and 8-bit communication.
* Bit 4: Enable FIFO
* Bits 5,6: 11 means 8 bits/frame
*/
mmio_write(UART0_LCRH, (1 << 4) | (1 << 5) | (1 << 6));
/* Enable all interrupts
* 1:enable 0:disable
* Bit 1: uUARTCTS modem interrupt
* Bit 4: receive interrupt
* Bit 5: transmit interrupt
* Bit 6: receive timeout interrupt
* Bit 7: frame error interrupt
* Bit 8: parity error interrupt
* Bit 9: Break error interrupt
* Bit 10: overrun error interrupt
*/
mmio_write(UART0_IMSC, (1 << 1) | (1 << 4) | (1 << 5) |
(1 << 6) | (1 << 7) | (1 << 8) |
(1 << 9) | (1 << 10));
/* Bit 0: Enable UART
* Bit 8: Enable transmit
* Bit 9: Enable receive
*/
mmio_write(UART0_CR, (1 << 0) | (1 << 8) | (1 << 9));
}
/*
* Serial transmission of 1 byte
*/
void uart_putc(uint8_t byte) {
/* FR bit 5: 1 if transmit FIFO is full */
while (1) {
if (!(mmio_read(UART0_FR) & (1 << 5))) {
break;
}
}
mmio_write(UART0_DR, byte);
}
/*
* Serial transmission of null-terminated string
*/
void uart_puts(const char *str) {
while (*str) {
uart_putc(*str++);
}
}
/*
* Serial reception of 1 byte
*/
uint8_t uart_getc() {
/* FR bit 4: 1 if receive FIFO is empty */
while(1) {
if (!(mmio_read(UART0_FR) & (1 << 4))) {
break;
}
}
return mmio_read(UART0_DR);
}
3.7 syscalls.c
Write system calls so that newlib can be used. I added missing ones during linking. For now, set it up so that write() transmits serially.
#include <errno.h>
#include <uart.h>
#undef errno
extern int errno;
/*
* send message to uart0 (serial port)
*/
int _write(int fd, char *ptr, int len){
int i=0;
while(i<len && ptr[i]!='\0')
uart_putc(ptr[i++]);
return i;
}
int _close(int file) {
return -1;
}
int _fstat(){
return -1;
}
int _sbrk(){
return -1;
}
int _kill(){
return -1;
}
int _exit(){
return -1;
}
int _getpid(){
return -1;
}
int _gettimeofday(){
return -1;
}
int _isatty(){
return -1;
}
int _lseek(){
return -1;
}
int _read(){
return -1;
}
3.8 Makefile
Source files (*.c, *.S) -> Object files (*.o) -> Executable file (*.elf file) -> Binary file (*.img)
is the conversion flow. For CFLAGS, specify fpic, nostdlib, nostartfiles, ffreestanding, nodefaultlibs for embedded/kernel targets. This page is useful for elf - raw binary - assembler conversion: http://d.hatena.ne.jp/ken_2501jp/20121107/1352311439%5D
PREFIX = /usr/local/cross-pi
ARMGNU = $(PREFIX)/bin/arm-none-eabi
QEMU = /usr/local/bin/qemu-system-arm
# Source files
SOURCES_ASM := $(wildcard *.S)
SOURCES_C := $(wildcard *.c)
# Object files
# $(patsubst search-string, replacement-string, target-string)
OBJS := $(patsubst %.S,%.o,$(SOURCES_ASM))
OBJS += $(patsubst %.c,%.o,$(SOURCES_C))
# elf format
ELF := kernel.elf
# raw binary
BINARY := kernel.img
# list
LIST := kernel.list
# map
MAP := kernel.map
# Output dependencies
DEPENDFLAGS := -MD -MP
# Specify include path (for gcc)
INCLUDES := -I include -I /usr/local/src/mruby/include
# Specify library path (for ld) Order matters
# /usr/local/src/mruby/build/arm/lib/libmruby_core.a
LIBS := /usr/local/src/mruby/build/arm/lib/libmruby.a \
/usr/local/cross-pi/arm-none-eabi/lib/libc.a \
/usr/local/cross-pi/arm-none-eabi/lib/libm.a \
/usr/local/cross-pi/arm-none-eabi/lib/libg.a \
/usr/local/cross-pi/lib/gcc/arm-none-eabi/4.8.2/libgcc.a
# Base CFLAGS
#
# Don't depend on standard library or main function
# fpic: position independent code
# nostartfiles, ffreestanding, nodefaultlibs: for embedded/kernel
# fno-builtin: don't replace builtin functions
# fomit-frame-pointer: don't use frame pointer on function calls
#
# pedantic: removed because mruby headers use gcc extensions
BASEFLAGS := -O2 -fpic -nostdlib
BASEFLAGS += -nostartfiles -ffreestanding -nodefaultlibs
BASEFLAGS += -fno-builtin -fomit-frame-pointer -mcpu=arm1176jzf-s
ASFLAGS := $(INCLUDES) $(DEPENDFLAGS) -D__ASSEMBLY__
# CFLAGS
# Use c99
CFLAGS := $(INCLUDES) $(DEPENDFLAGS) $(BASEFLAGS) $(WARNFLAGS)
CFLAGS += -std=gnu99
# qemu options
CPU := arm1176
MEM := 256
#MACHINE := realview-eb
MACHINE := versatilepb
SERIAL := stdio
QEMU_OPT := -nographic -m $(MEM) -M $(MACHINE) -cpu $(CPU) -serial $(SERIAL)
# Image file
all: $(BINARY) $(LIST)
# Run with qemu
run: $(ELF)
$(QEMU) $(QEMU_OPT) -kernel $(ELF)
include $(wildcard *.d)
# Link to kernel.elf
$(ELF): $(OBJS) link-arm-eabi.ld
$(ARMGNU)-ld -o $@ $(OBJS) $(LIBS) -Tlink-arm-eabi.ld -Map $(MAP)
# kernel.elf to kernel.list
$(LIST) : $(ELF)
$(ARMGNU)-objdump -d $(ELF) > $(LIST)
# kernel.elf to raw binary kernel.img
$(BINARY): kernel.elf
$(ARMGNU)-objcopy $(ELF) -O binary $(BINARY)
clean:
$(RM) -f $(OBJS) $(BINARY) $(ELF) $(MAP) $(LIST)
# Delete including *.d
dist-clean: clean
$(RM) -f *.d
# *.c to *.o file
%.o: %.c Makefile
$(ARMGNU)-gcc $(CFLAGS) -c $< -o $@
# .S to *.o file
%.o: %.S Makefile
$(ARMGNU)-gcc $(ASFLAGS) -c $< -o $@
4. Conclusion
Running $ make creates an image file, and $ make run
runs it on qemu.
You can confirm that kernel.elf and kernel.img are properly created.
Alignment was also done correctly.
$ arm-none-eabi-objdump -D kernel.elf | head -n 50
kernel.elf: file format elf32-littlearm
Disassembly of section .text:
00010000 <Start>:
10000: e3a0d902 mov sp, #32768 ; 0x8000
10004: e59f4030 ldr r4, [pc, #48] ; 1003c <halt+0x8>
10008: e59f9030 ldr r9, [pc, #48] ; 10040 <halt+0xc>
$ hexdump kernel.img | head -n 10
0000000 02 d9 a0 e3 30 40 9f e5 30 90 9f e5 00 50 a0 e3
0000010 00 60 a0 e3 00 70 a0 e3 00 80 a0 e3 00 00 00 ea
I was also curious about the Raspberry Pi boot process, so I read this page: http://elinux.org/RPi_Software#GPU_bootloaders
When you turn on the Raspberry Pi, the CPU is halted and a small RISC CPU in the GPU executes the program inside the SoC. Therefore, the boot process is executed by the GPU.
- 1st bootloader: Mounts the boot partition of the FAT32 formatted SD card. It’s inside the SoC so it can’t be modified.
- 2nd bootloader (bootcode.bin): Used to retrieve the GPU firmware from the SD card.
- GPU firmware (start.elf): Sets up the SDRAM partition used by the GPU and CPU via fixup.dat. (Here, the CPU is awakened by the GPU)
- User code (kernel.img): The CPU executes the Linux kernel (kernel.img), other bootloaders (U-Boot, etc.), or applications without an OS. (By setting kernel=u-boot.bin in config.txt, any image file can be executed. Also, kernel options can be written in cmdline.txt.)
Until 2012/10/19, there was also a 3rd bootloader, but it’s no longer required.
It’s difficult to create the 1st and 2nd bootloaders, but for those who want to create their own kernel?, bootcode.bin and start.elf are available on github, so it’s okay. https://github.com/raspberrypi/firmware/tree/master/boot It seems only binaries are available, not source code. Even if it were released, I wonder if it could be read since it’s for the GPU, not the ARM CPU. For config files, for example in config.txt, if you want to set GPU memory to 16MB, set gpu_mem=16 in config.txt and add fixup_cd.dat, start_cd.elf.
Looking at http://wiki.gentoo.org/wiki/Raspberry_Pi, there’s a sample SD card configuration. Since we don’t use a filesystem this time, having just a boot partition is sufficient.
Further investigation revealed that even after the CPU starts running the kernel, the GPU code is not unloaded. The GPU runs a small OS called VCOS (Video Core Operating System) and performs graphics operations through protocols and interrupts called mailbox with the kernel side. Surprisingly, the GPU not only handles graphics but also clock control and audio control.
Since I can now use newlib, I’d like to try porting something. It will probably be mruby with good portability, or my own scheme implementation.
References
- ARM RaspberryPi osdev root. Simple summary of Mailbox, interrupts, exceptions, serial communication, boot, etc.
- ARM RaspberryPi Tutorial C osdev tutorial. Hello World with serial transmission and reception
- ARM System Calls Part of osdev. How to use system calls
- cambridge Cambridge University tutorial. Alex’s OS. Also has HDMI output method.
- iPhone inline assembler How to use inline assembler from C language.
- Raspberry Pi Frame buffer Contains MailBox register addresses for Raspberry Pi
- raspberrypi / firmware MailBox details. Firmware developer’s github
- Category RaspberryPi eLinux root.
- PRI Hub eLinux summary
- RPi Framebuffer eLinux.org Frame Buffer description. FrameBuffer address, transmission/reception procedure explanation.
- Post subject: ARM framebuffer FrameBuffer address for versatilepb
- ARM926EJ-S User Guide RealView HTML manual for versatilepb
- ARM926EJ-S User Guide RealView PDF manual for versatilepb
- Running bare metal code on ARM QEMU Transmitting from UART to serial port on versatilepb (what I did before)
- Raspberry Pi Frame buffer Personal blog summarizing framebuffer
- Step01 – Bare Metal Programming in C Pt1 GPIO tutorial. Only this one so far. Personal blog.
- Put characters on the screen Q&A that it doesn’t work like x86
- Level of Hackability of raspberry pi Q&A on raspi hacking steps