Useful Reference: Broadcom BCM2835 ARM Peripherals Manual (PDF)

UART0 and printf

In this section, we will focus on configuring the GPIO pins to enable UART0 output on the Raspberry Pi 3B.

Setting GPIO14 and GPIO15 to ALT0 for UART0

selector = get32(GPFSEL1);
selector &= ~(7 << 12);                   // Clear bits 14:12 for GPIO14
selector |=  (4 << 12);                   // Set bits 14:12 to ALT0
selector &= ~(7 << 15);                   // Clear bits 17:15 for GPIO15
selector |=  (4 << 15);                   // Set bits 17:15 to ALT0
put32(GPFSEL1, selector);

What is ALT0?

On the Raspberry Pi, most GPIO pins are multiplexed, which means each pin can perform multiple functions depending on how you configure it. In our case, we will be configuring it for UART.

Each GPIO pin supports several alternate functions named ALT0, ALT1, ALT2, and so on. These correspond to different internal hardware blocks. For example:

ALT0 for GPIO14 = UART0 transmit (TXD0)
ALT0 for GPIO15 = UART0 receive (RXD0)

What is GPFSEL1?

GPFSEL stands for GPIO Function Select which are registers that determine what function each GPIO pin performs. There are six of these registers: GPFSEL0 through GPFSEL5. Each register controls 10 GPIO pins and each pin requires 3 bits to set its function.

GPFSEL1 covers GPIO pins 10 through 19. We care about:

GPIO14, which maps to bits 14:12 of GPFSEL1
GPIO15, which maps to bits 17:15 of GPFSEL1

The function values are as follows:

000 = Input
001 = Output
100 = ALT0 (which is UART0 for these pins)

Explanation of the Code

get32(GPFSEL1) reads the current value of the GPIO Function Select register for GPIOs 10–19. Since GPIO14 and GPIO15 fall into this range, we start by fetching the current state so we can modify only the bits we care about without affecting other pins.
&= ~(7 << 12) clears bits 14–12, which correspond to the function select field for GPIO14. We must clear them first because we’re about to change this pin’s function, and we want to avoid leaving leftover bits that might point to a different mode.
|= 4 << 12 sets bits 14–12 to 100, which configures GPIO14 to use ALT0. On the Pi 3B, this routes the pin to UART0’s TX (transmit) function.
&= ~(7 << 15) and |= 4 << 15 perform the exact same logic for GPIO15, clearing its function bits and setting them to ALT0 mode. This enables the UART0 RX (receive) line on that pin.
put32(GPFSEL1, selector) writes the modified configuration back into the GPFSEL1 register. Only now do we commit the changes to the hardware, ensuring both pins are connected to the UART0 peripheral.

After this configuration, GPIO14 and GPIO15 are connected to the UART0 hardware block inside the SoC. This allows us to send and receive serial data using the UART0 peripheral.

Disabling Pull-Up/Down Resistors

// Disable pull-up/down for all GPIO pins & delay for changes to take effect
    put32(GPPUD, 0);
    delay(150);
    put32(GPPUDCLK0, (1 << 14) | (1 << 15));
    delay(150);
    put32(GPPUDCLK0, 0);

If a GPIO pin is set as an input and nothing is connected to it, the voltage level on that pin can float around randomly. This is called a "floating" pin. Because there is no solid electrical signal driving it high or low, it might pick up random electrical noise, which can cause your code to see unpredictable 1s and 0s.

To deal with this, the Raspberry Pi (like many microcontrollers) lets you enable small internal resistors called pull-up or pull-down resistors. These help gently pull the pin toward a default value when nothing else is connected. A pull-up resistor makes the pin read as a 1, and a pull-down makes it read as a 0, unless another device overrides it.

But for UART communication, we do not want any internal resistor interfering. The TX and RX lines are already being actively driven by both the Pi and the other device, like your computer. Since both ends are in full control of the signal, we want the line to be left completely untouched. Having an internal resistor pulling the line in a certain direction could cause small distortions in the signal and lead to unreliable communication.

That is why we explicitly disable both pull-up and pull-down resistors on GPIO14 and GPIO15.

To do that, the Broadcom SoC (page 101) requires a specific sequence to disable pull resistors:

Explanation of the Code

put32(GPPUD, 0) writes 0 to the GPIO Pull-Up/Down (PUD) register, which disables any internal pull-up or pull-down resistors for all GPIO pins. This is important because UART lines (TX and RX) are actively driven and shouldn’t be influenced by internal bias resistors.
delay(150) introduces a short delay to give the new setting time to propagate internally before applying it to specific pins. According to the Broadcom documentation (page 101), this delay is required to ensure the next step functions correctly.
put32(GPPUDCLK0, (1 << 14) | (1 << 15)) writes to the Pull-Up/Down Clock register. This doesn't set a clock, it’s a mechanism that tells the hardware, “apply the pull setting we just configured (from GPPUD) to GPIO14 and GPIO15.” Setting bits 14 and 15 targets those specific pins.
delay(150) ensures the pull-up/down setting has enough time to latch into the target pins before we remove the clock signal.
put32(GPPUDCLK0, 0) clears the clock bits, which finalizes the configuration. Without this final step, the change may not reliably take effect, especially on real hardware.

This is how the Raspberry Pi’s hardware expects pull-up/down settings to be configured. If you skip it or apply it incorrectly, there is a chance UART output will be unstable.

Setting the Baud Rate

put32(UART0_IBRD, 26);  // Integer part of baud rate divisor
put32(UART0_FBRD, 3);   // Fractional part of baud rate divisor

These two lines configure the baud rate for UART0, which determines how fast data is sent and received over the serial line. The UART clock on the Raspberry Pi 3B is typically set to 48 MHz, and we want a standard baud rate of 115200 bits per second for serial communication.

Baud Rate Divisor Formula

The UART uses a clock divider to compute the baud rate from the source clock. The equation is:

$$ \text{BaudDiv} = \frac{\text{UART_CLK}}{16 \times \text{BaudRate}} $$

For a 48 MHz UART clock and a target baud rate of 115200:

$$ \text{BaudDiv} = \frac{48{,}000{,}000}{16 \times 115200} \approx 26.041666\ldots $$

Breaking That Into Registers

UART0_IBRD gets the integer part of the divisor. In this case: $$ \text{Integer} = 26 $$
UART0_FBRD gets the fractional part, which is calculated with: $$ \text{Fractional} = \text{round}\left((\text{BaudDiv} - \text{Integer}) \times 64\right) $$ $$ \text{Fractional} = \text{round}(0.041666 \times 64) \approx 3 $$

These two values together configure UART0 to produce a baud rate close to 115200. If the values are off, the receiving end may misinterpret the signal which would result in a lot of garbled text.

UART0_LCRH – Line Control Register

put32(UART0_LCRH, (1 << 4) | (1 << 5) | (1 << 6)); // UARTEN, TXE, RXE

This register sets the format of the data being transmitted and received over UART.

(1 << 4) sets the FEN bit (FIFO Enable). This enables both the transmit and receive FIFO (First In, First Out) buffers inside the UART hardware.

Each FIFO is a 16-byte queue that temporarily holds data as it's sent or received. Without FIFO enabled, the UART can only hold a single byte at a time in each direction, meaning the CPU must read or write each character exactly when it arrives or is ready to send—any delay might cause data loss or missed bytes.

With FIFOs enabled, the CPU doesn’t have to respond immediately to every character. The transmit FIFO can queue up to 16 bytes to be sent, and the receive FIFO can store up to 16 bytes that were received while the CPU was busy. This improves reliability and reduces how often the CPU must service the UART.

If you want to visually see this, there is this really cool interactive tool from Dr. Valvano's Intro to Embedded Systems Class (ECE319K) (scroll to Interactive Tool 9.4): UART FIFO Demo – Dr. Valvano's Intro to Embedded Systems .

(1 << 5) | (1 << 6) sets the word length to 8 bits. The combination WLEN[1:0] = 0b11 tells the UART to send and receive data as 8-bit values, which is standard for most text-based communication.

So this line of code configures UART0 for standard 8-bit data transmission and enables internal buffering, making it easier to work with in a bare metal environment.

Enabling UART0 - Transmit and Receive

put32(UART0_CR, (1 << 0) | (1 << 8) | (1 << 9)); // UARTEN, TXE, RXE

Once we've finished configuring the UART peripheral (pins, baud rate, data format, etc.), the final step is to turn it on. This is done by writing to the UART0_CR register, which controls the high-level behavior of the UART hardware.

In this case, we are setting three specific bits:

Bit 0 – UARTEN: Enables the UART itself. If this bit is 0, the UART stays off regardless of any other settings. This bit must be 1 to activate the UART hardware.
Bit 8 – TXE (Transmit Enable): Enables the transmitter circuitry. Without this, even if UART is enabled, it won’t send any characters.
Bit 9 – RXE (Receive Enable): Enables the receiver circuitry, allowing UART to accept incoming characters.

By writing all three bits at once using bitwise OR, we enable the UART, transmitter, and receiver simultaneously:

put32(UART0_CR, (1 << 0) | (1 << 8) | (1 << 9));

This completes our UART initialization and makes it fully operational. If we wanted to print characters to the serial console, we could now write to the UART’s transmit register, and the data would be sent out over GPIO14 (TX). Similarly, the UART is now ready to receive data on GPIO15 (RX), which we could read from the receive register (which is a hint at how printf works!).

Reading and Writing with UART

Now that UART0 is configured and enabled, we can start communicating through it by sending and receiving individual characters. The following functions form the core of our low-level serial I/O layer. They allow us to interact with a terminal, print debug messages, or even build a command-line shell.

`char uart_getc(void)`

while (get32(UART0_FR) & (1 << 4)) {
    // wait for data
}
return (char)(get32(UART0_DR) & 0xFF);

The UART0_FR register contains flags describing the current state of the UART. Bit 4 (RXFE) indicates whether the receive FIFO is empty.
By spinning in a loop until the FIFO is not empty, we ensure that a character is available before we read from UART0_DR.
This function lets us receive input from a user or serial device one character at a time. It's the foundation for early command-line interfaces or reading input during kernel execution.

`void uart_putc(char c)`

while (get32(UART0_FR) & (1 << 5));
put32(UART0_DR, c);

Bit 5 (TXFF) of the UART0_FR register tells us if the transmit FIFO is full.
We wait until there’s space before writing a new character to the UART0_DR register.
This ensures we don’t overrun the transmit buffer. Without this check, we might lose data if the UART is still sending previous characters.
With uart_putc, we can output individual characters to the serial terminal which is essential for debugging.

`void uart_puts(const char* str)`

while (*str) {
    uart_putc(*str++);
}

This is a helper function that sends a full null-terminated string over UART, character by character.
It's built on uart_putc, and provides a much more convenient way to output human-readable messages from your kernel.
Having this abstraction allows us to build higher-level output functions like printf, or log structured information as our OS runs.

Together, these three functions form the basic tools you need to do meaningful I/O in a bare-metal environment. They give you visibility into what your kernel is doing — even before a screen or debugger is available.

Wiring Up `printf`

Now that we have basic UART functionality with uart_putc, we can hook it up to a lightweight printf implementation to make formatted output much easier to work with. I won't go too deep into the internals of how printf works, but at a high level, all it does is take your format string (things like %d, %x, etc.), process the arguments, and output each character one by one using a function you provide — in our case, uart_putc. So effectively, printf is just a wrapper that formats a string and passes the characters to UART. Instead of focusing on the string formatting logic in printf.cpp, let's look at how printf is wired up to actually send data to the UART.

In kernel.cpp, we initialize printf like this:

init_printf(nullptr, uart_putc_wrapper);

This tells the printf system to use our uart_putc_wrapper function to write characters. Here's what that function looks like:

void uart_putc_wrapper(void* p, char c) {
    (void)p; // Unused
    if (c == '\n') {
        uart_putc('\r'); // Carriage return for terminals
    }
    uart_putc(c);
}

The void* p argument exists so that printf can pass context around if needed, but we ignore it here.
The newline check ensures that every \n gets preceded by a \r (carriage return), which is required by many terminals to properly move the cursor to the beginning of the line. For some reason this is not needed on qemu, but on hardware it is very much needed.
uart_putc(c) sends the character to the UART.

When you call printf("Hello, world!\n"), the internal implementation walks through each character of the formatted string and sends it one by one using your uart_putc_wrapper — which ultimately talks to the UART hardware.

With this setup, you now have formatted text output directly from your bare-metal kernel (no screen or OS required).

Using `printf` in Exception Handlers

One of the best parts about having printf working in a bare-metal environment is that you can now use it inside your exception handlers. This is incredibly helpful when something goes wrong and you want to know exactly what caused it.

For example, here’s what our exception handler might look like now:

extern "C" void exc_handler(unsigned long type, unsigned long esr, 
                            unsigned long elr, unsigned long spsr, 
                            unsigned long far) {
    printf("\n=== Exception Handler Triggered ===\n");
    printf("Type    : %lu\n", type);
    printf("ESR_EL1 : 0x%lx\n", esr);
    printf("ELR_EL1 : 0x%lx\n", elr);
    printf("SPSR_EL1: 0x%lx\n", spsr);
    printf("FAR_EL1 : 0x%lx\n", far);

    while (1); // halt
}

With this in place, if your code triggers a synchronous exception or an invalid memory access, the handler will print out a full register dump over UART. That means you can immediately see the cause of the fault, what kind it was, where it happened, what the CPU state was, and what memory address was involved.

Before printf, debugging these issues meant blinking LEDs, setting up semihosting, or just guessing. Now, we can see the information from serial output.

UART0 and printf

Setting GPIO14 and GPIO15 to ALT0 for UART0

What is ALT0?

What is GPFSEL1?

Explanation of the Code

Disabling Pull-Up/Down Resistors

Explanation of the Code

Setting the Baud Rate

Baud Rate Divisor Formula

Breaking That Into Registers

UART0_LCRH – Line Control Register

Enabling UART0 - Transmit and Receive

Reading and Writing with UART

char uart_getc(void)

void uart_putc(char c)

void uart_puts(const char* str)

Wiring Up printf

Using printf in Exception Handlers

`char uart_getc(void)`

`void uart_putc(char c)`

`void uart_puts(const char* str)`

Wiring Up `printf`

Using `printf` in Exception Handlers