In this section, we will focus on configuring the GPIO pins to enable UART0 output on the Raspberry Pi 3B.
selector = get32(GPFSEL1);
selector &= ~(7 << 12); // Clear bits 14:12 for GPIO14
selector |= (4 << 12); // Set bits 14:12 to ALT0
selector &= ~(7 << 15); // Clear bits 17:15 for GPIO15
selector |= (4 << 15); // Set bits 17:15 to ALT0
put32(GPFSEL1, selector);
On the Raspberry Pi, most GPIO pins are multiplexed, which means each pin can perform multiple functions depending on how you configure it. In our case, we will be configuring it for UART.
Each GPIO pin supports several alternate functions named ALT0
, ALT1
, ALT2
, and so on. These correspond to different internal hardware blocks. For example:
ALT0
for GPIO14 = UART0 transmit (TXD0)ALT0
for GPIO15 = UART0 receive (RXD0)
GPFSEL stands for GPIO Function Select which are registers that determine what function each GPIO pin performs. There are six of these registers: GPFSEL0
through GPFSEL5
. Each register controls 10 GPIO pins and each pin requires 3 bits to set its function.
GPFSEL1
covers GPIO pins 10 through 19. We care about:
14:12
of GPFSEL1
17:15
of GPFSEL1
The function values are as follows:
000
= Input001
= Output100
= ALT0 (which is UART0 for these pins)get32(GPFSEL1)
reads the current value of the GPIO Function Select register for GPIOs 10–19.
Since GPIO14 and GPIO15 fall into this range, we start by fetching the current state so we can modify only the bits we care about without affecting other pins.
&= ~(7 << 12)
clears bits 14–12, which correspond to the function select field for GPIO14.
We must clear them first because we’re about to change this pin’s function, and we want to avoid leaving leftover bits that might point to a different mode.
|= 4 << 12
sets bits 14–12 to 100
, which configures GPIO14 to use ALT0
.
On the Pi 3B, this routes the pin to UART0’s TX (transmit) function.
&= ~(7 << 15)
and |= 4 << 15
perform the exact same logic for GPIO15, clearing its function bits and setting them to ALT0 mode.
This enables the UART0 RX (receive) line on that pin.
put32(GPFSEL1, selector)
writes the modified configuration back into the GPFSEL1
register.
Only now do we commit the changes to the hardware, ensuring both pins are connected to the UART0 peripheral.
After this configuration, GPIO14 and GPIO15 are connected to the UART0 hardware block inside the SoC. This allows us to send and receive serial data using the UART0 peripheral.
// Disable pull-up/down for all GPIO pins & delay for changes to take effect
put32(GPPUD, 0);
delay(150);
put32(GPPUDCLK0, (1 << 14) | (1 << 15));
delay(150);
put32(GPPUDCLK0, 0);
If a GPIO pin is set as an input and nothing is connected to it, the voltage level on that pin can float around randomly. This is called a "floating" pin. Because there is no solid electrical signal driving it high or low, it might pick up random electrical noise, which can cause your code to see unpredictable 1s and 0s.
To deal with this, the Raspberry Pi (like many microcontrollers) lets you enable small internal resistors called pull-up or pull-down resistors. These help gently pull the pin toward a default value when nothing else is connected. A pull-up resistor makes the pin read as a 1, and a pull-down makes it read as a 0, unless another device overrides it.
But for UART communication, we do not want any internal resistor interfering. The TX and RX lines are already being actively driven by both the Pi and the other device, like your computer. Since both ends are in full control of the signal, we want the line to be left completely untouched. Having an internal resistor pulling the line in a certain direction could cause small distortions in the signal and lead to unreliable communication.
That is why we explicitly disable both pull-up and pull-down resistors on GPIO14 and GPIO15.
To do that, the Broadcom SoC (page 101) requires a specific sequence to disable pull resistors:
put32(GPPUD, 0)
writes 0
to the GPIO Pull-Up/Down (PUD) register, which disables any internal pull-up or pull-down resistors for all GPIO pins.
This is important because UART lines (TX and RX) are actively driven and shouldn’t be influenced by internal bias resistors.
delay(150)
introduces a short delay to give the new setting time to propagate internally before applying it to specific pins.
According to the Broadcom documentation (page 101), this delay is required to ensure the next step functions correctly.
put32(GPPUDCLK0, (1 << 14) | (1 << 15))
writes to the Pull-Up/Down Clock register.
This doesn't set a clock, it’s a mechanism that tells the hardware, “apply the pull setting we just configured (from GPPUD
) to GPIO14 and GPIO15.”
Setting bits 14 and 15 targets those specific pins.
delay(150)
ensures the pull-up/down setting has enough time to latch into the target pins before we remove the clock signal.
put32(GPPUDCLK0, 0)
clears the clock bits, which finalizes the configuration.
Without this final step, the change may not reliably take effect, especially on real hardware.
This is how the Raspberry Pi’s hardware expects pull-up/down settings to be configured. If you skip it or apply it incorrectly, there is a chance UART output will be unstable.
put32(UART0_IBRD, 26); // Integer part of baud rate divisor
put32(UART0_FBRD, 3); // Fractional part of baud rate divisor
These two lines configure the baud rate for UART0, which determines how fast data is sent and received over the serial line.
The UART clock on the Raspberry Pi 3B is typically set to 48 MHz, and we want a standard baud rate of 115200
bits per second for serial communication.
The UART uses a clock divider to compute the baud rate from the source clock. The equation is:
$$ \text{BaudDiv} = \frac{\text{UART_CLK}}{16 \times \text{BaudRate}} $$
For a 48 MHz UART clock and a target baud rate of 115200:
$$ \text{BaudDiv} = \frac{48{,}000{,}000}{16 \times 115200} \approx 26.041666\ldots $$
UART0_IBRD
gets the integer part of the divisor. In this case:
$$ \text{Integer} = 26 $$
UART0_FBRD
gets the fractional part, which is calculated with:
$$ \text{Fractional} = \text{round}\left((\text{BaudDiv} - \text{Integer}) \times 64\right) $$
$$ \text{Fractional} = \text{round}(0.041666 \times 64) \approx 3 $$
These two values together configure UART0 to produce a baud rate close to 115200. If the values are off, the receiving end may misinterpret the signal which would result in a lot of garbled text.
put32(UART0_LCRH, (1 << 4) | (1 << 5) | (1 << 6)); // UARTEN, TXE, RXE
This register sets the format of the data being transmitted and received over UART.
(1 << 4)
sets the FEN bit (FIFO Enable). This enables both the transmit and receive FIFO (First In, First Out) buffers inside the UART hardware.Each FIFO is a 16-byte queue that temporarily holds data as it's sent or received. Without FIFO enabled, the UART can only hold a single byte at a time in each direction, meaning the CPU must read or write each character exactly when it arrives or is ready to send—any delay might cause data loss or missed bytes.
With FIFOs enabled, the CPU doesn’t have to respond immediately to every character. The transmit FIFO can queue up to 16 bytes to be sent, and the receive FIFO can store up to 16 bytes that were received while the CPU was busy. This improves reliability and reduces how often the CPU must service the UART.
If you want to visually see this, there is this really cool interactive tool from Dr. Valvano's Intro to Embedded Systems Class (ECE319K) (scroll to Interactive Tool 9.4): UART FIFO Demo – Dr. Valvano's Intro to Embedded Systems .
(1 << 5) | (1 << 6)
sets the word length to 8 bits. The combination WLEN[1:0] = 0b11
tells the UART to send and receive data as 8-bit values, which is standard for most text-based communication.So this line of code configures UART0 for standard 8-bit data transmission and enables internal buffering, making it easier to work with in a bare metal environment.
put32(UART0_CR, (1 << 0) | (1 << 8) | (1 << 9)); // UARTEN, TXE, RXE
Once we've finished configuring the UART peripheral (pins, baud rate, data format, etc.), the final step is to turn it on.
This is done by writing to the UART0_CR
register, which controls the high-level behavior of the UART hardware.
In this case, we are setting three specific bits:
UARTEN
: Enables the UART itself. If this bit is 0
, the UART stays off regardless of any other settings. This bit must be 1
to activate the UART hardware.
TXE
(Transmit Enable): Enables the transmitter circuitry. Without this, even if UART is enabled, it won’t send any characters.
RXE
(Receive Enable): Enables the receiver circuitry, allowing UART to accept incoming characters.
By writing all three bits at once using bitwise OR, we enable the UART, transmitter, and receiver simultaneously:
put32(UART0_CR, (1 << 0) | (1 << 8) | (1 << 9));
This completes our UART initialization and makes it fully operational. If we wanted to print characters to the serial console, we could now write to the UART’s transmit register, and the data would be sent out over GPIO14 (TX). Similarly, the UART is now ready to receive data on GPIO15 (RX), which we could read from the receive register (which is a hint at how printf works!).
Now that UART0 is configured and enabled, we can start communicating through it by sending and receiving individual characters. The following functions form the core of our low-level serial I/O layer. They allow us to interact with a terminal, print debug messages, or even build a command-line shell.
char uart_getc(void)
while (get32(UART0_FR) & (1 << 4)) {
// wait for data
}
return (char)(get32(UART0_DR) & 0xFF);
UART0_FR
register contains flags describing the current state of the UART. Bit 4 (RXFE
) indicates whether the receive FIFO is empty.
UART0_DR
.
void uart_putc(char c)
while (get32(UART0_FR) & (1 << 5));
put32(UART0_DR, c);
TXFF
) of the UART0_FR
register tells us if the transmit FIFO is full.
UART0_DR
register.
uart_putc
, we can output individual characters to the serial terminal which is essential for debugging.
void uart_puts(const char* str)
while (*str) {
uart_putc(*str++);
}
uart_putc
, and provides a much more convenient way to output human-readable messages from your kernel.
printf
, or log structured information as our OS runs.
Together, these three functions form the basic tools you need to do meaningful I/O in a bare-metal environment. They give you visibility into what your kernel is doing — even before a screen or debugger is available.
printf
Now that we have basic UART functionality with uart_putc
, we can hook it up to a lightweight printf
implementation to make formatted output much easier to work with. I won't go too deep into the internals of how printf
works, but at a high level, all it does is take your format string (things like %d
, %x
, etc.), process the arguments, and output each character one by one using a function you provide — in our case, uart_putc
. So effectively, printf
is just a wrapper that formats a string and passes the characters to UART. Instead of focusing on the string formatting logic in printf.cpp
, let's look at how printf
is wired up to actually send data to the UART.
In kernel.cpp
, we initialize printf
like this:
init_printf(nullptr, uart_putc_wrapper);
This tells the printf
system to use our uart_putc_wrapper
function to write characters. Here's what that function looks like:
void uart_putc_wrapper(void* p, char c) {
(void)p; // Unused
if (c == '\n') {
uart_putc('\r'); // Carriage return for terminals
}
uart_putc(c);
}
void* p
argument exists so that printf
can pass context around if needed, but we ignore it here.\n
gets preceded by a \r
(carriage return), which is required by many terminals to properly move the cursor to the beginning of the line. For some reason this is not needed on qemu, but on hardware it is very much needed.uart_putc(c)
sends the character to the UART.
When you call printf("Hello, world!\n")
, the internal implementation walks through each character of the formatted string and sends it one by one using your uart_putc_wrapper
— which ultimately talks to the UART hardware.
With this setup, you now have formatted text output directly from your bare-metal kernel (no screen or OS required).
printf
in Exception Handlers
One of the best parts about having printf
working in a bare-metal environment is that you can now use it inside your exception handlers. This is incredibly helpful when something goes wrong and you want to know exactly what caused it.
For example, here’s what our exception handler might look like now:
extern "C" void exc_handler(unsigned long type, unsigned long esr,
unsigned long elr, unsigned long spsr,
unsigned long far) {
printf("\n=== Exception Handler Triggered ===\n");
printf("Type : %lu\n", type);
printf("ESR_EL1 : 0x%lx\n", esr);
printf("ELR_EL1 : 0x%lx\n", elr);
printf("SPSR_EL1: 0x%lx\n", spsr);
printf("FAR_EL1 : 0x%lx\n", far);
while (1); // halt
}
With this in place, if your code triggers a synchronous exception or an invalid memory access, the handler will print out a full register dump over UART. That means you can immediately see the cause of the fault, what kind it was, where it happened, what the CPU state was, and what memory address was involved.
Before printf
, debugging these issues meant blinking LEDs, setting up semihosting, or just guessing. Now, we can see the information from serial output.