ARM64

Summary

The initial arm64 U-Boot port was developed before hardware was available, so the first supported platforms were the Foundation and Fast Model for ARMv8. These days U-Boot runs on a variety of 64-bit capable ARM hardware, from embedded development boards to servers.

Notes

  1. U-Boot can run at any exception level it is entered in, it is recommened to enter it in EL3 if U-Boot takes some responsibilities of a classical firmware (like initial hardware setup, CPU errata workarounds or SMP bringup). U-Boot can be entered in EL2 when its main purpose is that of a boot loader. It can drop to lower exception levels before entering the OS. For ARMv8-R it is recommened to enter at S-EL1, as for this architecture there is no S-EL3.

  2. U-Boot for arm64 is compiled with AArch64-gcc. AArch64-gcc use rela relocation format, a tool(tools/relocate-rela) by Scott Wood is used to encode the initial addend of rela to u-boot.bin. After running, the U-Boot will be relocated to destination again.

  3. Earlier Linux kernel versions required the FDT to be placed at a 2 MB boundary and within the same 512 MB section as the kernel image, resulting in fdt_high to be defined specially. Since kernel version 4.2 Linux is more relaxed about the DT location, so it can be placed anywhere in memory. Please reference linux/Documentation/arm64/booting.txt for detail.

  4. Spin-table is used to wake up secondary processors. One location (or per processor location) is defined to hold the kernel entry point for secondary processors. It must be ensured that the location is accessible and zero immediately after secondary processor enter slave_cpu branch execution in start.S. The location address is encoded in cpu node of DTS. Linux kernel store the entry point of secondary processors to it and send event to wakeup secondary processors. Please reference linux/Documentation/arm64/booting.txt for detail.

  5. Generic board is supported.

  6. CONFIG_ARM64 instead of CONFIG_ARMV8 is used to distinguish aarch64 and aarch32 specific codes.

MMU

U-Boot uses a simple page table for MMU setup. It uses the smallest number of bits possible for the virtual address based on the maximum memory address (see the logic in get_tcr()). If this is less than 39 bits, the MMU will use only 3 levels for address translation.

As with all platforms, U-Boot on ARM64 uses a 1:1 mapping of virtual to physical addresses. In general, the memory map is expected to remain static once the MMU is enabled.

Software pagetable walker

It is possible to debug the pagetable generated by U-Boot with the built in dump_pagetable() and walk_pagetable() functions (the former being a simple wrapper for the latter). For example the following can be added to setup_all_pgtables() after the first call to setup_pgtables():

dump_pagetable(gd->arch.tlb_addr, get_tcr(NULL, NULL));
void __pagetable_walk(u64 addr, u64 tcr, int level, pte_walker_cb_t cb, void *priv)

Walk through the pagetable and call cb() for each memory region

Parameters

u64 addr

The address of the table to walk

u64 tcr

The TCR register value

int level

The current level of the table

pte_walker_cb_t cb

The callback function to call for each region

void *priv

Private data to pass to the callback function

Description

This is a software implementation of the ARMv8-A MMU translation table walk. As per section D5.4 of the ARMv8-A Architecture Reference Manual. It recursively walks the 4 or 3 levels of the page table and calls the callback function for each discrete region of memory (that being the discovery of a new table, a collection of blocks with the same attributes, or of pages with the same attributes).

U-Boot picks the smallest number of virtual address (VA) bits that it can based on the memory map configured by the board. If this is less than 39 then the MMU will only use 3 levels of translation instead of 3 - skipping level 0.

Each level has 512 entries of 64-bits each. Each entry includes attribute bits and an address. When the attribute bits indicate a table, the address is the physical address of the table, so we can recursively call _pagetable_walk() on it (after calling cb). If instead they indicate a block or page, we record the start address and attributes and continue walking until we find a region with different attributes, or the end of the table, in either case we call cb with the start and end address of the region.

This approach can be used to fully emulate the MMU’s translation table walk, as per Figure D5-25 of the ARMv8-A Architecture Reference Manual.

bool pagetable_print_entry(u64 start_attrs, u64 end, int va_bits, int level, void *priv)

Callback function to print a single pagetable region

Parameters

u64 start_attrs

The start address and attributes of the region (or table address)

u64 end

The end address of the region (or 0 if it’s a table)

int va_bits

The number of bits used for the virtual address

int level

The level of the region

void *priv

Private data for the callback (unused)

Description

This is the default callback used by dump_pagetable(). It does some basic pretty printing (see example in the U-Boot arm64 documentation). It can be replaced by a custom callback function if more detailed information is needed.

The pagetable walker can be used as follows:

pte_walker_cb_t

Typedef: callback function for walk_pagetable.

Syntax

bool pte_walker_cb_t (u64 addr, u64 end, int va_bits, int level, void *priv)

Parameters

u64 addr

PTE start address (PA), or address of table. Includes attributes.

u64 end

End address of the region (or 0 for a table)

int va_bits

Number of bits in the virtual address

int level

Table level

void *priv

Private data for the callback

Description

This function is called when the walker finds a table entry or after parsing a block or pages. For a table the end address is 0, and addr is the address of the table. Otherwise, they are the start and end physical addresses of the block or page.

Return

true to stop walking, false to continue

void walk_pagetable(u64 ttbr, u64 tcr, pte_walker_cb_t cb, void *priv)

Walk the pagetable at ttbr and call cb for each region

Parameters

u64 ttbr

Address of the pagetable to dump

u64 tcr

TCR value to use

pte_walker_cb_t cb

Callback function to call for each entry

void *priv

Private data for the callback

void dump_pagetable(u64 ttbr, u64 tcr)

Dump the pagetable at ttbr, printing each region and level.

Parameters

u64 ttbr

Address of the pagetable to dump

u64 tcr

TCR value to use

This will result in a print like the following:

Walking pagetable at 000000017df90000, va_bits: 36. Using 3 levels
[0x17df91000]                   |  Table |               |
  [0x17df92000]                 |  Table |               |
    [0x000001000 - 0x000200000] |  Pages | Device-nGnRnE | Non-shareable
  [0x000200000 - 0x040000000]   |  Block | Device-nGnRnE | Non-shareable
[0x040000000 - 0x080000000]     |  Block | Device-nGnRnE | Non-shareable
[0x080000000 - 0x140000000]     |  Block | Normal        | Inner-shareable
[0x17df93000]                   |  Table |               |
  [0x140000000 - 0x17de00000]   |  Block | Normal        | Inner-shareable
  [0x17df94000]                 |  Table |               |
    [0x17de00000 - 0x17dfa0000] |  Pages | Normal        | Inner-shareable

For more information, please refer to the additional function documentation in arch/arm/include/asm/armv8/mmu.h.

Contributors