Trying to boot Haiku on ARM device

Hello,

Last week I’ve been hacking around the ARM (32 bit) port a little bit. I’m trying it on real hardware (because QEmu would be too easy). The setup is a Lichee Pi Zero board (single core ARM, &$MB of RAM, quite simple hardware that should be good for early testing, later I can move to larger systems)

I fixed a few things in the bootstrap build so I can get myself a bootloader and kernel and filesystem built. But I didn’t manage to go very far into booting the kernel.

I think the status is the same as @pengphei hit back in 2021: UEFI Haiku boot loader for ARM - #84 by pengphei

My understanding of what happens:

  • Initially the bootloader uses EFI for serial and console logging and this works fine
  • At some point we call ExitBootServices, after that we should switch to using the UART directly
  • UART address is correctly identified from FDT
  • The MMU is enabled by using GetMemoryMap, converting the memory map in our own thing in arch_mmu_generate_post_efi_page_tables, and then enabling it with arch_mmu_post_efi_setup which calls EFI SetMemoryMap. After that, if I understand correctly, we should be running with the MMU enabled, and the memory set to the one we asked for
  • On x86. the UART is accessed with in/out instructions which are not affected by MMU. So we can continue to use the UART normally
  • But on ARM, the UART is memory mapped. And I think we’re not including it in the memory map passed to SetMemoryMap, and so it isn’t mapped anymore once the MMU is on. As far as I can see, U-Boot isn’t including it in its provided memory map either.

So I think the first attempt to access the UART will just fault, and the boot fails there, with no way to complain about it.

Am I missing something? Is it already handled somewhat differently on other architectures?

Thanks for your help, I’d like to have some UART working to better see what I’m doing :slight_smile:

9 Likes

No you are not missing anything, the only code for x86_64 is in haiku/src/system/boot/platform/efi/serial.cpp at 0ab44682c99358698ba38b89a1c61913f2ef9325 · haiku/haiku · GitHub though it should be in arch/.. IMO …

1 Like

FWIW, last I checked (earlier this year IIRC), the ARM32 boot with EFI does get past this phase and into the kernel on QEMU, and the kernel does log things to serial out. I don’t know if that’s coincidence or something else going on though.

1 Like

I got Beagleboard, Beagleboard xm and Beaglebone black in my collection now. If corresponding version is under testing, I may help.

2 Likes

That is already tracked in Making sure you're not a bot!

However it’s not the part I’m most annoyed by if we’re talking about moving code around. The 5 nearly identical arch_start.cpp files look like they’re worth looking into and moved to shared code, for example, as there are very few arch-specific thigns in there, but they diverge for various other reasons (formatting, variable naming, comments, …)

At the moment I’d rather like to get one platform working. But I can try to unify things a bit so that either I find relevant changes in other architectures, or whatever fixes I end up doing benefit to everyone.

That is quite a different setup, I think:

  • Is it using U-Boot or is it using Tianocore or some other EFI implementation? The firmware may itself put some things in the mapping returned by GetMemoryMap. In my case I see only RAM there (about 6MB of it, of which some will be freed because it’s only used for boot services), but I see no mapping for devices. Or maybe some older version of U-Boot returned more things there? I tested one from 2023 and updated to a more current one from this year, and that made no difference.
  • Maybe the kernel knows how to map the devices in virtual memory later on, but since the kernel seems not to be starting, I’m interesting in tracing the end of the bootloader code and the jump into the kernel. Did you see logs from the late boot process (after the “So long, EFI” message), or is there a gap with logs resuming only after the kernel is started?
  • Do the tests in QEMU use an UART, or do they use the framebuffer console? On my VGA output I only get the empty splash screen so far, but I may try to keep that mapped as well (I’m not sure if we already do that, or if we also rely on the firmware to keep it reserved somehow).
2 Likes

The switching of serial and passing it to the kernel has been the issue of many hangs even on x86_64 so it might be the code and logic itself not even the serial part. Other than that I guess dumping memory maps before and after is the only suggestion I have.

3 Likes

I don’t recall. But getting things running in ARM QEMU wasn’t too hard (a bit annoying to find the right arguments to the QEMU command line for EFI), so you may be able to test that way without too much difficulty.

This at least I know how it works on x86: the bootloader records the physical address of the memory-mapped framebuffer, and then re-creates virtual mappings for it after the VM is initialized. (But if the framebuffer memory-map goes away, that will indeed be a problem.)

2 Likes

I did some more digging and there is indeed a problem.

This is how it goes:

  • During early boot, the UART is correctly detected from the FDT (finding the chosen node, identifying it, etc).
  • A DebugUART object is created, mapped at the physical address for the UART. This is done quite early on, in dtb_init
  • The kernel args are also set to this same address
  • In arch_mmu_generate_post_efi_page_tables, the UART is remapped to a different virtual address (same physical address, of course). The kernel args are adjusted accordingly.
  • The DebugUART object is NOT updated/moved to match.
  • After exiting boot services, the new memory map (with the UART moved to a different virtual address) is applied in arch_mmu_post_efi_setup
  • Finally, the DebugUART object starts to be used

Conclusion: anything between that point and the start of the kernel that attempts to write to the UART will either panic because nothing is mapped at the old address, or will overwrite some memory that happens to be at the same address. Maybe it doesn’t happen if the firmware happens to include the UART in its reservations.

ARM64 seems to have the same problem. RISC-V uses significantly different code. x86 (32 or 64) is not affected by the problem because it doesn’t use a memory mapped UART.

6 Likes

Here is a proposed fix: https://review.haiku-os.org/c/haiku/+/10068

And corresponding serial trace:

arch_smp_register_cpu()
cpu
  id: 0
Welcome to the Haiku boot loader!
Haiku revision: hrev59162+1+dirty
add_partitions_for(0x4264f160, mountFS = no)
add_partitions_for(fd = 0, mountFS = no)
0x4264f1a0 Partition::Partition
0x4264f1a0 Partition::Scan()
check for partitioning_system: GUID Partition Map
check for partitioning_system: Intel Partition Map
  priority: 810
check for partitioning_system: Intel Extended Partition
0x4264f320 Partition::Partition
0x4264f1a0 Partition::AddChild 0x4264f320
0x4264f320 Partition::SetParent 0x4264f1a0
new child partition!
0x4264f3f0 Partition::Partition
0x4264f1a0 Partition::AddChild 0x4264f3f0
0x4264f3f0 Partition::SetParent 0x4264f1a0
new child partition!
0x4264f1a0 Partition::Scan(): scan child 0x4264f320 (start = 2097152, size = 41943040, parent = 0x4264f1a0)!
0x4264f320 Partition::Scan()
check for partitioning_system: GUID Partition Map
check for partitioning_system: Intel Partition Map
check for partitioning_system: Intel Extended Partition
0x4264f1a0 Partition::Scan(): scan child 0x4264f3f0 (start = 44040192, size = 8025800704, parent = 0x4264f1a0)!
0x4264f3f0 Partition::Scan()
check for partitioning_system: GUID Partition Map
check for partitioning_system: Intel Partition Map
check for partitioning_system: Intel Extended Partition
0x4264f1a0 Partition::~Partition
0x4264f320 Partition::SetParent 0x00000000
0x4264f3f0 Partition::SetParent 0x00000000
0x4264f320 Partition::_Mount check for file_system: BFS Filesystem
0x4264f320 Partition::_Mount check for file_system: FAT32 Filesystem
0x4264f320 Partition::_Mount check for file_system: TAR Filesystem
0x4264f320 Partition::~Partition
0x4264f3f0 Partition::_Mount check for file_system: BFS Filesystem
PackageVolumeInfo::SetTo()
PackageVolumeInfo::_InitState(): failed to parse activated-packages: No such file or directory
load kernel kernel_arm...
smbios: found v3 at 0x43873000
Chosen UART:
  kind: 8250
  regs: 0x1c28000, 0x400
  irq: 32
  clock: -1
Chosen interrupt controller:
  kind: gicv2
  regs: 0x1c81000, 0x1000
        0x1c82000, 0x2000
Chosen timer:
  kind: armv7
  regs: 0x0, 0x0
  irq: 29
kernel:
  text: 0x80000000, 0x1ac000
  data: 0x801ac000, 0x62000
  entry: 0x80074260
Kernel stack at 0x82531000
System provided memory map:
  phys: 0x40000000-0x41eac000, virt: 0x40000000-0x41eac000, type: EfiConventionalMemory (0x7), attr: 0x8
  phys: 0x41eac000-0x42000000, virt: 0x41eac000-0x42000000, type: EfiLoaderData (0x2), attr: 0x8
  phys: 0x42000000-0x4205c000, virt: 0x42000000-0x4205c000, type: EfiBootServicesData (0x4), attr: 0x8
  phys: 0x4205c000-0x4205d000, virt: 0x4205c000-0x4205d000, type: EfiConventionalMemory (0x7), attr: 0x8
  phys: 0x4205d000-0x4264e000, virt: 0x4205d000-0x4264e000, type: EfiLoaderData (0x2), attr: 0x8
  phys: 0x4264e000-0x4264f000, virt: 0x4264e000-0x4264f000, type: EfiBootServicesData (0x4), attr: 0x8
  phys: 0x4264f000-0x427cf000, virt: 0x4264f000-0x427cf000, type: EfiLoaderData (0x2), attr: 0x8
  phys: 0x427cf000-0x42826000, virt: 0x427cf000-0x42826000, type: EfiLoaderCode (0x1), attr: 0x8
  phys: 0x42826000-0x4282a000, virt: 0x42826000-0x4282a000, type: EfiBootServicesData (0x4), attr: 0x8
  phys: 0x4282a000-0x42832000, virt: 0x4282a000-0x42832000, type: EfiACPIReclaimMemory (0x9), attr: 0x8
  phys: 0x42832000-0x42836000, virt: 0x42832000-0x42836000, type: EfiBootServicesData (0x4), attr: 0x8
  phys: 0x42836000-0x42837000, virt: 0x42836000-0x42837000, type: EfiRuntimeServicesData (0x6), attr: 0x8000000000000008
  phys: 0x42837000-0x42838000, virt: 0x42837000-0x42838000, type: EfiBootServicesData (0x4), attr: 0x8
  phys: 0x42838000-0x42859000, virt: 0x42838000-0x42859000, type: EfiRuntimeServicesData (0x6), attr: 0x8000000000000008
  phys: 0x42859000-0x4285b000, virt: 0x42859000-0x4285b000, type: EfiBootServicesData (0x4), attr: 0x8
  phys: 0x4285b000-0x4285c000, virt: 0x4285b000-0x4285c000, type: EfiRuntimeServicesData (0x6), attr: 0x8000000000000008
  phys: 0x4285c000-0x42866000, virt: 0x4285c000-0x42866000, type: EfiBootServicesData (0x4), attr: 0x8
  phys: 0x42866000-0x43873000, virt: 0x42866000-0x43873000, type: EfiBootServicesCode (0x3), attr: 0x8
  phys: 0x43873000-0x43874000, virt: 0x43873000-0x43874000, type: EfiRuntimeServicesData (0x6), attr: 0x8000000000000008
  phys: 0x43874000-0x43a8b000, virt: 0x43874000-0x43a8b000, type: EfiBootServicesCode (0x3), attr: 0x8
  phys: 0x43a8b000-0x43a8d000, virt: 0x43a8b000-0x43a8d000, type: EfiRuntimeServicesCode (0x5), attr: 0x8000000000000008
  phys: 0x43a8d000-0x43ad5000, virt: 0x43a8d000-0x43ad5000, type: EfiBootServicesCode (0x3), attr: 0x8
  phys: 0x43ad5000-0x43ad8000, virt: 0x43ad5000-0x43ad8000, type: EfiReservedMemoryType (0x0), attr: 0x8
  phys: 0x43ad8000-0x43b00000, virt: 0x43ad8000-0x43b00000, type: EfiBootServicesCode (0x3), attr: 0x8
  phys: 0x43b00000-0x43d58000, virt: 0x43b00000-0x43d58000, type: EfiReservedMemoryType (0x0), attr: 0x8
  phys: 0x43d58000-0x44000000, virt: 0x43d58000-0x44000000, type: EfiBootServicesCode (0x3), attr: 0x8
Welcome to kernel debugger output!
Haiku revision: hrev59162+dirty, debug level: 2
PANIC: _mutex_unlock() failure: thread 0 is trying to release mutex 0x801c4f1c (current holder -1)

Welcome to Kernel Debugging Land...
Thread 0 "" running on CPU 0
stack trace for thread 0x0 ""
    kernel stack: 0x00000000 to 0x00000000
frame            caller     <image>:function + offset
 0 82534e14 (+  52) 80176538
 1 82534e4c (+  56) 800c850c
 2 82534e9c (+  80) 800c8a2c
 3 82534eb4 (+  24) 800c8d98
 4 82534edc (+  40) 800aaa98
 5 82534f24 (+  72) 800d1690
 6 82534f44 (+  32) 800d1e80
 7 82534fdc (+ 152) 8014f9cc
 8 82534ffc (+  32) 80074380
 9 438665b4 (+   0) 43a8c3f9
kdebug> 

We’re in the kernel debugger now!

I manually decoded the stacktrace to see where we are (using objdump on kernel.so and matching addresses):

 0 82534e14 (+  52) 80176538	arch_debug_call_with_fault_handler
 1 82534e4c (+  56) 800c850c	kernel_debugger_loop
 2 82534e9c (+  80) 800c8a2c	kernel_debugger_internal
 3 82534eb4 (+  24) 800c8d98	panic
 4 82534edc (+  40) 800aaa98	_mutex_unlock
 5 82534f24 (+  72) 800d1690	guarded_heap_allocate_meta
 6 82534f44 (+  32) 800d1e80	heap_init
 7 82534fdc (+ 152) 8014f9cc	vm_init
 8 82534ffc (+  32) 80074380	_start
 9 438665b4 (+   0) 43a8c3f9
13 Likes

Hmm, you are running with the guarded heap then I suppose?

Strange this bug manifests on ARM but not x86. (Well, actually I didn’t test on 32-bit x86; maybe the problem is 32-bit related somehow?) It looks like an initialized but not-locked lock is attempting to be unlocked, so somewhere there’s an imbalanced lock acquisition I suppose.

1 Like

I didn’t enable the guarded heap, and it would be a bad idea on this machine with only 64MB of RAM. So I’ll see why we get into that codepath at all?

1 Like

Ok, it turns out I had. It took a while to find out: I had a kernel_debug_config.h override, that is located in a directory that is part of .gitignore. So, git grep wasn’t looking in it and I couldn’t find the problem…

The regular heap gets me further, to “Did not find any boot partitions!"

So I guess I’m wiring the SDHCI driver to FDT next?

11 Likes

I decided to work on network booting. It will save me a lot of time moving the SD card back and forth between the device and the build machine.

Implementing EFI network support was relatively straightforward (thanks to a patch by kallisti5 from 2021 that was patiently waiting for someone to continue the work): https://review.haiku-os.org/c/haiku/+/3678

The network boot is working (well, the bootloader part at least). But I had to hardcode the IP address for now, as I don’t know how to retrieve it from UEFI services (I will keep looking and update the patch later when I find a way).

13 Likes

I fixed the problem of hardcoding the IP address. It is now forwarded from U-Boot to the bootloader. The bootloader loads the kernel from remote_disk_server and starts it.

Now I get to the “Did not find any boot partitions” error, which is expected sice I don’t have much drivers yet. But I’m not entirely sure how this is supposed to work.

In the case of PXE boot, the kernel is provided with a “Memory Disk” (basically an initramfs) that contains some drivers, enough to set up the network stack and network devices and mount the “real” root volume using those.

It looks like OpenFirmware (which I based the EFI code on) does not go through that, and directly passes a RemoteDisk as the boot device to the kernel, without any extra support. I’m not sure how that works, doesn’t the kernel need some drivers to find the root partition even when booting from disk?

I will study the early boot process more closely, I guess!

10 Likes

Well, I found and solved the next problem!

There is a list of “preloaded” kernel modules in the form of symlinks in /boot/system/add-ons/boot.

These are loaded by haiku_loader so the kernel can find and use them before having access to the rootfs. The network stack and network drivers were not part of that list (at least for the bootstrap image). Now the network stack is found and I see a few network drivers trying to initialize as well.

Next step is porting the if_awg driver from FreeBSD, as that’s the one I need on this hardware. We’ll see how porting an FDT based driver from FreeBSD goes!

I would also like to find a way to make the “reboot” KDL command actually reboot the device, as now I have to unplug and replug it between every attempt (I forgot to add a reset or power button when designing the motherboard…)

18 Likes