Troubleshooting ACPI boot hang on real hardware (Dell Inspiron 9400)


#1

Hi, long time watcher, first time actually getting involved. So I’ll jump right in:

I’m trying to run Haiku now on an older laptop of mine. I’ll give some specs below, but mainly I’m running into two issues: the boot process hangs when ACPI is not disabled, Network crashes when watching for available SSIDs, then later doesn’t operate correctly against an open wireless network.

I’ll probably end up filing different bug reports for each, but wanted to try figuring out the issues myself. I’m just looking for a little guidance first.

Here’s the problem: When I boot normally, the boot stops right before the rocket icon lights up. According to https://www.haiku-os.org/docs/userguide/en/bootloader.html#booting this means the final initialization of subsystems was last to light.

So I try booting again with on-screen debug output. The last lines I see are as follows:

Highpoint-IDE: supports_device()
Highpoint-IDE: supports_device(): unsupported device: vendor ID: 8086, deviceID: 27da
Highpoint-IDE: supports_device()
Highpoint-IDE: supports_device(): unsupported device: vendor ID: 8086, deviceID: 27c4
publish device: node 0xffffffff8d824328, path power/button/sleep, module drivers/power/acpi_button/device_v1
publish device: node 0xffffffff8d824148, path power/button/power, module drivers/power/acpi_button/device_v1
publish device: node 0xffffffff8d824008, path power/acpi_battery/0, module drivers/power/acpi_battery/device_v1

From here, nothing happens. I see the solid black cursor (not blinking) right below that line. I can’t type anything. CTRL+ALT+DEL does nothing (why would it?). My only option is to hard power-off the laptop.

I’ve tried blacklisting each of acpi_battery and acpi_button individually in the boot menu. The only difference is that the system hangs after the Highpoint-IDE lines. Fully disabling ACPI from the safe mode menu allows the system to finish booting, but obviously this isn’t ideal. What’s the next step I can take in troubleshooting ACPI on this laptop?

Any direction is appreciated!

Ok, now for some details about the system:

Dell Inspiron 9400

identical hardware to Dell Inspiron E1705
Core 2 Duo T7200 (dual core, 2.00ghz, 4096 KB cache)
4gb DDR2 667Mhz RAM, 3.25gb addressable (BIOS bug limits to 3.25gb)
NVidia GeForce Go 7900 GS (256MB DDR video RAM)
17 inch display at 1920x1200
BIOS Revision A04 (09/29/2006)
Video BIOS version 005.071.022.028.001.000
as a side note: the battery is completely dead and holds no charge

Haiku x86_64 Anyboot Nightly Image, booting from a USB stick
hrev51776

Any relevant info I’m missing?


#2

A 64-bit operating system, on a 32-bit CPU?


#3

Whoops. I got mixed up. :slight_smile: I fixed the specs of the laptop, it has a Core 2 Duo T7200, not the Core Duo T2500… But yes, it’s running x86_64 and it’s actually a 64-bit processor. Sorry for the bad info.


#4

Have a bug report created? Here sone linjs to help you. You found something like this on the haiku guides too.

http://besly.de/menu/search/archiv/dev/haiku-trac_eng.html
http://besly.de/menu/search/archiv/misc/beslysat_eng.html
http://besly.de/menu/search/archiv/misc/haiku-hardware_compatibility_list.html


#5

Can’t say much about your ACPI problem (other that Frederik (tqh) recently committed some changes that due to the current buildbot strike haven’t hit the Haiku package repo (no updates are currently available). Maybe those help, you’ll have to wait and see.

The network prefs crashing is probably this one (compare crash reports) : ticket #12024


#6

Forgive my ignorance on this: Would those changes need to hit the repo before they’re included in one of the nightly images? According to the hrev number, the nightly image that I’m using should come after tqh’s changes, so I expected them to be included.

Anyway, I’m going to keep poking at it in my free time. I’ll see about building the OS locally with tracing enabled and see if I can dig in a little more.

That may be it. I don’t think I had clicked, but perhaps mouseover/mouseout one of the menu items while it was updating. I’m not too worried about this for now, but glad to see it’s a known issue.

I haven’t created a bug report yet. I want to dig in a little bit myself before dropping the burden on someone else. Thanks for the links though! I’ll see about installing BeSly SAT from a wired connection once I get ACPI working (or give up on ACPI).


#7

Right, nightlies are indeed up to date.


#8

I would try to blacklist the wlan and graphics drivers.


#9

Thanks for the suggestion. I tried this to no avail. :slight_frown:

So I’ve built and run a slightly modified build. Here’s a basic overview of my changes:

  1. I removed the comment before each preprocessor define of TRACE_BATTERY (or similar) in the acpi_*.cpp driver sources
  2. I changed ACPICAHaiku.cpp's define of DEBUG_OSHAIKU to 2 (for full debug).
  3. I then made a few printf-style formatting changes so that the formatting wouldn’t cause compiler errors when targeting x86_64. (Warnings are treated as errors, and I never loosen code quality settings :laughing: ). Of course, my changes would likely break x86 regular build, so I won’t submit a patch – at least not in the current state. :innocent:
  • They were mostly simple things because I’m building for x86_64 and the strings were using %ld for UINT32; resolving compiler complains like "expected unsigned long parameter but received unsigned int".
  • There were just a couple where ACPI_MUTEX types preprocessed into mutex *, but the format string included %ld. Changed these to %p and casted the formatting input to (void*), since it’s actually a pointer.

Built and runs, but I can see the hang when I enabled debug output and disabled paging – it seems to be some infinite loop, with this spamming over and over:

acpi[1]: ACPI_STATUS AcpiOsAcquireMutex(mutex*, UINT16)(mutex: 0xffffffff820076c0; timeout: 65535)
acpi[1]: ACPI_STATUS AcpiOsAcquireMutex(mutex*, UINT16)(mutex: 0xffffffff820076c0; timeout: 65535 result: 0)
acpi[1]: void AcpiOsReleaseMutex(mutex*)(mutex: 0xffffffff820076c0)

Occasionally the thread number will switch (the beginning acpi[%d] format is rendered with find_thread(NULL) as the first parameter), and there are a few other messages thrown in between.

So I started again, this time not disabling the paging. The acquire/release spam starts extremely early in the ACPI-related logs. It looks like other things are happening in-between, but never when the mutex is actually held – unless I’m distracted and not reading carefully.

Once the rest of the messages finish, it seems the main spam only has an occasional set of lines like these:

acpi[1]: cpu_status AcpiOsAcquireLock(spinlock*)(spinlock: 0xffffffff82016610)
acpi[1]: void AcpiOsReleaseLock(spinlock*, cpu_status)(spinlock: 0xffffffff82016610)

This is repeated about 3 times before it returns back to acquiring and releasing the mutex.

And much less common, I’m seeing the following single line, obviously with different results:

acpi[1]: void* AcpiOsAllocate(ACPI_SIZE)(result: 0xffffffff8228cc80)

It makes sense that the results differ each time, since I’m not seeing any traces of that memory freed.

Interestingly, the mutex from the previous boot has the exact same pointer value on this boot. I suppose that’s probably to be expected if address space isn’t randomized.

Anyway, I’ve run out of free time for today. I must chase my daughters to sleep and then get some rest myself.

When I get time again, I’ll reduce the tracing a bit and see if I can find where/why it gets stuck in this loop of acquiring/releasing a mutex, then periodically acquiring more memory for ACPI. As always, I appreciate any pointers. :smile:


#10

What you see is the actual correct behavior of ACPICA, threads running and using mutex to protect or read its critical data, allocating memory.
And yes the debug code might not be up to date vs current types, please send a patch.
Did you eventually try Linux on this laptop? It might need AcpiTables patching.