Llvmpipe hangs when rendering

On Haiku 64bit development versions (haven’t tested on Beta 5) after some time rendering llvmpipe hangs in certain ported games.

I’ve been experiencing this since the 28th of April while trying to port Warzone 2100 but I’ve been able to investigate more thoroughly since it started becoming more common lately (eg. this Haikuports issue)

The issue can be hard to notice as some games take longer to trigger it than others, triggering audio, music changes and some visual effects/pop-ups make it trigger faster as far as I can tell.

Software I found that’s affected so far:
GemRB (SDL2, both 0.9.4 and dev versions)
Dune Legacy (SDL2, 0.99+, haven’t triggered it in 0.98.2 (the repo version) yet)
UZDoom (SDL2, rarely, can take 1h+ to trigger)
Warzone 2100 (SDL3)
re3 (GTA3) (librw, constant? but might have other issues too)

When it happens I’ve been “dealing” with it by opening an llvmpipe thread in the Debugger (usually 0 or 1) to debug it and then make it run again. This works short term but the hang becomes more frequent the more it triggers.

When opening the Debugger the stuck llvmpipe thread shows this:


(And then the Debugger often crashes as you can see)

Should I make a ticket for this? Is this a Haikuports issue or a Haiku issue?

2 Likes

Use: Haikuports → Issues.

1 Like

In case it’s an Haikuports issue the ticket that’s already there should be fine, I’ve put a lot of this info up there too :slight_smile:

1 Like

It’s probably a Haiku issue. We’ve had issues with pthread barriers before; previously we used Mesa’s internal barrier implementation, but I fixed a lot of issues in ours, and at least it stopped hanging immediately, but if it hangs after some time then some race must still remain.

It won’t be easy to debug this inside Mesa. I came up with a test a while back that tried to trigger races/deadlocks in the pthread_barrier implementation. We’ll need to try and expand that and catch this issue, too. (Can you try running it on your system and see what happens?)

2 Likes

(You may want to tweak the numbers in that test, and disable some of the prints and sleeps, while at it.)

1 Like

Actually I think we should start by determining which mutex_lock call we are at in each thread. That will indicate a lot. But if Debugger keeps crashing, that may be difficult to do. Can you start by opening a ticket with a crash report for Debugger? Is that reproducible by just attaching Debugger to this process, even if it’s not hung?

2 Likes

Also, as an important note for this thread (which I know x512 and waddlesplash know)… Haiku’s OpenGL implementation is currently in a state of flux. The next major version of Mesa removes the legacy rendering pipelines and falls back to libglvnd

[Notice] Major Mesa change · Issue #8152 · haikuports/haikuports · GitHub has details.

3 Likes

It’s pretty late here, I’ll try testing some of the things in this thread tomorrow, but in the meantime I tried the Debugger thing some more and opened a ticket for it.

Apparently so, it’s very frequent but it’s not a 100% thing, at times the Debugger will work just fine.

I’ve tried the test a bunch of times with some different configurations, the most intensive one has been 100 concurrent threads with the snooze disabled and some programs open and it executed without any issue. (I assume the test would hang otherwise)

I still have to try running the test while an app is hung, I’ll edit this post in case that brings some new information.

It looks like the move to Mesa 25 + libglvnd could solve this issue, I’m currently testing them with some of the affected apps and I can’t seem to trigger it anymore.


It does seem to use the llvmpipe threads more efficiently as well. Still, there are some quirks to iron out afaict (the disk$0 thread name for instance)

3 Likes

Well, there’s probably still a bug in Haiku’s pthread_barrier implementation; it would be good to try and track that down anyway. I’ll see about fixing the Debugger problem so we can get better crash reports.

4 Likes

I fixed the Debugger crashes in hrev59285.

If you build a version of libroot.so’s pthreads with debug information (i.e. edit libroot/posix/pthread/Jamfile to add SubDirC++Flags -g ; and then touch * in that directory and rebuild libroot.so), then this should hopefully not affect timing (whereas DEBUG=1 to Jam changes optimization level and thus timing), and then when it hangs, we should be able to get an exact line where each thread is stopped.

3 Likes

Does it work with GDB?

Alright, I never built anything from Haiku’s source but once I wrap my head around that I’ll downgrade mesa back to 22, test it and post the results in a ticket.

1 Like

Out of curiosity, what exactly is the purpose of this command? Sometimes I face a build failed due to “untouched“ files. The 2nd attempt succeeds. So, my guess is to play with `chmod`.

It’s to force a rebuild of everything in that directory. The equivalent would be cleaning the generated objects, but it’s not so easy to do that for just one target with Jam, I don’t think?

1 Like

Touch resets the modification date for the files. This makes sure the .cpp are newer than the existing .o files, and tools like jam or make will use that to decide the .o needs to be rebuilt.

This is needed when changing build flags, because that should result in a different .o file, but the .cpp file isn’t changed and jam/make considers the .o already up to date.

1 Like

OK, thank you both.