Unlocking from wrong thread. How do I break on this msg?

memecode · September 2, 2023, 2:14am

Unlocking BLocker with sem 131860 from wrong thread 2828, current holder 2916 (see issue #6400).

So I’m seeing this in one of my apps. But I’d like to break in the debugger when that event happens so I can see the stack. Cause at the moment I have no idea what code is triggering it.

It would be nice if there was a function in the BLocker class that was to specifically just print that message. It would make breaking on the msg very easy, even without debug symbols. Because you can see the function names at least in the debugger. In fact that would be a good pattern to apply across all similar error/warning messages.

Btw that semid doesn’t exist if I open up the process in something like SystemManager. So I don’t even know what it relates to.

Any ideas on finding the source of this?

nephele · September 2, 2023, 6:50am

I’d find the source that generates that error and rebuild the library with a call to the debugger and put it in lib/ next to the executable

memecode · September 2, 2023, 11:17am

I didn’t know you could do that but it does sound promising.

I’m getting this error from haiku git:

../src/kits/support/ZstdCompressionAlgorithm.cpp:17:12: fatal error: zstd.h: No such file or directory
   17 |   #include <zstd.h>

BiPolar · September 2, 2023, 12:55pm

Try with pkgman install devel:libzstd (I thought that was already fixed in https://review.haiku-os.org/c/haiku/+/6055)

memecode · September 3, 2023, 11:35am

I’ve successfully built an instrumented libbe.so but when I load my application in the debugger, it still uses the /system/boot/lib version rather then the libbe.so in the same folder as my executable. Is there some other step I missed?

Seems like they are the same type of binary:

~/code/lgi/trunk/lvc> file libbe.so
libbe.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
~/code/lgi/trunk/lvc> file lvc
lvc: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, with debug_info, not stripped

There isn’t an “ldd” or “otool” to check dynamic lib dependencies?

nephele · September 3, 2023, 12:04pm

Yes, I wrote above to put it in a folder called lib/ next to the executable.

: )

edit: you can check loaded images with “listimage”
Every executable on Haiku is a shared library

PulkoMandy · September 3, 2023, 12:25pm

readelf -aW file.so |grep “NEEDED”

But that will not tell you where the lib will be searched, just hte library name.

For library search paths, check the LIBRARY_PATH environment variable., which you can adjust if needed to force loading a specific library (or just use one of the paths that are already there)

Lrrr · September 3, 2023, 2:54pm

You can use the lddtree command from the pax_utils package.

BiPolar · September 3, 2023, 4:45pm

@memecode… To follow up on @Lrrr’s suggestion: pkgman install cmd:lddtree should work without knowing the name of the package that contains it.

Notice the pattern here… pkgman <command> {cmd:,devel:,lib:}.

memecode · September 3, 2023, 9:16pm

Ah, yes my mistake… didn’t read carefully.

Ok, now I have it in the right folder. But the program exits almost immediately without printing much. Which is not the same behavior as the system libbe.so. I suspect I need to backdate the haiku checkout to be compatible with my system. e.g.

git checkout <haiku_beta4_hash>

But how to I find the right hash to go to?

~> uname -a
Haiku haiku4 1 hrev56578+87 Sep  3 2023 06:02: x86_64 x86_64 Haiku

That’s seems super recent. So there should be no mismatch to the current HEAD of the git repo… weird.

I wonder if I can I get debug symbols for the build of haiku I’m running? Is a debug build of Haiku a thing?

Also: I read through the assembly of BLocker::Unlock (of the system libbe.so), comparing it to the C++, and found a good point to put my breakpoint despite not having the symbols. So I now have an accurate stack trace of where the rouge unlock is called from. GG

My working theory is that the “system” BFont I created in the startup thread which is getting shared across many threads is being used incorrectly. In single threaded GUI’s like Linux and Windows it’s fine. Haiku… not so much.

memecode · September 4, 2023, 12:50am

My code is somehow trigging this error message on the sLock in src\kits\app\AppServerLink.cpp.

In the context of calling BFont::GetStringWidths (src\kits\interface\Font.cpp).

BPrivate::AppServerLink is constructing (and there for locked) and destructed in the same function call, so also the same thread. How could it possibly have a mismatched lock/unlock thread error in ~AppServerLink?

waddlesplash · September 5, 2023, 11:32pm

R1/beta4 is on a branch (called r1beta4.) You will note the current HEAD of the git repo is hrev57259, quite far ahead of where you are. So, either switch to a nightly build, or check out the r1beta4 branch.

memecode · September 5, 2023, 11:49pm

Yeah it’s not the fonts.

It’s something to do with the sub-processes. So this particular app spawns off some subprocesses to read the versions of the various vcs’s installed (hg, git, svn etc) as part of it’s startup. It’s a front end for version control. Anyway, if I disable the subprocesses it all starts working normally.

The code that runs subprocesses is just your standard fork and execve that works great under linux and macosx. Although it does make pipes to talk to the stdout and stdin of the subprocess. Something about calling that messes up the parent process. Does the child inherit something it shouldn’t?

Is there a good example of running a subprocess under Haiku without trashing the parent process?

waddlesplash · September 5, 2023, 11:55pm

Do you have a stack trace of the problem? It seems very strange that subprocesses are causing this issue.

memecode · September 6, 2023, 12:00am

It’s not always in the same place… but this is one example. It’s always in ~AppServerLink though. And it happens sometime after the subprocesses have all run and exited. Although I do see those subprocesses sitting in the list of processes in the debugger. So maybe they aren’t being cleaned up properly?

I don’t waitpid on the child process in this case… do I HAVE to do that to make it actually finish up what it was doing?

waddlesplash · September 6, 2023, 12:06am

This is in the parent process, not the subprocess, I guess? And some other thread than the one creating subprocesses?

What are the threads the BLocker lists in its message? Do they both exist?

waddlesplash · September 6, 2023, 12:07am

You shouldn’t need to, no. The list of processes in the Debugger sometimes doesn’t update correctly, I think. Check with ProcessController to see if they’re really still around.

memecode · September 6, 2023, 12:12am

Yes.

Unlocking BLocker with sem 240608 from wrong thread 9442, current holder -1 (see issue #6400).

Where ‘9442’ is the thread of my main application window. -1 is IDK… ha. The stack trace is from that same thread.

waddlesplash · September 6, 2023, 12:14am

-1 indicates there’s no thread actually holding the BLocker. So this usually means “unlock without having been locked before.” This is pretty strange, especially if it’s the AppServerLink lock; there shouldn’t be a way we wind up in that state.

Given that other processes are involved, I guess there’s some chance this is memory corruption? But if the memory was modified, this should have resulted in 0’ing the lock, not -1 in one specific field. Very strange.

If you have steps to reproduce and open a ticket, I may be able to look into it next week (or perhaps someone else will beat me to it.)

memecode · September 6, 2023, 12:21am

Ok I’ll come up with a way of reproducing it.

My working theory is that between the fork and the execve there are 2 copies of everything? And are they both running? And some counting of locking / unlocking gets messed up? It’s only a short period of time… but non zero. I admit I don’t know enough about the fork implementation to really have a stab at understanding the issue.