Notes:
- I may use interchangeably “team id” and “pid” in this conversation
- I’m not sure of the category in which to post this, if a moderator may sort me out and explains (in private) a more adequate place, I’d be grateful
- I didn’t open a Trac issue as they might already be one I didn’t find, if not, please tell me I’ll open one (and tell me in what module to triage it)
I’m encountering a stability issue with Haiku. I’m trying to track down what’s
happening, so I wrote a program for that. But I’m not sure my watching app is right,
so I’d like someone to give it a look and tell me if I’m wrong.
This didn’t lead to any data loss.
I’m having an Haiku (R1/beta4 hrev56578+95
) deployed in a virtual machine
qemu under Linux
(Linux mothra 5.10.0-23-amd64 #1 SMP Debian 5.10.179-1 (2023-05-12) x86_64 GNU/Linux
)
I’m using it daily to learn BeOS API by writing programs and sometime attempting to
compile Haiku’s components. In order to simplify my life, I downloaded and
compiled Byobu, which works ok as long as it’s not too demanding on the Terminal
app (but that’s a problem for a different day). Byobu spawn a couple of processes
per second to refresh it’s status bar.
About ever week or so, Haiku can’t spawn processes or threads anymore. The
Deskbar becomes unresponsive, but <ctrl>+<alt>+<del>
allow (sometimes) for a
soft reboot, and all running app are still usable.
When attempting to run from the Terminal app crashes it: it displays a message
about forking failing and exits.
I observed that the team id was growing. During the past 2 months, I was writing
down the uptime and the last team id with the ps
command. Once I reach the last
tether of my patience, I decided to monitor this growth:
collect-ps
Once compiled, you can use it to fetch the last team id:
./collect-ps
or watch the system:
./collect-ps -w &
This will collect the last team id and the max team id every second, and attempt to
spawn a process and a thread.
On my last attempt, the KO is reached for PID 25’465’541. This number doesn’t look
like anything to me. I’m waiting for the next crash ^^;
At this time, when I ran ps
in an open Terminal app, this didn’t crash and output that message:
~/workshop/belab/todo> ps
-bash: fork: Unknown Device Error (-2147432385)
-bash: cannot make pipe for command substitution: Too many open files
My hypothesis is that the kernel reaches an integer ceiling and overflows. But I
might be wrong, maybe an other resource id is exhausted.
I join the logs I collected.
So, in my monitoring tool, I observe a few strange problem:
- The team id seems stuck for
a while, even though aps
will show higher team ids. It’s like if that value was cached at some point, but I don’t have an good enough knowledge of the system.
And my monitoring tool might be erroneous. - sometime the team id will reverse to a previous value
Is this a known behaviour? I didn’t find anything in the Trac concerning such issue.