[GSoC 2024] Fixing the crashing | Haiku Project

The benefit is that the rendering pipeline is now opengl, and this seems unlikely to change.

And when hardware acceleration is deployed, then it’s accelerated, and that catches haiku up to the reat if the market

The rendering is not OpenGL in any of the WebKit ports. It’s either Cairo or Skia in the Linux ports, and it is app_server commands in ourcase. This doesn’t change when you enable OpenGL, for the simple reason that OpenGL is a 3D API, and rendering a webpage is mostly a 2D work, with, for example, a lot of text rendering, that OpenGL has no idea how to do.

The OpenGL functions will be used only for compositing, and even then, only in some cases where the website requested it. Because setting up a whole 3D rendering stack for rendering a website is a lot of work, and only worth it for websites that do crazy transparency and rotation effects and animations.

Good to know. Though, for now, I’ll continue with it enabled until I’m in a position to be able to test changes to it.

1 Like

I will point out that when you avoid OpenGL you are traveling the path less traveled… 99% of browsers out there are NOT running the codepath you are trying to enable. So… I’d suggest rethinking that and work on getting the underpinnings to be as much the same as everywhere else as possible.

A TON of websites require compositing so unless you want to render like its 2005… yeah do what everywhere else does.

Yes, there’s a cost to not enabling it and there’s a cost to enabling it. I still don’t know the exact costs to either path since I’m currently working on fixing another problem. Later I’ll need to measure the cost of both paths and make a decision.

At this point runtime cost is irrelevant… since it isn’t working, so saving yourself any pain points by making it work the same as everywhere else is probably what I’d aim for.

Also, hardware OpenGL needs to happen, adding one more motivation for that isn’t hurting anyone.

1 Like

Eh, it’s not the runtime cost I’m worried about, but more about how much time it would take to get running, because, according to waddlesplash

Edit: Oops, EGL != OpenGL. EGL is more related to creating and drawing on windows as opposed to 3d graphics.

Yes, but webkit uses EGL for OpenGL, if you can‘t use it you might have to try and make it use HaikuGl instead. Not sure how much worl that is : )

That’s the hard part with OpenGL. Once you have that, the OpenGL APIs are standard and just work. And our implementation is Mesa, like everyone else’s.

EGL is an attempt to standardize that part. If you don’t use it, you will have to write custom Haiku code with BGLView instead. Previously, WebKit could use glx, which is the X11specific way of doing the context/window creation. But now, it’s egl-only for Linux (allowing things to work on Wayland, and maybe even DirectFB if that’s still a thing).

So, yes, if we go with OpenGL, using EGL is likely a good idea. But then we have to make it work on Mesa side.

3 Likes

This EGL issue in Haiku, which needs to be fixed:

So, make it work, even with the bugs. At the very worst, the trail through the jungle is beaten and others can start pitching in to get it working fully.

1 Like

Progress! No more crashing!

Well, at least as long as a window (Tracker in this image) doesn’t get moved onto MiniBrowser.

Technical details

This time, it was just a one line fix. WebProcess tried decoding an integer from a message. Normally, it would just decode the integer directly. But Haiku’s port specified that the integers should be treated specially, as a message attachment. Well, we don’t use message attachments, and, as you can guess, this mistake eventually led to a crash.

The fix: Haiku doesn’t use attachments atm, so just make the attachment type the closest thing to nothing: an empty struct. Now no more integers will be confused as attachments!

12 Likes

It appears someone has been putting some work into directfb… but I’d say its in the solidly avoid terretory. Maybe they are maintaining it because they have legacy devices or something to support? https://directfb2.github.io/

Now I can cover it all I want. The problem was calling something from the window thread which should have been called from the main thread. (PR)
image

MiniBrowser is now sufficiently stable that I can move on :tada:. I still can’t resize it or exit it without crashing, but good enough for now.

Now I wonder what is needed to get a web page displaying…

17 Likes

Nice work @Zardshard ! :+1:

2 Likes

The three processes need to talk to each other. If tey can’t, not a lot will happen.

I recommend that you use devconsole - Gitiles (git clone https://pulkomandy.tk/gerrit/devconsole) and enable logging (using the appropriate environment variable for WebKit), since a lot of the logging is sent to DevConsole (this allows to get the logs from all three processes in a single place, with highlighting of which logs come from where).

The general idea is that the browser (the “UI process”) creates a “connection”, and forwards each end of that connection to the web process and the network process when starting them (through command line arguments). The two processes should then be able to talk to each other.

The generic UNIX version of WebKit (used for GTK and WPE ports for example) uses UNIX domain sockets for this. In the current Haiku code we try to use BHandler/BLooper, but that doesn’t work great, because you can’t create a BMessenger (that would be the logical thing to use as a “connection” object) which targets a process that isn’t started yet.

So, there are two options, either figure out a way to make this work using BMessenger (Rajagopalan had implemented something using a temporary hashmap in the UI process allowing to create the “real” BMessenger after the processes had started, but it wasn’t clean and a bit hard to follow). Or, we could switch to UNIX domain sockets as other UNIX ports do, but I’m unsure how convenient this will be if the processes also have to run a normal BLooper event loop (which is how it’s done so far).

4 Likes

Already have this :slight_smile:

Yes, that seems to be part of the problem. Here’s what I know so far: The InitializeWebProcess and InitializeNetworkProcess messages get through. So MiniBrowser appears to be able to communicate with WebProcess and NetworkProcess. WebProcess then sends a GetNetworkProcessConnection to MiniBrowser, but MiniBrowser doesn’t receive it.

1 Like

Alright, I have the fix! Turns out the run loop was failing to process some messages.

Here’s the rabbit trail that I had to follow to figure out the problem. No, debugging is fortunately not usually this long and complicated. Feel free to skim it, since it’s probably very difficult to actually understand.

  • Website is not drawn on screen
  • Because BackingStore::incorporateUpdate is unimplemented
  • Because SharedMemory::map is not implemented correctly
  • Because it isn’t called from anywhere
  • Because ShareableBitmap::create(Handle&&, SharedMemory::Protection) isn’t called from anywhere
  • Because BackingStore::incorporateUpdate isn’t called
  • Because DrawingAreaProxyCoordinatedGraphics::update isn’t called
  • Because Messages::DrawingAreaProxy::Update is never sent
  • Because DrawingArea::display isn’t called
  • Because DrawingAreaCoordinatedGraphics::displayTimerFired isn’t called
  • Because DrawingAreaCoordinatedGraphics is never constructed
  • Because DrawingArea::create isn’t called
  • Because WebPage::WebPage isn’t called
  • Because WebProcess::createWebPage isn’t called
  • Because Messages::WebProcess::CreateWebPage is never received
  • Because Connection::dispatchMessage is not receiving the message
  • Because WebProcess is stuck waiting for a network process connection
  • Because Messages::WebProcessProxy::GetNetworkProcessConnection is not being processed
  • Because the ‘loop’ message is never sent to MiniBrowser.
  • Because it was perceived to already be processing incoming messages, so no attempt at waking it up was made
  • Because m_handler->Looper() is null
  • Because the handler needs to be attached during RunLoop’s construction.

Now imagine if a compiler could just give you a stack trace like that at the very beginning? First line is the symptom, middle lines narrow down why the symptom is happening, and the last line states the bug that needs to be fixed.

Now, with that fixed, I see what @PulkoMandy meant by saying that IPC needs to be fixed. Now that the messages are flowing, I see that the current implementation of IPC has some problems with creating a connection between the NetworkProcess and WebProcess, among perhaps other things. To address these bugs, I’ll probably need to address how I will fundamentally approach message passing, whether with BLoopers, ports, or UNIX domain sockets. I’ll be writing a blog post with more details, once I figure out exactly what the problem is and what some ways of approaching it are.

25 Likes

On the topic of message passing, which type unix, be, etc will be the easiest to maintain long term ??

That would probably be ideal, great work on debugging that, impressive work

That’s one of the things I want to address in my blog post, once I get my head wrapped around the system.

1 Like