Vulkan lavapipe software rendering is working on Haiku

In that case you have to reboot anyhow :wink:

There is one thing that you should temporary block though, the wait for engine idle function is still called by haiku which makes no sense at all without using the engine. Since the GPU on newer Intel cards is still down even dat would stall app_server big time thereā€¦

1 Like

GPU engine is usually used smarter then waiting for idle until running next commands. On Radeon there are fences that are written to the end of command sequence and signaled on completion by writing sequence number and optionally triggering interrupt.

Accelerant engine API seems have sync token for that.

It is just an observation I made. BTW everything you mention is just the acc engine, none of that is used by haiku currently, so it should not crash haiku if whatever stalls on that engine.

On Nvidia what you mention also applies. Though I did not use interrupts there. For 3d I never waited for engine idle of course, that would not be clever. However, if you want to set a 2d mode from haiku, itā€™s probably best to do that only if the engine is totally idle, as memory and front buffer are about to be re-arranged.

Update: hmm come to think of it, waiting might be precisely the reason why acceleration did not speedup haiku drawing compared to software drawing. I saw app-server calls that hook after use of every 2d primitive. (Pure observation of behaviour). I think I will re enable 2d acc for some personal tests here at some point.

5 Likes

Isnā€™t it a job for setting mode hook?

Maybe. This is not how Beos did it though afaik: app-server stops engine use and then calls modeset.

How and when all hooks are called on Beos is very interesting to monitor closely. It was quite clever in my opinion, with of course the limited knowledge I have .

I think that sync to token API can be used. And ring buffer for GPU accelerated commands.

1 Like

Accelerant API need some redesign anyway:

  1. Multiple monitor support.
  2. GPU memory management.
  3. DMA operations between buffers (can be considered as engine).
  4. Specify buffer when setting mode instead of allocating it internally.
  5. API for swapping buffers.
  6. Reconsidering host/clone.
3 Likes

I agree of course :wink:

I see that intel_extreme.accelerant use ring buffer for engine, but do not use fences in sync to token, just wait for idle.

Ok, I did not look at the finer details here, but then if speed of acc was measured with this driver then it would be a non valid comparison.

BTW I used an app called BeRoMeter 1.2.6 (I think) to benchmark acc in the old days on BeOS with the Nvidia driver.

Edit: the ring buffer is down in newer cards as well (Intel) I expect.

I implemented interrupt ring and make it handle fences update. Interrupts itself are not yet handled, interrupt ring is pooled by timer.

GPU use ring buffer for interrupts too, writing interrupt event packets (VM page fault, ring fence completed, display connected/disconnected (display hot plugging) etc.). Interrupt ring is working in opposite direction compared to GFX/DMA ring: it is written by GPU and read by driver.

IH: fWptr: 12992
InterruptPacket(181, 0, 0, 0)
handlerItem(handler: 0x115e8760e79, arg: 0x11a9a8591020)
RadeonRingBuffer::UpdateFences()
Fence(426): resolved
14 Likes

Great, one step forward! Does it fix anything in particular or improve speed in some tests?

Not really, but reduced pooling to one place. Speed was improved a bit with removing unneeded snooze(1000) in GFX ring reset hack.

I implemented basic RLC (ring list controller as mentioned above) block handling, but it not solved GFX ring reset problem for now.

12 Likes

If the api needs changes, just do it. Itā€™s not like there are any other 3d cards depending in it afiak.

The drivers we might have could be adjusted, if we have any at all

1 Like

I finally fixed GFX ring problem by adding additional tracing to Linux andgpu driver andcomparing results. The problem was different units in RPTR and WPTR registers: uint32, not bytes. DMA and GFX ring use different ring position units. Now GFX ring works with non zero RPTR and wrapping.

rptr: 0x1e00
wptr: 0x1e00
*fRptrAdr: 0x780
*fFenceAdr: 6205
Fence(6206): resolved
rptr: 0x1f00
wptr: 0x1f00
*fRptrAdr: 0x7c0
*fFenceAdr: 6206
Fence(6207): resolved
rptr: 0
wptr: 0
*fRptrAdr: 0
*fFenceAdr: 6207
Fence(6208): resolved
rptr: 0x100
wptr: 0x100
*fRptrAdr: 0x40
*fFenceAdr: 6208
Fence(6209): resolved
rptr: 0x200
wptr: 0x200
*fRptrAdr: 0x80
*fFenceAdr: 6209
17 Likes

Also curious here about performance improvement, If I may ask :slight_smile:

Changed memory clock settings with ATOMBIOS, engine clock change donā€™t work for now.

screenshot70

21 Likes

Oh you doubled speed since first ā€œTestApp - Haikuā€ post :slight_smile:

Super nice.

76 FPS with engine clock changed. 26 FPS in SSAO demo. Now it get significant benefit over software rendering.

20 Likes

This sounds amazing!!!