Vulkan lavapipe software rendering is working on Haiku

rudolfc · December 4, 2021, 9:25am

In that case you have to reboot anyhow

There is one thing that you should temporary block though, the wait for engine idle function is still called by haiku which makes no sense at all without using the engine. Since the GPU on newer Intel cards is still down even dat would stall app_server big time there…

X512 · December 4, 2021, 9:49am

GPU engine is usually used smarter then waiting for idle until running next commands. On Radeon there are fences that are written to the end of command sequence and signaled on completion by writing sequence number and optionally triggering interrupt.

Accelerant engine API seems have sync token for that.

rudolfc · December 4, 2021, 10:07am

It is just an observation I made. BTW everything you mention is just the acc engine, none of that is used by haiku currently, so it should not crash haiku if whatever stalls on that engine.

On Nvidia what you mention also applies. Though I did not use interrupts there. For 3d I never waited for engine idle of course, that would not be clever. However, if you want to set a 2d mode from haiku, it’s probably best to do that only if the engine is totally idle, as memory and front buffer are about to be re-arranged.

Update: hmm come to think of it, waiting might be precisely the reason why acceleration did not speedup haiku drawing compared to software drawing. I saw app-server calls that hook after use of every 2d primitive. (Pure observation of behaviour). I think I will re enable 2d acc for some personal tests here at some point.

X512 · December 4, 2021, 10:10am

Isn’t it a job for setting mode hook?

rudolfc · December 4, 2021, 10:14am

Maybe. This is not how Beos did it though afaik: app-server stops engine use and then calls modeset.

How and when all hooks are called on Beos is very interesting to monitor closely. It was quite clever in my opinion, with of course the limited knowledge I have .

X512 · December 4, 2021, 10:21am

I think that sync to token API can be used. And ring buffer for GPU accelerated commands.

X512 · December 4, 2021, 10:28am

Accelerant API need some redesign anyway:

Multiple monitor support.
GPU memory management.
DMA operations between buffers (can be considered as engine).
Specify buffer when setting mode instead of allocating it internally.
API for swapping buffers.
Reconsidering host/clone.

rudolfc · December 4, 2021, 10:32am

I agree of course

X512 · December 4, 2021, 10:46am

I see that intel_extreme.accelerant use ring buffer for engine, but do not use fences in sync to token, just wait for idle.

rudolfc · December 4, 2021, 10:50am

Ok, I did not look at the finer details here, but then if speed of acc was measured with this driver then it would be a non valid comparison.

BTW I used an app called BeRoMeter 1.2.6 (I think) to benchmark acc in the old days on BeOS with the Nvidia driver.

Edit: the ring buffer is down in newer cards as well (Intel) I expect.

X512 · December 4, 2021, 2:20pm

I implemented interrupt ring and make it handle fences update. Interrupts itself are not yet handled, interrupt ring is pooled by timer.

GPU use ring buffer for interrupts too, writing interrupt event packets (VM page fault, ring fence completed, display connected/disconnected (display hot plugging) etc.). Interrupt ring is working in opposite direction compared to GFX/DMA ring: it is written by GPU and read by driver.

IH: fWptr: 12992
InterruptPacket(181, 0, 0, 0)
handlerItem(handler: 0x115e8760e79, arg: 0x11a9a8591020)
RadeonRingBuffer::UpdateFences()
Fence(426): resolved

Diver · December 4, 2021, 2:49pm

Great, one step forward! Does it fix anything in particular or improve speed in some tests?

X512 · December 4, 2021, 2:53pm

Not really, but reduced pooling to one place. Speed was improved a bit with removing unneeded snooze(1000) in GFX ring reset hack.

I implemented basic RLC (ring list controller as mentioned above) block handling, but it not solved GFX ring reset problem for now.

SCollins · December 5, 2021, 1:18pm

If the api needs changes, just do it. It’s not like there are any other 3d cards depending in it afiak.

The drivers we might have could be adjusted, if we have any at all

X512 · December 5, 2021, 3:10pm

I finally fixed GFX ring problem by adding additional tracing to Linux andgpu driver andcomparing results. The problem was different units in RPTR and WPTR registers: uint32, not bytes. DMA and GFX ring use different ring position units. Now GFX ring works with non zero RPTR and wrapping.

rptr: 0x1e00
wptr: 0x1e00
*fRptrAdr: 0x780
*fFenceAdr: 6205
Fence(6206): resolved
rptr: 0x1f00
wptr: 0x1f00
*fRptrAdr: 0x7c0
*fFenceAdr: 6206
Fence(6207): resolved
rptr: 0
wptr: 0
*fRptrAdr: 0
*fFenceAdr: 6207
Fence(6208): resolved
rptr: 0x100
wptr: 0x100
*fRptrAdr: 0x40
*fFenceAdr: 6208
Fence(6209): resolved
rptr: 0x200
wptr: 0x200
*fRptrAdr: 0x80
*fFenceAdr: 6209

andreas_dr · December 5, 2021, 3:30pm

Also curious here about performance improvement, If I may ask

X512 · December 5, 2021, 4:03pm

Changed memory clock settings with ATOMBIOS, engine clock change don’t work for now.

screenshot70

andreas_dr · December 5, 2021, 4:05pm

Oh you doubled speed since first “TestApp - Haiku” post

Super nice.

X512 · December 5, 2021, 4:10pm

76 FPS with engine clock changed. 26 FPS in SSAO demo. Now it get significant benefit over software rendering.

andreas_dr · December 5, 2021, 4:11pm

This sounds amazing!!!