X.org Driver Porting

haskellcurrying · September 18, 2019, 11:34pm

Hi Haiku Community,

With Beta 2 getting closer(plus the 23y ani of BeOS), I thought it would be fun to build a period appropriate(nothing newer than a PIII, SDR RAM, no graphics with the “geforce” or “radeon” moniker) machine as a side project. I sourced a machine that would be perfect… with a trident “graphics” (hehe) sized flaw.

I was digging through the Haiku git and noticed the lack of drivers for it. After recovering from the shock I thought I would try my hand at porting the Xorg drivers over. Obviously no whitepaper from trident, but I went through the haiku dev guide as well as doing some poking around the i810 driver and it seems like I might be able to handle this (famous last words).

That said, the closest thing to a driver I’ve ever written was for a bluetooth le device and that’s quite a bit more abstracted, so I’m looking for help getting my bearings. Is there perhaps any information/reference that maps the xorg functions to the those expected by haiku? I can’t imagine there’s an IDE with code completion, but perhaps some decent bootstrapping information not in the main guide?

I know people saying they’re going to write a driver for hardware they didn’t make on the internet is about as common as photos of cats, but I’m trying to keep the project small, focused, free of 3D, and really just for my one card. Basic 2D acceleration would be more than awesome.

Thanks for the help!

cb88 · September 19, 2019, 1:55am

Well it is imminently doable… but you should probably plan on getting a basic framebuffer working as a start. You probably won’t be able to take the xorg code peacemeal… at all.

There is Netbeans… not sure if code completion works on Haiku. Also there is a port of EMACS if you lean that way.

Also the VESA driver probably already works on that card just with no acceleration whatsoever. Also, I’m not sure the trident cards really accelerate much anyway…even 2D. Its probably worth mentioning that trident cards are older than the PIII, the original radeon or geforce would be more period correct… and actually accelerate things somewhat given the effort to write a driver for them.

CodeforEvolution · September 19, 2019, 1:59am

Welcome!

Unfortunately, our documentation for creating a graphics driver is only partially translated in English and the original is in Dutch (I’m pretty certain…).
What documentation is translated can be found at: https://www.haiku-os.org/legacy-docs/writing-video-card-drivers/

Otherwise, instructions for bootstrapping Haiku is here: https://www.haiku-os.org/guides/building

And a skeleton driver is available at: https://github.com/haiku/haiku/tree/master/src/add-ons/kernel/drivers/graphics/skeleton and https://github.com/haiku/haiku/tree/master/src/add-ons/accelerants/skeleton

Not to mention, as @cb88 said, the VESA driver would be a good reference driver to look at too.

Another recommendation for a good code IDE is called Koder, it’s Haiku native and at least has syntax highlighting and some other useful tools.

cb88 · September 19, 2019, 2:07am

I think this is the latest most up to date version of the driver guide by Rudolf http://www.rudolfs-place.nl/BeOS/NVdriver/index.html (link at bottom of page).

He also has the Geforce driver he wrote there which is proably the most advanced BeOS driver for graphics period… unfortunately it only supports some of the original geforce cards, and I think up untill the cards quit having a compatibility mode in them, so no modern cards are supported.

Haiku is much in need of 3D drivers… but that’s a ton of work, as far as I know, the best starting point would be studying how the BSD’s provide wrappers and shims for Linux APIs and writing a similar layer for Haiku (quite complex probably man months of work to complete). Waddlesplash has plans for this but he is only one man after all, and spread quite thinly bashing various bits of Haiku into shape. If you can get your trident card working with some sort of 2D acceleration, this might be a worthy 2nd step to take…

PulkoMandy · September 19, 2019, 7:01am

That would be for wifi and it was in german. The doc for video card drivers were in english from the start and as far as I know, complete enough to get a driver up and going. And maybe when you get there, you can even write the missing chapters.

Also, on a sidenote, Haiku is not BeOS, and I would not consider such a machine “appropriate”. Haiku will be slow, may not even boot if you don’t have enough RAM, and certainly the web browser won’t run. You may hit many other problems with quirky IDE controllers, ACPI tables, … before you even get to the video card. Have fun, and let us know what happens!

CodeforEvolution · September 19, 2019, 3:06pm

Whoops, I forgot that Rudolf did some of his own work to update the guide of the Haiku website more recently (I had a work in progress effort myself, however, it was taking a while since I was also trying to convert the Html of the guide to Markdown).

haskellcurrying · September 19, 2019, 7:53pm

Wow! Thanks for all the responses/help/support everyone.

I thought the documentation was quite good and Rudolf thesis really got me to a place where I think I might pull this off. In his documentation he suggests picking a good starting driver when you’re light on docs and there are framebuffer as well as xorg xaa and exa drivers for these cards, so I’ve got a few sources to tap.

That said the two things that don’t immediately seem obvious are:

(1) A deployment/testing strategy. I figure I could dev/build the driver in a VM, deploy the thing on a thumb drive, and then just test on real hardware in the live configuration. That said if the screen is just black (or is unreadable) then that isn’t much to go on. I could SSH into the live machine, but perhaps there are some better approaches?

(2) Since I’m light on docs, It will more than likely be the case I’m going to look at some, for example, initialization code, which write to a device on the AGP Bus in some non-obvious way, with no real understanding of what is happening on device or if there is a required wait period for that operation, or who knows what. Moving that over from architecture to another is likely going to get a bit hairy. This brings me back to a question I opened with, is there some informal/rough equivalence between some subset of functions in the linuxfb/xorg codebase and the driver/accelerant in haiku. I realize they aren’t the same and a direct port is impossible, but broad strokes would be appreciated.

Just to say this again, I’m honestly not trying to do anything amazing here, but rather have a bunch of fun doing something interesting. My goals are to get the hooks in the kernel driver going, get the accelerant core up and running, get blit/fill/inverse/etc running for my one device and call that 1.0. No multihead, no 3d, probably no overlay unless that causes serious performance regressions and I might even ignore all resolutions/bit depths/refresh rates except the native/relevant ones for the device’s (its a laptop) panel. I’m budgeting 2000-2500 lines of code and about 3 months of work.

@cb88 @PulkoMandy
If you discount the improvement from SSE2 (and later AVX), the IPC of those later P-III is actually more comparable to modern machines than you would think. Plus the chipset on this machine happens to support quite a bit of ram (3GB!!!). That said, I’ve got no illusions of speed or that I will be using this thing in any serious way. Similarly, unlike almost everything else about them, the 2D acceleration and power consumption for the integrated trident cards is actually quite good and lived on long after trident did in the server space.

waddlesplash · September 19, 2019, 8:12pm

This is just … not true?

But anyway, what’s wrong with VESA mode? Theoretically Haiku will just boot already if the card supports VESA.

haskellcurrying · September 19, 2019, 8:23pm

I didn’t want to imply they weren’t large, just not as large as you would think, but we can also agree to disagree! And there is nothing wrong with vesa mode, I’m just building a toy for fun and want it as shiny as I can get it.

PulkoMandy · September 19, 2019, 8:42pm

There’s no problem having fun with this, I just wanted to be clear that these old machines are not the main target for Haiku (who am I fooling when I’m porting it to a 1998 SPARC machine?)

Of course you’re welcome to have fun with this and experiment with that driver, but for example, be aware that the app_server will not even try to use the 2D acceleration functions (blit, fill, etc) because on modern hardware these are either missing or slower than the CPU in our current design. So overlay and multihead would in fact be a lot more relevant.

On to the testing strategy: usually a serial port is what you would use to access the kernel debugger. The intel_extreme driver registers its own command (ie_reg) to read and write video card registers easily. This allows you to experiment manually with register values and maybe unroll the initialization process by hand or just try some finetuning.

Ideally you would deploy your driver by booting a disk image over the network (pxe+our custom disk image server), but last time I tried that a few years ago, I could not get it to work, so currently your main option is indeed writing a disk image to some media on the machine and booting that.

As for mapping xorg functions with ours, having no idea how xorg driver are structured, I would not know much about it. Let’s hope they left some comments on what they are doing and you can infer how the card works from these?

haskellcurrying · September 19, 2019, 10:12pm

That is quite showstopping for this project, is there a switch I can flip or maybe recompile a (hopefully small) section of the app_server with a flag changed that would allow me to use hardware 2d acceleration? I realize when you’ve got cores for days doing that on a cpu is not a big deal. However, when you are processor staved, even if it is a bit slower, using specialized hardware will obviously be advantageous. Really the excepted power draw from the chip less than 1watt and I think that is with the “3D” running, so I’m more that willing to be a bit slow if I can take advantage.

As for the SPARC, that’s the machine I learned to write software one at uni! You weren’t allowed to use your own machine with an IDE. You could use the SPARC III machines in the lab, or you could SSH into the SPARC server, but your writing implements vi/emacs/pico.

cb88 · September 19, 2019, 10:44pm

No because the perception was at the time that CPU rendering could get the graphics on the screen faster, this is probably almost certainly no longer true… however it would require major rework to change this assumption. It’s an assumption that probably still holds true for the GPUs you are targeting…

Also 2D acceleration was relatively inflexible…so it may not help much with some of the rendering that Haiku does.In theory complex font rendering and the UI could actually be directly rendered by the GPU these days and that might improve latency even beyond what the CPU can achieve but again that would be alot of work.

haskellcurrying · September 20, 2019, 12:13am

I totally understand the design decision, but I guess we are at a no go point for this. That’s too bad. Thanks to everyone for taking the time and helping. I really appreciate it, honestly I do.

cb88 · September 20, 2019, 12:54am

Well the dedicated driver would buy you a few things which Haiku does need…multi head for instance is missing on most drivers all the things VESA doesn’t implement…

Diver · September 20, 2019, 7:17am

This seems to be it:

github.com

haiku/haiku/blob/master/src/servers/app/drawing/interface/local/AccelerantHWInterface.cpp#L62


      
          
          
          using std::nothrow;
          
          
          #ifdef DEBUG_DRIVER_MODULE
          #	include <stdio.h>
          #	define ATRACE(x) printf x
          #else
          #	define ATRACE(x) ;
          #endif
          
          #define USE_ACCELERATION		0
          #define OFFSCREEN_BACK_BUFFER	0
          
          
          const int32 kDefaultParamsCount = 64;
          
          
          bool
          operator==(const display_mode& a, const display_mode& b)

Also see this ticket where it was discussed #2769 (Radeon driver is Slower than VESA ...) – Haiku

waddlesplash · September 20, 2019, 1:12pm

It seems like it would be, but as noted in the ticket Diver links above, it wasn’t:

herdemir:

After your suggestion above, I tried to measure the CPU usage, if VESA or Radeon driver was using the most. But I didn’t expect that VESA would beat Radeon driver. Since Radeon driver is using hardware accelerated 2D drawing so, I was expecting that Radeon driver would use less CPU than VESA. The test I did was very simple, I opened Activity Monitor and a Terminal window. Then I constanly moved around Terminal window on desktop and observed the CPU usage.

Using with Radeon Driver: Generally %51 ( %49 - %57)
Using with VESA Driver: Generally %32 ( %30 - %36)

(I did the tests at the same resolution: 1280x1024) Screen shots will follow …

As for your suggestion, i think you should apply it, since it will use less CPU(%15-%20 lesser) then the current setup and more smoother drawing IMO. Of course it would be much better if it were to be configurable for slower systems, since it depends on CPU speed.

So using the graphics card for blit, etc. operations actually increased CPU usage on such old hardware!

The code is all still there, as Diver also linked, but it has remained disabled ever since for that reason.

cb88 · September 20, 2019, 3:33pm

That only does some extremely limited acceleration… basically only 3 cases are potentially accelerated. Which is also probably why the acceleration implemented was slower than doing it on the CPU… as it was just doing software rendering with extra hoops to jump through anyway instead of actual hardware rendering.

fAccFillRect = (fill_rectangle)fAccelerantHook(B_FILL_RECTANGLE,
(void *)&fDisplayMode);
fAccInvertRect = (invert_rectangle)fAccelerantHook(B_INVERT_RECTANGLE,
(void *)&fDisplayMode);
fAccScreenBlit = (screen_to_screen_blit)fAccelerantHook(
B_SCREEN_TO_SCREEN_BLIT, (void *)&fDisplayMode);

waddlesplash · September 20, 2019, 4:38pm

What do you think the fixed-function pipeline is? There are “extra hoops” to jump through to use it, yes. Only 3 of its operations were used here, but 3 > 0, right?

The reason it was slower then is probably due to the AGP bus running at 88 MHz (iirc?) which obviously is slower than a low-GHz CPU in terms of command sequencing, etc. Which is why most graphics cards of that era are non-useful for this stuff.

cb88 · September 20, 2019, 5:40pm

There is some functionality in older 2D accelerators to also do font rendering and line drawing acceleration, rect/poly fills etc…but yeah… that’s all I meant. If you are going to do complex font rendering on the CPU the point is moot…there are GPU font renderers now that may change this assumption though since modern GPUs are programmable enough to do it on their own.

AGP 1x = 266 MBps and like you said this would be much lower than system ram bandwidth unless the GPU can do most of the work, attempting doing lots of uploads and blits can be dumb. FPM ram =177MBP/s EDO=266MBP/s SDR SDRAM = 1GB/s… in those systems main memory and AGP must talk to the CPU over the FSB…through the chipset which is a little different also.

PulkoMandy · September 21, 2019, 7:19am

Well this did raise some eyebrows back then so of course we studied it carefully before making that decision. The combination of factors that leads to this surprising result is as follows:

We use antialiasing for all drawing operations. This means the 2D accelerated line drawing, etc are quite useless to us, because they are not antialiased
We use double buffering to limit flickering when redrawing
Since we do the antialiased drawing with the CPU anyway, the CPU makes quite a lot of access to the buffer, and since it’s antialiased, it does a lot of writes, but also a lot of reads
If we want to use the blitting and rectangle filling of the video card (these handle a lot of pixels, so they would be quite useful even if we can’t use line drawing, etc), we would have to move the backbuffer into video card RAM (of course the blitter doesn’t work on main RAM)
But, it turns out the AGP bus is designed for fast writes. The idea was that you would send a bunch of things to the video card (textures, 3D commands, framebuffer) and very rarely need to read from it (maybe a status register when waiting for retrace or setting a video mode). So, AGP is designed to allow fast writes, but not-quite-that-fast reads

The combination of the above: CPU doing a lot of reads + buffers in VRAM + AGP being slow at reads leads to the slower rendering when using 2D acceleration.

Now, things would be different if we could offload all the drawing to the video card. This would require something like OpenVG, and the app_server will be quite easily adjusted to that, once we have the “3D” part of video cards up and running and providing an OpenVG backend (or something similar). The software-drawing code in app_server is very self-contained, in fact it can already be replaced, for example when you use remote app_server, it is replaced by a thing that sends the drawing commands to the network. And when you use the html5 app_server, it is replaced by a thing that write the same commands into a websocket. There is no reason we cannot as easily add an OpenVG backend to that. If you don’t want to wait for Haiku to get 3D drivers, you could even implement the remote app_server protocol on a machine running another OS and use OpenVG (or whatever other accelerated drawing library you wanted to) there.

This is something we will experiment with when we have 3D acceleration up and running. It is sufficiently different from the 2D acceleration we are talking about here, that a new run of benchmarks has to be made before we can decide if it’s worth enabling.