Replacing AGG with Blend2d

BeardPower · August 7, 2018, 10:33am

Hi,

As AGG is pretty old and it’s creator, unfortunately, passed away years ago, I wanted to ask if there are any ideas or incentives in replacing it with a more modern framework.
I think Blend2d (https://blend2d.com/) would be a great option (it’s implemented in C++ and will get a C wrapper API soon).
Future features will include multi-threading and GPU acceleration.

I compiled the Alpha version some time ago on Haiku and it’s working great already. The performance on Haiku is similar to Windows/*BSD/macOS/Linux and negligible.

What do you think?

Thanks!

cb88 · August 9, 2018, 5:23am

What does the age of the code have to do with anything?

You’ll find just about anyone commenting on having used the AGG code says it’s nice code.

It’s also unlikely that an upstart project like blend2d will surpass it’s quality much less while in beta still. AGG is highly regarded for it’s rendering quality.

AGG is also very fast… Haiku’s UI rendering is faster than many accelerated UIs on other OSes.

Switching to something other than AGG would just be a setback for the rest of the project (It would probably break Web+ for instance).

Note there is a fork of AGG here that is being maintained, AGG is also used in some serious software like pdfium and the .net imaging framework etc… and Haiku of course. https://sourceforge.net/p/agg/svn/HEAD/tree/agg-2.4/

One thing that might be worth slotting into a GSOC spot is updating AGG to support AVX2/AVX512 etc… I would imagine that would be a very doable project as well for a GSOC student.

Haiku doesn’t even have actual accelerated graphics drivers (some modest 2d acceleration, nothing 3d yet) hopefully someone picks that up and works on it sooner or later.

PulkoMandy · August 9, 2018, 6:53am

I’d rather let the compiler do AVX, etc on its own. The AGG code is written to make it as compiler friendly as possible already.

We do have 2D “acceleration” in some drivers, but we have disabled it, as it actually made things slower in many cases (mainly because we do most of our drawing offscreen). The only thing we support is hardware cursors, and not even in the most frequently used hardware drivers.

BeardPower · August 9, 2018, 7:11am

A lot. For example, the code is pretty frozen and not maintained, not using any of the more modern C++ features, designed as a software renderer.

Don’t get me wrong, Maxim did a fantastic job with AGG and I’m not criticizing its code quality.

Did you try it, because the rendering quality is on par with AGG?

There is no question about that.

Sure, I agree with that. It would a post-release task and more of an experiment right now.
But why would it break Web+? Is it using it directly and not through some back-end agnostic API?

Thank you. It seems it’s not much actively developed.

Sure, that would be possible and nice to have (REBOL is using AGG and would also profit from it).
The thing is, that Blend2d is designed for and using SIMD, has a JIT, async rendering and is/will get multi-threading but you would need to “hack” this all into AGG.

Yes, and 2D acceleration is good enough as the rendering backends/UI rendering just need 2D (vector graphics) and not using any fancy 3D stuff anyway.

GPU acceleration is all but an easy task because of a.) limited access to the GPU technical papers and specs b.) driver fragmentation (which driver model should be used: Linux? *BSD? Fuchsia/Zircon?). Chances are high that Fuchsia will receive GPU drivers once they decide to target the Desktop as well and it shares more with Haiku than the other solutions.

PulkoMandy · August 9, 2018, 1:11pm

This is a great thing for us, because we use gcc2 and software rendering. It is then perfectly fit for our needs.

BeardPower · August 9, 2018, 1:19pm

From a stability PoV, I totally agree.

waddlesplash · August 9, 2018, 10:38pm

“Modern C++ features” are mostly for the developer’s sake, not the compiler’s sake. If the code works and is maintainable without them, they aren’t needed. Indeed, we are not using many of these in Haiku itself, so it doesn’t really matter.

AGG is known for its unmatched subpixel-precise rendering. I see the Blend2d page claims it is “as good as AGG”, but is it?

“async rendering” probably just means it uses an alternate thread rather than making the end-user create an alternate thread. We already do this in the app_server anyway, so it doesn’t matter for us.

Clang (and modern GCC is starting to, also) does pretty good vectorization out of the box. I doubt you could get massive performance wins by just “using more AVX” without some architectural redesign, like a JIT, which indeed it seems Blend2d has done. But that also sounds incredibly complicated for a 2D renderer…

The only real way to get “truly massive” performance wins at this point would be to use hardware-accelerated rendering, not just 2D blitting or other “2D acceleration” commands, and this would require OpenGL, probably.

Intel’s GPU driver programming guides are available publicly, containing everything from command descriptions to register layouts, and then there is the open-source Linux graphics drivers which Intel themselves maintain. The same is mostly true for AMD. The only company without very much GPU documentation at all is NVIDIA, and even then there is Noveau which seems to be figuring things out anyway.

We can and have been using these for the development of our own modesetting drivers.

The BSDs re-use Linux drivers, to the point where they actually wrap the Linux APIs to use the drivers unmodified. Zircon seems to be developing their own Vulkan-based graphics stack from scratch, using publicly-available documentation, i.e., they are not going for native OpenGL or the “whole nine yards” that Mesa provides. We will probably take the BSD path here.

They already have GPU drivers in the works, and from looking at their source tree, they already run their graphics stack accelerated with them, at least for Intel chipsets.

No it does not. Zircon is a microkernel, Haiku is a monolithic kernel. We share much, much more in common with Linux and the BSDs than we do Zircon by a long shot. The only thing both Haiku and Zircon has that Linux and the BSDs don’t is the use of C++ in the kernel, but in this regard, that counts for almost nothing.

Further, and more importantly, it seems blend2d is actually not even open source yet: Source? · Issue #1 · blend2d/blend2d · GitHub

Please do some research before making baseless claims that a major component of the project should be completely reengineered.

BeardPower · August 15, 2018, 12:10pm

Thanks for the detailed response! It’s much appreciated!

Sure, isn’t that exactly why they should be used then, easing the work of the coder?
I agree, that existing code, which is not developed further, does not need them.

Yes. Please see the feedback from the developer at the bottom.

Please see the feedback from the developer at the bottom.

Sure. I was referring to using the capabilities of modern GPUs and was just pointing out, that it does not need features like a UI in 3D-space, just orthogonal projection.

Yes, AMD and Intel are releasing their specs, but not Nvidia (and maybe many others), so access is limited and this is an issue.
The more companies open up their technical documentation the better. I think we all agree on that one.

Yes. Of course, it’s nice to use an existing base for drivers, but the question is, if it’s the best solution for the architecture of Haiku? Using these driver models inherits all the quirks and issues they have and that’s maybe one of the reasons Fuchsia is creating drivers from a clean slate.

Yes, there are some drivers, mostly mobile chipsets.

I was referring to the origins of the BeOS, Haiku and Fuchsia Kernel and the “x is not Linux/BSD” approach.
Yes, having C++ in the Kernel can lead to road-blocks when it comes to FFI and integration/portability.
It’s also debatable if Haiku is a monolithic Kernel or a hybrid Kernel (a lot of resources claim the former or the latter), which was discussed here and in other places adequatly.

See below.

Sorry, but I cannot follow you on that one.
First, I did enough research on this topic to make a judgment on the pros/cons of AGG/Blend2d. I actually tried it myself. I’m not speculating on the quality of a project without actually trying it myself, contrary to others.
Second, I never claimed, that a major component should be re-engineered! The reason for this thread was to start a discussion about the rendering backend of the UI. About its pros/cons and if it makes sense to replace it with some modern framework. Anyway, it was by no means a “demand” to change the component or claiming that AGG is bad (on the contrary, I love AGG and it’s great).

I discussed some of the concerns mentioned here with the Blend2D author and here are the Blend2D vs AGG answers directly from him:

Rendering Quality

Blend2D uses the same analytic rasterization approach as AGG & FreeType so the quality is visually comparable, but not pixel equal. The reason is that Blend2D always renders from top-to-bottom whereas AGG/FreeType can emit cells also in a bottom-to-top direction depending on the orientation of the line. Blend2D guarantees that a single path rendered twice in both directions would yield identical results. In my own measurement and Blend2D vs AGG comparison tool the alpha coverages/masks produced by Blend2D rasterizer could be max ±4 off compared to AGG (in 0…255 range). Since I have tested more libraries including Qt and Cairo this was the closest that I could get. I think Blend2D is more precise in the mathematical sense as it guarantees top-to-bottom vs bottom-to-top stability, but it’s not distinguishable by human eye.
Bottomline: I don’t understand the quality concern - the quality is on par with AGG.

Rendering Performance

Blend2D is much faster than AGG because it was engineered for performance. I have written around 4 rasterizers before I picked the one that I use now and I know few tricks that I can use to accelerate rendering of small art that in general would require only small cell buffer. Here are the most interesting Blend2D advantages:

Blend2D uses a small memory pool that maintains a zeroed memory that is used as a cell buffer.
Cell-buffer is not SPARSE (like in AGG/FreeType/Qt) - it’s a DENSE continuous buffer representing the whole band (16 or 32 scan lines). This means that indexing is super trivial and 2 cells representing the same coordinate would never happen compared to AGG.
In addition to a cell-buffer, Blend2D uses a bit-buffer to mark which cells were modified.
Compositor scans bit-buffer for bits that mark pixels to composite and clears both cell-buffer and bit-buffer during composition. This is super fast as it uses bit-scan instruction to do the scanning.
Compositor always processes the whole BAND and this is what is JIT compiled - it inlines composition operator, source fetcher, and mask fetcher (rasterizer). This means that the rendering usually performs 1 function call per 32 scanlines, which is an amazing job compared to all other rendering libraries that usually work at scanline level and happen to perform many function-calls to fetch pixels / generate spans / etc…
Since rendering happens in bands Blend2D knows exactly how much bytes it needs for cell-buffer and bit-buffer and these are always allocated before the rasterizer/compositor is used. This means that the rendering input could be of any complexity. Complexity is a problem in SPARSE buffers as when it grows the SPARSE buffer advantage quickly turns into a disadvantage which I discovered when comparing Blend2D against Qt.

Bottomline: I cannot efficiently compare this in a few bullet points. I spent months to write the rasterizer and to test it and this is, in a nutshell, the main difference. I’m planning on writing an article about the rasterizer and I also thought about writing a research paper about it as the design of both rasterizer and compositor is very innovative and you will hardly find it elsewhere. The rasterizer can rasterize millions of lines in the same path and it will still deliver superior performance - I have some very interesting results that compare with other libraries.

Other thoughts

Autovectorization of AGG code (SSE2+/AVX2+) - Good luck with that. Most of the AGG code is sequential and especially the whole rasterizer and compositor is written in this sense. I actually think it was not bad design 15 years ago, but now since even a mobile phone has SIMD this design is no longer the high-performance one. I doubt this would be for GSOC as well as to properly vectorize AGG you would really need to know how it works and how to fix things that prevent it. So it’s not about writing some SSE/AVX code, it’s about fixing the code so it can work better with SIMD.
Async rendering and multithreading - Async rendering (I sometimes call it deferred rendering as well) is a rendering that doesn’t block, and multithreaded rendering is rendering that uses 2 or more threads to accelerate it (Multithreaded rendering implies async rendering in Blend2D sense). I plan both and I have already experience with both from a past project. I created a multithreaded rasterizer that was based on AGG in the past, but I think I can do even better with Blend2D.
Blend2D is not open-source - This is a valid point at the moment. I gathered quite extensive feedback from the alpha release and I’m working hard to prepare the beta so I hope this argument gets invalidated soon.

Zenja · August 15, 2018, 7:11pm

Nice write up Beard Power, and good luck with your project - it is clearly visible that you have done lots of research developing this. Realistically, at this stage Haiku has a very good working renderer, and the devs are focused elsewhere. For Haiku to adopt something else would require someone to do actually implement/integrate the changes, and show the final results (both visual quality and benchmarks). If the core agrees, Blend2D would become the new renderer. Seeing that you are the developer of Blend2D, you’re the best person to attemp this. Good luck.

BeardPower · August 15, 2018, 7:35pm

It’s not my project!
As noted above: “I discussed some of the concerns mentioned here with the Blend2D author and here are the Blend2D vs AGG answers directly from him”.

I’m in constant contact with him, though.

Sure, I will look into benchmarking outside of the renderer and how integration can be done.

I’m not the developer of Blend2d, but I’m one of the Alpha testers.

CodeforEvolution · October 20, 2019, 1:37am

Just an update to this discussion, it seems Blend2D is officially in beta now and its code is now open on Github. Even if libagg is fine for what we need in a drawing library at this point, maybe in the future, one could investigate Blend2D as to avoid maintaining (what technically was at one point) an extra code base and avoid (further) code rot (I may do some of my own investigating if I have extra time, no promises though with my busy schedule ). You may also want to take a look at this when looking at the performance of Blend2D compared to libagg: https://blend2d.com/performance.html

PulkoMandy · October 20, 2019, 10:25am

“at the moment Blend2D uses only 128-bit SIMD in JIT compiled code. It can still use AVX instructions to eliminate unnecessary moves that are present in SSE2+ code, however, it only uses 128-bit XMM registers”

Wouldn’t this be a problem for the non-SSE2 machines? How does it compare there?

Also, I tend to not trust benchmark pages where the project self-advertises as being better in all cases than everyone else. There must be drawbacks somewhere? What are the cases where it doesn’t turn out so well? Without this information, what we could do is add an alternate rendering path to app_server with a different drawing engine to experiment things by ourselves, and run our own benchmarks under a “real world” situation (which these tests clearly aren’t, look at the sample output which is full of alphablending and gradients everywhere, quite different from drawing patterns in Haiku where it’s mostly opaque stuff, with alpha used sparsely at the edges for antialiasing).

Also, we don’t just use stock agg code, we have some custom rasterizers for special-casing the identified easily optimizable cases, so comparing with sotck agg isn’t accurate, we should compare to the app_server code with these optimizations used.