World's fastest Haiku box

A point worth repeating. Obviously we’d all love to have native software good enough to rival any ported offerings, but in practice that is simply impossible.

Given the limited amount of developers, better to put the effort into software were a native version can really shine. Perhaps clockwerk is such a piece of software, I don’t know what the open source competition is in this field and how much it could make use of Haiku’s possibilities.

Webpositive is a nice proof-of-concept, and certainly overall better than the now 5+(?) years old BezillaBrowser, but even here I can’t help but to think that an up-to-date port of Firefox/Chrome would be of much better use to end users. That said, since there is/are no such port(s), obviously Webpositive fills a need.

However, even with ports (and let’s not kid ourselves, Haiku will be heavily dependant on ports in the foreseeable future) there are often possibilities to make the port appear/feel quite native.

[quote=NoHaikuForMe][quote=AndrewZ]
I disagree here. I think the massive threading in the OS makes it a good candidate for ‘many-way’ parallelism and I hope to release some code for testing and benchmarking. I do agree that the kernel will need some improvements. Like for instance I think the scheduler could use affinity. Right now I think I see a single load bouncing between two cores. That is probably sub-optimal.[/quote]

Soft-affinity would be an improvement even on the very common dual-core systems. But there are lots of other problems. Locks for example. To scale from one CPU to two or more, the locks ensure that access to critical data structures and code paths is serialised. But this serialisation becomes a bottleneck for the most frequently accessed structures as you scale up further.[/quote]

Its not just a problem for Haiku, its a problem for all operatingsystems pretty much. Thats why gpu compute is so promising for big heavy dsp type of loads.

indeed on Linux it was the BKL (big kernel lock) which was introduced when SMP support was added and only recently has it been removed and replaced with mutexes I believe.

From what I gather a mutex is better than a lock as it doesn’t stall other things it only protects acess to the component that requires serialization?

Locks for example. To scale from one CPU to two or more, the locks ensure that access to critical data structures and code paths is serialised. But this serialisation becomes a bottleneck for the most frequently accessed structures as you scale up further.

Unlike Linux, BeOS was multiprocessor since 1.0. Can you give an example of a lock that is causing a bottleneck in Haiku now?

[quote=AndrewZ]Locks for example. To scale from one CPU to two or more, the locks ensure that access to critical data structures and code paths is serialised. But this serialisation becomes a bottleneck for the most frequently accessed structures as you scale up further.

Unlike Linux, BeOS was multiprocessor since 1.0. Can you give an example of a lock that is causing a bottleneck in Haiku now?[/quote]

Sure whats the time to clear the lock ? IE how much time is spent in terms of clock cycles on aquiring and releasing locks ? its about %20 of a threads cpu demend IIRC.

Look into Amdahls law.

Anyways I think gpgpu is a way better way to go, it doesn’t make sense to parrellel alot of operation becuase the overhead exceeds the performance benefit. the things we need to process faster anyways can be done better on a gpu style processor anyways and given the changes that AMD and Nvidia plan to bring to the table soon, programming for them will get alot easier in the future.

I’m not sure we are talking about the same thing here. I am looking for a specific example of a bottleneck in the Haiku kernel that affects SMP today…

For what, cracking passwords? Running magma simulations? GPGPU processing is fairly specialized at this point. You can only offload very specific tasks, mostly math. It’s not as useful as having a second, third, or fourth core. [/quote]

[quote=cb88]indeed on Linux it was the BKL (big kernel lock) which was introduced when SMP support was added and only recently has it been removed and replaced with mutexes I believe.

From what I gather a mutex is better than a lock as it doesn’t stall other things it only protects acess to the component that requires serialization?[/quote]

“lock” and “mutex” are synonyms in this context, they’re both just names for a method to control access to a data structure or subroutine, usually on a voluntary basis (ie the programmer is responsible for ensuring they “take the lock” before accessing the protected data).

The BKL is a strategy used in many once uni-processor Unix systems to bootstrap SMP support, it is sometimes also called a “giant lock”. All access to the kernel is initially wrapped with a single lock (the BKL) so that only one CPU at a time can be running kernel code, but as many CPUs as are available can be running userspace code.

Although this is a good start, it’s a long way from ideal. Large improvements can be made by releasing the BKL in common code paths and taking a mutex that’s more specific. For example by having separate locks for SCSI controllers and TCP/IP, one CPU can be sending a TCP/IP packet while another is reading data from disk. Having a separate mutex for each network card would allow several CPUs to be sending a TCP/IP packet simultaneously, at the cost of making the networking code slightly more complex. The term for using each mutex to protect only a very specific codepath or data structure is “fine-grained locking”. Such work began on Linux almost straight away in the Linux 2.0 kernel series.

In the last five years or so there haven’t really been many more opportunities to improve performance through fine-grained locking in Linux. The BKL retreated to the dustier corners of the kernel, like VT switching (when you press Ctrl-Alt-F2) or ioctl() calls. It was no longer a bottleneck, and for a while it seemed as though it was mostly harmless.

Ultimately replacing even these more obscure uses with various finer grained locks was desirable because the BKL makes interdependencies within the kernel trickier to understand. It’s a recursive lock and it’s automatically released when sleeping, so determining whether the lock is always held when calling a particular routine could be very difficult. Removing it altogether completed the job which began when it was introduced, Linux is now probably the most scalable OS kernel that ever existed.

To get there though, something quite different had been happening meanwhile.

[quote=AndrewZ]
Unlike Linux, BeOS was multiprocessor since 1.0. Can you give an example of a lock that is causing a bottleneck in Haiku now?[/quote]

My old boss would say “That’s a problem we’d like to have”. Meaning of course “The problems we have now are much worse”. If you ran Haiku on a 16 CPU system with a workload that wasn’t inherently CPU bound, the first few obstacles you’d probably run into don’t have anything to do with locking, or at least not explicitly.

But once you have cleared those obstacles you’d start to hit things like the extensive reliance on a single block cache per volume with a mutex.

[quote=NoHaikuForMe]If you ran Haiku on a 16 CPU system with a workload that wasn’t inherently CPU bound, the first few obstacles you’d probably run into don’t have anything to do with locking, or at least not explicitly.

But once you have cleared those obstacles you’d start to hit things like the extensive reliance on a single block cache per volume with a mutex.[/quote]

Yes but at least for now desktop users do not have to worry except for Intel i7 EE which is 6C/12T (or 6 physical, 12 logical CPUs). AMD desktop CPUs are 6 core & less and most Intel 4 core + HT (ie, 8 logical) or less.

From what I have seen, with 8 CPUs it seems to still run good. Only with benches can see if SMP holds up or not and how well.

Also, since this is just code it can always be fixed with the right programmers. So, even if an issue today, that could change later on by some developer improving the code.

As for Linux scalability, well, if I ever need a system with 256 cores then I will keep that in mind. Or even with 16 or more cores but for now I am happy using my 2C/4T systems. In future we may have 8, 12 & 16 core on desktop but who can say when that will happen. The i7 EE is selling for $1,000 so too expensive and may take 2 years to drop to a better price.

Use installoptionalpackage clockwerk to install and try that out Andrew. Clockwerk was included with A1 & maybe A2 but was not part of A3.

As for video editing and other applications, native software is always best but Haiku will have to use many ports until post-R1 when more developers jump onboard and start writing Haiku software.

Also, trying to write software from scratch does not always make sense and better to port some apps over to Haiku.

People may also be familiar with certain software, like Blender, and having it also available on Haiku means someone can have choice whether to use native or ported version of the program.

[quote=AndrewZ]

I’m not sure we are talking about the same thing here. I am looking for a specific example of a bottleneck in the Haiku kernel that affects SMP today…

For what, cracking passwords? Running magma simulations? GPGPU processing is fairly specialized at this point. You can only offload very specific tasks, mostly math. It’s not as useful as having a second, third, or fourth core. [/quote]

I don’t really know enough about the innards to know for sure, but its likely comparable through benchmarking I have done to windows linux in most of the smp heavys apps that are around, mainly handbrake.

As to gpgpu, GCN and Cuda adress many of the problems your hinting at and when a good programming model is developers things like audio DSP, video DSP, etc etc etc will show massive improvements. Actually some already do BTW.

Its just a matter of a good programming model, which BTW Haiku has a very good opportunity as a fiarly clean unencumbered system to make that happen.So if Haiku wants to really make that jump, that big leap, they can likely given the rough construction of the OS and the API’s make the jump go gpgpu processing post R1 or even a dev version and likely really do it right. Currently its sort of a hacky add on everywhere else unless its custom written software for a specific purpose and super computers are showin g just how power the GPU can be.

However some apps won’t benefit, highly serialiazed GUI’s won’t but haiku does well there already, its the heavy lifting for audio,video, etc types of processing where those workloads make the most sense to be on the gpu in the first place.

Look around the world is changing.

[quote=thatguy]
As to gpgpu, GCN and Cuda adress many of the problems your hinting at and when a good programming model is developers things like audio DSP, video DSP, etc etc etc will show massive improvements. Actually some already do BTW.[/quote]
Well, CUDA would be pointless since it’s NVidia ONLY. AMD however is going all-in with the cross-gpu framework OpenCL, not surprising perhaps since they also seem to be focusing on cpu’s with gpu’s in them. OpenCL allows the code to run on both cpu and gpu, originally drafted by Apple it’s now backed heavily by Apple, AMD and also to some extent Intel.

[quote=thatguy]
its the heavy lifting for audio,video, etc types of processing where those workloads make the most sense to be on the gpu in the first place. [/quote]
But these areas are handled in user-space, not by the Haiku system so I’m not sure I follow what you are suggesting, what exactly would the Haiku system use the gpu for?

[quote=tonestone57]
People may also be familiar with certain software, like Blender, and having it also available on Haiku means someone can have choice whether to use native or ported version of the program.[/quote]
I’m not sure exactly what you mean by a ‘native’ Blender, are you talking a pure native Blender equivalent? Haiku native equivalents of such specialized software like Blender, Inkscape, etc seems very unlikely to happen, even if Haiku were to attract lots of developers. I’d say this kind of software really works best as cross-platform projects.

If it’s about making Blender look/feel native I think Blender is a poor candidate for that given it uses it’s own OpenGL-driven interface.

Inkscape (which I mentioned) would be a much better candidate for being made to look/feel native. Still I’m not sure someone would find it worth doing it even there. Maintaining non-official ports is hard work enough I gather, so I’d be glad just to have the ports at all.

Not talking about making Blender more native.

Change out:
“native or ported version of the program” to “native or ported software application”. ie, someone can choose to use native video editing software (like Clockwerk) or a ported version like Blender. If familiar with Blender then can use the software on Haiku right away without having to relearn it.

To me, making ported software look & feel native is unimportant and what really matters is that it runs well and is available on Haiku.

“Maintaining non-official ports is hard work enough I gather, so I’d be glad just to have the ports at all.”

I agree. The user should decide if they want to use native software or ported software on Haiku. Having both gives better and more choice to Haiku users. Ported software can also run pretty well on Haiku except for a few, larger applications.

[quote=Rox]
Well, CUDA would be pointless since it’s NVidia ONLY. AMD however is going all-in with the cross-gpu framework OpenCL, not surprising perhaps since they also seem to be focusing on cpu’s with gpu’s in them. OpenCL allows the code to run on both cpu and gpu, originally drafted by Apple it’s now backed heavily by Apple, AMD and also to some extent Intel.

But these areas are handled in user-space, not by the Haiku system so I’m not sure I follow what you are suggesting, what exactly would the Haiku system use the gpu for?[/quote]

Yes AMD is going for a more open and versitale approach whereas cuda is very vendor specific. Makes a great case to support it.

I am talking about having userland apis and kernel structures etc to make the gpgpu paradigm more effective. A closer tie in vrs a user space and a good programming method and design to make it work. Opencl is a start IIRC gallium plans to support opencl at some point.