Proposal: Targeted microarchitecture bump

extrowerk · May 18, 2021, 8:38pm

Currently Haiku targets the following CPU microarchitecture / feature set:

on X86: Northwood (2002) MMX, SSE, SSE2
on X86_64: Prescott (2004) MMX, SSE, SSE2, SSE3

Limiting the usable set of instructions to 20 and 18 years old CPU feature level means Haiku can run on that old hw, but also means we can’t use any modern functionality available in the modern CPUs.

I propose to bump the CPU feature level for x86_64 to:

at least Penryn (2008) MMX, SSE, SSE2, SSE3, SSSE3, SSE4, SSE4.1
but ideally to Westmere (2010) MMX, SSE, SSE2, SSE3, SSSE3, SSE4, SSE4.1, SSE4.2, AES-NI, CLMUL

They both are over 10 years old and 64 bit CPUs are capable to run 32 bit programs, so the worst case scenairo would be for those users to use 32 bit Haiku.

Or did i overlook something?

alpopa · May 19, 2021, 5:43am

There are AMD CPUs (x86-64) from 2010 that have no SSE3 and above.

I personally don’t want to stop the technological progress. But the bump of microarchitecture should really be done on purpose. Besides modern (optimized) functionality, what problems it solves? There was WebKit upgrade that required some. Are there others?

Don’t get me wrong. Imagine someone with such old PC being able to install and run Windows 10 and modern Linux on it and being unable to run Haiku. I don’t think Haiku would benefit from such update.

nephele · May 19, 2021, 5:46am

SSE2 is not required for haiku on x86, and there even was some user who complained about SSE2 for webpositive.

Why bump the requirements for 64bit? One could always make images for more modern cpus, but stopping support for older ones is a bit eh

lordmmx · May 19, 2021, 6:08am

I really don’t agree with this motion.

PulkoMandy · May 19, 2021, 8:32am

That is not true, we can detect the features at runtime and use them only if they are available.

You are also incorrect about the target for the x86 version, we don’t require SSE and SSE2 currently. We do use them where available. As a result, for example WebKit checks for SSE2 and does not run if it is not available, but you can still use other parts of the OS.

Personally, my oldest 64bit machine is from 2011 so I would be fine with this. But I bet other people are running older hardware. I’d say we should benchmark the difference to see if there is a large gain or only a small one first?

SCollins · May 20, 2021, 11:25pm

since compiler runtime can select, maybe continue 9n as before, but 64b being nonbeos compatible, allow newwr fpu etc

X512 · May 20, 2021, 11:29pm

Why? I think that base system should run with only basic CPU features. OpenGL and WebKit may not run on old hardware. System can use dynamic switching depending on available CPU features, if I remember correctly this is already done for software bitmap blitting in app_server. New CPU features may be also not available on virtual machines.

What are you expecting from modern functionality? It will likely not speed up generic logic code. It is useful mostly in specific cases like software rendering, compression, encryption. New CPU extensions are already used in this cases, for example LLVM-pipe OpenGL software rendering.

extrowerk · May 21, 2021, 4:00am

Oh i was misunderstood then. I am talking about the HaikuPorts stuff here. We consequently disable everything above SSE3. During a discussion with HPC folks they told we dont squeeze out every drop of hp but we leave a whole horse in it. Obviously their use case are different, but our ports using also fftw/blas and other scientific libs, those could profit hugely from uarch change.
But we cant change on HP without Haiku, the result would be catastrophic.

PulkoMandy · May 21, 2021, 7:44am

Catastrophic in what way? It will just mean that apps using SSSE3 will not run on CPUs that don’t have SSSE3. And that’s only if it can’t be detected at runtime.

Haiku does support applications using the extra instructions (this is no work at all for us) and registers (this required some work but it is already done). ffmpeg is one example where this is detected at runtime and it will decide on its own if it can use the instructions or not. So, people with modern CPUs run the code using these instructions, and people using older CPUs get a different version of the code automatically.

extrowerk · May 21, 2021, 9:36am

HaikuPorts hardcoded uarch in OpenBLAS recipe. Similarly it is hardcoded in fftw recipe too.

Would they break the ecosystem for older hw? They used by ports like LibreOffice…

X512 · May 21, 2021, 9:52am

Maybe just provide 2 binary versions for different CPU?

PulkoMandy · May 21, 2021, 9:55am

A quick look at the OpenBLAS readme shows me that it has a section named " Support for multiple targets in a single library" explaining that you should build with the DYNAMIC_ARCH option. Then it will decide by itself at runtime which CPU instructions are available. Source: https://fossies.org/linux/OpenBLAS/README.md

PulkoMandy · May 21, 2021, 10:01am

And for FFTW, here is the recommendation from the library developers:

Special note for distribution maintainers: Although FFTW supports a
zillion SIMD instruction sets, enabling them all at the same time is
a bad idea, because it increases the planning time for minimal gain.
We recommend that general-purpose x86 distributions only enable SSE2
and perhaps AVX. Users who care about the last ounce of performance
should recompile FFTW themselves.

Source: http://mad.web.cern.ch/mad/releases/madng/madng-git/lib/fftw3/NEWS

I think we should listen to them

I had a look at the sourcecode and it looks like it knows how to use the CPUID instruction to detect if these CPU instructions are available at runtime, too, for example https://fossies.org/windows/misc/fftw-3.3.9.zip/fftw-3.3.9/simd-support/amd64-cpuid.h

ModeenF · May 21, 2021, 9:12pm

For me, what we have are enough. But if we can turn on thing when there are support for it can’t we have newer things aswell?

We have other more important things to do, but nothing stopping anyone else for implementing thing?

lordmmx · May 21, 2021, 11:12pm

I think we should be able to stick with the current system requirements.