Currently Haiku targets the following CPU microarchitecture / feature set:
on X86: Northwood (2002) MMX, SSE, SSE2
on X86_64: Prescott (2004) MMX, SSE, SSE2, SSE3
Limiting the usable set of instructions to 20 and 18 years old CPU feature level means Haiku can run on that old hw, but also means we can’t use any modern functionality available in the modern CPUs.
I propose to bump the CPU feature level for x86_64 to:
at least Penryn (2008) MMX, SSE, SSE2, SSE3, SSSE3, SSE4, SSE4.1
but ideally to Westmere (2010) MMX, SSE, SSE2, SSE3, SSSE3, SSE4, SSE4.1, SSE4.2, AES-NI, CLMUL
They both are over 10 years old and 64 bit CPUs are capable to run 32 bit programs, so the worst case scenairo would be for those users to use 32 bit Haiku.
There are AMD CPUs (x86-64) from 2010 that have no SSE3 and above.
I personally don’t want to stop the technological progress. But the bump of microarchitecture should really be done on purpose. Besides modern (optimized) functionality, what problems it solves? There was WebKit upgrade that required some. Are there others?
Don’t get me wrong. Imagine someone with such old PC being able to install and run Windows 10 and modern Linux on it and being unable to run Haiku. I don’t think Haiku would benefit from such update.
That is not true, we can detect the features at runtime and use them only if they are available.
You are also incorrect about the target for the x86 version, we don’t require SSE and SSE2 currently. We do use them where available. As a result, for example WebKit checks for SSE2 and does not run if it is not available, but you can still use other parts of the OS.
Personally, my oldest 64bit machine is from 2011 so I would be fine with this. But I bet other people are running older hardware. I’d say we should benchmark the difference to see if there is a large gain or only a small one first?
Why? I think that base system should run with only basic CPU features. OpenGL and WebKit may not run on old hardware. System can use dynamic switching depending on available CPU features, if I remember correctly this is already done for software bitmap blitting in app_server. New CPU features may be also not available on virtual machines.
What are you expecting from modern functionality? It will likely not speed up generic logic code. It is useful mostly in specific cases like software rendering, compression, encryption. New CPU extensions are already used in this cases, for example LLVM-pipe OpenGL software rendering.
Oh i was misunderstood then. I am talking about the HaikuPorts stuff here. We consequently disable everything above SSE3. During a discussion with HPC folks they told we dont squeeze out every drop of hp but we leave a whole horse in it. Obviously their use case are different, but our ports using also fftw/blas and other scientific libs, those could profit hugely from uarch change.
But we cant change on HP without Haiku, the result would be catastrophic.
Catastrophic in what way? It will just mean that apps using SSSE3 will not run on CPUs that don’t have SSSE3. And that’s only if it can’t be detected at runtime.
Haiku does support applications using the extra instructions (this is no work at all for us) and registers (this required some work but it is already done). ffmpeg is one example where this is detected at runtime and it will decide on its own if it can use the instructions or not. So, people with modern CPUs run the code using these instructions, and people using older CPUs get a different version of the code automatically.
A quick look at the OpenBLAS readme shows me that it has a section named " Support for multiple targets in a single library" explaining that you should build with the DYNAMIC_ARCH option. Then it will decide by itself at runtime which CPU instructions are available. Source: https://fossies.org/linux/OpenBLAS/README.md
And for FFTW, here is the recommendation from the library developers:
Special note for distribution maintainers: Although FFTW supports a
zillion SIMD instruction sets, enabling them all at the same time is
a bad idea, because it increases the planning time for minimal gain.
We recommend that general-purpose x86 distributions only enable SSE2
and perhaps AVX. Users who care about the last ounce of performance
should recompile FFTW themselves.