Currently in Haiku RISC-V and ARM implementetions there are incorrect FPU handling that may corrupt userland FPU registers if kernel will use it. I am thinking how to solve it without increading context switch/syscall cost.
Most operating systems such as Windows NT and Linux do not allow free FPU access in kernel and require adding code to indicate begin and end of using FPU, for example KeSaveExtendedProcessorState.
Current FPU use in Haiku kernel is small (score for driver selecting) and probably can be removed or marked with extra code.
“premature optimization is the root of all evil” have it working first to estimate the costs.
What do you mean? Handle FPU registers in the the same way as GP registers and save/restore on each interrupt/syscall/context switch?
The use of floating point maths here is clearly inappropriate. If people love how it feels, use a fixed point scheme and translate to integers at compile time (e.g. 16.16 so the value 1.0 corresponds to 65536 in integer space), if not just use integers directly.
You said “I am thinking how to solve it without increading context switch/syscall cost”, which means you’re trying to optimize some cost, instead of measuring the cost, then optimize.
Don’t do it
Usually handled through software implementations, even back during the Amiga era.
kernel_fpu_end (Linux 2.6.11+)
It might have made sense to restrict FPU back in the days, but with current CPU’s it seems like that decision hasn’t aged well. What about SIMD or other more modern technologies? Why should we not make full use of the CPU? Maybe context switching overhead increases, but probably the code will be much faster.
Besides don’t we want CPU manufacturers to handle fast save/restore of registers…
There are several reasons Linux does it the way it does. One of them is that FPu on x86 has a lot of registers, and saving them all on every context switch takes a lot of time. So they keep track of which thread last used the fpu, and if the same thread wants to use it again, it can do so immediately. Otherwise, a “delayed” context switch occurs.
But you don’t want FPU context switches to happen in some critical low-level cooe in the kernel (let’s say while handling an interrupt or so). And there’s the problem of detecting when something uses the FPU at all.
On ARM they also allow running without a FPU since some hardware doesn’t have one, and using software emulated floating point in the kernel would make little sense.
So, what are our options here? How can we make sure the kernel doesn’t change fpu state for userspace processes accidentally? Is a manual call to a save/restore function the only way? And if so, do we insert that only in places where we use the fpu, or do we always do it when we enter and exit the kernel?
That missies the point of why it’s done this way. The idea is, whatever other drivers have chosen, you are very likely to find a floating point value that fits in between two other drivers. This is not really a problem in Linux because you can always patch the other drivers. But it was a problem in BeOS where they expected other people to write drivers (and do stupid things in their drivers, like trying to use UINT32_MAX as a priority where 2 would have been large enough).
Fixed point would not be any better than an integer in that regard, and so it would be the same complexity as floating point, but without helping with the problem floating point is designed to fix here.
RISC-V supports turning off FPU so attempting to use FPU instructions or registers will cause illegal instruction exception. FPU also detect when some FPU register changed and set dirty flag.
So it is possible to turn off FPU when executing trap handler and re-enable FPU when it is actually used.
Maybe we should do a branch with fpu disabled in kernel and see how it affects different archs and generations…
Probably not worth the effort as it wouldn’t even boot to desktop most likely the most you could get without FPU emulation would be to a shell…
Still it would be a good idea to remove FPU usages from the kernel since there is little to no advantage these days. And a driver breakage relative to BeOS is probably not a big deal at this point. I know some stuff I used in the past would probably break like the longrun driver (but there is source for that I think).
It probably wouldn’t be that hard, we have compiler flags to disable fpu code generation. Any asm using simd or fpu would need to be fixed though. The mouse driver already had fpu emulation (movement maker), but not sure if it has been removed since, as we moved it to userland.
Moving the mouse caused the fpu emulation and the screen blit to use a lot of CPU btw. Probably still the case for non hid devices.
I expect you’re thinking of the real numbers, for which this is of course true by definition, but the floating point numbers map 1:1 to the integers they’re just distributed differently over the number space - that’s why computers have floating point but not reals. So e.g. suppose we write dates as numbers, 20230708, 20230710 - so 20230709 fits in between those two right? Nope, in floating point we can’t tell the difference between 20230708 and 20230709 [ I’m assuming roundTiesToEven, you are of course free to insist on some other rounding mode but the problem just moves it doesn’t vanish ]
For the values that are actually being used today, like 0.0, 0.6, 0.8 and so on, fixed point is really nice choice which avoids the cost of using floating point comparisons and avoids the attendant confusion.
That is why I said “very likely” and not “certain”. Because I knew you would argue on whatever detail you can.
Oh… I never stated not to use an FPU (i.e. whether CPU-internal or CPU-external) - if existing.
It wasn’t a reply to your post. So no worries.
Huh. So you knew floating point wasn’t actually better here? @X512 is trying to make an improvement here, it’s not a huge improvement, but these things add up. Assuming (per Chesterton’s Fence) that you need this numerical scale, it can easily be fixed point instead of floating point, as I explained.
what about kernel extensions and improvements, you will have to write versions for different processors every time. I write this in the context of the fact that the GPIO in the RPI is also supported by the kernel