NPU WebAssembly compiler optimizations?

SamuraiCrow · June 16, 2025, 11:38pm

Former WebAssembly Plans

Anyone who has known me since I’ve started posting to this forum knows that my ideal scenario for WebAssembly is that cross-platform bytecode files could be made according to the WASI standard for architecture-neutral apps. Recent events may turn my ideal scenario totally on its head and my plans for Haiku with it.

Using an NPU to Change That

The vaguely-named neural-processing-unit (NPU) is a recent addition to x86-64 models and some Qualcomm Snapdragons on ARM64 as well. While all the hype surrounding it seems to be directed toward A.I. and all the hullabaloo associated with it but parts of an NPU may be useful for other, more common and useful tasks.

Of particular note is the fact that linear algebra is coprocessed by NPUs such as the matrix functionality used by the PBQB register allocator in LLVM. If that particular function is made demonstrably faster, using WebAssembly bytecode could be made much more painless, along with any other LLVM-based production compilation. So far LLVM-based static compilers are not limited to WebAssembly but also exist for JavaScript as well.

The Downsides

If WebAssembly continues to be adopted for its intended purpose as a means of making web-apps faster and more efficient, the ideal of using clever tricks at the OS level to improve usability of the system could be fading fast. So far, most WebAssembly exercises have been web-based and the WASI standards are basically an afterthought. If this trend continues, the milestones that would be reachable about improving an OS will be rendered irrelevant by the browser apps’ improvements.

I’ve long viewed web-development as redundant with system development but always hoped that system programmers’ efficient code would win in the end. This is no longer looking like the case because with a static installation of production-grade web-apps, the roles of system programming and web-development could become so utterly interchangeable that they’ll all be competing for the same clients. Programming languages like Dart Native that used to support all types of mainstream OS apps as well as web-apps will become unnecessarily expensive to maintain.

The Fallout

Even the smallest and most clever operating systems will become as obsolete as any other operating systems as browser-based systems like ChromeOS will become the simplicity of the mainstream while traditional operating systems will fade out. Their features will be ignored and eliminated. Basic browser infrastructure like RSS feeds will be enough to supplement or even replace package management.

The Light at the End of the Tunnel?

Once the browsers and operating system functions are integrated so fully that browser essentially run on the bare metal and the bare metal becomes architecture neutral, mainstream OS development will likely cease as well. My hope to make drivers a macro-expansion of a hardware register map that is independent of the headers that wrap it using generics or similar functions could be manifested.

The end of Native apps could ensue though. The light at the end of the tunnel may be the train.

PulkoMandy · June 17, 2025, 6:05am

I think you are making this way overcomplicated. People have been experimenting with this since Java and the JVM, and to some extent before that with Smalltalk and Lisp. None of these need neural processing units. They achieve this by not using a register-based bytecode, but a stack-based one. The virtual machine can easily post-process that and turn it into anything it wants, itcluding a register based bytecode that matches the cpu it is running on. C# does this for example as it runs its JIT recompilation, and then caches the resultsso the next run of the program is faster (or it could even do a compilation step at package installation time).

In the end, moving that step at package creation time allows you to do it once per cpu architecture (a handful of times) instead of once per machine (a few thousand or million times), or at each time the program starts (multiply by another 1000 or so).

That’s why this is limited to the web and not gaining much success for traditional distribution of apps. You can do less work (compile a handful of times instead of millions of times) to get a faster app (since it is compiled natively). what are the reasons you would not want to do that?

SamuraiCrow · June 17, 2025, 10:08am

Simple: There is no longer a distinction. Compiling through LLVM doesn’t care what language you use. C++ has been around a long time, C even longer. Varying degrees of portability are available through standard runtimes and POSIX standards. Compiling JavaScript and other decidedly web-related technologies will make no difference in efficiency any more because LLVM generates good code from either one. JavaScript is just one more lamb in the fold of LLVM or GCC.

One of the major points I was making is that time overhead being associated with compilation is suddenly not so much if register allocation is coprocessed. Who really cares if compilation is once per install, downloading bytecode versus once per architecture and downloading the binary. If compilation times are minor, language doesn’t matter.

Background

You mention JVMs originating in the 90’s. What brought the need of it was the domination of Microsoft Windows in the computer market. POSIX standards were being ignored as irrelevant. I was happy with my Commodore Amiga, which also wasn’t POSIX, so what did I need to worry about until Commodore went under in '94? More than I thought! When Commodore went under, after years of bad business to bring computers to the masses, it didn’t fail in isolation. It was pushed out of business.

When I started looking for alternatives that WERE NOT Windows, one of my favorites and most promising ones was BeOS but by the time I found out its feature set was good, BeOS had moved from PowerPC to x86. Once on x86, it wasn’t just not accepted by the PC manufacturers, it was shunned. Did it get tossed from owner to owner out of nowhere? NO! It was pushed out!

There are some really bad characters in this story but without making this personal, the need for bytecodes and cross-compatibility has been these bad characters’ involvement.

Standards

POSIX standards aren’t good enough while web standards are. That’s why I’m afraid the OS may be integrated into the browser and the need for anything else will vanish. Haiku is a good OS while there is still need of an OS but that may be a missed opportunity.

Apologies for the Wall of Text

The last post I made didn’t get revised or edited for length because I was late for a meeting as I typed the last couple paragraphs. I’m trying to keep this post from becoming another wall of text but with limited success. I apologize in advance for making you read another monstrosity.

Closing

If the need to get rid of Windows ends up sinking all of the better-designed operating systems in the process, the degree to which it is a loss may vary. Browser technology was always to be a sandbox for information to be insulated from the hardware. If the sandbox becomes the beach though? The implications of that were unexpected and undesirable in my opinion.

pinaraf · June 17, 2025, 1:12pm

It’s kind of funny how the way the mainframe work since decades is looking so great now, and people try to reinvent it again
You can also have a look at some ideas of research projects like Singularity (operating system) - Wikipedia : compilation at installation time from bytecode to machine code in order to make sure the code generated can not fool around…

SamuraiCrow · June 17, 2025, 3:37pm

It’s all interchangeable now! GitHub - ASDAlexander77/TypeScriptCompiler: TypeScript Compiler (by LLVM) is a TypeScript compiler that can target binary or WebAssembly. Who even needs the traditional JavaScript backend any more?

PulkoMandy · June 17, 2025, 9:29pm

That’s not true. No matter how much effort you make, dynamic languages like javascript don’t offer as much optimisation as C or C++ where the static typing allows to do a huge part of the work once (at compile time) for gains at runtime. The gains muy be somewhat small (a few percent) but the cost is paid only once for a lot of runs of the program, making it worth the compilaton time wait. But that doesn’t work anymore if you make everyone redo the compilation.

What you save in compilation time, you will pay by slower runtime. Since runtime happens much more often, and in your scheme, users also pay some of the compilation time, this makes things slower for the users twice, and I still don’t see who gains anything from it.

So Windows, a technology from the 80s, was replacing unix, a technology from the 70s that was never meant to run on home computers. Seems the normal course of history to me. Javawas here to help solve the large diversity of systems at the time: many cpu families, many os.

BeOS had lots of cool ideas but was never quite ready to use as a serious OS. It would have needed to either be in that state a few years earlier, when other systems were also less polished, or continue on for a few more years to get better support for printing, and many of the other tasks one may want to do on a computer. As a result, no pc manufacturer would bet on it as a single OS, and even Be quickly switched their marketing to try to sell it as an alternative os to put side by side with windows, knowing very well they were not quite ready for more than that. Sadly it was abandoned just as it may have been becoming just good enough (with bone and accelerated opengl for example, things that windows was doing for several years at that point).

I am confused. If your goal is inter-os portability, you will always focus on the smallest subset of features that works accross all os. That means no way to innovate anything such as the use of indexed attributes of Haiku in your framework.

I must be missing something, because no matter how I look at it, I see only downsides and it gets worse at every step.

runuime is slower than native apps
users must also pay some recompilation time
cross os compatioility prevents using os specific innovative features
special hardware (neural accelerator) is needed to optimize things and try to catch up on performance

and I don’t see what the selling point is. Just that Windows can run the same apps as Haiku?

michel · June 18, 2025, 7:24am

I’ve been around a while.

1990: Laptops are taking over, soon you won’t be able to get a desktop any more.
1995: PCs are dead. Thin clients are the future.
2000: Phones are so good now, who will want a laptop any more?
2010: Tablets, tablets, tablets!
2015: Phablets?
2020: Folding screens will take over real soon now!
2025: When we said “thin clients” 30 years ago, we actually meant chromebooks.

and …

Every year since 1995: Is this the year of the Linux desktop?

Soooo … here I am in 2025, typing this on a “desktop computer” of sorts - a Mac mini, actually, which is a merger of desktop and laptop parts. Mini-PCs are the hot ticket, but you can still buy a giant desktop PC if you like - serious gamers swear by them. You can run productivity apps in your browser but Office remains Microsoft’s cash cow.

I don’t think the sky is falling.

SamuraiCrow · June 18, 2025, 1:31pm

Another reprieve: LLVM uses the greedy allocator as default on x86. PBQP isn’t supported on that architecture yet.