Bytecode implementations and why they're important

SamuraiCrow · June 10, 2021, 5:50pm

In several threads I’ve run into there have been people that are quite resistant to implementing bytecodes like WebAssembly into the standard implementation of the Haiku package manager. I’ve created this thread to discuss different bytecodes through the years and what was right or wrong with them.

Java

Considered the first, was actually one of the worst as initially implemented. It still has major flaws to this day.

Flaws

Proprietary licencing for some of its standard library
Mandatory garbage collection
Cannot be non-OOP
Huge runtime library

Advantages

Portability
Availability as GPL OpenJava

Comments

Once AOT compilation alleviated most of the birth pains of interpreters and JITting in the mobile edition of Java, the speed issues were mostly laid to rest but by that time the damage was done. The desktop varieties required a huge runtime library and object-oriented inheritance. This interfered with cache performance and caused massive swapping to the hard drive. This, in turn, threw out most of the speed increases that AOT compilation gained.

.NET and CLR

Since the advantages and flaws are much the same as Java, I won’t list them again. It has more language support than the JVM, not to mention more standard development tools.

WebAssembly

An open source standard defined by the W3C. I see it as just another off-the-shelf component.

Likes

AOT most common compilation strategy
BSD like license
Doesn’t require object-oriented inheritance or garbage collection at the core level

Dislikes

No macros at the core-level for more extensive dynamic bindings
No big-endian support in the core bytecode

Comments

I like this one best of all even though there is room for improvement. Rust translates well into it because the bytecode is low-level enough to exclude the indirect jumps associated with vtable calls used by OOP languages.

SPIR-V, OpenKode and OpenCL

These are maintained by the Khronos group and as such, they dovetail well with Vulkan. I’ve never seen OpenKode implemented but it’s supposed to be an extension of OpenCL and have excellent support for offloading operations to the GPU. SPIR-V is the standard shader bytecode optimizing representation. I don’t know much else about these.

If somebody would like to add more please comment! I will be adding commentary as well.

X512 · June 10, 2021, 6:10pm

Compatible with Haiku C++ ABI and allows to interact with Haiku API without wrappers and bindings.

It would be very difficult to provide access to whole Haiku API from Java bytecode.

BlueSky · June 10, 2021, 8:13pm

Maybe it’s just me but I completely seem to fail to see how this is relevant to Haiku as an operating system. Which is probably OK because it is in the off-topic category, but still…

Munchausen · June 10, 2021, 8:14pm

I think it’s worth mentioning python and meta tracing JIT compilers (i.e. pypy) which can make huge leaps in terms of performance.

Also, supposedly an iteration of ARM that supports memory bounds checking is on the horizon. Such a feature should help alleviate the need for some run time memory safety checks, and bring impressive performance increases for interpreted languages.

SamuraiCrow · June 10, 2021, 8:19pm

Perhaps you missed the RISC-V porting thread. If we want package support for all little-endian architectures in one package, a bytecode representation would be really handy.

BlueSky · June 10, 2021, 8:24pm

No, I didn’t.

This sounds like a very complicated and bloated solution to me. Why not just build the packages normally for every architecture and let the package manager handle it?

SamuraiCrow · June 10, 2021, 8:40pm

Two reasons: building a separate package for each architecture takes many times the hosting storage, and secondly, bytecodes are more future proof for additional architectures thus removing the need to recompile all packages every time another architecture is added.

leavengood · June 10, 2021, 9:22pm

I think it would be interesting if someone tried a prototype of this idea of single WASM bytecode packages which are AOT (ahead of time) compiled into native code upon installation. Probably the Rust cranelift project would be worth looking at for this. It might do most of the work?

Having something like this would start to be useful if we had the RISC-V port usable on real hardware, which could be close given X512’s recent progress (or there could be some real show-stoppers on real hardware which make it not happen for years.) This also depends on a RISC-V backend for cranelift, I’m not sure of the status of that. If an ARM port also became a reality then it would start to be very useful to have something like this.

With that said, I don’t think the current Haiku development team has the bandwidth for this, so it would be helpful if someone made a prototype. Even a prototype might be a lot of work, though maybe less so if cranelift can be used. Of course there is no guarantee it would be adopted by Haiku later, though if it works and is mostly transparent to users (which is what I am advocating) I would certainly support it, and maybe X512 too. Like everything in Haiku there would need to be a case made to use it.

If you are passionate about this idea SamuraiCrow I say start messing around with it.

My acceptance criteria would be something like:

Server packages are stored in this WASM format, with some way of indicating what native libraries and symbols they use.
Upon installation the packages are compiled into the local native format using cranelift or something similar, producing applications or libraries. Resources like icons, MIME DB stuff and similar would also need to be handled. The resulting application or library should look like it came from our build system directly.

I don’t think making a prototype of this would be too hard. Then some nice to haves long term would be:

Reproducible builds (same source → same WASM)
Differential packages (this would probably require caching the WASM versions which might defeat some of the benefits of differential packages for local storage space)
There is a way to execute and run CI/CD tests against the WASM versions, with maybe occasional tests of the native compiled versions too
There are CI/CD tests of the AOT compiling on the supported platforms, maybe not for every package on every build, but certainly periodically.

Edit: having written all this, I really don’t know how this would work with the package system. I guess the output of the AOT compiling process would need to be hpkg files. Maybe the process would be WASM hpkg → native hpkg?

SamuraiCrow · June 10, 2021, 9:31pm

Thanks for the vote of confidence! The easy package to look at is the WebAssembly Binary Toolkit (or WABT, pronounced “wabbit”). It comes with a utility that exports a .WASM file to C source. I’ll see if I can get that working first.

leavengood · June 10, 2021, 9:34pm

I don’t know how useful WASM → C would be for purposes of proving this out. I think the make or break would be whether cranelift or something similar can take WASM and compile it into native code which supports the C++ ABI so they can directly call into native Haiku libraries and also be called from other applications and libraries. I have no idea of the state of WASM and C++ ABI. I know C++ ABI is kind of a crapshow in general.

SamuraiCrow · June 10, 2021, 10:06pm

The default WASM backend runs on LLVM12. I just downloaded that from HaikuDepot to Haiku beta2.

memsom · June 10, 2021, 10:26pm

I think you would be better to look at something like BitCode. I still think a runtime is the wrong solution. I would want the code generated to be as close to the C++ as possible with symbolic debugging symbols. The last thing we need is the mess that you get with JavaScript transpilers and debugging the app from the original source code.

Your main issue will be C++. I really don’t know how you would bridge the C++ API in to something like JavaScript. Because the ABI will be different on every potential platform, you will have to deal with that somehow. C++ is the devil’s own language for this type of thing. The alternative is wrapping the world in C… but then you may as well use .Net. It has first class C native integration and AOT compilation.

SamuraiCrow · June 10, 2021, 10:31pm

As crazy as it may sound, LLVM BitCode is not architecture independent. A 32-bit bitcode won’t work on a 64-bit OS. WASM32 works on either 32-bit or 64-bit OSs as long as the application fits in 32-bit boundaries.

memsom · June 10, 2021, 10:46pm

Are you 100% sure about that. I was under the impression that BitCode supports both 32 and 64 bit ARM from the same output, for example. Otherwise, it seems a bit redundant for Apple to enforce its use.

SamuraiCrow · June 10, 2021, 10:50pm

If Aarch64 is backwardly compatible to 32-bit ARM, it may be a hybrid architecture. Or it may just be a holdover from the past versions of iOS.

What I do know for certain is that PNaCl was Google’s attempt at cross-architecture BitCode and it failed because each version of LLVM had a new BitCode format while PNaCl needed a stable version.

SamuraiCrow · June 11, 2021, 2:35am

The test progress will be listed at the WebAssembly progress thread.

nephele · June 11, 2021, 4:16am

So far you seem to have been the only one even talking about that :)

Anyway, if i recall correctly WASM has been made to compile somewhat okay towars ARM and x86, but isn’t designed for RISCV etc. so if anything that would have to be checked in practice.

As for this thread I think a major bytecode you have not mentioned is the dalvik VM and the ART runtime, both part of android.

ART has replaced dalvik and is also aot/jit, as an advantage though it allows to ship native libraries aswell, which is nicer if you need precise abi boundaries for example this would be important for decorators.

BlueSky · June 11, 2021, 7:09am

That might be true, but it`s not like we have that many architectures to support in the forseeable future. If the porting efforts continue at the phenomenal pace of X512’s riscv porting we would be lucky if we have Haiku running on riscv and arm, in addition to x86 (64 and 32bit).

I also freely admit I don’t have the technical in-depth knowledge about bytecode implementations to get into a serious discussion about it. I see it from a more practical standpoint. We are already short on developers and people are complaining about the perceived slowness of releases (I don’t agree with that).

But don’t let my words discourage you in any way. If you can implement the solution you proposed in a stable and performant way (and in a timeframe that doesn’t get us even further from reaching R1) it will be a great thing for Haiku.

SystemShock · June 11, 2021, 11:16am

The java bytecode has nothing to do with the java standard library. Many other languages can be compiled to that bytecode. A very good specification of the bytecodes can be found online.

memsom · June 11, 2021, 11:26am

Same with the CLR. I have a version of the base system libs that implements a bunch of PSP level stuff. The base libs are more a convention.

The CLR is also designed to be OO, but nothing about the code generation forces that. It is absolutely possible to have static methods that behave like regular functions. The P/Invoke interface is also pretty great, and pummels JNI in to a pulpy mess in the corner

Also - this list is not exhaustive. There are plenty of other VM’s and byte code representations.