A new malloc for Haiku userland, based around OpenBSD's malloc

nephele · February 13, 2025, 2:20pm

With something as complex as a memory allocator I can understand waddlesplash wanting something well-tested. And since his patch adds this alongside hoard2 we can compare as needed. If someone else wants to procude a patch where we can also try against mimalloc that would be nice aswell.

I think, ideally, we would need something that isn’t quite off-the-shelf anyway, and rather based on something else. Ideas from mesh could come in handy duo to our heavy use of C++ for example. Perhaps even only used for specific ELF images based on a flag for the runtime loader.

waddlesplash · February 13, 2025, 2:33pm

Not by as much as I initially thought, and this difference is in used pages, not in reserved memory. There it still winds up reserving 1 GB or more where hoard2 and OpenBSD are using hundreds of MB at most, and I got app_server to crash while using the system normally due to mimalloc failing to reserve another 1GB. We might be able to craft an allocation/reservation strategy for it that does better here, but that will require more research.

The author also recommended looking at the in-development “version 3” of mimalloc which may have even less memory usage (it restructures a lot of the internal allocator design, apparently), but this is still experimental and not yet ready for production use. (There are some open bugs in mimalloc about potentially-unbounded memory usage growth which are a problem in v2 but apparently not in v3.)

I think we should keep an eye on mimalloc for sure, and especially if Chimera Linux sees success with it as the system allocator, then it may be a future option here. But for now there are too many shortcomings and corner-cases, I think.

cocobean · February 13, 2025, 3:20pm

mimalloc v3.0.1 is very recent. A bit risky even though it seems like a great product.
Examples:

mimalloc crashes on FreeBSD during static initialization phase · Issue #1007 · microsoft/mimalloc · GitHub
Memory blowup with synthetic test: mimalloc 2 uses all available memory, mimalloc 3 ok · Issue #1001 · microsoft/mimalloc · GitHub

jscipione · February 13, 2025, 6:40pm

So what ever happened to musl-mallocng ?

waddlesplash · February 13, 2025, 7:23pm

It’s now musl’s default malloc. I did try it in Haiku but it had worse performance than hoard2 by a lot in the Debugger benchmark, and it didn’t seem as flexible a design, so I gave up on it.

marcoapc · February 13, 2025, 9:28pm

There are many problems on 32-bit systems, using mimalloc would cause a lot of work and waste of time to run 32-bit systems.

Chimera Linux is very experimental, not something you can use on a daily basis.

PulkoMandy · February 13, 2025, 9:47pm

X512 did the branch, and it’s on Gerrit. So yes, testing for that one is also welcome

waddlesplash · February 13, 2025, 10:46pm

I merged the new allocator in hrev58639, so it’ll be in the next nightlies. Thanks everyone who tested in advance!

marcoapc · February 13, 2025, 11:37pm

Has musl-mallocng security issues:

github.com

struct/isoalloc/blob/master/SECURITY_COMPARISON.MD

# Heap Allocator Security Feature Comparison

Heap allocators hide incredible complexity behind `malloc` and `free`. They must maintain an acceptable level of performance for various environments with wildly different memory and CPU constraints. All the security checks in the world don't matter if they require performance regressions that slow down your program by orders of magnitude. Striking a balance between performance and security is a requirement if you want people to use your library. Most allocators have some security checks, even if they're poorly implemented and easily bypassed. It's impossible to capture the nuances of how those checks work, the corner cases of when they do and don't apply, or the environment changes that affect their efficacy etc. This table is incomplete at the moment so I welcome pull requests that improve its accuracy.

:heavy_check_mark: = Yes by default

:heavy_plus_sign: = Yes but requires configuration

:heavy_minus_sign: = Yes but via non-standard api

:x: = Not available

:grey_question: = Todo


| Security Feature                    | <a href="https://github.com/struct/isoalloc/">isoalloc</a>         | <a href="https://source.android.com/docs/security/test/scudo/">scudo</a>            | <a href="https://github.com/microsoft/mimalloc/">mimalloc</a>         | <a href="https://github.com/google/tcmalloc/">tcmalloc</a>        | <a href="https://github.com/yeewang/ptmalloc3/">ptmalloc3</a>        | <a href="https://jemalloc.net/">jemalloc</a>         | <a href="https://musl.libc.org/">musl malloc-ng</a> | <a href="https://github.com/GrapheneOS/hardened_malloc/">hardened_malloc</a>  | <a href="https://chromium.googlesource.com/chromium/src/+/HEAD/base/allocator/partition_allocator/PartitionAlloc.md">PartitionAlloc</a>   | <a href="https://github.com/microsoft/snmalloc/">snmalloc</a> |
|:-----------------------------------:|:----------------:|:----------------:|:----------------:|:---------------:|:----------------:|:----------------:|:----------------:|:----------------:|:----------------:|:----------------:|
|Memory Isolation                     |:heavy_check_mark:|:heavy_check_mark:|:heavy_check_mark:|:x:              |:x:               |:grey_question:   |:x:               |:heavy_check_mark:|:heavy_check_mark:|:grey_question:
|Canaries                             |:heavy_check_mark:|:heavy_check_mark:|:x:               |:x:              |:heavy_plus_sign: |:x:               |:heavy_check_mark:|:heavy_check_mark:|:grey_question:   |:heavy_check_mark:
|Non-global canary                    |:heavy_check_mark:|:x:               |:x:               |:x:              |:x:               |:x:               |:x:               |:heavy_check_mark:|:grey_question:   |:heavy_check_mark:

This file has been truncated. show original

BiPolar · February 14, 2025, 3:23am

Bit late to the party here with my tests results (mentioned them on IRC about an hour before the new malloc got merged).

Anyway… will paste my results here, as some of you might find them interesting:

I built Python 3.14.0a5 on beta5, hrev58616 and on hrev58637+3+dirty (change-8974 at patchset 3).
Used optimized LTO parallel builds (“./configure --with-optimizations --with-lto”; make -j 4). Work-dirs on RAMFS.

All on bare metal x86_64 (Phenom II X4 @ 2.8 GHz, 8 GB DDR2).

Results for beta5 are not directly comparable, because after it, I’ve added a patch to fix an issue that prevented the ahead-of-time compilation (and inclusion) of LOTS of .pyc files in the final .hpkg.

Even so, beta5 was the slowest of the 3 (despite doing less work overall on the INSTALL() stage).

Timing results:

hrev57937+129:      real 43m48.376s / user 58m44.716s / sys 3m4.900s (.hpkg about half of what it should be, due to missing .pyc files)
hrev58616:          real 42m34.229s / user 59m49.856s / sys 2m0.797s
hrev58637+3+dirty:  real 37m19.972s / user 53m29.505s / sys 2m21.403s
hrev58650_dirty-1:  real 36m57.818s / user 53m30.421s / sys 2m18.671s

Memory results (subjective, I just watched ActivityMonitor graphs with 2s refresh time):

Beta5/nightlies with hoard2: Mem usage max around 1.4/1.5 GiB, stayed most of the time a bit upward of 1 GiB.
nightlies with OpenBSD memalloc: Peak mem usage about 1.59 GiB, but otherwise seemed to be more frequently below 1 GiB compared to hoard2.
hrev58650: peak usage 1.55 GiB.

Edit: added data for hrev58650 (that includes further optimizations that landed on hrev58649)

Zenja · February 14, 2025, 9:49am

Augustin, you are too fast for me. I just tried 58639, and I can consistently within 5 seconds get Medo to crash when loading a bunch of video files. See attached crash report (error in memcpy in AvCodecDecoder - part of ffmpeg). With the older version (58583 from Feb 1st), it works fine (no crashes).

I will attach crash report in haiku bug tracker.

thread 2132: WorkThread_00
state: Exception (Segment violation)

	Frame		IP			Function Name
	-----------------------------------------------
	0x7f4792f57d80	0xc369283af6	memcpy + 0x26 
		Disassembly:
			memcpy:
			0x000000c369283ad0:               55  push %rbp
			0x000000c369283ad1:           4889d1  mov %rdx, %rcx
			0x000000c369283ad4:           4889e5  mov %rsp, %rbp
			0x000000c369283ad7:             4156  push %r14
			0x000000c369283ad9:             4155  push %r13
			0x000000c369283adb:           4989fd  mov %rdi, %r13
			0x000000c369283ade:             4154  push %r12
			0x000000c369283ae0:           4989f4  mov %rsi, %r12
			0x000000c369283ae3:         4883ec18  sub $0x18, %rsp
			0x000000c369283ae7:         4883fa10  cmp $0x10, %rdx
			0x000000c369283aeb:             7623  jbe 0xc369283b10
			0x000000c369283aed:   4881faff070000  cmp $0x7ff, %rdx
			0x000000c369283af4:             765a  jbe 0xc369283b50
			0x000000c369283af6:             f3a4  rep movsb <--

		Frame memory:
			[0x7f4792f57d40]  .}..G.....Rbn...   a0 7d f5 92 47 7f 00 00 00 c3 52 62 6e 00 00 00
			[0x7f4792f57d50]  `...G...`...G...   60 7f f5 92 47 7f 00 00 60 7f f5 92 47 7f 00 00
			[0x7f4792f57d60]  .~..G.....N.....   f8 7e f5 92 47 7f 00 00 00 80 4e e8 ff ff ff ff
			[0x7f4792f57d70]  .}..G.....R.....   a0 7d f5 92 47 7f 00 00 9d bb 52 95 b2 00 00 00
	0x7f4792f57db0	0xb29552bb98	AVCodecDecoder::_DecodeVideo(void*, long*, media_header*, media_decode_info*) + 0xf8 
	0x7f4792f57ec0	0xd6fb312eab	BMediaTrack::ReadFrames(void*, long*, media_header*, media_decode_info*) + 0x8b 
	0x7f4792f58070	0xe9709f973a	VideoManager::GetFrameBitmap(MediaSource*, long, bool) + 0x65a 
	0x7f4792f58120	0xe9709fa305	VideoManager::CreateThumbnailBitmap(MediaSource*, long) + 0x1e5 
	0x7f4792f58140	0xe9709fb8e6	VideoThumbnailActor::AsyncGenerateThumbnail(MediaSource*, long, bool) + 0x16 
	0x7f4792f581b0	0xe970a58ff2	yarra::WorkThread::work_thread(void*) + 0x172 
	0x7f4792f581d0	0xc369203487	thread_entry + 0x17 
	00000000	0x7ff6df73d258	commpage_thread_exit + 0 

	Registers:
		  rip:	0x000000c369283af6
		  rsp:	0x00007f4792f57d40
		  rbp:	0x00007f4792f57d70
		  rax:	0x0000000004380001
		  rbx:	0x0000006e6252c300
		  rcx:	0x00000000007e9000
		  rdx:	0x00000000007e9000
		  rsi:	0x00000089d6a12000
		  rdi:	0xffffffffe84e8000
		   r8:	0x0000006e699e1900
		   r9:	0x0000006e69a61900
		  r10:	0x0000000000000780
		  r11:	0x00000089d71f9200
		  r12:	0x00000089d6a12000
		  r13:	0xffffffffe84e8000
		  r14:	0xffffffffe84e8000
		  r15:	0x0000006e35cfc480
		   cs:	0x002b
		   ds:	0x0000
		   es:	0x0000
		   fs:	0x0000
		   gs:	0x0000
		   ss:	0x0023
		  st0:	nan
		  st1:	nan
		  st2:	nan
		  st3:	nan
		  st4:	0
		  st5:	0
		  st6:	nan
		  st7:	nan
		  mm0:	{0x39, 0, 0, 0}
		  mm1:	{0x7, 0, 0, 0}
		  mm2:	{0x4, 0, 0, 0}
		  mm3:	{0, 0, 0, 0x2000}
		  mm4:	{0, 0, 0, 0}
		  mm5:	{0, 0, 0, 0}
		  mm6:	{0, 0, 0, 0x2000}
		  mm7:	{0, 0, 0, 0}
		 ymm0:	{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
		 ymm1:	{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
		 ymm2:	{0, 0, 0x780, 0, 0x438, 0, 0x1e00, 0, 0, 0, 0, 0, 0, 0, 0, 0}
		 ymm3:	{0x10, 0x9, 0, 0x3f80, 0x14a, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
		 ymm4:	{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
		 ymm5:	{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
		 ymm6:	{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
		 ymm7:	{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
		 ymm8:	{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
		 ymm9:	{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
		ymm10:	{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
		ymm11:	{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
		ymm12:	{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
		ymm13:	{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
		ymm14:	{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
		ymm15:	{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}

Zenja · February 14, 2025, 10:14am

Issue logged to #13554 (Switch system allocator) – Haiku

waddlesplash · February 14, 2025, 12:57pm

Thanks for testing. My initial guess is that this is a Media Kit bug or something like it that’s been exposed by the new allocator and not an allocator bug, but we’ll see.

waddlesplash · February 17, 2025, 5:00pm

The crash is indeed a Media Kit or FFmpeg bug and reproduces under the guarded heap as well with MediaPlayer; there’s now a separate ticket #19420 for it.

I’ve just pushed a change to the new allocator’s glue code in hrev58649 that should reduce memory usage and improve performance by a bit more. Some of the system servers are now significantly closer to their memory usage under the old allocator just after boot.