New syscall request

trungnt2910 · May 11, 2023, 11:35am

Currently, there is no way to atomically remap a memory range to the same physical pages as an existing one. A process wanting to do this must:

Iterate through all areas in the range they want to remap using area_for and get_area_info.
Map/reserve a memory region large enough to hold all concerned areas, including the unwanted head/tails.
Iterate through all the related areas once again. For each area, calculate the target address in the newly reserved region, unmap it, and then call clone_area to the target address.
Unmap the unwanted head/tails, if any.

This process is vulnerable to race conditions when other threads call any kind of memory management syscalls, including resize_area, delete_area, unmap_memory,… on the target.

I therefore propose a new syscall:

extern area_id _kern_remap_memory(const char *name, team_id targetTeam, void **address,
                                  uint32 addressSpec, size_t size, uint32 protection,
                                  bool unmapAddressRange,
                                  team_id sourceTeam, void* sourceAddress);

To do this, the caller must have an euid of 0 or the same euid as the euid of both targetTeam and sourceTeam.

This syscall, if added, would serve a variety of use cases, from remapping arbitrary ranges of memory with different protections, to quickly performing IPC through shared memory (this should be a more powerful version of clone_area and _kern_transfer_area combined).

It would not introduce any security issues since any malware with the privilege mentioned above can already use install_team_debugger and access the victim’s address space as it please.

What do you think?

waddlesplash · May 11, 2023, 3:14pm

There was some discussion on IRC about this already for .NET. But what does .NET actually need this feature for? I thought I also saw that some upstream .NET platforms (like the BSDs) do not even implement this feature at all. So it can’t really be that important…

trungnt2910 · May 11, 2023, 4:05pm

I thought I also saw that some upstream .NET platforms (like the BSDs) do not even implement this feature at all.

The BSDs support has better shm support and creating a multi-gigabyte file does not actually consume disk space. Haiku on the other hand implements shm using a temporary folder and therefore cannot use the shared file method.

Therefore, currently on Haiku I’m doing something similar to Apple by making use of the clone_area syscall.

waddlesplash · May 11, 2023, 4:05pm

The temporary shm folder is a ramfs, it doesn’t consume disk space either.

trungnt2910 · May 11, 2023, 4:26pm

For the shared file approach, each process tries to create a 256GB shared memory file.

And, from your message in IRC:

you can exahust system RAM this way and it will not happen very nicely

waddlesplash · May 11, 2023, 6:51pm

I meant that ramfs doesn’t handle the system running out of memory very well. That can be fixed, if necessary.

trungnt2910 · May 11, 2023, 10:44pm

Hmm, last year when I initially ported dotnet shm_open files were created directly on disk.

It seems that there has been a recent change to enable storing shared memory files in actual memory.

Now 256GB files can be created on the filesystem without any (apparent) problems.

Still, the addition of the syscall mentioned here would still have some uses, such as serving as a replacement for /proc/[pid]/mem which is not available on Haiku.

trungnt2910 · May 13, 2023, 4:59am

Another use case for this is to make a more powerful (but costlier) mprotect that can set cut areas and change more advanced flags such as B_OVERCOMMITTING_AREA.

jessicah · May 15, 2023, 12:43am

It seems like a new system call would be better than trying to shoehorn an ill-fitting set of existing APIs. We have full control over the kernel and libroot. Perhaps this current proposal does maybe a little too much for a single system call? I don’t think limiting ourselves to POSIX has particular value.

trungnt2910 · May 15, 2023, 4:31am

I don’t know what is considered “too much” when ioctl and _kern_generic_syscall exist.

For this proposal the underlying task is the same for all usecases: Grab a range of pages and remap them with different characteristics.