How kernel access to user memory is usually implemented?

In many OSes kernel can freely access memory of current user process. How it is implemented? Copying user page table to kernel page table on each process switch? That look inefficient. xv6-riscv don’t allow directly accessing user memory, instead of that it has functions that traverse user page table and copy physical memory from/to kernel. It will not slow down process switching, but make memory copy slow.

I currently plan to use xv6-riscv approach to implement user_memcpy/memset/strlcpy.

1 Like

That seems to be the right thing to do.

On x86, the user address space remains mapped while the kernel is running. Addresses above 80000000 are for the kernel, and addresses below are for userland. Userspace memory remains mapped while executing syscalls. (for 32bit, the idea is the same for 64bit, splitting the address space in two)

However this creates a lot of security problems. Applications could use this to indirectly write or read parts of kernel memory. So now there is SMAP/SMEP which allows to better differenciate the two address spaces.

On sparc, it isn’t implemented yet, but the cpu has specific instructions to access memory in different address spaces, and my plan is to use that to implement user_memcpy

I finally understand how it usually working (I hope). It is possible to use same child page tables in multiple address spaces. So kernel memory can be accessed from userland page table by preallocating kernel child page tables (level 1 in RISC-V sv39 paging) and assigning it to all use root pages tables.

Why that detail is not mentioned in osdev paging tutorials so reinventing the wheel is needed?

Implementation is simple:

static void PreallocKernelRange(kernel_args *args)
{
	Pte *root = PteFromPhysAdr(sPageTable);
	// NOTE: adjust range if changing kernel address space range
	for (int i = 0; i < 256; i++) {
		Pte *pte = &root[i];
		pte->ppn = AllocPhysPage(args);
		if (pte->ppn == 0) panic("can't alloc early physical page");
		memset(PteFromPhysAdr(pageSize*pte->ppn), 0, pageSize);
		pte->flags |= (1 << pteValid);
	}
}

...

		if (!fIsKernel) {
			// Map kernel address space into user address space. Preallocated kernel level-2 PTEs are reused.
			VMAddressSpacePutter kernelSpace(VMAddressSpace::GetKernel());
			Pte *kernelPageTable = (Pte*)VirtFromPhys(((RISCV64VMTranslationMap*)kernelSpace->TranslationMap())->PageTable());
			Pte *userPageTable = (Pte*)VirtFromPhys(fPageTable);
			// NOTE: adjust range if changing kernel address space range
			for (int i = 0; i < 256; i++) {
				Pte *pte = &userPageTable[i];
				pte->ppn = kernelPageTable[i].ppn;
				pte->flags |= (1 << pteValid);
			}
		}

...

	// enable supervisor user page access
	{
		SstatusReg sstatus;
		sstatus.val = Sstatus();
		sstatus.sum = 1;
		SetSstatus(sstatus.val);
		sstatus.val = Sstatus();
		dprintf("sstatus.sum: %d\n", (int)sstatus.sum);
	}

3 Likes

osdev is a wiki, you can edit it if you think it should be added there :slight_smile:

1 Like