About "Implement PCID" for GSoC

FullPlus · March 22, 2019, 3:59pm

Hi, I am interested in the idea “Implement support for Process Context Identifiers (PCID)” for GSoC 2019. Is it still available yet? I’m still trying to figure out more details.

The actual requirement seems more than just implementing PCID feature, because the description mentions Meltdown attack. Do we need to implement dual page table solution like linux, then apply PCID to the “PTI-like” solution to improve performance? Or, do we only need to partition current haiku applications into 4096 or 2048 PCID group? I guess the latter option sounds a bit easier than page table isolation, but less scalable.

I haven’t played with PCID before, but I roughly look at the context switching code in linux and address space in haiku. I think I need some help to see if I’m ready for the project.

Please correct me if I misunderstand anything.

Thanks.

cb88 · March 23, 2019, 4:21am

You should contact the developers via the mailing list. You’re messages may get missed here… almost all development communication is on the mailing lists.

And of course gerrit once you have code to submit… alot of discussion goes on there you probably want to go ahead and get registered on review.haiku-os.org

Going by the schedule you still have time and I haven’t seen anyone else discuss this project.
https://developers.google.com/open-source/gsoc/timeline

waddlesplash · March 23, 2019, 2:30pm

Yes, this is the way to go. Other architectures like SPARC use the dual-address-space system by default; so we will need to implement it no matter what, and then PCID will be an optimization on that.

cb88 · March 23, 2019, 5:46pm

So you mean as in https://www.gaisler.com/doc/sparcv8.pdf page 48.

ASIs:
“A normal load/store instruction provides an ASI of either 0x0A or 0x0B for the data access, depending on whether the processor is in user or supervisor mode.However, privileged load from alternate space instructions and privileged store into alternate space instructions supply explicit address space identifiers, from the asi field in the instructions.”

But isn’t that within a single software TLB with just protection bits not a dual TLB as used for the meltdown fixes? Perhaps I just missed a concept there.

waddlesplash · March 23, 2019, 6:44pm

The key is in the last part of what you noted: the privileged instructions specify the address space that is being loaded from directly. It seems that most OSes do a full segmentation, i.e. user and kernel are not in the same address space at all, according to @PulkoMandy.

PCID is pretty much the same thing: specifying the address space to operate on when performing operations, so that one does not have to do a context switch in order to read/write to it.

cb88 · March 23, 2019, 7:09pm

Yes I guess that makes sense.

I was reading some commentary on this on stackoverflow and, apparently there are only 4096 potential PCID values, and once you run out you start having to recycle them etc… https://stackoverflow.com/questions/20155304/does-linux-use-x86-cpus-pcid-feature-for-tlb-if-not-why

That said, I could seen hundreds of PCIDs being in use on Haiku, but it might be awhile before encroaching on thousands.

So it seems it allows to do partial TLB flushing in the way Linux implements it.

This answer is interesing as it explains that it isn’t per process but per CPU cache.

Regardless of Intel’s issues the whole reason behind PCID is apparently a non issue on AMD hardware and PCID isn’t supported there.

FullPlus · March 24, 2019, 4:37pm

Thanks for pointing out. It seems an implementation similar to KPTI in linux is probably feasible?

FullPlus · March 24, 2019, 4:43pm

Thanks for the link. I’m currently not quite sure about the per cpu dual page table. If there is only one kernel page table, how does the kernel handle things like copy_from_user/copy_to_user? Because different process obviously have different user-space mapping. I find a paragraph at

https://www.kernel.org/doc/Documentation/x86/pti.txt

When PTI is enabled, the kernel manages two sets of page tables.
The first set is very similar to the single set which is present in
kernels without PTI. This includes a complete mapping of userspace
that the kernel can use for things like copy_to_user().

But it does not mention the detailed mapping rule of userspace.

FullPlus · March 24, 2019, 4:55pm

I just found the authors’ slide in blackhat

It actually uses 2 PGD. But in this way, it still looks like per-process dual page table? I need further searching about the PCI per CPU.

waddlesplash · March 24, 2019, 8:14pm

The expectation is that pointers to userspace addresses will still be in the user address space of the process that invoked the syscall.

FullPlus · March 25, 2019, 4:48am

So if I understand it correctly, there is indeed “dual page table” for each process, but the kernel part in all processes still share the same full-kernel space, and the user part vary from different process. Upon creation of process, we prepare 2 PGD, one with full-kernel mapping and the other with little mapping. In each user->kernel/kernel->user switch, we need to map/unmap the kernel mappings by changing PGD.

cb88 · March 25, 2019, 5:57am

No just dual page table not per process.

One set of mappings includes the entire kernel and userspace, and the userpace mappings only include minimal non executable mappings in the kernel for syscalls and interrupts and exceptions + userpace. https://en.wikipedia.org/wiki/Kernel_page-table_isolation and exactly as described here as was linked to previously https://www.kernel.org/doc/Documentation/x86/pti.txt

PCID could be per processor or even per process, though per process means that when a different process runs on the same CPU it can’t help you avoid a TLB flush and it doesn’t scale as well. You only have 4096 PCID encodings anyway on x86 which Linux runs up against in large systems, so PCID is encoded per processor. https://lwn.net/Articles/738975/

Note the PCID is a tag within the TLB at the hardware level that helps to avoid complete TLB flushes. Basically you end up with different contexts denoted by the PCID but within the same page table. Ands several processes on Linux at least share that PCID per processor. SPARC again has some specific support for this in recent processors https://lwn.net/Articles/718204/

See page 1056 for information about the INVPCID instruction. INVLPG is also related.

Note we should probably be able to turn this off automatically and go back to single page table mappings for CPUs that don’t have this problem (AMD and future Intel CPUs), as it is a significant performance hit in many cases, especially since we only have a few people looking at this and optimizing it even if we are being optimistic about it. I believe Linux may already be maintaining a list of affected CPUs.

Sorry for being a bit daft on this earlier waddlesplash it can take me a bit to remember stuff sometimes hopefully this is more helpful than confusing at least, it points you toward some interesting reading material

FullPlus · March 25, 2019, 1:00pm

You are right, thanks for clarifying it. I just looked at some code in the SWITCH_TO_KERNEL_CR3 and SWITCH_TO_USER_CR3_STACK, and I think I understand it now. KPTI adds an extra 4k page for each CR3, but PCID is another story. KPTI would work even without PCID.

As for PCID, it is stored in the tlb_state structure, which is per cpu. Linux allows 6 PCID per cpu by default, and pcid is allocated dynamically in choose_new_asid function. If there are >= 6 PCID, it needs a flush by setting the need_flush variable.

As for the PCID for kernel and user, linux splits it into 2 parts, [0, 2047] for kernel and [2048, 4095] for user. The entering to and exiting from kernel space would need a change in bit 11 of CR3 (by macro PTI_USER_PCID_MASK) to make the PCID for kernel different with the PCID for user space.

Although there are still details I don’t cover yet, it seems to me now that PCID serves for 2 purpose in linux:

Acceleration when there is locality. For example, if there are <= 6 tasks switching to each other frequently, the PCID solution would greatly reduce the TLB flushing
Isolation between kernel tasks and user tasks (but every PCID is allocated dynamically in kernel with choose_new_asid).

Please correct me if I misunderstand anything.

FullPlus · April 1, 2019, 2:25am

Hi,

I’m planning to submit a patch to gerrit recently to get a chance for gsoc. The student information page says I should submit a patch that relates to my project, but how could I submit a related patch if I haven’t started coding yet? There is no related ongoing discussion on PCID yet.

Does it mean I should express my initial plan and send it to the mailing list without code? Or should I just pick up an easy task and solve it, even if it does not relate to my project?

Thanks

PulkoMandy · April 1, 2019, 8:49am

We have added the patch requirement to make sure you have basic understanding of git, Gerrit, C++, and if possible, the general area you intend to work on.

If you apply to work on PCID, but submit patches showing only user interface changes, you will have to make up for that with a very detailed proposal showing your understanding of PCID and kernel-side development, virtual memory, etc. The more your patch is related to what you will work on, the less you have to convince us that you have the required knowledge.

As the application period is quite short now, maybe start with a small patch to at least have the git/gerrit/C++ part covered. We will keep watching for your contributions after the application period is closed, if you start working on something larger. Submitting WIP patches is also fine, if you start working on something that needs a lot of changes, more than you can complete during the application period.

FullPlus · April 1, 2019, 4:17pm

Thanks. Can I submit patches to explain my proposal with just very initial code snippets, e.g., the code might not even pass the compilation or does not work even if it passes?

PulkoMandy · April 1, 2019, 5:45pm

Yes, on Gerrit you can easily update your changes. Just mark them as “WIP” in the commit message to make it clear you are not done. You can update them as you progress.