Linux does support mixed page sizes (that's how huge pages work) and the page si...

marcan_42 · on March 19, 2022

That's not the same thing. Huge pages are in addition to standard pages. Being in 16K mode means no 4K pages. It's a global switch. Linux can't handle that, the baseline page size is set at compile time and affects a ton of macros and constants used throughout the kernel.

Userspace breaks on 16K pages when it tries to do things like call mmap() with virtual addresses that aren't aligned to 16K. Usually it's allocators doing this when they think the entire world is 4K.

zozbot234 · on March 19, 2022

If page size can be chosen per process (such as 4K pages for Rosetta apps) that's not unlike the existing hugepages. Are we sure that it cannot be easily adapted to support both 4k and 16k processes?

marcan_42 · on March 19, 2022

It is completely different from hugepages. Hugepages exist alongside normal pages.

zozbot234 · on March 19, 2022

> Hugepages exist alongside normal pages.

Don't 16K pages "exist alongside" 4k pages too, just not within the same virtual address space or (in Linux) VMA? How else are Rosetta apps supposed to work in Mac OS?

marcan_42 · on March 19, 2022

That's the issue: 16K pages never exist alongside 4K pages in the same address space half. The CPU has two mode flags, for kernel and userspace respectively. There is no way to mix modes within the same address space half. And changing the page size completely changes the page table structure and boundaries for different walk levels, the huge page size, etc.

Rosetta runs the userspace half in 4K mode, and XNU had to be reworked a lot to support this. Linux could of course be reworked to do something similar on paper, but it's a hugely intrusive change and it'd actually be easier to just make the kernel support 4K/16K pages in a single build first.

Hugepages aren't like that, they actually coexist with normal pages. In general, hugepages are just a pile of contiguous/aligned small pages that the kernel manages as a unit, and it flags them to tell the MMU "I promise these are all one big contiguous chunk so you can optimize it to one larger TLB entry". Depending on the page table structure they might be coalesced to a higher-level page table entry, skipping a page table walk level.

zozbot234 · on March 19, 2022

> And changing the page size completely changes the page table structure and boundaries for different walk levels, the huge page size, etc.

This looks like the real issue, so adding support for both "4k" and "16K" address spaces would involve support for multiple page table structures within a single kernel? Still seems very much worth doing since it can likely be extended to support e.g. 64K. And maybe other architectures could reuse that support depending on how their hardware support for multiple page sizes works, e.g. https://en.wikipedia.org/wiki/Page_(computer_memory)#Multipl...

marcan_42 · on March 19, 2022

That's implied in how page tables work. If your pages are 16K then all your page levels are going to shift up two bits compared to 4K. Again, these aren't 16K "huge pages", that'd be nice. This is changing the baseline page size.

Lots of things in the kernel count sizes in pages. If your page size can vary, suddenly a lot of kernel constants become boot-time variables. And if it can vary from process to process, suddenly lots of things are per-process. Say you run a 4K process. It wants to map some data from a file. That data is in the page cache in 16K chunks. Now you have one page cache page mapped to anywhere from 1 to 4 4K pages. How do you keep track of that? That wasn't necessary before.

What happens if a 16K process shares memory with a 4K process? If the 4K process sends the 16K process a 4K page, that page can't be mapped at all.

See how this is makes everything much more complicated?

zozbot234 · on March 19, 2022

> See how this is makes everything much more complicated?

Has this stuff been discussed elsewhere so far, e.g. on some linux kernel dev list? I think you've made a good case for not trying to support per-process page size right away, but many of these issues are not entirely new; they came up in some form as part of the transparent-huge-pages feature. It turns out that some hardware support already requires the kernel to understand "higher-order" mappings of contiguous physical pages, and "transparent huge pages" could leverage that support.

Sirened · on March 19, 2022

From what I've seen just bumping into some of the devs on twitter, many larger software packages (i.e. Chromium) used a hardcoded pagesize. AFAIK, Asahi doesn't actually support mixed pages—just 16k—due to some hardware quirks on the M1 platform, and so running in 4K compat mode wouldn't even help. This is obviously problematic if you're trying to enforce memory permissions on 4k boundaries since you can't simply pierce the huge page like you can in THP since there is no smaller granule to fallback to.

58028641 · on March 19, 2022

> Linux can’t really mix page sizes like that and likely never will be able to https://asahilinux.org/2021/10/progress-report-september-202...