I finally got around to starting to test the 64KB page size for SH-4 and SH-4A pages, when I ran in to some rather annoying behaviour. We currently use THREAD_SIZE in a couple of places, namely where we switch from the kernel to user stack, and for fetching the current thread info on nommu. This used to be open-coded for 8k stacks, but got a bit of an overhaul when 4k stacks got introduced. Now we effectively have something like:
mov #(THREAD_SIZE >> 8), reg
as we're constrained by the ability to do a large immediate load, and simply having the pre-processor shift the constant and then shift it back via shll8 is still far faster than a memory lookup. This worked fine for 4k and 8k pages, but we manage to overrun the immediate size by 1 bit using 64k pages.
Unfortunately we're somewhat constrained by the instruction set, as shll8 and friends exist pretty much across the board, the variants that shift by a loaded immediate are restricted to later CPUs, which is rather unfortunate, as PAGE_SHIFT would save a lot of trouble here. We also have a 20-bit capable immediate load, but that's likewise constrained to later CPUs.
The only portable solution where we can still save the memory access is to just shift it down another 2 bits and then pack in an shll2 to get back to the full size, so we just end up with:
mov #(THREAD_SIZE >> 10), reg
but this is not a very appealing solution, and it wastes a cycle for something that's effectively a corner case. A dynamic shift would still cost us the cycle, but would at least provide some future proofing. On the other hand, the likelihood of someone adopting a system page size larger than what we can address as an immediate when shifted down 10 bits is quite low. We can still expand on this model with one more shll2 if it does become a problem, though the most we can shift down THREAD_SIZE is 12 bits, which happens to equate PAGE_SHIFT for 4k pages. After this we're effectively screwed.
And it looks like I need to revisit the PTRS_PER_PGD math again too. Grumble.