lethal is currently certified at Master level.

Name: Paul Mundt
Member since: 2000-11-19 06:02:53
Last Login: 2008-06-27 00:31:51

FOAF RDF Share This

Homepage: http://www.linux-sh.org/~lethal


Just another kernel hacker. Currently hacking the kernel for Renesas (mostly VM, DSP, and filesystem stuff), and maintaining sh/sh64 architectures.


Recent blog entries by lethal

Syndication: RSS 2.0
11 Mar 2007 (updated 11 Mar 2007 at 12:41 UTC) »

Notes on SH interrupt/exception dispatch path

Some people are seemingly confused about the semantic changes that have happened for the unified exception dispatch path in the SH-3/4 code, so it's probably worth reiterating what changed, and why you don't want to touch the interrupt exception tables.

Most of the exceptions (especially general exceptions and interrupt exceptions) are immediately bounced through handle_exception once the exception code has been appropriately stashed in r2, with the return path sitting in r3. Traditionally this has included the EXPEVT value in the general exception case, and INTEVT for the interrupt exceptions, which could then be used for calculating the offset in to a flat jump-call table (exception_handling_table). This worked well for general exceptions, but rather less so for interrupt exceptions. In the IRQ case we ended up with many CPU subtypes with very sparse IRQ maps, that would only be interested in selectively enabling do_IRQ() dispatch for a handful of vectors. While this worked fine for a very small number of CPU subtypes, it very quickly got out of hand and turned in to a giant ifdef fiasco that was highly prone to off-by-one vector enabling and other ugly things.

In order to get rid of all of these accursed tables, the exception code read-in and the handle_exception dispatch needed a bit of rework. The regular case is that general exceptions exist first in the vector table, with the interrupt exceptions following afterwards. There are some minor corner cases where there is overlap, but those vectors can be overloaded by the CPUs that need special care.

Enter the interrupt exception marker. With the marker scheme, a simple marker is placed in r2 to signify a do_IRQ() fast-path while deferring the INTEVT read. This is then looked at when figuring out whether to take the r2 value as a jump-call table offset or whether to dispatch directly. There are some additional notes regarding this in the tail end of handle_exception for anyone that's too concerned.

The only pitfall with this scheme is that the vector tables have to be padded out so a fixed length in order to allow setting specific exception handlers that happen to reside far out in the table (as is the case with some of the FPU exceptions). Two new routines have been added for this purpose, set_exception_table_evt() and set_exception_table_vec(). The use of both of these is fairly obvious to anyone looking at the vector tables, so there's no point in reiterating it here.

In practice this hasn't worked out too badly:

4 files changed, 40 insertions(+), 721 deletions(-)

For additional reading, consult arch/sh/kernel/traps.c, arch/sh/kernel/cpu/sh3/entry.S and arch/sh/kernel/cpu/sh[34]/ex.S.

I finally got around to starting to test the 64KB page size for SH-4 and SH-4A pages, when I ran in to some rather annoying behaviour. We currently use THREAD_SIZE in a couple of places, namely where we switch from the kernel to user stack, and for fetching the current thread info on nommu. This used to be open-coded for 8k stacks, but got a bit of an overhaul when 4k stacks got introduced. Now we effectively have something like:

mov #(THREAD_SIZE >> 8), reg
shll8 reg

as we're constrained by the ability to do a large immediate load, and simply having the pre-processor shift the constant and then shift it back via shll8 is still far faster than a memory lookup. This worked fine for 4k and 8k pages, but we manage to overrun the immediate size by 1 bit using 64k pages.

Unfortunately we're somewhat constrained by the instruction set, as shll8 and friends exist pretty much across the board, the variants that shift by a loaded immediate are restricted to later CPUs, which is rather unfortunate, as PAGE_SHIFT would save a lot of trouble here. We also have a 20-bit capable immediate load, but that's likewise constrained to later CPUs.

The only portable solution where we can still save the memory access is to just shift it down another 2 bits and then pack in an shll2 to get back to the full size, so we just end up with:

mov #(THREAD_SIZE >> 10), reg
shll8 reg
shll2 reg

but this is not a very appealing solution, and it wastes a cycle for something that's effectively a corner case. A dynamic shift would still cost us the cycle, but would at least provide some future proofing. On the other hand, the likelihood of someone adopting a system page size larger than what we can address as an immediate when shifted down 10 bits is quite low. We can still expand on this model with one more shll2 if it does become a problem, though the most we can shift down THREAD_SIZE is 12 bits, which happens to equate PAGE_SHIFT for 4k pages. After this we're effectively screwed.

And it looks like I need to revisit the PTRS_PER_PGD math again too. Grumble.

770 Notes

pycage, while MPU-side decoding is the easiest way to go, DSP-side will still be beneficial (albeit somewhat more complicated). Whether the benefits are worth the effort is another matter. The tools that you need to roll your own codecs are available, and you can do this mostly in C without having to resort to too much tms320c55x assembly. The biggest issue is likely familiarizing yourself with the DSP kernel, the socket node interfaces, and so forth. Most of this is documented pretty well at the dspgateway page.

For the adventurous, there's still an unused mailbox line between MPU and DSP on 1710 in the current implementation that could probably be round-robin'ed pretty easily. We also presently don't make use of hardware page table walking, which makes the exmap interface a bit clunky (essentially wiring TLB entries by hand, but at least they're pre-faulted).

It would also be interesting to see how the FP-driven codecs compare to the integer-based one under EABI with a soft-float toolchain. ogg123 might even be usable out of the box with soft-float (though at likely higher than the CPU utilization numbers that have been quoted). On another note, it's also pretty easy to figure out DSP load average through the sysfs interface, so it may be worthwhile to profile some of that, especially if the DSP ends up getting more heavily loaded.

Haven't posted here in awhile. Work is keeping me busy. As is getting the kernel running on SH-2A on the MS7206SE01 board.

On the sh front, things have been progressing nicely with the new clock and timer frameworks. The timer stuff is still in need of being extended to more transparently deal with multiple timer channels, but this can wait until the timesource driver stuff on l-k sorts itself out. No use redoing the timer stuff twice..

On another note, the cpufreq driver still needs to be reworked for the clock framework as well. This will still take a bit of doing, but in the end it should leave us with a single driver capable of dynamic scaling on every CPU subtype that hooks in to the clock framework (this will go on the TODO list for now).

With sh64, things have also been pretty quiet. Ran in to some fairly consistent slab corruption that seems to have only popped up in recent kernels, suppose its time to dig out the redzoning for non-BYTES_PER_WORD minaligned architectures patch and get slab debugging working again. Unfortunately the UW SCSI drives I was using that managed to trigger this on my Cayman both ended up killing themselves. Lets see how far we get with onboard IDE.. judging by the schematics, at least PIO was wired right, and should mostly work (DMA on the other hand..). Some of the GPIO configuration in the SuperIO is probably still off (since much of that was borrowed from microdev), so it seems there will be more than one thing to debug..

And just to show how often I actually log in to this thing, I seem to have had this following paragraph started, which was amusingly retained (from some time in 2003):

More uClibc hacking today and the last couple of days. Started working on the shared loader backend for sh64, which is now at the point where most of the work is done, but now there's just a lot of debugging and testing left. At least some good has come out of it so far, it turned out that the R_SH_IMM_MEDLOW16 relocation was broken in multiple ways in glibc, so I ended up fixing that while writing up the relocation handling code for uClibc. Regardless, the uClibc stuff is in pretty good shape now, so the next logical step is to start tinkering with buildroot and friends, though that will still have to wait till after some more debugging time.

The ironic thing is that years later, the sh64 ldso stuff needs to be fixed again due to some ABI changes, though I have so far been successfully putting it off. ldso is vindictive ;-)

Disclaimer: As nothing really interesting has been happening lately, be forewarned that this entry will be somewhat dry and generally boring, even if for some reason you _are_ interested in the state of Linux/SH-2 support.

Lots of SH-2 hacking lately, quite exhausting, though still quite fun. The VBR semantics are completely different in relation to the SH-3/4, so this buys entry.S a much needed overhaul. Unfortunately this also required some changes in semantics, at least on the SH-2 side for the general-purpose exception handling code -- though this is all quite hacky already, especially given the number of different registers and register names, etc.

Another minor nuisance, gcc sanely labels things like saving off ssr as an SH-3 and up instruction, but binutils subsequently defaults to accepting virtually anything as valid. binutils CVS now seems to properly support a processor family flag that clearly defines this, so that should be dealt with relatively well once I get finished hacking that.. this will be an interesting contrast to gcc flags by ABI level, so hopefully that will all work out cleanly. Between that and the latest -fno-zero-initialized-in-bss mess with 3.3, I definitely hope we won't need more stupid gcc/binutils version specific checks for the kernel build, as these are already starting to add up..

Additionally, the fixed references to arch/foo/kernel/vmlinux.lds.S in the top-level kernel Makefile are truly annoying. This now forces anyone who wants to use multiple ld scripts to either make a wrapper script with ifdef abuse, or do gross symlinking hacks at build time. This is certainly a disappointing step back in comparison to the 2.4 behavior..

Back to the SH-2 issue, it should be a lot easier to identify what still needs to be done (other then things like the system call interface, which still needs cleanup for things like TRA referencing, INTEVT/EXPEVT stuff I just finished) once the aforementioned binutils issues are out of the way. It's quite bothersome to identify problem spots when the assembler will knowingly accept accesses to things like different register banks and ssr/spc, etc. even when these don't actually exist on the SH-2.. though I'm sure there will be quite a few. At least now with the exception vector, early SCI console, XIP, etc. out of the way, we should be set to actually start debugging on live silicon.. Now the only other trick is getting the page_alloc2 stuff updated and merged, and getting the overly pesky inode and dentry cache hast tables reduced in size -- there's not a whole lot of room when you've only got 512KiB of RAM to work with..

Also got some 7760 IPR patches sitting in my home directory, this is pretty much the last remaining portion of the 7760 backend that needs to go in (I did the exception vector / sh-sci / etc. stuff previously). So this is definitely good news, even though it reminds me that I still need to get the 7040/7044/7045/etc. stuff figured out and written..

Lastly, also got some uClibc hacking done. Some relatively uneventful sh64 syscall updates to satisfy current busybox, etc. Just finished off the pthreads work, so now we should be good to go for static pthreads.. that still leaves the ldso work, but that can wait for another day (particularly as it's rather mind numbing). After that, we should be able to start doing sh64 builds under buildroot, should be fun.

9 older entries...


lethal certified others as follows:

  • lethal certified dwmw2 as Master
  • lethal certified riel as Master
  • lethal certified alan as Master
  • lethal certified kira as Journeyer
  • lethal certified drs as Apprentice
  • lethal certified lethal as Master
  • lethal certified andersee as Master
  • lethal certified mrbrown as Master
  • lethal certified zx80user as Apprentice
  • lethal certified riscgrl as Apprentice
  • lethal certified banram as Apprentice
  • lethal certified phro as Apprentice
  • lethal certified grey as Apprentice
  • lethal certified mjd as Journeyer
  • lethal certified tgall as Journeyer
  • lethal certified sjhill as Journeyer
  • lethal certified dank as Journeyer
  • lethal certified ottawaDave as Apprentice
  • lethal certified Foxwolfen as Apprentice
  • lethal certified major as Master
  • lethal certified darthgnu as Apprentice
  • lethal certified mjstahl as Apprentice
  • lethal certified sambau as Journeyer
  • lethal certified ncunningham as Journeyer
  • lethal certified linuxata as Master
  • lethal certified komal as Journeyer
  • lethal certified wmat as Apprentice
  • lethal certified tglx as Master
  • lethal certified amatus as Journeyer

Others have certified lethal as follows:

  • drs certified lethal as Master
  • lethal certified lethal as Master
  • andersee certified lethal as Master
  • softkid certified lethal as Journeyer
  • mrbrown certified lethal as Master
  • zx80user certified lethal as Master
  • ottawaDave certified lethal as Master
  • mjstahl certified lethal as Master
  • darthgnu certified lethal as Journeyer
  • uriel certified lethal as Journeyer
  • wmat certified lethal as Master
  • amatus certified lethal as Master

[ Certification disabled because you're not logged in. ]

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page