Userspace Threads for SBCL -- a short discussion
(This is in response to a wish from an IndieGoGo funder.)
First off a disclaimer: I'm not really into green threads / fibers distinction, so I'm just going to be rambling about userspace threads in general. I'm also making the assumption that having the option of userspace threads in addition to native threads would be a good thing, and not spending time on ramifications of that.
I'm also not working, or planning to work on this area in near future. Consider this a vague roadmap for those wanting to look into doing this.
Are Some Threads More Equal Than Others?
How are userspace threads distinct from native threads?
(subtypep 'userspace-thread 'thread)
hold? Is there going to even be a distinct userspace thread type?
Because semantically userspace threads should mostly be indistinguishable from native threads (ie. dynamic binding works the same way, locks work the same way, etc), I think they should indeed be just like threads except for the "who is responsible for scheduling" bit.
So I'm thinking all lisp threads are really going to be userspace threads, and it's just that some of them have a dedicated OS thread from the start.
Let's say MAKE-THREAD grows an argument :RUN, which defaults to true. If it's NIL you get an inert suspended thread object that won't run until someone yields to it.
So from lisp land all threads are going to look identical -- but with the new distinction that some lisp threads may be in suspended state, not being currently run by any OS thread.
Lies, Statistics, and Schedules
How does scheduling work?
Do users of userspace threads need explicit scheduling, or does the system eg. automatically schedule them on blocking IO?
I think both have merits, but automatic scheduling on blocking IO is really explicit scheduling under the hood, so let's consider that only.
We already have #'THREAD-YIELD. Let's just add an optional argument to it. If that argument is a thread that isn't currently running, we yield to it -- otherwise we will just consider it a hint that our processor resources could be better spent elsewhere now. (Or maybe "yield execution context" and "yield processor time" should not be mixed up? Hm.)
It's possible that we may also want SUSPEND-THREAD, which dissociates a lisp thread from the underlying OS thread, and RUN-THREAD which starts a new OS thread in order to run a suspended lisp thread, but I'll ignore them for now.
One thing that this does mean, which the current system may have conflicting assumptions about, is that the OS thread associated with a single lisp thread may change over it's lifetime. This needs to be checked. (Or indeed that a thread has an OS thread associated with it!)
We're also going to need critical sections during which the scheduling status of the thread (ie, if it's currently running == associated with an OS thread) cannot be changed. Not sure if WITHOUT-INTERRUPTS should subsume this, or if it has to be distinct from it.
Enough About Design, Let's Do This!
So, what does a thread need? A stack and a context, pretty much.
I'll make the wild assumption that we're on a platform with fully functional swapcontext(3). I have sometimes heard whispered that those API's aren't all that great, so it's possible that we may need to implement them in asm on our own -- but I haven't ever used them personally, so I don't really claim to know.
If that is how we're going to be switching from one userspace thread to another, how do we make it play nice with the rest of SBCL?
Let's start by taking a look at MAKE-THREAD. At first blush at least it looks to me like the only thing that really needs to be different for userspace threads is the call to %CREATE-THREAD, which currently ends up doing the following:
Creates the C-side thread struct, which contains the stack(s) and thread-local bindings, and a bunch of other stuff. What it doesn't currently have is space for everything swapcontext(3) needs, so we'll need to add that.
Creates the OS thread, including all the signal handling setup, etc.
Definitely prime reading ground for anyone looking to add userspace threads to SBCL: the stuff that needs to happen when we switch to a new thread is going to look a lot like create_os_thread and new_thread_trampoline.
This gets factored into RUN-THREAD and THREAD-YIELD, pretty much (or at least the C-code those will end up calling). Not rocket science, but a lot of details...
(Unless you're just skimming this, go ahead and at least skim the relevant parts of the code.)
The other end of thread lifetime is another obvious place to look in -- but mostly it comes down to undoing whatever was done when the thread was created. This raises a hairy design question, though: do OS threads die when the current thread associated with them dies? I don't know. I suspect this points to problem in my overall design, but possibly it is a simple policy question.
The final place that needs attention is GC: it needs to be able to find all C-side thread structs in order to scavenge their stacks, and it needs to know how to scavenge the contexts of suspended threads as well -- not rocket science, again, but details.
Is this all? Probably not!
I'm pretty sure signal handling needs some very careful consideration -- but if WITHOUT-INTERRUPTS also means "without userspace thread state changes", then possibly current code is a decent match.
I think the easiest way to find out what is missing, however, is to start working towards an implementation.
The biggest issue with this sketch in my mind is the question of thread death mentioned above. The easiest way to solve it (not necessarily the best!) would be to say that each OS thread does indeed die when the currently executing lisp thread dies. The second easiest would be to have something like QUEUE-THREAD, which would mean that when the next lisp thread dies, the queued one should receive the OS thread instead of it going the way of the dodo.
...and now I'm out of time, and this still needs proofreading. Hopefully this inspires someone to do somethign awesome. :)
Addendum: locking really needs thinking about. Suspending a thread that holds locks is not going to end well, or yielding while holding locks. Not sure if the locking API should be clever about this, or if this can all be punted to the users.