Saturday, September 22, 2012

Thread termination/exit handlers

I'd been hunting for a while for a good solution to automatically call cleanup operations on exit from threads I didn't own.  Windows (XP onward) has a pretty straightforward solution if you're working with a DLL, but for pthreads on most other platforms, the solution is not as obvious.

Some folks have been using JNA to allow Java code to be invoked as callbacks from various streaming (video, sound) libraries.  If these callbacks come from threads instantiated from native code, the JVM has to (at least temporarily) map the thread into Java space for the duration of the callback.  If the callbacks are frequent, and always come in on the same native thread, we don't want to incur the mapping overhead on every invocation.  The solution, then is to avoid detaching the thread when the callback finishes.

Now we have a new problem.  When the native thread actually terminates, the JVM has no idea that the thread went away because it's only really got a placeholder Thread object, and hasn't hooked up all the plumbing it normally does to detect that the thread has gone away and clean up/GC the mapped Thread object.  Thus the need for a thread exit handler.  In some cases, you may not care about the minimal object overhead, for instance if you've just got a few native threads.  However, with a lot of threads coming and going (we can't force folks to thread-pool).

Windows


On Windows, if you have a DllMain function defined, you'll get notices when threads and processes attach/detach your DLL's code.  This works out nicely, we can make the VM detach the current thread when we get that message:

// Store thread-local information required to detach the thread
// (in my case, only a JVM reference was required)
// TlsSetValue is only set when we recognize that we need the 
// extra cleanup
static DWORD dwTlsIndex;
BOOL WINAPI DllMain(HINSTANCE hDLL, 
                    DWORD fdwReason, 
                    LPVOID lpvReserved) {
  switch (fdwReason) {
  case DLL_PROCESS_ATTACH:
    dwTlsIndex = TlsAlloc();
    if (dwTlsIndex == TLS_OUT_OF_INDEXES) {
      return FALSE;
    }
    break;
  case DLL_PROCESS_DETACH:
    TlsFree(dwTlsIndex);
    break;
  case DLL_THREAD_ATTACH:
    break;
  case DLL_THREAD_DETACH: {
    extern void detach_thread();
    detach_thread(TlsGetValue(dwTlsIndex));
    break;
  }
  default:
    break;
  }
  return TRUE;
}
 Easy enough.  Note that TlsSetValue is called elsewhere only when the callback decides it doesn't want to detach immediately (rather than every time a native thread attaches).  Callbacks normally detach on exit if they weren't attached to begin with.

POSIX Threads (pthreads)

The pthreads library is a different beast.  Searches for "thread termination handler" don't turn up much, except for the stack-based pthread_cancel_push/pop, which seems to do the right thing, but have to arrive in pairs.  It's actually in the pthreads implementation of thread-local storage that we find a way to attach a termination handler to a given thread.

When you define a given thread-local variable in pthreads (a "key" in pthreads lingo), you can provide a destructor function to be used to clean up the storage...on thread exit.  It was a little hard to find since it's not a thread termination handler per se, but rather a mechanism to clean up thread local storage.  I'd overlooked it several times because I wasn't looking for thread local storage solutions.


// Basic plumbing to create a unique key identifying
// specific thread-local storage
static pthread_key_t key;
static void make_key() {
  extern void detach_thread();
  pthread_key_create(&key, detach_thread);
}

    ...
    // This code gets called whenever we identify that
    // a thread needs detach-on-exit behavior
    static pthread_once_t key_once = PTHREAD_ONCE_INIT;
    pthread_once(&key_once, make_key);
    if (!jvm || pthread_getspecific(key) == NULL) {
      pthread_setspecific(key, jvm);
    }
    ...

Now detach_thread will be called (with the value of the thread-local storage as its single argument) when the thread exits.

And voilà, you have a POSIX thread termination handler.

5 comments:

Daniel said...

I think you were looking for pthread_cleanup_push() :-)

Val said...

Thanks for the post, this seems to be exactly the issue I'm dealing with. Thankful to find a clue for both platforms in one place!

technomage said...

@Daniel You might think that, but pthread_cleanup_push() *must* be paired with a corresponding pthread_cleanup_pop(). I have no way of invoking pthread_cleanup_pop(); obtaining process control at the point of thread exit is what I'm looking for, and calling pthread_cleanup_pop() assumes I already have control.

Jeff Solomon said...

Thanks for the post! You must have had the exact same situation as I did. pthread_key_create() really saved us and your post showed me the way! Since we're writing C++ code, I can simplify your code a tiny bit, by using std::call_once() instead of pthread_once since the former accepts a lambda where you can place the call to pthread_key_create():

static pthread_key_t key;
static std::once_flag flag;

std::call_once(flag, []() {
extern void detach_thread();
pthread_key_create(key, detach_thread);
});

if (!jvm || pthread_getspecific(key) == NULL) {
pthread_setspecific(key, jvm);
}

No need for your make_key function and now all statics can be function local.

Thanks again!

Jeff Solomon said...

Thanks for the post! It really saved us. We have the exact same situation where we have a JVM creating threads and we want to track their exit on the native side and our code runs on Windows and Linux.

I was able to use std::call_once instead of pthread_once and since former accepts a lambda, you can eliminate the need for your make_key function and I could make all statics function local.

static pthread_key_t key;
static std::once_flag flag;

std::call_once(flag, []() {
extern void detach_thread();
pthread_key_create(&key, detach_thread);
});

if (!jvm || pthread_getspecific(key) == NULL) {
pthread_setspecific(key, jvm);
}