The Rabbit Hole

Tuesday, January 21, 2025

"Cambridge Chronicled" complete

I realized I hadn't posted the completion of this project (about two years ago now), mostly b/c the social media has been managed elsewhere.

The book is available Million Year Picnic in Cambridge, MA or online at Radiator Comics.

Saturday, June 13, 2020

Other creative things

Been spending a lot of time on a comic (aka graphic novel memoir for those with upturned noses).

We offered a few chapters at the last MICE, but it's not yet available elsewhere.

It's a bit of humor about raising a kid in Cambridge, but it's also touches a lot of serious topics like racism, diversity, culture, and soul. It's solely intended to share our unique perspective(s).

It's heartening to see so many around the world joining in protest after four years of xenophobia and hate in the US. While it's sad that so little has changed since Rodney King (or Emmett Till, for that matter), I hope that more white folk (and by this I mean anyone who has questioned whether people of color are really treated any differently in this country) will take the opportunity to learn more.

Because ultimately, justice in society is everyone's responsibility, not just those with a boot on their neck.

Next time you go to read your favorite blog, pick up some James Baldwin or Ralph Ellison to read instead (you can probably get digital or audio versions of these from your local library for free).

Loading that pickle, what's going on?

tqdm is a handy little utility for showing how well some process is progressing (it works within python code, but you can also use it on any piped shell process).

Pickle is python's built-in serialization, which can take an awful long time on really large objects. Unfortunately, pickle's only input is an open file.

So here's a little useful snippet for estimating how long it's going to take to deserialize that big python pickle file.

class TQDMBytesReader(object):
    """Show progress while reading from a file"""
    def __init__(self, fd, **kwargs):
        self.fd = fd
        from tqdm import tqdm
        self.tqdm = tqdm(**kwargs)
    def read(self, size=-1):
        bytes = self.fd.read(size)
        self.tqdm.update(len(bytes))
        return bytes
    def readline(self):
        bytes = self.fd.readline()
        self.tqdm.update(len(bytes))
        return bytes
    def __enter__(self):
        self.tqdm.__enter__()
        return self
    def __exit__(self, *args, **kwargs):
        return self.tqdm.__exit__(*args, **kwargs)

with open(filename, "rb") as fd, \
     TQDMBytesReader(fd, desc=f"Loading 'pickle", unit="b",
                     total=os.path.getsize(filename) as reader:
    obj = pickle.load(reader)

Here's the complement, although it's not terribly helpful since pickle serializes everything to memory and only then writes to disk:

class TQDMBytesWriter(object):
    """Show progress while writing to a file"""
    def __init__(self, fd, **kwargs):
        self.fd = fd
        from tqdm import tqdm
        self.tqdm = tqdm(**kwargs)
    def write(self, b):
        bytes_written = self.fd.write(b)
        self.tqdm.update(bytes_written or 0)
        return bytes_written
    def __enter__(self):
        self.tqdm.__enter__()
        return self
    def __exit__(self, *args, **kwargs):
        return self.tqdm.__exit__(*args, **kwargs)

Friday, September 20, 2013

Potamus update

The GAE cost profiling graphs have gotten a facelift, now using flot instead of Google visualizations. I rapidly hit the limit of the GViz capabilities (one notable shortcoming is the lack of support for sparsely-populated data). Most of the controls are now completely client-side, which makes it a lot easier to tweak the graph to get just the information you'd like.

Flot generally provides more CSS-level control over styling, and a nice plugin system to allow for mixing features.

Sunday, June 30, 2013

App Engine real-time cost profiling is available on github. Some assembly required.

Wednesday, April 10, 2013

Cost profiling on Google App Engine

I've recently been measuring costs for various operations that are currently being performed on Google App Engine. Google provides some cost estimates on the app engine dashboard, and you can get historical daily totals, but it's generally not straightforward to answer the question "How much does this operation cost (or is going to cost if I ramp up)?".

The google-provided appstats is fine for profiling individual requests, but sometimes you need a much more comprehensive view.

With a Chrome extension to monitor the app engine dashboard numbers, and a small app engine app to collect data, I've managed to collect some interesting intra-day profile data, as well as provide a means for fairly accurate estimates of discrete operations.

Group view (for multiple app IDs). The artifact on the left is due to two days' worth of missing data. The lower graph has an obvious daily cron job, while the upper has much more distributed activity:

Zoomed view (detail for a single app ID). On this graph, you can see some annotations have been added; the data collector provides an API for applications to post events that can be overlaid on the cost data, making it easy to pick start and end points and calculating the cost for the selected span:

This project is now available on github. The Chrome extension is based on the OSE (Offline Statistics Estimator) which scrapes usage data and applies customizable usage rates from the GAE dashboard pages.

Wednesday, January 30, 2013

Enable cProfile in Google App Engine

If it's not readily apparent to you how to enable CPU profiling on Google App Engine (it certainly wasn't to me, aside from a few hand waves at cProfile), this code snippet should get you up and running so you can focus on finding the data you need rather than the implied interfaces you have to mimic. It uses the standard WSGI middleware hook to wrap an incoming request in a cProfile call, formatting and dumping the resulting stats to the log when the request returns:


def cprofile_wsgi_middleware(app):
    """
    Call this middleware hook to enable cProfile on each request.  Statistics are dumped to
    the log at the end of the request.
    :param app: WSGI app object
    :return: WSGI middleware wrapper
    """
    def _cprofile_wsgi_wrapper(environ, start_response):
        import cProfile, cStringIO, pstats
        profile = cProfile.Profile()
        try:
            return profile.runcall(app, environ, start_response)
        finally:
            stream = cStringIO.StringIO()
            stats = pstats.Stats(profile, stream=stream)
            stats.strip_dirs().sort_stats('cumulative', 'time', 'calls').print_stats(25)
            logging.info('cProfile data:\n%s', stream.getvalue())
    return _cprofile_wsgi_wrapper

def webapp_add_wsgi_middleware(app):
    return cprofile_wsgi_middleware(app)

Saturday, September 22, 2012

Thread termination/exit handlers

I'd been hunting for a while for a good solution to automatically call cleanup operations on exit from threads I didn't own. Windows (XP onward) has a pretty straightforward solution if you're working with a DLL, but for pthreads on most other platforms, the solution is not as obvious.

Some folks have been using JNA to allow Java code to be invoked as callbacks from various streaming (video, sound) libraries. If these callbacks come from threads instantiated from native code, the JVM has to (at least temporarily) map the thread into Java space for the duration of the callback. If the callbacks are frequent, and always come in on the same native thread, we don't want to incur the mapping overhead on every invocation. The solution, then is to avoid detaching the thread when the callback finishes.

Now we have a new problem. When the native thread actually terminates, the JVM has no idea that the thread went away because it's only really got a placeholder Thread object, and hasn't hooked up all the plumbing it normally does to detect that the thread has gone away and clean up/GC the mapped Thread object. Thus the need for a thread exit handler. In some cases, you may not care about the minimal object overhead, for instance if you've just got a few native threads. However, with a lot of threads coming and going (we can't force folks to thread-pool).

Windows

On Windows, if you have a DllMain function defined, you'll get notices when threads and processes attach/detach your DLL's code. This works out nicely, we can make the VM detach the current thread when we get that message:


// Store thread-local information required to detach the thread
// (in my case, only a JVM reference was required)
// TlsSetValue is only set when we recognize that we need the 
// extra cleanup
static DWORD dwTlsIndex;
BOOL WINAPI DllMain(HINSTANCE hDLL, 
                    DWORD fdwReason, 
                    LPVOID lpvReserved) {
  switch (fdwReason) {
  case DLL_PROCESS_ATTACH:
    dwTlsIndex = TlsAlloc();
    if (dwTlsIndex == TLS_OUT_OF_INDEXES) {
      return FALSE;
    }
    break;
  case DLL_PROCESS_DETACH:
    TlsFree(dwTlsIndex);
    break;
  case DLL_THREAD_ATTACH:
    break;
  case DLL_THREAD_DETACH: {
    extern void detach_thread();
    detach_thread(TlsGetValue(dwTlsIndex));
    break;
  }
  default:
    break;
  }
  return TRUE;
}

Easy enough. Note that TlsSetValue is called elsewhere only when the callback decides it doesn't want to detach immediately (rather than every time a native thread attaches). Callbacks normally detach on exit if they weren't attached to begin with.

POSIX Threads (pthreads)

The pthreads library is a different beast. Searches for "thread termination handler" don't turn up much, except for the stack-based pthread_cancel_push/pop, which seems to do the right thing, but have to arrive in pairs. It's actually in the pthreads implementation of thread-local storage that we find a way to attach a termination handler to a given thread.

When you define a given thread-local variable in pthreads (a "key" in pthreads lingo), you can provide a destructor function to be used to clean up the storage...on thread exit. It was a little hard to find since it's not a thread termination handler per se, but rather a mechanism to clean up thread local storage. I'd overlooked it several times because I wasn't looking for thread local storage solutions.


// Basic plumbing to create a unique key identifying
// specific thread-local storage
static pthread_key_t key;
static void make_key() {
  extern void detach_thread();
  pthread_key_create(&key, detach_thread);
}

    ...
    // This code gets called whenever we identify that
    // a thread needs detach-on-exit behavior
    static pthread_once_t key_once = PTHREAD_ONCE_INIT;
    pthread_once(&key_once, make_key);
    if (!jvm || pthread_getspecific(key) == NULL) {
      pthread_setspecific(key, jvm);
    }
    ...

Now detach_thread will be called (with the value of the thread-local storage as its single argument) when the thread exits.

And voilà, you have a POSIX thread termination handler.