add a reference-counted SharedMutexImpl so that lock guards can outlive
the SharedMutex itself. this is required because the lock guards are
passed with async completions, and there is no guarantee that the
executor will process those completions before the SharedMutex
destructs. this case is exercised by the async_destruct unit test