I'm not sure whether rados_connect is expected to be threadsafe or not,
so this is just a documentation patch rather than a fix; I'd appreciate
your opinion on whether this is expected behaviour or not.
The race condition is in the call to ceph::crypto::init when called by
common_init_finish, the issue being that it calls NSS_NoDB_Init (not
threadsafe) without locking. It can be reproduced (probabilistically) by
calling rados_connect on different rados_t objects simultaneously, due
to NSS_NoDB_Init's use of PR_CallOnce in nspr (which keeps global state,
and while PR_CallOnce is intended as a locking function, the locking
itself isn't thread-safe, and can pass PR_Lock a null pointer).
The observed behaviour is a segfault on calling rados_connect.
Backtrace, for reference:
secmodName=secmodName@entry=0x7ffff4f64817 "", updateDir=updateDir@entry=0x7ffff4f64817 "", updCertPrefix=updCertPrefix@entry=0x7ffff4f64817 "",
updKeyPrefix=updKeyPrefix@entry=0x7ffff4f64817 "", updateID=updateID@entry=0x7ffff4f64817 "", updateName=updateName@entry=0x7ffff4f64817 "",
initContextPtr=initContextPtr@entry=0x0, initParams=initParams@entry=0x0, readOnly=readOnly@entry=1, noCertDB=noCertDB@entry=1, noModDB=noModDB@entry=1,
forceOpen=forceOpen@entry=1, noRootInit=noRootInit@entry=1, optimizeSpace=optimizeSpace@entry=1, noSingleThreadedModules=noSingleThreadedModules@entry=0,
allowAlreadyInitializedModules=allowAlreadyInitializedModules@entry=0, dontFinalizeModules=dontFinalizeModules@entry=0) at nssinit.c:551
Signed-off-by: Sharif Olorin <sio@tesser.org>
*
* @post If this succeeds, any function in librados may be used
*
+ * @warning This function is not thread-safe if using libnss (built with
+ * --with-nss); callers must implement their own locking mechanisms.
+ *
* @param cluster The cluster to connect to.
* @returns 0 on sucess, negative error code on failure
*/