common/mutex_debug: add memory barrier before load/store nlock
on some architectures, the CPU reorders the store-store and load-load
instruction sequences, hence we cannot assume that nlock is always
updated after locked_by in _post_lock().
in this change, nlock is changed from a plain int to atomic<int> so we
can use atomic primitives to store to / read from it. the major
consumer of nlock and locked_by is `is_locked_by_me()`. so, locked_by is
always guarded by release-acquire ordering of nlock when accessing this
variable so the access to them are not reordered.
`is_locked_by_me` is only enabled when `CEPH_DEBUG_MUTEX` macro is
defined for debugging purpose. and this macro is enabled for Debug
builds. the vanilla std::mutex is used for Release builds. so the
problem addressed by this change only impacts the Debug builds on
ARM64/ARM32 architectures.