Sage Weil [Mon, 22 Oct 2018 19:38:48 +0000 (14:38 -0500)]
os/bluestore: fix race between SharedBlobSet::lookup and SharedBlob::put
A B
SharedBlobSet::lookup()
takes lock
nref is not 0
SharedBlob::put()
--nref
returns SharedBlobRef,
++nref
takes cache lock
SharedBlobSet::remove
takes lock
removes
deletes SharedBlob
-> A ends up with a ref to deleted SharedBlob
Fix by verifying that nref is still zero in SharedBlobSet::remove(),
while we are holding the SharedBlobSet::lock. The lock ensures that we
have increased the ref for the lookup before entering remove, so we can
verify that nref is still zero before removing it. If not, we have
raced, and put() bails out and does nothing.
Fixes: http://tracker.ceph.com/issues/36526 Signed-off-by: Sage Weil <sage@redhat.com>
Kefu Chai [Mon, 22 Oct 2018 08:41:45 +0000 (16:41 +0800)]
include/ceph_assert.h: do not pack assert params if WITH_ASAN
we pack the asset() params for smaller code size, but this creates a
inlined `assert_data_ctx` instance for every compilation unit which
call ceph_assert() defined in .h .
__PRETTY_FUNCTION__ is likely to be referenced by `assert_data_ctx`
sections which are included by different compiled object files. if the
ceph_assert() call is used by header file, then there will be multiple
`assert_data_ctx` sections sharing the same identifier. these sections are
defined as "COMDAT" group sections, i.e. common data sections. when linker
see multiple COMDAT sections with the same identifer, it will simply discard
the duplicated ones, and only keep a single copy of them. without enabling
ASan, GCC can always handle this problem just fine. but the dedup feature
does not work well with ASan. if ASan is enabled, and we link the objects
with the wrong order, some references will be pointing to the discarded
sections.
to address this issue, we could audit the link command line and inspect
all .o files to make sure they are properly ordered. but this is
non-trivial. as a workaround, in this change, the assert params are not
packed, and sent to the __ceph_assert_fail() overrides which accepts
unpacked params directly, so the COMDAT section is not created.
Jason Dillaman [Mon, 22 Oct 2018 14:44:40 +0000 (10:44 -0400)]
qa/tasks/qemu: use unique clone directory to avoid race with workunit
If there is a workunit task associated with the same client, the two
tasks will attempt to clone the suite repo to the same directory.
Worse, if it's parallel tasks, the two clones will clobber each
other.
Fixes: http://tracker.ceph.com/issues/36542 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Kefu Chai [Mon, 22 Oct 2018 06:06:38 +0000 (14:06 +0800)]
cmake,ceph.in: preload libasan if WITH_ASAN
we need to preload libasan.so as the python exectuable is not likely to
be compiled with ASan enabled.
see:
https://github.com/google/sanitizers/wiki/AddressSanitizerAsDso#asan-and-ld_preload
just to ease the use of ASan, for fine-tuned behaviour, use
`ASAN_OPTIONS`.
Kefu Chai [Mon, 22 Oct 2018 05:13:11 +0000 (13:13 +0800)]
cmake: should compile libzstd with -fPIC
otherwise we will have
/usr/bin/ld: libzstd/lib/libzstd.a(error_private.c.o): relocation
R_X86_64_32S against `.rodata' can not be used when making a shared
object; recompile with -fPIC
Kefu Chai [Mon, 22 Oct 2018 04:28:01 +0000 (12:28 +0800)]
cmake: pass cflags to disutils using CC instead of CFLAGS
in python's distutils.ccompiler, linker_exe is composed using CC instead
of LDFLAGS. the latter only effects how it builds (shared) library.
and put CMAKE_C_FLAGS into the cflags for the compiler for building
python C extensions, it's more consistent this way. more importantly,
if we build with ASan enabled, the canary program, a.k.a. rados_dummy.c,
won't link without proper CFLAGS.
without this change, rados.so fails to build with errors like:
/usr/bin/ld: /var/ssd/ceph/build/lib/librados.so: undefined reference to
`__asan_stack_free_10'
/usr/bin/ld: /var/ssd/ceph/build/lib/librados.so: undefined reference to
`__asan_report_exp_store8'
...
...
clang: error: linker command failed with exit code 1 (use -v to see
invocation)
Link Error: RADOS library not found
make[3]: ***
[src/pybind/rados/CMakeFiles/cython_rados.dir/build.make:57:
src/pybind/rados/CMakeFiles/cython_rados] Error 1
myoungwon oh [Sun, 21 Oct 2018 06:29:09 +0000 (15:29 +0900)]
src/test: fix unordered manifest-unset op
- manifest unset op to foo-chunk object
- remove manifest flag
- commit
- send an ack to a client
- send decrement mesages ("chunk_put") to old chunks (bar-chunk)
Current unit test(ManifestUnset) send "chunk_read" command (to bar-chunk)
in order to see whether chunk's reference count is decreased.
But, as described above, "chunk_read" event can be triggered after a client
(test application) receives an ack. Therefore, there is a corner case
such as bar-chunk (in chunk pool) receives "chunk_read" first instead of "chunk_put"
Reference count model of dedup/tiering is based on false-positive (#24230).
So decreasing reference count is not guaranteed. If reference mismatch occur,
chunk-scrub (this is WIP) will fix it.
One guaranteed thing is that existing manifest flag is removed.
So, the solution of this commit is just re-send unset op, and then
chenk that return value is -EOPNOTSUPP (this means manifest flags is removed).
Sage Weil [Sat, 20 Oct 2018 21:40:22 +0000 (16:40 -0500)]
Merge PR #24184 into master
* refs/pull/24184/head:
mgr/DaemonServer: remove any upmaps on merging PGs
mgr/DaemonServer: prevent merge if either pg is remapped|upmap
mgr/DaemonServer: move pending merge check for more consistent code
qa/suites/rados/thrash*/thrashers/careful.yaml: thrash with mgr controller
mgr/DaemonServer: add option to bypass careful throttling for thrasher
PendingReleaseNotes: note about mgr/balancer/max_misplaced change
mgr/DaemonServer: remove stale/misleading check
mgr/DaemonServer: throttle pgp_num changes based on misplaced %
mgr/DaemonServer: block pg_num decrease(merge) until pgp_num is reduced
mgr/DaemonServer: adjust_pgs(): cosmetic change to debug output
mon/PGMap: add get_recovery_stats()
mgr/balancer: mgr/balancer/max_misplaced -> pg_max_misplaced
pybind/mgr/mgr_module: add get_option()
mgr/DaemonServer: allow pg_num increases that abort pending merges
mon/OSDMonitor: resent pre-nautilus client ops on aborted merge
mon/OSDMonitor: make pgp_num track pg_num more consistently
Reviewed-by: John Spray <john.spray@redhat.com> Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Sage Weil [Mon, 24 Sep 2018 18:53:46 +0000 (13:53 -0500)]
mgr/DaemonServer: remove any upmaps on merging PGs
Remove any pg_upmap[_items] on pgs that are merging to ensure that they
land on the same OSDs.
This is a bit sloppy: we *could* set the source upmap to match the target
upmap (vs potentially moving both PGs to a third location, and/or then
having the balancer move the resulting PG somewhere else again), but for
now assume upmaps are not a common case and Keep It Simple.
Alan Somers [Wed, 4 Oct 2017 22:31:23 +0000 (16:31 -0600)]
blkdev: overhaul API for better portability
* Turn the API into a class so it will work with GoogleMock
* Take file descriptors for all methods, instead of a mix of file
descriptors, path names, and canonical device names. It's more
consistent and it will work better with FreeBSD.
* Split get_device_by_fd into block_device_partition and
block_device_wholedisk
* Remove dead code
Tim Serong [Fri, 19 Oct 2018 08:25:45 +0000 (19:25 +1100)]
mgr/orchestrator: use result property in Completion classes
The _Completion class defined a get_result() method, but the orchestrator
CLI and Rook module both just expected a result property, so let's run with
the property version.