J. Eric Ivancich [Tue, 16 Oct 2018 20:40:03 +0000 (16:40 -0400)]
rgw: move RGWReshardBucket lock to its own separate class
There are other processes beyond resharding that would need to take a
bucket reshard lock (e.g., correcting bucet resharding flags in event
of crash, tools to remove bucket shard information from earlier
versions of ceph). Pulling this logic outside of RGWReshardBucket
allows this code to be re-used.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
J. Eric Ivancich [Fri, 12 Oct 2018 22:07:24 +0000 (18:07 -0400)]
rgw: failed resharding clears resharding status from shard heads
Previously, when resharding failed, we restored the shard status on
the bucket info object. However the status on each of the shards was
left indicating a reshard was underway. This prevented some write
operations from taking place, as they would wait for resharding to
complete. This adds the missing functionality. It also makes the
functionality available to other classes via static functions in
RGWBucketReshard.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
J. Eric Ivancich [Fri, 12 Oct 2018 14:24:32 +0000 (10:24 -0400)]
rgw: change the bucket reshard lock to exclusive-ephemeral
The bucket reshard lock was simply an exclusive lock that existed on
an object solely for the purpose of representing the lock. This is now
changed to exclusvie-ephemeral lock, so as not to leave these objects
behind.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
J. Eric Ivancich [Fri, 12 Oct 2018 14:23:57 +0000 (10:23 -0400)]
cls: add exclusive ephemeral locks that auto-clean
Add a new type of cls lock -- exclusive ephemeral for which the
object only exists to represent the lock and for which the object
should be deleted at unlock. This is to prevent the accumulation of
unneeded objects in the cluster by automatically cleaning them up.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
J. Eric Ivancich [Thu, 27 Sep 2018 17:31:57 +0000 (13:31 -0400)]
rgw: renew resharding locks to prevent expiration
Fix lock expiration problem with resharding. The resharding process
will renew its bucket lock (and logshard lock if necessary) when half
the remaining time is left on the lock. If the lock is expired and
cannot renew the process fails and errors out appropriately.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
cls: add semantics for cls locks to require renewal without expiring
Add ability to *require* renewal of an existing lock in addition
toexisting ability to *allow* renewal of an existing lock. The key
difference is that a MUST_RENEW will fail if the lock has expired
(where a MAY_RENEW) will succeed. This provides calling code with the
ability to verify that a lock is held continually and that it was
never lost/expired.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Sage Weil [Mon, 22 Oct 2018 19:38:48 +0000 (14:38 -0500)]
os/bluestore: fix race between SharedBlobSet::lookup and SharedBlob::put
A B
SharedBlobSet::lookup()
takes lock
nref is not 0
SharedBlob::put()
--nref
returns SharedBlobRef,
++nref
takes cache lock
SharedBlobSet::remove
takes lock
removes
deletes SharedBlob
-> A ends up with a ref to deleted SharedBlob
Fix by verifying that nref is still zero in SharedBlobSet::remove(),
while we are holding the SharedBlobSet::lock. The lock ensures that we
have increased the ref for the lookup before entering remove, so we can
verify that nref is still zero before removing it. If not, we have
raced, and put() bails out and does nothing.
Fixes: http://tracker.ceph.com/issues/36526 Signed-off-by: Sage Weil <sage@redhat.com>
Kefu Chai [Mon, 22 Oct 2018 08:41:45 +0000 (16:41 +0800)]
include/ceph_assert.h: do not pack assert params if WITH_ASAN
we pack the asset() params for smaller code size, but this creates a
inlined `assert_data_ctx` instance for every compilation unit which
call ceph_assert() defined in .h .
__PRETTY_FUNCTION__ is likely to be referenced by `assert_data_ctx`
sections which are included by different compiled object files. if the
ceph_assert() call is used by header file, then there will be multiple
`assert_data_ctx` sections sharing the same identifier. these sections are
defined as "COMDAT" group sections, i.e. common data sections. when linker
see multiple COMDAT sections with the same identifer, it will simply discard
the duplicated ones, and only keep a single copy of them. without enabling
ASan, GCC can always handle this problem just fine. but the dedup feature
does not work well with ASan. if ASan is enabled, and we link the objects
with the wrong order, some references will be pointing to the discarded
sections.
to address this issue, we could audit the link command line and inspect
all .o files to make sure they are properly ordered. but this is
non-trivial. as a workaround, in this change, the assert params are not
packed, and sent to the __ceph_assert_fail() overrides which accepts
unpacked params directly, so the COMDAT section is not created.
Jason Dillaman [Mon, 22 Oct 2018 14:44:40 +0000 (10:44 -0400)]
qa/tasks/qemu: use unique clone directory to avoid race with workunit
If there is a workunit task associated with the same client, the two
tasks will attempt to clone the suite repo to the same directory.
Worse, if it's parallel tasks, the two clones will clobber each
other.
Fixes: http://tracker.ceph.com/issues/36542 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Kefu Chai [Mon, 22 Oct 2018 06:06:38 +0000 (14:06 +0800)]
cmake,ceph.in: preload libasan if WITH_ASAN
we need to preload libasan.so as the python exectuable is not likely to
be compiled with ASan enabled.
see:
https://github.com/google/sanitizers/wiki/AddressSanitizerAsDso#asan-and-ld_preload
just to ease the use of ASan, for fine-tuned behaviour, use
`ASAN_OPTIONS`.
Kefu Chai [Mon, 22 Oct 2018 05:13:11 +0000 (13:13 +0800)]
cmake: should compile libzstd with -fPIC
otherwise we will have
/usr/bin/ld: libzstd/lib/libzstd.a(error_private.c.o): relocation
R_X86_64_32S against `.rodata' can not be used when making a shared
object; recompile with -fPIC
Kefu Chai [Mon, 22 Oct 2018 04:28:01 +0000 (12:28 +0800)]
cmake: pass cflags to disutils using CC instead of CFLAGS
in python's distutils.ccompiler, linker_exe is composed using CC instead
of LDFLAGS. the latter only effects how it builds (shared) library.
and put CMAKE_C_FLAGS into the cflags for the compiler for building
python C extensions, it's more consistent this way. more importantly,
if we build with ASan enabled, the canary program, a.k.a. rados_dummy.c,
won't link without proper CFLAGS.
without this change, rados.so fails to build with errors like:
/usr/bin/ld: /var/ssd/ceph/build/lib/librados.so: undefined reference to
`__asan_stack_free_10'
/usr/bin/ld: /var/ssd/ceph/build/lib/librados.so: undefined reference to
`__asan_report_exp_store8'
...
...
clang: error: linker command failed with exit code 1 (use -v to see
invocation)
Link Error: RADOS library not found
make[3]: ***
[src/pybind/rados/CMakeFiles/cython_rados.dir/build.make:57:
src/pybind/rados/CMakeFiles/cython_rados] Error 1
myoungwon oh [Sun, 21 Oct 2018 06:29:09 +0000 (15:29 +0900)]
src/test: fix unordered manifest-unset op
- manifest unset op to foo-chunk object
- remove manifest flag
- commit
- send an ack to a client
- send decrement mesages ("chunk_put") to old chunks (bar-chunk)
Current unit test(ManifestUnset) send "chunk_read" command (to bar-chunk)
in order to see whether chunk's reference count is decreased.
But, as described above, "chunk_read" event can be triggered after a client
(test application) receives an ack. Therefore, there is a corner case
such as bar-chunk (in chunk pool) receives "chunk_read" first instead of "chunk_put"
Reference count model of dedup/tiering is based on false-positive (#24230).
So decreasing reference count is not guaranteed. If reference mismatch occur,
chunk-scrub (this is WIP) will fix it.
One guaranteed thing is that existing manifest flag is removed.
So, the solution of this commit is just re-send unset op, and then
chenk that return value is -EOPNOTSUPP (this means manifest flags is removed).
Sage Weil [Sat, 20 Oct 2018 21:40:22 +0000 (16:40 -0500)]
Merge PR #24184 into master
* refs/pull/24184/head:
mgr/DaemonServer: remove any upmaps on merging PGs
mgr/DaemonServer: prevent merge if either pg is remapped|upmap
mgr/DaemonServer: move pending merge check for more consistent code
qa/suites/rados/thrash*/thrashers/careful.yaml: thrash with mgr controller
mgr/DaemonServer: add option to bypass careful throttling for thrasher
PendingReleaseNotes: note about mgr/balancer/max_misplaced change
mgr/DaemonServer: remove stale/misleading check
mgr/DaemonServer: throttle pgp_num changes based on misplaced %
mgr/DaemonServer: block pg_num decrease(merge) until pgp_num is reduced
mgr/DaemonServer: adjust_pgs(): cosmetic change to debug output
mon/PGMap: add get_recovery_stats()
mgr/balancer: mgr/balancer/max_misplaced -> pg_max_misplaced
pybind/mgr/mgr_module: add get_option()
mgr/DaemonServer: allow pg_num increases that abort pending merges
mon/OSDMonitor: resent pre-nautilus client ops on aborted merge
mon/OSDMonitor: make pgp_num track pg_num more consistently
Reviewed-by: John Spray <john.spray@redhat.com> Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Sage Weil [Mon, 24 Sep 2018 18:53:46 +0000 (13:53 -0500)]
mgr/DaemonServer: remove any upmaps on merging PGs
Remove any pg_upmap[_items] on pgs that are merging to ensure that they
land on the same OSDs.
This is a bit sloppy: we *could* set the source upmap to match the target
upmap (vs potentially moving both PGs to a third location, and/or then
having the balancer move the resulting PG somewhere else again), but for
now assume upmaps are not a common case and Keep It Simple.
Alan Somers [Wed, 4 Oct 2017 22:31:23 +0000 (16:31 -0600)]
blkdev: overhaul API for better portability
* Turn the API into a class so it will work with GoogleMock
* Take file descriptors for all methods, instead of a mix of file
descriptors, path names, and canonical device names. It's more
consistent and it will work better with FreeBSD.
* Split get_device_by_fd into block_device_partition and
block_device_wholedisk
* Remove dead code