Jason Dillaman [Mon, 24 Sep 2018 18:45:09 +0000 (14:45 -0400)]
librbd: do not invalidate object map if update races with copyup
The copyup state machine needs to iterate over all object maps to update
the existence for the object. If an snapshot is being removed concurrently,
it's possible to invalidate the object map for the image.
Fixes: http://tracker.ceph.com/issues/24516 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Mon, 17 Sep 2018 20:20:57 +0000 (16:20 -0400)]
qa/workunits/rbd: wait max 2 hrs for all stress images to sync
Sporadically the rbd-mirror fsx stress test would fail due to very
slow sync times due to overloaded clusters. Attempt to wait for all
images to be replicated before proceeding with the comparison.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Patrick Donnelly [Mon, 24 Sep 2018 21:46:14 +0000 (14:46 -0700)]
Merge PR #23187 into master
* refs/pull/23187/head:
test: make rank argument mandatory when running journal_tool
cephfs-journal-tool: make "--rank" argument mandatory
cephfs-journal-tool: pass local arg vector for Journal actions
cephfs-journal-tool: dump to per rank output file wherever necessary
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Mon, 24 Sep 2018 21:41:59 +0000 (14:41 -0700)]
Merge PR #23530 into master
* refs/pull/23530/head:
qa/vstart_runner: fix daemons list
PendingReleaseNotes: note multifs support in libcephfs
test/cephfs: add pybind test for mount_root
pybind/cephfs: enable passing filesystem name to mount
libcephfs: add ceph_select_filesystem
common: add doc strings to client_mds_namespace
client: allow passing fs name to mount()
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Conflicts:
PendingReleaseNotes
Andrew Schoen [Mon, 24 Sep 2018 21:41:37 +0000 (16:41 -0500)]
ceph-volume: fail fast during tests
This also rsyncs the ceph-volume code to the testing vms before
a ceph.conf is generated because ceph-volume is needed now for this
to figure out the number of osds when you're using 'lvm batch'.
Jason Dillaman [Wed, 16 May 2018 13:26:32 +0000 (09:26 -0400)]
librbd: create image should return unique error code on id collision
The image id is composed of the librados global instance id and a random
number. For long-lived clients that create multiple images (basically
only rbd-mirror daemon), it's more likely to hit a collision.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Sat, 22 Sep 2018 16:01:15 +0000 (11:01 -0500)]
osd/PG: fix not_ready_to_merge behavior for merge target
Track the *target* not being ready to merge independently from the source,
so that we do not have two PGs fighting over the state of
not_ready_ready_to_merge_source, and so that the map reflects the *source*
PGs readiness only.
Sage Weil [Mon, 24 Sep 2018 03:43:08 +0000 (22:43 -0500)]
qa/suites/rados/thrash-old-clients/workloads/rbd_cls.yaml: skip parents test
We can't (easily) build updated hammer packages, but all this sh script does
it run this one test binary with --gtest_filter arguments, so just do
it directly and skip the test explicitly here. (Newer version of the .sh
understand the environemnt variable but the hammer version does not.)
Fixes: http://tracker.ceph.com/issues/36104 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Sun, 23 Sep 2018 16:17:03 +0000 (11:17 -0500)]
Merge PR #24133 into master
* refs/pull/24133/head:
common/Finisher: convert to ceph::mutex etc
common/ceph_mutex: ceph::{mutex,condition_variable,lock_guard}
common/mutex_debug: take const char * to ctor, and require a name
common/mutex_debug: add lockdep support for recursive_mutex_debug
common/mutex_debug: fix whitespace
common/mutex_debug: refactor to remove intermediate class
common/lockdep: add recursive flag for _will_lock
do_cmake.sh: default to Debug build
.gitignore: ignore build.*/
Sage Weil [Sat, 22 Sep 2018 15:42:20 +0000 (10:42 -0500)]
osd/ECBackend: suppress 'Error -2 reading object' if EC fast reads
When fast reads are enabled, it's possible for the ordering of a shard
read to not be enforced with respect to writes that come after because
the read completes on the primary before all shards reply. This can lead
to an ENOENT on the non-primary, and an ERR message in the cluster log,
even though everything is fine. (The reply will go back to the primary
with the error but it will be ignored since the read has completed.)
Suppress the error message so we don't see these ERR messages in the
cluster log during the normal course of events.
Fixes: http://tracker.ceph.com/issues/26972 Signed-off-by: Sage Weil <sage@redhat.com>
If CEPH_DEBUG_MUTEX is defined, use the [recursive_]mutex_debug classes
that implement lockdep and a bucnh of other random debug checks. Also
typedef ceph::condition_variable to std::condition_variable_debug, which
adds addition assertions and debug checks.
If CEPH_DEBUG_MUTEX is not defined, then use the bare-bones C++ std::mutex
primitives... or as close as we can get to them.
Since the [recursive_]mutex_debug classes take a string argument for the
lockdep piece, define factory functions ceph::make_[recursive_]mutex that
either pass arguments to the debug implementations or toss them out.
Patrick Nawracay [Mon, 17 Sep 2018 07:25:34 +0000 (09:25 +0200)]
mgr/dashboard: Fix for some dashboard timing issues
Specifically fixes the recurringly occurring `test_osd.py` error on the
`test_scrub` method. But this change should also prevent other issues of
the same kind. Issues of "same kind" are issues which occurr due to
tests which do not immediately result in a clean cluster status and
aren't manually programmed to wait for it.
Fixes: http://tracker.ceph.com/issues/36107 Signed-off-by: Patrick Nawracay <pnawracay@suse.com>
Sage Weil [Fri, 21 Sep 2018 13:21:53 +0000 (08:21 -0500)]
Merge PR #23985 into master
* refs/pull/23985/head:
ceph-objectstore-tool: add back pool dne check
qa/suites/rados/singleton/reg11184: remove old test
ceph-objectstore-tool: import pg at original epoch
osd: handle null pg slot on startup
ceph-objectstore-tool: drop support for ancient export files
osd: avoid dropping osd_lock when pg osdmaps are not laggy
qa/standalone/osd/pg-merge.sh: add merge vs pg import test
Sage Weil [Fri, 21 Sep 2018 13:21:33 +0000 (08:21 -0500)]
Merge PR #24064 into master
* refs/pull/24064/head:
osd: simplify init of fabricated pg
osd/PG: inherit pg history from merge source, if necessary
osd/osd_types: increasing pg_num_pending is also an interval change
osd: cancel pg merge if PGs are undersized
mon/OSDMonitor: handle ready_to_merge message that cancels the merge
osd/PG: only signal ready_to_merge if we have all replicas
osd/PG: move all mark_clean-ish activity into try_mark_clean()
osd/PG: use last_epoch_clean from ReadyToMerge point in time for fabricated history
osd: send last_epoch_clean when indicating PG is ready to merge
osd/osd_types: rename pg_num_pending_dec_epoch -> pg_num_dec_last_epoch_clean
osd,mon: stop setting pg_num_pending_dec_epoch