Patrick Donnelly [Mon, 24 Sep 2018 21:41:59 +0000 (14:41 -0700)]
Merge PR #23530 into master
* refs/pull/23530/head:
qa/vstart_runner: fix daemons list
PendingReleaseNotes: note multifs support in libcephfs
test/cephfs: add pybind test for mount_root
pybind/cephfs: enable passing filesystem name to mount
libcephfs: add ceph_select_filesystem
common: add doc strings to client_mds_namespace
client: allow passing fs name to mount()
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Conflicts:
PendingReleaseNotes
Sage Weil [Mon, 24 Sep 2018 03:43:08 +0000 (22:43 -0500)]
qa/suites/rados/thrash-old-clients/workloads/rbd_cls.yaml: skip parents test
We can't (easily) build updated hammer packages, but all this sh script does
it run this one test binary with --gtest_filter arguments, so just do
it directly and skip the test explicitly here. (Newer version of the .sh
understand the environemnt variable but the hammer version does not.)
Fixes: http://tracker.ceph.com/issues/36104 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Sun, 23 Sep 2018 16:17:03 +0000 (11:17 -0500)]
Merge PR #24133 into master
* refs/pull/24133/head:
common/Finisher: convert to ceph::mutex etc
common/ceph_mutex: ceph::{mutex,condition_variable,lock_guard}
common/mutex_debug: take const char * to ctor, and require a name
common/mutex_debug: add lockdep support for recursive_mutex_debug
common/mutex_debug: fix whitespace
common/mutex_debug: refactor to remove intermediate class
common/lockdep: add recursive flag for _will_lock
do_cmake.sh: default to Debug build
.gitignore: ignore build.*/
Sage Weil [Sat, 22 Sep 2018 15:42:20 +0000 (10:42 -0500)]
osd/ECBackend: suppress 'Error -2 reading object' if EC fast reads
When fast reads are enabled, it's possible for the ordering of a shard
read to not be enforced with respect to writes that come after because
the read completes on the primary before all shards reply. This can lead
to an ENOENT on the non-primary, and an ERR message in the cluster log,
even though everything is fine. (The reply will go back to the primary
with the error but it will be ignored since the read has completed.)
Suppress the error message so we don't see these ERR messages in the
cluster log during the normal course of events.
Fixes: http://tracker.ceph.com/issues/26972 Signed-off-by: Sage Weil <sage@redhat.com>
If CEPH_DEBUG_MUTEX is defined, use the [recursive_]mutex_debug classes
that implement lockdep and a bucnh of other random debug checks. Also
typedef ceph::condition_variable to std::condition_variable_debug, which
adds addition assertions and debug checks.
If CEPH_DEBUG_MUTEX is not defined, then use the bare-bones C++ std::mutex
primitives... or as close as we can get to them.
Since the [recursive_]mutex_debug classes take a string argument for the
lockdep piece, define factory functions ceph::make_[recursive_]mutex that
either pass arguments to the debug implementations or toss them out.
Patrick Nawracay [Mon, 17 Sep 2018 07:25:34 +0000 (09:25 +0200)]
mgr/dashboard: Fix for some dashboard timing issues
Specifically fixes the recurringly occurring `test_osd.py` error on the
`test_scrub` method. But this change should also prevent other issues of
the same kind. Issues of "same kind" are issues which occurr due to
tests which do not immediately result in a clean cluster status and
aren't manually programmed to wait for it.
Fixes: http://tracker.ceph.com/issues/36107 Signed-off-by: Patrick Nawracay <pnawracay@suse.com>
Sage Weil [Fri, 21 Sep 2018 13:21:53 +0000 (08:21 -0500)]
Merge PR #23985 into master
* refs/pull/23985/head:
ceph-objectstore-tool: add back pool dne check
qa/suites/rados/singleton/reg11184: remove old test
ceph-objectstore-tool: import pg at original epoch
osd: handle null pg slot on startup
ceph-objectstore-tool: drop support for ancient export files
osd: avoid dropping osd_lock when pg osdmaps are not laggy
qa/standalone/osd/pg-merge.sh: add merge vs pg import test
Sage Weil [Fri, 21 Sep 2018 13:21:33 +0000 (08:21 -0500)]
Merge PR #24064 into master
* refs/pull/24064/head:
osd: simplify init of fabricated pg
osd/PG: inherit pg history from merge source, if necessary
osd/osd_types: increasing pg_num_pending is also an interval change
osd: cancel pg merge if PGs are undersized
mon/OSDMonitor: handle ready_to_merge message that cancels the merge
osd/PG: only signal ready_to_merge if we have all replicas
osd/PG: move all mark_clean-ish activity into try_mark_clean()
osd/PG: use last_epoch_clean from ReadyToMerge point in time for fabricated history
osd: send last_epoch_clean when indicating PG is ready to merge
osd/osd_types: rename pg_num_pending_dec_epoch -> pg_num_dec_last_epoch_clean
osd,mon: stop setting pg_num_pending_dec_epoch
/ceph/src/osd/PG.cc: In member function 'void
PG::choose_async_recovery_ec(const std::map<pg_shard_t, pg_info_t>&,
const pg_info_t&, std::vector<int>*, std::set<pg_shard_t>*) const':
/ceph/src/osd/PG.cc:1572:32: warning: comparison of integer expressions
of different signedness: 'long int' and 'long unsigned int'
[-Wsign-compare]
if (approx_missing_objects > cct->_conf.get_val<uint64_t>(
~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"osd_async_recovery_min_cost")) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/ceph/src/osd/PG.cc: In member function 'void
PG::choose_async_recovery_replicated(const std::map<pg_shard_t,
pg_info_t>&, const pg_info_t&, std::vector<int>*, std::set<pg_shard_t>*)
const':
/ceph/src/osd/PG.cc:1625:33: warning: comparison of integer expressions
of different signedness: 'long int' and 'long unsigned int'
[-Wsign-compare]
if (approx_missing_objects > cct->_conf.get_val<uint64_t>(
~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"osd_async_recovery_min_cost")) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
since 0bd2546eaca72ed0122a9c2648df4bef05b0d5d2, we check the pool id
of object when performing fsck to ensure we are looking at the right
collection, but the test is still using the pool id set by the
constructor of hobject_t. so all objects we created in that test belong
to the POOL_META. while the collection is created with the pool id of
`555`. hence the test fails.
Sage Weil [Mon, 10 Sep 2018 18:24:19 +0000 (13:24 -0500)]
ceph-objectstore-tool: import pg at original epoch
- In the jewel era, we fast-forwarded the PG to the OSD's latest epoch
and cleared past_intervals.
- In mimic, as of 2347ecb9614b0cd4cd9eae1d67b03119cc7ad18e, we brought the
PG up to date while updating past_intervals. (At the same time we removed
the OSD's parallel past_intervals regeneration.)
The problem is that the tool then has to reimplement the past_intervals
update logic, and *also* has to cope with splits and merges. Splits are
somewhat easier (until now we enable partial import of a PG into a split
child), but merges are not so easy.
This patch changes it so we import the PG and leave the pg_epoch matching
the import file. The OSD is then responsible for bringing it up to date
with the latest map, and dealing with any intervening splits or merges.
We also adjust the safety check to ensure that we don't collide with
any existing PG, either a child we eventually split into, or a parent
we eventually merge into.
Fixes: http://tracker.ceph.com/issues/35955 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 20 Sep 2018 16:52:27 +0000 (11:52 -0500)]
mon/MonClient: fix wait for monmap+config is non-cephx case
In the auth_none case, we were exiting the get_monmap_and_config() loop
early, before we got a monmap, because the default constructed monmap
did not have the mimic feature. Make sure we wait for both the monmap
and config.
Remove the "ceph_assert" statements and instead bubble any potential
error code up to the caller. The object map state machines should
attempt to return a 0 upon failure unless it was unable to flag the
object map as invalid.
Fixes: http://tracker.ceph.com/issues/36074 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Wed, 19 Sep 2018 17:35:53 +0000 (12:35 -0500)]
osd: simplify init of fabricated pg
This was similar (but different) to the logic in PG::merge_from(). Do not
do any initialization here, and instead rely on merge_from() to do the
right thing.
Sage Weil [Mon, 17 Sep 2018 17:51:41 +0000 (12:51 -0500)]
osd/PG: inherit pg history from merge source, if necessary
Having an accurate(ish) same_interval_since is important for making sure
any subsequent PastIntervals we add are consistent with the
last_epoch_clean value that the bounds are tested against. Otherwise we
might have lec 100 and merge in 150, an interval changes gives us a pi of
[150,something) and we fail the bounds check.
Sage Weil [Sat, 15 Sep 2018 21:36:55 +0000 (16:36 -0500)]
osd: cancel pg merge if PGs are undersized
If the PG is undersized, cancel the PG merge attempt early. Undersized is
a bad thing because it makes merge more dangerous.
It's also bad because the PG won't be fully clean when it finishes
peering, which means last_epoch_clean can be something far in the past,
and past_intervals won't be empty. Since we also take the past_intervals
from the source PG, we want to be confident that it is valid. It *should*
match up with the target PG since they should have mapped to the same
OSDs since they were both clean at the ReadyToMerge point--in fact, they
should both be empty. If a PG mapping change snuck in such that they did
map somewhere else, though, the same set of mapping changes will have
applied to both the source and target, so it should be safe.
(It would be better of the mon rejected the ReadyToMerge if the
mapping with the latest OSDMap has changed since the message was sent.
If we do that the situation is even better, but this change is still
appropriate.)
Sage Weil [Sat, 15 Sep 2018 19:31:41 +0000 (14:31 -0500)]
osd/PG: move all mark_clean-ish activity into try_mark_clean()
Keep it all in one place (try_mark_clean()). The key behavioral change
is that we update last_epoch_clean and last_epoch_started when we are
peered too, only only when we are active.
Sage Weil [Wed, 12 Sep 2018 20:02:13 +0000 (15:02 -0500)]
osd/PG: use last_epoch_clean from ReadyToMerge point in time for fabricated history
If we are fabricating the pg history values, we need something that is
reasonably valid, but that won't screw up peering of the PG by indicating
that the PG has peered at some point later than when it really has.
Otherwise we can end up in a situation where everyone thinks there is a
newer pg info out there that doesn't actually exist, and the PG will end
up as incomplete.