Patrick Nawracay [Mon, 17 Sep 2018 07:25:34 +0000 (09:25 +0200)]
mgr/dashboard: Fix for some dashboard timing issues
Specifically fixes the recurringly occurring `test_osd.py` error on the
`test_scrub` method. But this change should also prevent other issues of
the same kind. Issues of "same kind" are issues which occurr due to
tests which do not immediately result in a clean cluster status and
aren't manually programmed to wait for it.
Fixes: http://tracker.ceph.com/issues/36107 Signed-off-by: Patrick Nawracay <pnawracay@suse.com>
Sage Weil [Fri, 21 Sep 2018 13:21:53 +0000 (08:21 -0500)]
Merge PR #23985 into master
* refs/pull/23985/head:
ceph-objectstore-tool: add back pool dne check
qa/suites/rados/singleton/reg11184: remove old test
ceph-objectstore-tool: import pg at original epoch
osd: handle null pg slot on startup
ceph-objectstore-tool: drop support for ancient export files
osd: avoid dropping osd_lock when pg osdmaps are not laggy
qa/standalone/osd/pg-merge.sh: add merge vs pg import test
Sage Weil [Fri, 21 Sep 2018 13:21:33 +0000 (08:21 -0500)]
Merge PR #24064 into master
* refs/pull/24064/head:
osd: simplify init of fabricated pg
osd/PG: inherit pg history from merge source, if necessary
osd/osd_types: increasing pg_num_pending is also an interval change
osd: cancel pg merge if PGs are undersized
mon/OSDMonitor: handle ready_to_merge message that cancels the merge
osd/PG: only signal ready_to_merge if we have all replicas
osd/PG: move all mark_clean-ish activity into try_mark_clean()
osd/PG: use last_epoch_clean from ReadyToMerge point in time for fabricated history
osd: send last_epoch_clean when indicating PG is ready to merge
osd/osd_types: rename pg_num_pending_dec_epoch -> pg_num_dec_last_epoch_clean
osd,mon: stop setting pg_num_pending_dec_epoch
Sage Weil [Mon, 10 Sep 2018 18:24:19 +0000 (13:24 -0500)]
ceph-objectstore-tool: import pg at original epoch
- In the jewel era, we fast-forwarded the PG to the OSD's latest epoch
and cleared past_intervals.
- In mimic, as of 2347ecb9614b0cd4cd9eae1d67b03119cc7ad18e, we brought the
PG up to date while updating past_intervals. (At the same time we removed
the OSD's parallel past_intervals regeneration.)
The problem is that the tool then has to reimplement the past_intervals
update logic, and *also* has to cope with splits and merges. Splits are
somewhat easier (until now we enable partial import of a PG into a split
child), but merges are not so easy.
This patch changes it so we import the PG and leave the pg_epoch matching
the import file. The OSD is then responsible for bringing it up to date
with the latest map, and dealing with any intervening splits or merges.
We also adjust the safety check to ensure that we don't collide with
any existing PG, either a child we eventually split into, or a parent
we eventually merge into.
Fixes: http://tracker.ceph.com/issues/35955 Signed-off-by: Sage Weil <sage@redhat.com>
Remove the "ceph_assert" statements and instead bubble any potential
error code up to the caller. The object map state machines should
attempt to return a 0 upon failure unless it was unable to flag the
object map as invalid.
Fixes: http://tracker.ceph.com/issues/36074 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Wed, 19 Sep 2018 17:35:53 +0000 (12:35 -0500)]
osd: simplify init of fabricated pg
This was similar (but different) to the logic in PG::merge_from(). Do not
do any initialization here, and instead rely on merge_from() to do the
right thing.
Sage Weil [Mon, 17 Sep 2018 17:51:41 +0000 (12:51 -0500)]
osd/PG: inherit pg history from merge source, if necessary
Having an accurate(ish) same_interval_since is important for making sure
any subsequent PastIntervals we add are consistent with the
last_epoch_clean value that the bounds are tested against. Otherwise we
might have lec 100 and merge in 150, an interval changes gives us a pi of
[150,something) and we fail the bounds check.
Sage Weil [Sat, 15 Sep 2018 21:36:55 +0000 (16:36 -0500)]
osd: cancel pg merge if PGs are undersized
If the PG is undersized, cancel the PG merge attempt early. Undersized is
a bad thing because it makes merge more dangerous.
It's also bad because the PG won't be fully clean when it finishes
peering, which means last_epoch_clean can be something far in the past,
and past_intervals won't be empty. Since we also take the past_intervals
from the source PG, we want to be confident that it is valid. It *should*
match up with the target PG since they should have mapped to the same
OSDs since they were both clean at the ReadyToMerge point--in fact, they
should both be empty. If a PG mapping change snuck in such that they did
map somewhere else, though, the same set of mapping changes will have
applied to both the source and target, so it should be safe.
(It would be better of the mon rejected the ReadyToMerge if the
mapping with the latest OSDMap has changed since the message was sent.
If we do that the situation is even better, but this change is still
appropriate.)
Sage Weil [Sat, 15 Sep 2018 19:31:41 +0000 (14:31 -0500)]
osd/PG: move all mark_clean-ish activity into try_mark_clean()
Keep it all in one place (try_mark_clean()). The key behavioral change
is that we update last_epoch_clean and last_epoch_started when we are
peered too, only only when we are active.
Sage Weil [Wed, 12 Sep 2018 20:02:13 +0000 (15:02 -0500)]
osd/PG: use last_epoch_clean from ReadyToMerge point in time for fabricated history
If we are fabricating the pg history values, we need something that is
reasonably valid, but that won't screw up peering of the PG by indicating
that the PG has peered at some point later than when it really has.
Otherwise we can end up in a situation where everyone thinks there is a
newer pg info out there that doesn't actually exist, and the PG will end
up as incomplete.
In https://github.com/ceph/ceph/pull/21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:
The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.
https://github.com/ceph/ceph/pull/18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..
Jianpeng Ma [Thu, 20 Sep 2018 14:10:20 +0000 (22:10 +0800)]
osd/OSD: choose a fixed thread do oncommits callback function
Now bluestore oncommit callback exec by osd op threads.
If there are multi threads of shard, it will cause out-of order.
For example, threads_per_shard=2
Thread1 Thread2
swap_oncommits(op1_oncommit)
swap_oncommits(op2_oncommit)
OpQueueItem.run(Op3)
op2_oncommit.complete();
op1_oncommit.complete()
This make oncommits out of order.
To avoiding this, we choose a fixed thread which has the smallest
thread_index of shard to do oncommit callback function.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Jason Dillaman [Thu, 6 Sep 2018 13:44:59 +0000 (09:44 -0400)]
librbd: watcher should internally track blacklisted state
Since it will periodically attempt to re-acquire the watch,
it will know when the RADOS client has been blacklisted and
when the blacklist has been removed.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Wed, 19 Sep 2018 18:24:31 +0000 (14:24 -0400)]
rbd-mirror: instantiate the status formatter before changing state
This will avoid a possible race between pre-queued status updates
firing between the time the state has been changed and the formatter
has been instantiated.
Fixes: http://tracker.ceph.com/issues/36084 Signed-off-by: Jason Dillaman <dillaman@redhat.com>