Sage Weil [Fri, 11 Aug 2017 15:58:42 +0000 (11:58 -0400)]
os/bluestore: do not segv on kraken upgrade debug print
When loading an onode from kraken we have a compat path that calls
get_ref before the SharedBlob pointer is initialized. This is fine except
that if debugging is enabled the operator<< on the Blob will segv on
printing *b.shared_blob (which is NULL).
Fix operator<< to print something else if it is NULL. shared_blob does
get set up right after the call to decode() so having it be NULL at this
point is otherwise harmless.
Fixes: http://tracker.ceph.com/issues/20977 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Fri, 11 Aug 2017 16:46:09 +0000 (12:46 -0400)]
os/bluestore: fix clone dirty_range again
If we are cloning a blob for a 1 byte logical extent then dirty_range_begin
will equal _end and we won't dirty the source onode (with possibly newly
shared blobs).
Fix by using a separate flag to indicate whether we are dirtying instead
of overloading the begin/end markers for this. Note that even if they
are equal dirty_range will still dirty the shard in question.
Sage Weil [Fri, 11 Aug 2017 16:11:47 +0000 (12:11 -0400)]
include/ceph_features: incarnation 3 can begin!
With all upgrades passing through luminous, we can now start
reusing bits retired in luminous. Our sentinel bitmask will be the
combination of SERVER_MIMIC and SERVER_JEWEL (i.e.,
CEPH_FEATUREMASK_SERVER_MIMIC).
amitkuma [Tue, 8 Aug 2017 18:41:13 +0000 (00:11 +0530)]
messages: Initialization of member variables
Fixes the coverity issues:
** 717271 Uninitialized scalar field
2. uninit_member: Non-static class member from_mds is not initialized
in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member dir_rep is not initialized
in this constructor nor in any functions that it calls.
CID 717271 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
6. uninit_member: Non-static class member discover is not initialized
in this constructor nor in any functions that it calls.
** 717272 Uninitialized scalar field
2. uninit_member: Non-static class member want_base_dir is not initialized
in this constructor nor in any functions that it calls.
CID 717272 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
4. uninit_member: Non-static class member want_xlocked is not initialized
in this constructor nor in any functions that it calls.
** 717274 Uninitialized scalar field
2. uninit_member: Non-static class member wanted_base_dir is not initialized
in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member wanted_xlocked is not initialized
in this constructor nor in any functions that it calls.
6. uninit_member: Non-static class member flag_error_dn is not initialized
in this constructor nor in any functions that it calls.
8. uninit_member: Non-static class member flag_error_dir is not initialized
in this constructor nor in any functions that it calls.
10. uninit_member: Non-static class member unsolicited is not initialized
in this constructor nor in any functions that it calls.
12. uninit_member: Non-static class member dir_auth_hint is not initialized
in this constructor nor in any functions that it calls.
CID 717274 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
14. uninit_member: Non-static class member starts_with is not initialized
in this constructor nor in any functions that it calls.
** 717275 Uninitialized scalar field
CID 717275 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member from is not initialized in this
constructor nor in any functions that it calls.
kungf [Thu, 10 Aug 2017 12:05:00 +0000 (20:05 +0800)]
mon: return directly after health_events_cleanup
when mon_health_to_clog was set false, all health events was cleanup,
no need to judge the change of mon_health_to_clog_interval and
mon_health_to_clog_tick_interval.
Alex Mikheev [Mon, 12 Jun 2017 08:32:38 +0000 (08:32 +0000)]
msg/async/rdma: fixes crash in fio
fio creates multiple CephContext in a single process.
Crash(es) happen because rdma stack has a global resources that
are still used from one ceph context while have already been destroyed
by another context.
The commit removes global instances of RDMA dispatcher and infiniband
and makes them context (rdma stack) specific.
Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Alex Mikheev <alexm@mellanox.com>
Consider the following user case:
(1) randomly choose some OSDs(e.g., from different hosts) and try to make them for private use only,
say, by grouping them into 'pool1'
(2) ceph osd crush set-device-class pool1 'OSDs from (1)'
(3) ceph osd crush rule create-replicated rule_for_pool1 default host pool1
(4) ceph osd pool rename pool1 pool2
(5) ceph osd crush class rename pool1 pool2
From the above user case, we need to safely change a pool name without worrying
any risk of data migration. That is why the 'osd crush class rename' command
is still needed here.
David Zafman [Wed, 9 Aug 2017 15:43:57 +0000 (08:43 -0700)]
qa: Fix races with waiting for scrubs
The trigger_scrub sets the last_scrub_stamp backwards to
force a scheduled scrub. In a small window this stamp could get propagated
to the mgr. A test failure occurred because wait_for_scrub() was confused
by seeing a backward moving date.
The most critical change is having wait_for_scrub() make sure that the
date advances past the previous in value.
A test failed because the random backoff kept delayed triggered scrub, so
set osd_scrub_backoff throughout.
Greg Farnum [Wed, 9 Aug 2017 21:34:44 +0000 (14:34 -0700)]
mdsmon: treat the osdmon correctly when doing plugged updates
Make sure it's writeable before invoking changes, and propose_pending()
on it when we're done.
Make the PaxosService::C_RetryMessage public so we can do this from FSCommands.
David Zafman [Tue, 1 Aug 2017 22:19:01 +0000 (15:19 -0700)]
qa: ceph-helpers.sh fixes
Add missing teardown to cleanup test directory
Fix pgid due to elimination of initial default pool
Testing could never fail because run_tests return ignored
Sage Weil [Wed, 9 Aug 2017 20:40:43 +0000 (16:40 -0400)]
qa/suites/upgrade/jewel-x/parallel: thrash layout
We can't kill and restart osds because that will interfere with
the upgrade process. We can, however, thrash the layout by
tweaking osd weights and so on. This will exercise osd recovery
paths during the upgrade that aren't normally exercised (outside
of stress-split..which doesn't upgrade individual osds while they
are non-clean).
Sage Weil [Wed, 9 Aug 2017 16:50:57 +0000 (12:50 -0400)]
osd/PG: force rebuild of missing set on jewel upgrade
Previously we were detecting the need to rebuild missing based on
whether the "divergent_priors" omap key was present. Unfortunately,
jewel does not always set this, so it is not a reliable indicator.
(It only gets set if you actually have a divergent prior at some
point in the PG's life time on that OSD.)
Fix by using the info_struct_v on the PG to detect whether we need
to do the conversion. We didn't bump the value when we adding
the missing persistence, but the fastinfo was also added during
the same period between jewel and kraken, so it will work just as
well.
Fixes: http://tracker.ceph.com/issues/20958 Signed-off-by: Sage Weil <sage@redhat.com>