This simply rebuilds the class roots. Normally this should create no
change in the map since whatever was making changes to the map before
should have rebuild the shadow roots at that point.
Sage Weil [Fri, 26 Oct 2018 14:32:27 +0000 (09:32 -0500)]
crushtool: make --reweight re-sum choose_args weight-sets too
This ensures that the weights add us for each weight-set (and each
position). Note that since we don't have anything that actually
creates positional weight-sets, the behavior here might not be what we
want in the end, but for the compat weight-sets (no position), we *do*
keep the weights as a properly summing tree.
Sage Weil [Sun, 14 Oct 2018 19:57:59 +0000 (14:57 -0500)]
crushtool: add --set-subtree-class; do not set class via --reclassify-root
Sometimes we don't want the --reclassify-root to set the class of every
device because a small number of them are (correctly) a different class.
Allow both behaviors by adding a new, separate command to set the class
of all devices beneath a point in the hierarchy and do not implicitly do
that relabeling as part of --reclassify-root.
Sage Weil [Fri, 5 Oct 2018 17:47:37 +0000 (12:47 -0500)]
crushtool: implement --reclassify
Add two modes of reclassification of existing hierarchies:
--classify-root <rootname> <class> will rewrite a hierarchy from an
existing root so that all of its devices are a different class. Rules
that reference that root will be implicitly adjusted to 'take <rootname>
class <class>'. Ids will be preserved.
--classify-bucket <match> <class> <default-parent> will match a pattern
in the bucket name, where % at the beginning or end of the string is used
as a wildcard (e.g., "%-ssd" will match an "-ssd" suffix, "foo-%" will
match a "foo-" prefix). Each such bucket is mapped to a "base" bucket
(with the suffix or prefix), items are labeled with the appropriate
class and the moved to that base bucket, and rules adjusted. The
<default-parent> is used as the parent if the base bucket doesn't exist
and has to be created.
Similarly,
--classify-bucket <bucket> <class> <base-bucket> does the same but
a single existing bucket is mapped to an existing base bucket. For
example, there is often an 'ssd' bucket that is the counterpart for
the 'default' root; '--classify-bucket ssd ssd default' will map it over.
Sage Weil [Fri, 21 Sep 2018 22:15:16 +0000 (17:15 -0500)]
crush/CrushCompiler: fix id scan to include class ids
The bucket stanza may specify the canonical id for the bucket as well as
ids for the shadow trees. Make not of all ids so we can avoid colliding
and reusing them later.
Sage Weil [Wed, 19 Sep 2018 16:44:32 +0000 (11:44 -0500)]
msg,osd: enable unauthenticated Dispatcher for pre-nautilus OSD compat
Before nautilus, osd heartbeats are sent over an unauthenticated channel.
We need support here to allow these connections when they are necessary
for upgrade compatibility.
Sage Weil [Wed, 19 Sep 2018 16:35:42 +0000 (11:35 -0500)]
osd: authenticate ping sessions
Do not set up a Session object, though--nobody cares (currently!).
This avoids having to special-case the generic authorizer validation
code in msg/* to have to handle non-authenticated sessions. Also, it
seems like a good idea to authenticate these sessions!
Sage Weil [Thu, 13 Sep 2018 19:21:04 +0000 (14:21 -0500)]
msg/Messenger: pull authenticator validation into Messenger
This code is essentially identical across the OSD and MDS. The
monitor is annoyingly different, but in a msgr1 specific way that
we can handle carrying here until msgr1 gets ripped out in a
couple years.
Sage Weil [Mon, 15 Oct 2018 13:36:18 +0000 (08:36 -0500)]
Merge PR #24493 into master
* refs/pull/24493/head:
mgr/DaemonState: clean up device life_expectancy values
mgr/devicehealth: warn based on life_expectancy_max
mgr/devicehealth: warn on failing devices at 6 weeks
Tatjana Dehler [Wed, 10 Oct 2018 14:12:32 +0000 (16:12 +0200)]
mgr/dashboard: config options table cleanup
Remove columns 'tags', 'enum_values', 'long_desc', 'type', 'flags',
'daemon_default', 'desc', 'level', 'can_update_at_runtime', 'services',
'max', 'see_also', 'min' and 'source' from table view and add them to
the details.
The table contains 'name', 'value' and 'default' only.
Fixes: http://tracker.ceph.com/issues/34533 Signed-off-by: Tatjana Dehler <tdehler@suse.com>
Sage Weil [Thu, 13 Sep 2018 19:00:44 +0000 (14:00 -0500)]
mon: fix ref cycle breakage in handle_forward
We now rely on the session -> connection ref for printing
remote addr, peer_global_id, and so on. Change this code to
break the ref cycle instead by removing the con->session link,
which is only needed by the MonOpRequest ctor called at the top
of _ms_dispatch.
Sage Weil [Tue, 9 Oct 2018 22:08:46 +0000 (17:08 -0500)]
mon: use MonOpRequest get_session() instead of PaxosServiceMessage's
The PaxosServiceMessage method relies on the msg -> con -> session linkage,
and the con -> session link is not present for forwarded messages. Also,
the message path is redundant and unnecessary.
Sage Weil [Tue, 11 Sep 2018 21:53:15 +0000 (16:53 -0500)]
mon: use ms_handle_authentication to parse caps
The situation is a bit different here than the MDS and OSD because the
authentication happens from MAuth instead of ms_verify_authorizer, but
we are moving toward being more consistent.
This is shown to corrupt otherwise healthy rocksdb databases. Rename to
make it clear that it is generally not safe to run and shoud only be used
as a last resort.
Sage Weil [Fri, 12 Oct 2018 21:14:37 +0000 (16:14 -0500)]
Merge PR #24270 into master
* refs/pull/24270/head:
osd: make 'cache drop' command require 'executable' permission
osd: rename 'drop cache' and 'get cache stats' to group them by component
doc: add documentation for 'drop cache' and 'get cache stats'
osd: don't print osdmap cache stats in 'get cache stats' command
osd: do not clear osdmap cache on 'drop cache' command
osd: offload dumping cache stats to the object store
osd: pass a stream to flush_cache commands for more verbosity
osd: implement flush_cache() method for Filestore
osd: add clear_cache and get_cache_object_count commands
Reviewed-by: Gregory Farnum <gfarnum@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Sage Weil [Fri, 12 Oct 2018 21:14:19 +0000 (16:14 -0500)]
Merge PR #24312 into master
* refs/pull/24312/head:
osd: kill the std::stringstream in ReplicatedBackend::do_repop_reply.
osd: bump-up the dout level in PGLog::write_log_and_missing.
Neha Ojha [Tue, 9 Oct 2018 22:57:15 +0000 (15:57 -0700)]
osd/PrimaryLogPG.cc: reassign size only when object size > truncate_size
Before setting size equal to op.extent.truncate_size, we need to check
if the size of the object is greater than the truncate_size. We do not
need to set size to op.extent.truncate_size, in the case where the size of
the object is less than op.extent.truncate_size.
Without this change, we were always setting size =
op.extent.truncate_size, when (seq < op.extent.truncate_seq) and
(op.extent.offset + op.extent.length > op.extent.truncate_size), were both
true. This ended up in:
1. overestimating the size of the object
2. not considering the correct size of the object, for
the later checks, which calculate op.extent.length for the read ops
3. causing crashes when trying to read more data than what was present