John Spray [Mon, 16 Nov 2015 10:57:56 +0000 (10:57 +0000)]
mon: don't require OSD W for MRemoveSnaps
Use ability to execute "osd pool rmsnap" command
as a signal that the client should be permitted
to send MRemoveSnaps too.
Note that we don't also require the W ability,
unlike Monitor::_allowed_command -- this is slightly
more permissive handling, but anyone crafting caps
that explicitly permit "osd pool rmsnap" needs to
know what they are doing.
Fixes: #13777 Signed-off-by: John Spray <john.spray@redhat.com>
Sage Weil [Wed, 25 Nov 2015 21:40:13 +0000 (16:40 -0500)]
mon/OSDMonitor: block 'ceph osd pg-temp ...' if update is pending
The OSD expects it's pg_temp update requests to succeed. If it
races with an ill-timed admin request, it can get stuck in
WaitActingChange indefinitely.
This is only a real problem now that the OSD/mon interaction has
been updated with wip-bigbang; previously we would retry (although
it would take a while). Backporting is optional.
Sage Weil [Mon, 16 Nov 2015 16:32:34 +0000 (11:32 -0500)]
osdc/Objecter: call notify completion only once
If we race with a reconnect we could get a second notify message
before the notify linger op is torn down. Ensure we only ever
call the notify completion once to prevent a segfault.
Fixes: #13805 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Sat, 14 Nov 2015 03:27:14 +0000 (22:27 -0500)]
mon/OSDMonitor: simplify failure reporters vs reports logic
Since each OSD only sends a failure report for a given peer once,
we don't need to count reports vs reporters separately. (This was
probably a bad idea anyway.) Remove this logic and the associated
config option.
Reported-by: Greg Farnum <gfarnum@redhat.com> Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Sat, 14 Nov 2015 03:11:17 +0000 (22:11 -0500)]
osd: simplify pg creation
We used to have a complicated pg creation process in which we
would query any previous mappings for the pg before we created the
new 'empty' pg locally. The tracking of the prior mappings was
very simple (and broken), but it didn't really matter because the
mon would resend pg create messages periodically. Now it doesn't,
so that broke.
However, none of this is necessary: the PG peering process does
all of the same things. Namely, it
- enumerates past intervals
- determines which ones may have been rw
- queries OSDs from each one to gather any potential changes
This is a more robust version of what the creation code was (or
should have been doing). So, let's rip it all out and let
peering handle it. As long as the newly instantiated PG sets
last_epoch_started and _clean to the created epoch we will probe
and consider all of these prior mappings and find any previous
instance of the PG (if one existed).
Sage Weil [Mon, 12 Oct 2015 02:06:33 +0000 (22:06 -0400)]
mon/PGMonitor: avoid useless pg gets when pool is deleted
If the .0 pg no longer exists, we know the entire pool was
deleted, and can avoid querying every other pg. (This is a good
thing because leveldb and rocksdb can be very slow to query
missing keys.)
Sage Weil [Thu, 8 Oct 2015 16:13:40 +0000 (12:13 -0400)]
mon/PGMonitor: revamp how pg creates are tracked
Previously we were calculating and managing in-core state that
wasn't committed as part of the pg_map, leading to all sorts of
ugliness that didn't really work. Instead,
* set mapping in all creating pgs in the committed pg_map
* make all pg create message sending be based on committed state
* update mappings for creating pgs every time we consume a new
osdmap, so that we have a reliable/stable epoch to attach to
it.
In particular, having that stable epoch means we have a reference
we can put in the pg create message that will also be used for
the subscription version. That way OSDs get consistent creates
from any mon.
Sage Weil [Wed, 7 Oct 2015 01:39:33 +0000 (21:39 -0400)]
mon/PGMonitor: send pg creates via persistent subscriptions, not spam
Generate and send pg create messages only for those OSDs who have
subscribed on this monitor. This is N time more efficient (where there
are N monitors) than the previous method.
Sage Weil [Thu, 8 Oct 2015 16:14:49 +0000 (12:14 -0400)]
mon/OSDMonitor: do not prime pg_temp for creating pgs
It will be less work for the old primary to ignore the create message
and the new one to query it and find nothing that for the slightly more
complicated peering and removal process to happen. Also, this reduces
bloat in the OSDMap a bit.
Sage Weil [Fri, 2 Oct 2015 13:15:33 +0000 (09:15 -0400)]
mon: disabled rocksdb compression when used as the backend
This significantly reduced CPU utilization on the bigbang scale
testing cluster at CERN. Note that it is already disabled for
leveldb by default (in ceph_mon.cc).
Sage Weil [Fri, 2 Oct 2015 13:06:29 +0000 (09:06 -0400)]
osd: cap adjusted max mon report interval at 2/3 of timeout
This ensures that we don't throttle back mon reports so much that
the mon times out out due to no pg stat reports. Since there is
little value is having a lower max anyway, just set this at an
upper bound (relative to the mon's timeout value).
Sage Weil [Wed, 30 Sep 2015 01:03:53 +0000 (21:03 -0400)]
osd: protect mon reporting with mon_report_lock
We need an exclusive lock over paths that update state related to
mon reports, lest they step on fields like up_thru_*, *stats_ack*,
last_mon_report, and so on. Everybody still needs a read lock
on map_lock too to get a stable OSDMap epoch.
Sage Weil [Wed, 23 Sep 2015 21:58:15 +0000 (17:58 -0400)]
osd: introduce explicit preboot stage
We want to separate the stage where we do a bunch of work
prior to booting (but intend to eventually boot), like when we
get maps and wait to be healthy, from the point after we've sent
the boot message while we are just waiting for a response (so that
we can avoid resending that boot message needlessly).
- start at PREBOOT in start_boot()
- transition to BOOTING in _send_boot()
- only call _preboot() while in PREBOOT state
Loic Dachary [Fri, 20 Nov 2015 14:12:20 +0000 (15:12 +0100)]
scripts: ceph-release-notes for development versions
Relax the requirements for titles and issues when --strict is not set.
Collect and display the merge messages when they are not the same as the
PR title.