clever215 [Wed, 25 Nov 2015 16:31:48 +0000 (11:31 -0500)]
rgw: add an inspection to the field of type when assigning user caps
Bug #13096
This modification adds the check to the field of type of a user's capality while previous versions set it any value. i.e. we limit the option of types to the 9 certain values, which are "users|buckets|metadata|usage|zone|bilog|mdlog|datalog|ops These 9 choosens are found in ceph documents and in source codes.
John Spray [Thu, 15 Oct 2015 00:31:16 +0000 (01:31 +0100)]
client: close mds sessions in shutdown()
Usually this happens in unmount(), but when we
have instantiated Client without mounting (to
send MDS commands), we need to handle closing
any open sessions in shutdown as well.
This is the correct replacement for the mark_down()
call that was removed from handle_command_reply
in the last commit.
Xinze Chi [Fri, 20 Nov 2015 12:59:16 +0000 (20:59 +0800)]
mon: fix osd failure info in mon
when the network adapter of node A run into error, osd in this node
would tell mon other osd's heartbeat is timeout also. So when rebind
fail after retry 3 times, the osd should cancel in-flight failure report
send to mon before.
Sage Weil [Mon, 16 Nov 2015 16:32:34 +0000 (11:32 -0500)]
osdc/Objecter: call notify completion only once
If we race with a reconnect we could get a second notify message
before the notify linger op is torn down. Ensure we only ever
call the notify completion once to prevent a segfault.
Fixes: #13805 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Sat, 14 Nov 2015 03:27:14 +0000 (22:27 -0500)]
mon/OSDMonitor: simplify failure reporters vs reports logic
Since each OSD only sends a failure report for a given peer once,
we don't need to count reports vs reporters separately. (This was
probably a bad idea anyway.) Remove this logic and the associated
config option.
Reported-by: Greg Farnum <gfarnum@redhat.com> Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Sat, 14 Nov 2015 03:11:17 +0000 (22:11 -0500)]
osd: simplify pg creation
We used to have a complicated pg creation process in which we
would query any previous mappings for the pg before we created the
new 'empty' pg locally. The tracking of the prior mappings was
very simple (and broken), but it didn't really matter because the
mon would resend pg create messages periodically. Now it doesn't,
so that broke.
However, none of this is necessary: the PG peering process does
all of the same things. Namely, it
- enumerates past intervals
- determines which ones may have been rw
- queries OSDs from each one to gather any potential changes
This is a more robust version of what the creation code was (or
should have been doing). So, let's rip it all out and let
peering handle it. As long as the newly instantiated PG sets
last_epoch_started and _clean to the created epoch we will probe
and consider all of these prior mappings and find any previous
instance of the PG (if one existed).
Sage Weil [Mon, 12 Oct 2015 02:06:33 +0000 (22:06 -0400)]
mon/PGMonitor: avoid useless pg gets when pool is deleted
If the .0 pg no longer exists, we know the entire pool was
deleted, and can avoid querying every other pg. (This is a good
thing because leveldb and rocksdb can be very slow to query
missing keys.)
Sage Weil [Thu, 8 Oct 2015 16:13:40 +0000 (12:13 -0400)]
mon/PGMonitor: revamp how pg creates are tracked
Previously we were calculating and managing in-core state that
wasn't committed as part of the pg_map, leading to all sorts of
ugliness that didn't really work. Instead,
* set mapping in all creating pgs in the committed pg_map
* make all pg create message sending be based on committed state
* update mappings for creating pgs every time we consume a new
osdmap, so that we have a reliable/stable epoch to attach to
it.
In particular, having that stable epoch means we have a reference
we can put in the pg create message that will also be used for
the subscription version. That way OSDs get consistent creates
from any mon.
Sage Weil [Wed, 7 Oct 2015 01:39:33 +0000 (21:39 -0400)]
mon/PGMonitor: send pg creates via persistent subscriptions, not spam
Generate and send pg create messages only for those OSDs who have
subscribed on this monitor. This is N time more efficient (where there
are N monitors) than the previous method.
Sage Weil [Thu, 8 Oct 2015 16:14:49 +0000 (12:14 -0400)]
mon/OSDMonitor: do not prime pg_temp for creating pgs
It will be less work for the old primary to ignore the create message
and the new one to query it and find nothing that for the slightly more
complicated peering and removal process to happen. Also, this reduces
bloat in the OSDMap a bit.
Sage Weil [Fri, 2 Oct 2015 13:15:33 +0000 (09:15 -0400)]
mon: disabled rocksdb compression when used as the backend
This significantly reduced CPU utilization on the bigbang scale
testing cluster at CERN. Note that it is already disabled for
leveldb by default (in ceph_mon.cc).
Sage Weil [Fri, 2 Oct 2015 13:06:29 +0000 (09:06 -0400)]
osd: cap adjusted max mon report interval at 2/3 of timeout
This ensures that we don't throttle back mon reports so much that
the mon times out out due to no pg stat reports. Since there is
little value is having a lower max anyway, just set this at an
upper bound (relative to the mon's timeout value).
Sage Weil [Wed, 30 Sep 2015 01:03:53 +0000 (21:03 -0400)]
osd: protect mon reporting with mon_report_lock
We need an exclusive lock over paths that update state related to
mon reports, lest they step on fields like up_thru_*, *stats_ack*,
last_mon_report, and so on. Everybody still needs a read lock
on map_lock too to get a stable OSDMap epoch.
Sage Weil [Wed, 23 Sep 2015 21:58:15 +0000 (17:58 -0400)]
osd: introduce explicit preboot stage
We want to separate the stage where we do a bunch of work
prior to booting (but intend to eventually boot), like when we
get maps and wait to be healthy, from the point after we've sent
the boot message while we are just waiting for a response (so that
we can avoid resending that boot message needlessly).
- start at PREBOOT in start_boot()
- transition to BOOTING in _send_boot()
- only call _preboot() while in PREBOOT state