John Spray [Mon, 13 Feb 2017 12:01:40 +0000 (12:01 +0000)]
mds: write_head when reading in PurgeQueue
Previously write_head calls were only generated
on the write side, so if you had a big queue
and were just working through consuming it, you
wouldn't record your progress, and on a daemon
restart would end up repeating a load of work.
John Spray [Mon, 13 Feb 2017 12:00:42 +0000 (12:00 +0000)]
osdc: expose Journaler::write_head_needed
So that callers on the read side can optionally
do their own write_head calls according to
the same condition that Journaler uses
internally for its write_head during _flush() condition.
John Spray [Mon, 13 Feb 2017 00:16:29 +0000 (00:16 +0000)]
osdc: less aggressive prefetch in read/write Journaler
Previously, if doing a write/is_readable/write/is_readable sequence,
you'd end up doing a flush after every write, even though there
was already a flush in flight that would advance the readable-ness
of the journal.
Because this flush-during-read path is only active when using
a read/write journal such as in PurgeQueue, tweak the behaviour
to suit this case.
This was an unused code path. If anyone set a nonzero
value here the MDS would crash because the Timer implementation
has changed since this code was written, and now requires
add_event_after callers to hold the right lock.
John Spray [Wed, 8 Feb 2017 16:24:24 +0000 (16:24 +0000)]
mds: expose progress during PurgeQueue drain
We don't track an item count, but we do have
a number of bytes left in the Journaler, so
can use that to give an indication of progress
while the MDS rank shutdown is waiting for
the PurgeQueue to do its thing.
Also lift the ops limit on the PurgeQueue
when it goes into the drain phase.
John Spray [Mon, 5 Dec 2016 15:40:00 +0000 (15:40 +0000)]
mds: move throttling code out of StrayManager
This will belong in PurgeQueue from now on. We assume
that there is no need to throttle the rate of insertions
into purge queue as it is an efficient sequentially-written
journal.
John Spray [Thu, 1 Dec 2016 20:22:43 +0000 (20:22 +0000)]
mds: use a persistent queue for purging deleted files
To avoid creating stray directories of unbounded size
and all the associated pain, use a more appropriate
datastructure to store a FIFO of inodes that need
purging.
Fixes: http://tracker.ceph.com/issues/11950 Signed-off-by: John Spray <john.spray@redhat.com>
John Spray [Thu, 1 Dec 2016 19:10:35 +0000 (19:10 +0000)]
osdc/Journaler: wrap recover() completion in finisher
Otherwise, the callback will deadlock if it in turn
calls into any Journaler functions. Don't care
about performance because we do this once at startup.
John Spray [Thu, 1 Dec 2016 18:59:26 +0000 (18:59 +0000)]
osdc/Journaler: add have_waiter()
Allows users of wait_for_readable to conveniently
see if there is already a waiter. Yes, they could
do this themselves, but I'd rather peek at an existing
variable than add a new one caller-side.
John Spray [Thu, 1 Dec 2016 15:27:39 +0000 (15:27 +0000)]
osdc/Journaler: remove incorrect assertion
This asserted that flush_pos would be ahead of
safe_pos after calling _flush. However, this
is not guaranteed to be the case because
prezeroing might prevent us from flushing
right now.
Sage Weil [Fri, 3 Mar 2017 03:20:08 +0000 (21:20 -0600)]
osdc/Objecter: resend RWORDERED ops on full
Our condition for respecting the FULL flag is complex, and involves
the WRITE | RWORDERED flags vs the FULL_FORCE | FULL_TRY flags. Previously,
we could block a read bc of RWORDRED but not resend it later.
Fix by capturing the complex condition in a respects_full() bool and using
it both for the blocking-on-send and resending-on-possibly-notfull-later
checks.
Fixes: http://tracker.ceph.com/issues/19133 Signed-off-by: Sage Weil <sage@redhat.com>
Matt Benjamin [Tue, 7 Mar 2017 14:48:57 +0000 (09:48 -0500)]
rgw_file: fix fs_inst progression
Reported by Gui Hecheng<guimark@126.com>. This change is a
variation on proposed fix by Dan Gryniewicz<dang@redhat.com>
to take root_fh.state.dev as fs_inst for new handles.
Fixes: http://tracker.ceph.com/issues/19214 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
At the end of start_rgw() we wait till establishing HTTP connections
with RadosGW become possible. However, if RadosGW uses the FastCGI,
the condition can't be fulfilled without spawning HTTP server first.
Sage Weil [Thu, 23 Feb 2017 21:35:40 +0000 (16:35 -0500)]
mon/OSDMonitor: generate health warnings for luminous
Note that this tells us how many OSDs are full or nearfull; it
does not include detailed warnings telling you exactly what the
utilization is because we don't have the full osd_stat_t
available. We leave it to ceph-mgr to generate those health
messages.
Sage Weil [Thu, 23 Feb 2017 21:31:21 +0000 (16:31 -0500)]
mon/OSDMonitor: set cluster flags based on osd flags (luminous)
For luminous, set cluster flags based on osd flags. Until
require_luminous is set, stick with the old pgmap-based behavior.
Move the new check to encode_pending so that the cluster flag is
set in the same epoch that the osd state(s) change.
Sage Weil [Thu, 23 Feb 2017 20:57:01 +0000 (15:57 -0500)]
osd: require fullness state changes (as needed) before boot
This ensures that we don't have a down osd that is marked full
go up, then realize it's not actually full, and then clear its
full flag. That would result in a cluster full blip that isn't
needed. This can easily happen if the full_ratio in the osdmap is
increased while the OSD is down.
Sage Weil [Thu, 23 Feb 2017 20:55:35 +0000 (15:55 -0500)]
osd: restructure and simplify internal fullness checks
First, eliminate the useless nearfull failsafe--all it did was
generate a log message, which we can do based on the OSDMap
states.
Add some new helpers.
Unify the cluster nearfull/full vs failsafe states so that
failsafe is a "really" full state that is more severe than
full, so we have NONE, NEARFULL, FULL, FAILSAFE.
Pull the full/nearfull ratios out of the OSDMap (remember that
we require luminous mons, so these will be initialized).