Sam Lang [Wed, 27 Mar 2013 14:35:08 +0000 (09:35 -0500)]
mds: Clear backtrace updates on standby_trim_seg
If the mds is standby, when a segment is trimmed, we need
to clear the backtrace updates list to avoid the following
assertion when the segment is deleted.
Samuel Just [Tue, 26 Mar 2013 22:10:37 +0000 (15:10 -0700)]
ReplicatedPG: send entire stats on OP_BACKFILL_FINISH
Otherwise, we update the stat.stat structure, but not the
stat.invalid_stats part. This will result in a recently
split primary propogating the invalid stats but not the
invalid marker. Sending the whole pg_stat_t structure
also mirrors MOSDSubOp.
Fixes: #4557
Backport: bobtail Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sam Lang [Tue, 26 Mar 2013 13:55:40 +0000 (08:55 -0500)]
mds: CInode::build_backtrace() always incr iter
Always increment the iterator when adding old pools
to the backtrace. This fixes a bug on files where
the layout had been set to a different pool and then
back to the same pool, causing continuous looping in
the build_backtrace() function.
Fixes #4537. Signed-off-by: Sam Lang <sam.lang@inktank.com>
Yehuda Sadeh [Mon, 25 Mar 2013 16:50:33 +0000 (09:50 -0700)]
rgw: bucket index ops on system buckets shouldn't do anything
Fixes: #4508
Backport: bobtail
On certain bucket index operations we didn't check whether
the bucket was a system bucket, which caused the operations
to fail. This triggered an error message on bucket removal
operations.
Sage Weil [Tue, 19 Mar 2013 21:26:16 +0000 (14:26 -0700)]
os/FileJournal: fix aio self-throttling deadlock
This block of code tries to limit the number of aios in flight by waiting
for the amount of data to be written to grow relative to a function of the
number of aios. Strictly speaking, the condition we are waiting for is a
function of both aio_num and the write queue, but we are only woken by
changes in aio_num, and were (in rare cases) waiting when aio_num == 0 and
there was no possibility of being woken.
Fix this by verifying that aio_num > 0, and restructuring the loop to
recheck that condition on each wakeup.
Fixes: #4079 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
mon: AuthMonitor: delete auth_handler while increasing max_global_id
By not deleting and setting NULL the session's auth_handler, we could
hit a scenario in which we'd end up dispatching a previously-wait-listed
auth message and we wouldn't start its auth session.
This only happened when increasing max_global_id via Paxos (in which case
we would wait-list the message) and would only be noticeable when running
with cephx disabled.
Fixes: #4519 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Samuel Just [Tue, 19 Mar 2013 21:45:41 +0000 (14:45 -0700)]
FileJournal: quieter debugging on journal scanning
Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com>
(cherry picked from commit 6740d512ac12263f7bee370bc14b1179f83af5be)
Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Gary Lowell <gary.lowell@inktank.com>
(cherry picked from commit e1e2d5d2176cc9debd436ba944e6ca65b3253c8a)
Josh Durgin [Sat, 16 Mar 2013 00:28:13 +0000 (17:28 -0700)]
librbd: optionally wait for a flush before enabling writeback
Older guests may not send flushes properly (i.e. never), so if this is
enabled, rbd_cache=true is safe for them transparently.
Disable by default, since it will unnecessarily slow down newer guest
boot, and prevent writeback caching for things that don't need to send
flushes, like the command line tool.
Refs: #3817 Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 19 Mar 2013 17:15:41 +0000 (10:15 -0700)]
mon/Paxos: set state to RECOVERING during restart
This ensures that the paxos state is not active when the PaxosService
restart() methods run right afterwards, and that EAGAIN waiters will get
requeued appropriately.
Sam Lang [Mon, 18 Mar 2013 21:59:04 +0000 (16:59 -0500)]
mds: Handle ENODATA returned from getxattr
The osds might return ENODATA if we request an
xattr that doesn't exist. In this case, we're
requesting the 'parent' xattr so that we can
remove all the forwarding pointers, but the xattr
may not have been written (which only happens on
log segment trim), so we don't assert here.
Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
mon: HealthMonitor: Keep track of monitor cluster's health
The HealthMonitor builds upon the QuorumService interface, and should be
used to keep track of all and any relevant information about the monitor
cluster (maybe even about all the cluster if need be).
This patch also introduces the HealthService interface, used to define
a HealthMonitor service, responsible for dispatching 'MMonHealth' messages
(the QuorumService interface dispatches generic 'Message').
Based on the HealthService interface, we introduce the DataHealthService
class, a service that will track disk space consumption by the monitors,
warn when a given threshold is crossed, and gracefully shutdown the monitor
if disk space usage hits critical levels that might affect the correct
monitor behavior.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
mon: QuorumService: Allow for services quorum-bound to be easily created
As the monitor grows in features, we have been dumping them in the Monitor
class as they don't really fit anywhere else.
Most of those latest features have been, and some of the future changes
will also be, quorum-bounded. By that we mean that these features tend
to require a quorum to be present in order to work.
Although we already have the PaxosService interface, it really isn't
adequate for this kind of features, as they don't really require Paxos,
nor do they access the store. Furthermore, they don't really need to
tick at the same rate as the monitor, and can be fairly independent.
Therefore we now introduce the concept of a QuorumService, a class to be
built upon, managing the tick and dispatch for any kind of service
basically requiring a quorum to function.
Among the already existing monitor features that could take advantage of
this new class we can find the Timecheck infrastructure, as it is by
nature quorum bounded. The monitor store sync could also take advantage
of this service, although it doesn't really require a quorum to work,
and even the PaxosService-related classes could use this.
This patch also introduces the MMonQuorumService base class, to be used
by any message that should want to.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Sam Lang [Mon, 18 Mar 2013 19:40:48 +0000 (14:40 -0500)]
client: Remove unecessary set_inode() in _rmdir()
With the recent changes in fc80c1dc6ee315ae5e039986602ffadba46cb43b,
we only allow setting the inode once on a MetaRequest. This triggered
a bug in _rmdir(), where the parent dir inode passed in and being set
on the MetaRequest, and then also setting the dir inode on the MetaRequest.
Removing the set_inode() using the parent dir inode resolves this issue.
Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Fix interator handling in ~TestFileStoreState(). After std::map::erase()
the used iterator is invalid. Use a while-loop and erase the object with
post-incremented iterator instead.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Mon, 18 Mar 2013 11:45:15 +0000 (12:45 +0100)]
rgw/rgw_rados.cc: make sure range_iter != ranges.end()
Make sure range_iter is valid, set range_iter = next_iter instead of
++range_iter, since next_iter is already checked against ranges.end() and
is the same as ++range_iter.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>