Guang Yang [Thu, 26 Feb 2015 08:13:12 +0000 (08:13 +0000)]
osd: fix negative degraded objects during backfilling
When there is deleting requests during backfilling, the reported number of degraded
objects could be negative, as the primary's num_objects is the latest (locally) but
the number for replicas might not reflect the deletings. A simple fix is to ignore
the negative subtracted value.
This can be done better in a separate script, which puts these in
CEPH_EXTRA_CONFIGURE_ARGS. In particular, this lets us enable
lttng for gitbuilder builds, but not release builds.
Jason Dillaman [Mon, 16 Mar 2015 22:40:49 +0000 (18:40 -0400)]
librbd: snap_remove should ignore -ENOENT errors
If the attempt to deregister the snapshot from the parent
image fails with -ENOENT, ignore the error as it is safe
to assume that the child is not associated with the parent.
Samuel Just [Thu, 26 Mar 2015 17:26:48 +0000 (10:26 -0700)]
ReplicatedPG::cancel_pull: requeue waiters as well
If we are in recovery_wait, we might not recover that object as part of
recover_primary for some time. Worse, if we are waiting on a backfill
which is blocked waiting on a copy_from on the missing object in
question, it can become a dead lock.
Fixes: 11244
Backport: firefly Signed-off-by: Samuel Just <sjust@redhat.com>
Sage Weil [Fri, 27 Mar 2015 22:35:21 +0000 (15:35 -0700)]
common: send cluster log messages to 'cluster' channel by default
The CLOG_CHANNEL_DEFAULT constant was being abused for two purposes:
- the default channel to log messages to
- the name of the config option key in the key/value pair string that is
used for the default option, e.g. "default=true foo=false bar=false"
Fix this by making the config option key CLOG_CONFIG_DEFAULT_KEY and
replacing throughout, and changing CLOG_CHANNEL_DEFAULT to "cluster" (as
it should be and has been historically).
Fixes: #11177 Signed-off-by: Sage Weil <sage@redhat.com>
Samuel Just [Tue, 24 Mar 2015 17:48:02 +0000 (10:48 -0700)]
PG: set/clear CREATING in Primary state entry/exit
Previously, we did not actually set it when we got a pg creation message from
the mon. It would actually get set on the first start_peering_interval after
that point. If we don't get that far, but do send a stat update to the mon, we
can end up with 11197. Instead, let's just set it and clear it upon entry into
and exit from the Primary state.
Fixes: 11197 Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Tue, 24 Mar 2015 22:14:34 +0000 (15:14 -0700)]
ReplicatedPG: trim backfill intervals based on peer's last_backfill_started
Otherwise, we fail to trim the peer's last_backfill_started and get bug 11199.
1) osd 4 backfills up to 31bccdb2/mira01213209-286/head (henceforth: foo)
2) Interval change happens
3) osd 0 now finds itself backfilling to 4 (lb=foo) and osd.5
(lb=b6670ba2/mira01213209-160/snapdir//1, henceforth: bar)
4) recover_backfill causes both 4 and 5 to scan forward, so 4 has an interval
starting at foo, 5 has an interval starting at bar.
5) Once those have come back, recover_backfill attempts to trim off the
last_backfill_started, but 4's interval starts after that, so foo remains in
osd 4's interval (this is the bug)
7) We serve a copyfrom on foo (sent to 4 as well).
8) We eventually get to foo in the backfilling. Normally, they would have the
same version, but of course we don't update osd.4's interval from the log since
it should not have received writes in that interval. Thus, we end up trying to
recover foo on osd.4 anyway.
9) But, an interval change happens between removing foo from osd.4 and
completing the recovery, leaving osd.4 without foo, but with lb >= foo
Fixes: #11199
Backport: firefly Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Fri, 20 Mar 2015 22:28:15 +0000 (15:28 -0700)]
ReplicatedPG::promote_object: check scrubber and block if necessary
Otherwise, we might attempt to promote into an in-progress scrub
interval causing 11156. I would have added a return value to
promote_object(), but could not find an existing user which
cared to distinguish the cases, even with a null op passed.
All existing users are in maybe_handle_cache. The ones which
pass a null op are for promoting the object in parallel
with a proxy -- a case where not actually performing the promote
does not really matter.
Fixes: #11156 Signed-off-by: Samuel Just <sjust@redhat.com>
Currently, this method also returns true if the object is backfilling.
This commit was reverted earlier in the branch in order to make the
other reverts clean. It's actually a nice rename though, so I'm
re-cherry-picking it.
Signed-off-by: Samuel Just <sjust@redhat.com>
Conflicts:
src/osd/ReplicatedPG.cc
Sage Weil [Thu, 19 Mar 2015 23:27:17 +0000 (16:27 -0700)]
osd: only complain about stored vs actual digest if all peers support it
If we have a mixed cluster of hammer and pre-hammer OSDs, we will fall back
to using 0 as the initial crc32c value. However, if the primary has a
stored digest, it currently compares its value to the reported value (w/
the wrong initial value) and complains.
There are two possible fixes:
- avoid storing a digest if all peers don't support it, or
- avoid complaining on scrub if all peers don't support it.
The latter is easier, and this fix also has the benefit of fixing the bug
even for clusters where this has already happened.
Fixes: #11102 Signed-off-by: Sage Weil <sage@redhat.com>
Loic Dachary [Wed, 18 Mar 2015 13:17:00 +0000 (14:17 +0100)]
osd: erasure-code-profile incremental rm before set
It is possible for an incremental change to have both a rm and a set for
a given erasure code profile. It only happens when a rm is followed by a
set. When a set is followed by a rm, the rm will remove the pending set
in the incremental change.
The logic is the same for pool create and pool delete.
We must apply the incremental erasure-code-profile removal before the
creation otherwise rm and set in the same proposal will ignore the set.
This fix is minimal. A better change would be that erasure-code-profile
set checks if there is a pending removal and wait_for_finished_proposal
before creating.
Loic Dachary [Wed, 18 Mar 2015 10:40:36 +0000 (11:40 +0100)]
mon: informative message when erasure-code-profile set fails
When erasure-code-profile set refuses to override an existing profile,
it may be non trivial to figure out why. For instance:
ceph osd set default ruleset-failure-domain=host
fails with:
Error EPERM: will not override erasure code profile default
although ruleset-failure-domain=host is documented to be the
default. The error message now includes the two profiles that have been
compared to not be equal so that the user can verify the difference.
Error EPERM: will not override erasure code profile default
because the existing profile
{directory=.libs,k=2,m=1,plugin=jerasure,technique=reed_sol_van}
is different from the proposed profile
{directory=.libs,k=2,m=1,plugin=jerasure,ruleset-failure-domain=host,technique=reed_sol_van}
Sage Weil [Tue, 17 Mar 2015 17:26:13 +0000 (10:26 -0700)]
osd: use (robust) helper for setting exists or clearing whiteout
The current blanket check in prepare_transaction() will trigger only when
there is a net obs.exists change from the commited obs to new_obs.
However, this misses the case where the first osd_op is a delete and then a
subsequent osd_op recreates the object. Changing the whiteout check to
look only at new_obs does not work because it fails to understand when
_delete_oid sets the whiteout and will simply clear it again.
In order to support sequences of delete + create in general, we need to do
the whiteout flag clearing when the actual create happens (to match the
fact that we set it when we process the delete osd_op). Use a helper to
do this and consolidate most other obs.exists = true code to use it.
Backport: giant, firefly Signed-off-by: Sage Weil <sage@redhat.com>
David Zafman [Tue, 17 Mar 2015 03:34:10 +0000 (20:34 -0700)]
Improve "ceph_argparse.py: add stderr note if nonrequired param is invalid"
When processing arguments stash an exception which is seen when looking
for another argument type. If we have extra args at the end, output
information about the exception which probably caused the problem.