Samuel Just [Tue, 24 Mar 2015 17:48:02 +0000 (10:48 -0700)]
PG: set/clear CREATING in Primary state entry/exit
Previously, we did not actually set it when we got a pg creation message from
the mon. It would actually get set on the first start_peering_interval after
that point. If we don't get that far, but do send a stat update to the mon, we
can end up with 11197. Instead, let's just set it and clear it upon entry into
and exit from the Primary state.
Fixes: 11197 Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Tue, 24 Mar 2015 22:14:34 +0000 (15:14 -0700)]
ReplicatedPG: trim backfill intervals based on peer's last_backfill_started
Otherwise, we fail to trim the peer's last_backfill_started and get bug 11199.
1) osd 4 backfills up to 31bccdb2/mira01213209-286/head (henceforth: foo)
2) Interval change happens
3) osd 0 now finds itself backfilling to 4 (lb=foo) and osd.5
(lb=b6670ba2/mira01213209-160/snapdir//1, henceforth: bar)
4) recover_backfill causes both 4 and 5 to scan forward, so 4 has an interval
starting at foo, 5 has an interval starting at bar.
5) Once those have come back, recover_backfill attempts to trim off the
last_backfill_started, but 4's interval starts after that, so foo remains in
osd 4's interval (this is the bug)
7) We serve a copyfrom on foo (sent to 4 as well).
8) We eventually get to foo in the backfilling. Normally, they would have the
same version, but of course we don't update osd.4's interval from the log since
it should not have received writes in that interval. Thus, we end up trying to
recover foo on osd.4 anyway.
9) But, an interval change happens between removing foo from osd.4 and
completing the recovery, leaving osd.4 without foo, but with lb >= foo
Fixes: #11199
Backport: firefly Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Fri, 20 Mar 2015 22:28:15 +0000 (15:28 -0700)]
ReplicatedPG::promote_object: check scrubber and block if necessary
Otherwise, we might attempt to promote into an in-progress scrub
interval causing 11156. I would have added a return value to
promote_object(), but could not find an existing user which
cared to distinguish the cases, even with a null op passed.
All existing users are in maybe_handle_cache. The ones which
pass a null op are for promoting the object in parallel
with a proxy -- a case where not actually performing the promote
does not really matter.
Fixes: #11156 Signed-off-by: Samuel Just <sjust@redhat.com>
Currently, this method also returns true if the object is backfilling.
This commit was reverted earlier in the branch in order to make the
other reverts clean. It's actually a nice rename though, so I'm
re-cherry-picking it.
Signed-off-by: Samuel Just <sjust@redhat.com>
Conflicts:
src/osd/ReplicatedPG.cc
Sage Weil [Thu, 19 Mar 2015 23:27:17 +0000 (16:27 -0700)]
osd: only complain about stored vs actual digest if all peers support it
If we have a mixed cluster of hammer and pre-hammer OSDs, we will fall back
to using 0 as the initial crc32c value. However, if the primary has a
stored digest, it currently compares its value to the reported value (w/
the wrong initial value) and complains.
There are two possible fixes:
- avoid storing a digest if all peers don't support it, or
- avoid complaining on scrub if all peers don't support it.
The latter is easier, and this fix also has the benefit of fixing the bug
even for clusters where this has already happened.
Fixes: #11102 Signed-off-by: Sage Weil <sage@redhat.com>
Loic Dachary [Wed, 18 Mar 2015 13:17:00 +0000 (14:17 +0100)]
osd: erasure-code-profile incremental rm before set
It is possible for an incremental change to have both a rm and a set for
a given erasure code profile. It only happens when a rm is followed by a
set. When a set is followed by a rm, the rm will remove the pending set
in the incremental change.
The logic is the same for pool create and pool delete.
We must apply the incremental erasure-code-profile removal before the
creation otherwise rm and set in the same proposal will ignore the set.
This fix is minimal. A better change would be that erasure-code-profile
set checks if there is a pending removal and wait_for_finished_proposal
before creating.
Loic Dachary [Wed, 18 Mar 2015 10:40:36 +0000 (11:40 +0100)]
mon: informative message when erasure-code-profile set fails
When erasure-code-profile set refuses to override an existing profile,
it may be non trivial to figure out why. For instance:
ceph osd set default ruleset-failure-domain=host
fails with:
Error EPERM: will not override erasure code profile default
although ruleset-failure-domain=host is documented to be the
default. The error message now includes the two profiles that have been
compared to not be equal so that the user can verify the difference.
Error EPERM: will not override erasure code profile default
because the existing profile
{directory=.libs,k=2,m=1,plugin=jerasure,technique=reed_sol_van}
is different from the proposed profile
{directory=.libs,k=2,m=1,plugin=jerasure,ruleset-failure-domain=host,technique=reed_sol_van}
Sage Weil [Tue, 17 Mar 2015 17:26:13 +0000 (10:26 -0700)]
osd: use (robust) helper for setting exists or clearing whiteout
The current blanket check in prepare_transaction() will trigger only when
there is a net obs.exists change from the commited obs to new_obs.
However, this misses the case where the first osd_op is a delete and then a
subsequent osd_op recreates the object. Changing the whiteout check to
look only at new_obs does not work because it fails to understand when
_delete_oid sets the whiteout and will simply clear it again.
In order to support sequences of delete + create in general, we need to do
the whiteout flag clearing when the actual create happens (to match the
fact that we set it when we process the delete osd_op). Use a helper to
do this and consolidate most other obs.exists = true code to use it.
Backport: giant, firefly Signed-off-by: Sage Weil <sage@redhat.com>
David Zafman [Tue, 17 Mar 2015 03:34:10 +0000 (20:34 -0700)]
Improve "ceph_argparse.py: add stderr note if nonrequired param is invalid"
When processing arguments stash an exception which is seen when looking
for another argument type. If we have extra args at the end, output
information about the exception which probably caused the problem.
Samuel Just [Sat, 7 Mar 2015 02:30:41 +0000 (18:30 -0800)]
PGBackend: do not rewrite ec object oi checksums
Deep scrub does not actually give us the whole-object checksum for an ec
object, only the checksum for the first shard. We ignore it in scrub
for ec pools anyway in be_select_auth_object.
Jason Dillaman [Thu, 12 Mar 2015 16:56:14 +0000 (12:56 -0400)]
cls_rbd: set_flags can now update snapshots
It's possible for an object map to be invalid only for
a snapshot, so allow snapshot flags to be updated. This
will also be required when rebuilding the object map and
clearing the invalid flag.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Raju Kurunkad [Tue, 10 Mar 2015 07:58:28 +0000 (13:28 +0530)]
XIO: Handle requeue case of XIO messages
If we are not able to send the XIO message using xio_send_msg(),
remove the XIO message from the send Q, before queuing it to the resend
Q. Otherwise, boost will generate a assert.