From: Sage Weil Date: Thu, 8 May 2014 21:19:22 +0000 (-0700) Subject: osd/ReplicatedPG: carry CopyOpRef in copy_from completion X-Git-Tag: v0.80.7~1^2 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=b8d2fc72ea54eb17611d7ac90be4da6c4e4e7485;p=ceph.git osd/ReplicatedPG: carry CopyOpRef in copy_from completion There is a race with copy_from cancellation. The internal Objecter completion decodes a bunch of data and copies it into pointers provided when the op is queued. When we cancel, we need to ensure that we can cope until control passes back to our provided completion. Once we *do* get into the (ReplicatedPG) callbacks, we will bail out because the tid in the CopyOp or FlushOp no longer matches. Fix this by carrying a ref to keep the copy-from targets alive, and clearing out the tids that we cancel. Note that previously, the trigger for this was that the tid changes when we handle a redirect, which made the op_cancel() call fail. With the coming Objecter changes, this will no longer be the case. However, there are also locking and threading changes that will make cancellation racy, so we will not be able to rely on it always preventing the callback. Either way, this will avoid the problem. Fixes: #7588 Signed-off-by: Sage Weil (cherry picked from commit 589b639af7c8834a1e6293d58d77a9c440107bc3) --- diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc index 5600466fa4355..d23e6fc9292ac 100644 --- a/src/osd/ReplicatedPG.cc +++ b/src/osd/ReplicatedPG.cc @@ -5343,9 +5343,11 @@ struct C_Copyfrom : public Context { hobject_t oid; epoch_t last_peering_reset; ceph_tid_t tid; - C_Copyfrom(ReplicatedPG *p, hobject_t o, epoch_t lpr) + ReplicatedPG::CopyOpRef cop; + C_Copyfrom(ReplicatedPG *p, hobject_t o, epoch_t lpr, + const ReplicatedPG::CopyOpRef& c) : pg(p), oid(o), last_peering_reset(lpr), - tid(0) + tid(0), cop(c) {} void finish(int r) { if (r == -ECANCELED) @@ -5592,7 +5594,7 @@ void ReplicatedPG::_copy_some(ObjectContextRef obc, CopyOpRef cop) &cop->rval); C_Copyfrom *fin = new C_Copyfrom(this, obc->obs.oi.soid, - get_last_peering_reset()); + get_last_peering_reset(), cop); gather.set_finisher(new C_OnFinisher(fin, &osd->objecter_finisher)); @@ -6023,8 +6025,10 @@ void ReplicatedPG::cancel_copy(CopyOpRef cop, bool requeue) if (cop->objecter_tid) { Mutex::Locker l(osd->objecter_lock); osd->objecter->op_cancel(cop->objecter_tid, -ECANCELED); + cop->objecter_tid = 0; if (cop->objecter_tid2) { osd->objecter->op_cancel(cop->objecter_tid2, -ECANCELED); + cop->objecter_tid2 = 0; } } @@ -6440,6 +6444,7 @@ void ReplicatedPG::cancel_flush(FlushOpRef fop, bool requeue) if (fop->objecter_tid) { Mutex::Locker l(osd->objecter_lock); osd->objecter->op_cancel(fop->objecter_tid, -ECANCELED); + fop->objecter_tid = 0; } if (requeue) { if (fop->op)