]> git.apps.os.sepia.ceph.com Git - ceph.git/commitdiff
osd: fix failure report handling during ms_handle_connect()
authorxie xingguo <xie.xingguo@zte.com.cn>
Tue, 29 Mar 2016 06:50:59 +0000 (14:50 +0800)
committerxie xingguo <xie.xingguo@zte.com.cn>
Tue, 29 Mar 2016 07:44:46 +0000 (15:44 +0800)
On connecting to a new monitor, we will resend everything
including the osd failure reports previously sent.
To realize this, we call requeue_failures() to transfer inflight
failure reports from failure_pending to failure_queue first, and
then call send_failures() to do the real delivery job.

The problem here is that the send_failures() never sends a
failure report again if it successfully detects that the doomed osd
is already in the failure_pending set, which is necessary as we don't
want to report monitor of the same osd failure twice in normal case.

This pr solves the above problem by erasing the record from failure_pending
set simultaneously during the requeue_failures() process. So the
succeeding call to send_failures() can resend the failure reports correctly.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
src/osd/OSD.cc

index ccd89831a64436019ed6f36b4a750f1b2d9dc6d1..324bf7fbf68f3cf96dc32335d51b54bddfb2f8d7 100644 (file)
@@ -4964,9 +4964,9 @@ void OSD::requeue_failures()
   unsigned old_pending = failure_pending.size();
   for (map<int,pair<utime_t,entity_inst_t> >::iterator p =
         failure_pending.begin();
-       p != failure_pending.end();
-       ++p) {
+       p != failure_pending.end(); ) {
     failure_queue[p->first] = p->second.first;
+    failure_pending.erase(p++);
   }
   dout(10) << __func__ << " " << old_queue << " + " << old_pending << " -> "
           << failure_queue.size() << dendl;