From 62d942294a54208cdc82aebf8b536d164cae5dc6 Mon Sep 17 00:00:00 2001 From: Sage Weil Date: Mon, 17 Mar 2014 16:21:17 -0700 Subject: [PATCH] mon/Paxos: commit only after entire quorum acks If a subset of the quorum accepts the proposal and we commit, we will start sharing the new state. However, the mon that didn't yet reply with the accept may still be sharing the old and stale value. The simplest way to prevent this is not to commit until the entire quorum replies. In the general case, there are no failures and this is just fine. In the failure case, we will call a new election and have a smaller quorum of (live) nodes and will recommit the same value. A more performant solution would be to have a separate message invalidate the old state and commit once we have all invalidations and a majority of accepts. This will lower latency a bit in the non-failure case, but not change the failure case significantly. Later! Fixes: #7736 Signed-off-by: Sage Weil Reviewed-by: Joao Eduardo Luis Reviewed-by: Greg Farnum (cherry picked from commit fa1d957c115a440e162dba1b1002bc41fc1eac43) --- src/mon/Paxos.cc | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/src/mon/Paxos.cc b/src/mon/Paxos.cc index fa3e20898426b..aa18d5f16342e 100644 --- a/src/mon/Paxos.cc +++ b/src/mon/Paxos.cc @@ -691,22 +691,21 @@ void Paxos::handle_accept(MMonPaxos *accept) assert(g_conf->paxos_kill_at != 6); - // new majority? - if (accepted.size() == (unsigned)mon->monmap->size()/2+1) { + // only commit (and expose committed state) when we get *all* quorum + // members to accept. otherwise, they may still be sharing the now + // stale state. + // FIXME: we can improve this with an additional lease revocation message + // that doesn't block for the persist. + if (accepted == mon->get_quorum()) { // yay, commit! - // note: this may happen before the lease is reextended (below) - dout(10) << " got majority, committing" << dendl; + dout(10) << " got majority, committing, done with update" << dendl; commit(); if (!do_refresh()) goto out; if (is_updating()) commit_proposal(); finish_contexts(g_ceph_context, waiting_for_commit); - } - // done? - if (accepted == mon->get_quorum()) { - dout(10) << " got quorum, done with update" << dendl; // cancel timeout event mon->timer.cancel_event(accept_timeout_event); accept_timeout_event = 0; -- 2.39.5