I believe the problem here is that the last_metadata change is lost in a
ECANCELED/EAGAIN transaction but the pending map change goes through in
the next one. I've been unable to find an exact way to reproduce this.
The problem seems to occur when upgrades are performed which would
indicate shuffling in the monitors where quorum would be lost repeatedly.
This seems to be the most likely explanation so let's go ahead and make
this change even without the reproducer. In any case, it has the added
benefit of batching the pending map update (to up:standby) with the
last_metadata update.
Fixes: https://tracker.ceph.com/issues/24403
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit
6f69fe9739a23974a46a4e13e4c29b431d95acc4)
bufferlist bl;
encode(pending_metadata, bl);
t->put(MDS_METADATA_PREFIX, "last_metadata", bl);
- paxos.trigger_propose();
}
void MDSMonitor::remove_from_metadata(const FSMap &fsmap, MonitorDBStore::TransactionRef t)