From 65ed99be85f285ac501a14224b185364c79073a9 Mon Sep 17 00:00:00 2001 From: Jim Schutt Date: Thu, 27 Sep 2012 15:56:15 -0600 Subject: [PATCH] PG: Do not discard op data too early Under a sustained cephfs write load where the offered load is higher than the storage cluster write throughput, a backlog of replication ops that arrive via the cluster messenger builds up. The client message policy throttler, which should be limiting the total write workload accepted by the storage cluster, is unable to prevent it, for any value of osd_client_message_size_cap, under such an overload condition. The root cause is that op data is released too early, in op_applied(). If instead the op data is released at op deletion, then the limit imposed by the client policy throttler applies over the entire lifetime of the op, including commits of replication ops. That makes the policy throttler an effective means for an OSD to protect itself from a sustained high offered load, because it can effectively limit the total, cluster-wide resources needed to process in-progress write ops. Signed-off-by: Jim Schutt --- src/osd/ReplicatedPG.cc | 4 ---- 1 file changed, 4 deletions(-) diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc index 3ea7f350bb7ac..0ee19b720ac34 100644 --- a/src/osd/ReplicatedPG.cc +++ b/src/osd/ReplicatedPG.cc @@ -3576,10 +3576,6 @@ void ReplicatedPG::op_applied(RepGather *repop) dout(10) << "op_applied " << *repop << dendl; if (repop->ctx->op) repop->ctx->op->mark_event("op_applied"); - - // discard my reference to the buffer - if (repop->ctx->op) - repop->ctx->op->request->clear_data(); repop->applying = false; repop->applied = true; -- 2.39.5