From 1b075dd44030f17f95dfc1ade6259f049cc7bdc7 Mon Sep 17 00:00:00 2001 From: "J. Eric Ivancich" Date: Fri, 3 Nov 2017 09:15:13 -0400 Subject: [PATCH] rgw: fix BZ 1500904, Stale bucket index entry remains after object deletion We have a race condition: 1. RGW client #1: requests an object be deleted. 2. RGW client #1: sends a prepare op to bucket index OSD #1. 3. OSD #1: prepares the op, adding pending ops to the bucket dir entry 4. RGW client #2: sends a list bucket to OSD #1 5. RGW client #2: sees that there are pending operations on bucket dir entry, and calls check_disk_state 6. RGW client #2: check_disk_state sees that the object still exists, so it sends CEPH_RGW_UPDATE to bucket index OSD (#1) 7. RGW client #1: sends a delete object to object OSD (#2) 8. OSD #2: deletes the object 9. RGW client #2: sends a complete op to bucket index OSD (#1) 10. OSD #1: completes the op 11. OSD #1: receives the CEPH_RGW_UPDATE and updates the bucket index entry, thereby **RECREATING** it Solution implemented: At step #5 the object's dir entry exists. If we get to beginning of step #11 and the object's dir entry no longer exists, we know that the dir entry was just actively being modified, and ignore the CEPH_RGW_UPDATE operation, thereby NOT recreating it. Signed-off-by: J. Eric Ivancich (cherry picked from commit b33f529e79b74314a2030231e1308ee225717743) Conflicts: (backported substantial changes only; omitted cleanups) src/cls/rgw/cls_rgw.cc src/rgw/rgw_rados.cc --- src/cls/rgw/cls_rgw.cc | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/src/cls/rgw/cls_rgw.cc b/src/cls/rgw/cls_rgw.cc index 03314a17ac406..6bd01b06af30a 100644 --- a/src/cls/rgw/cls_rgw.cc +++ b/src/cls/rgw/cls_rgw.cc @@ -1982,6 +1982,18 @@ int rgw_dir_suggest_changes(cls_method_context_t hctx, bufferlist *in, bufferlis } break; case CEPH_RGW_UPDATE: + if (!cur_disk.exists) { + // this update would only have been sent by the rgw client + // if the rgw_bucket_dir_entry existed, however between that + // check and now the entry has diappeared, so we were likely + // in the midst of a delete op, and we will not recreate the + // entry + CLS_LOG(10, + "CEPH_RGW_UPDATE not applied because rgw_bucket_dir_entry" + " no longer exists\n"); + break; + } + CLS_LOG(10, "CEPH_RGW_UPDATE name=%s instance=%s total_entries: %" PRId64 " -> %" PRId64 "\n", cur_change.key.name.c_str(), cur_change.key.instance.c_str(), stats.num_entries, stats.num_entries + 1); stats.num_entries++; -- 2.39.5