We have a race condition:
1. RGW client #1: requests an object be deleted.
2. RGW client #1: sends a prepare op to bucket index OSD #1.
3. OSD #1: prepares the op, adding pending ops to the bucket dir entry
4. RGW client #2: sends a list bucket to OSD #1
5. RGW client #2: sees that there are pending operations on bucket
dir entry, and calls check_disk_state
6. RGW client #2: check_disk_state sees that the object still exists, so it
sends CEPH_RGW_UPDATE to bucket index OSD (#1)
7. RGW client #1: sends a delete object to object OSD (#2)
8. OSD #2: deletes the object
9. RGW client #2: sends a complete op to bucket index OSD (#1)
10. OSD #1: completes the op
11. OSD #1: receives the CEPH_RGW_UPDATE and updates the bucket index
entry, thereby **RECREATING** it
Solution implemented:
At step #5 the object's dir entry exists. If we get to beginning of
step #11 and the object's dir entry no longer exists, we know that the
dir entry was just actively being modified, and ignore the
CEPH_RGW_UPDATE operation, thereby NOT recreating it.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit
b33f529e79b74314a2030231e1308ee225717743)
Conflicts: (backported substantial changes only; omitted cleanups)
src/cls/rgw/cls_rgw.cc
src/rgw/rgw_rados.cc
}
break;
case CEPH_RGW_UPDATE:
+ if (!cur_disk.exists) {
+ // this update would only have been sent by the rgw client
+ // if the rgw_bucket_dir_entry existed, however between that
+ // check and now the entry has diappeared, so we were likely
+ // in the midst of a delete op, and we will not recreate the
+ // entry
+ CLS_LOG(10,
+ "CEPH_RGW_UPDATE not applied because rgw_bucket_dir_entry"
+ " no longer exists\n");
+ break;
+ }
+
CLS_LOG(10, "CEPH_RGW_UPDATE name=%s instance=%s total_entries: %" PRId64 " -> %" PRId64 "\n",
cur_change.key.name.c_str(), cur_change.key.instance.c_str(), stats.num_entries, stats.num_entries + 1);
stats.num_entries++;