From cbc59d83701fe9f2da87dbcfec074c036b6ea3c9 Mon Sep 17 00:00:00 2001
From: Casey Bodley <cbodley@redhat.com>
Date: Thu, 12 Mar 2020 15:57:07 -0400
Subject: [PATCH] doc/rgw: update multisite reshard design for
 cross-datalog-shard coordination

Signed-off-by: Casey Bodley <cbodley@redhat.com>
---
 src/doc/rgw/multisite-reshard.md | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/src/doc/rgw/multisite-reshard.md b/src/doc/rgw/multisite-reshard.md
index 1bf2aeaab78..b2111deb593 100644
--- a/src/doc/rgw/multisite-reshard.md
+++ b/src/doc/rgw/multisite-reshard.md
@@ -57,8 +57,12 @@ The distinction between *index layout* and *log layout* is important, because in
 
 ### Bucket Sync Status
 
-* Move bucket sync status to a per-bucket object, rather than having separate status per-shard.
-    - Track current log generation and an array of markers for each shard in that generation.
+* Add a per-bucket sync status object that tracks:
+    - full sync progress,
+    - the current generation of incremental sync, and
+    - the set of shards that have completed incremental sync of that generation
+* Existing per-bucket-shard sync status objects continue to track incremental sync.
+    - their object names should include the generation number, except for generation 0
 * For backward compatibility, add special handling when we get ENOENT trying to read this per-bucket sync status:
     - If the remote's oldest log layout has generation=0, read any existing per-shard sync status objects. If any are found, resume incremental sync from there.
     - Otherwise, initialize for full sync.
@@ -66,14 +70,25 @@ The distinction between *index layout* and *log layout* is important, because in
 ### Bucket Sync
 
 * Full sync uses a single bucket-wide listing to fetch all objects.
-* Incremental sync spawns a coroutine for each log shard mentioned in the datalog.
+    - Use a cls_lock to prevent different shards from duplicating this work.
 * When incremental sync gets to the end of a log shard (i.e. listing the log returns truncated=false):
-    - If we've seen a higher generation number in the datalog, flag that shard as 'resharded' in the bucket sync status.
+    - If the remote has a newer log generation, flag that shard as 'resharded' in the bucket sync status.
     - Once all shards in the current generation reach that 'resharded' state, incremental bucket sync can advance to the next generation.
+    - Use cls_version on the bucket sync status object to detect racing writes from other shards.
 
 ### Log Trimming
 
 * Use generation number from sync status to trim the right logs
 * Once all shards of a log generation are trimmed:
     - Remove their rados objects.
+    - Remove the associated incremental sync status objects.
     - Remove the log generation from its bucket instance metadata.
+
+### Admin APIs
+
+* RGWOp_BILog_List response should include the bucket's highest log generation
+    - Allows incremental sync to determine whether truncated=false means that it's caught up, or that it needs to transition to the next generation.
+* RGWOp_BILog_Info response should include the bucket's lowest and highest log generations
+    - Allows bucket sync status initialization to decide whether it needs to scan for existing shard status, and where it should resume incremental sync after full sync completes.
+* RGWOp_BILog_Status response should include per-bucket status information
+    - For log trimming of old generations
-- 
2.39.5