From cbc59d83701fe9f2da87dbcfec074c036b6ea3c9 Mon Sep 17 00:00:00 2001 From: Casey Bodley Date: Thu, 12 Mar 2020 15:57:07 -0400 Subject: [PATCH] doc/rgw: update multisite reshard design for cross-datalog-shard coordination Signed-off-by: Casey Bodley --- src/doc/rgw/multisite-reshard.md | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/src/doc/rgw/multisite-reshard.md b/src/doc/rgw/multisite-reshard.md index 1bf2aeaab78..b2111deb593 100644 --- a/src/doc/rgw/multisite-reshard.md +++ b/src/doc/rgw/multisite-reshard.md @@ -57,8 +57,12 @@ The distinction between *index layout* and *log layout* is important, because in ### Bucket Sync Status -* Move bucket sync status to a per-bucket object, rather than having separate status per-shard. - - Track current log generation and an array of markers for each shard in that generation. +* Add a per-bucket sync status object that tracks: + - full sync progress, + - the current generation of incremental sync, and + - the set of shards that have completed incremental sync of that generation +* Existing per-bucket-shard sync status objects continue to track incremental sync. + - their object names should include the generation number, except for generation 0 * For backward compatibility, add special handling when we get ENOENT trying to read this per-bucket sync status: - If the remote's oldest log layout has generation=0, read any existing per-shard sync status objects. If any are found, resume incremental sync from there. - Otherwise, initialize for full sync. @@ -66,14 +70,25 @@ The distinction between *index layout* and *log layout* is important, because in ### Bucket Sync * Full sync uses a single bucket-wide listing to fetch all objects. -* Incremental sync spawns a coroutine for each log shard mentioned in the datalog. + - Use a cls_lock to prevent different shards from duplicating this work. * When incremental sync gets to the end of a log shard (i.e. listing the log returns truncated=false): - - If we've seen a higher generation number in the datalog, flag that shard as 'resharded' in the bucket sync status. + - If the remote has a newer log generation, flag that shard as 'resharded' in the bucket sync status. - Once all shards in the current generation reach that 'resharded' state, incremental bucket sync can advance to the next generation. + - Use cls_version on the bucket sync status object to detect racing writes from other shards. ### Log Trimming * Use generation number from sync status to trim the right logs * Once all shards of a log generation are trimmed: - Remove their rados objects. + - Remove the associated incremental sync status objects. - Remove the log generation from its bucket instance metadata. + +### Admin APIs + +* RGWOp_BILog_List response should include the bucket's highest log generation + - Allows incremental sync to determine whether truncated=false means that it's caught up, or that it needs to transition to the next generation. +* RGWOp_BILog_Info response should include the bucket's lowest and highest log generations + - Allows bucket sync status initialization to decide whether it needs to scan for existing shard status, and where it should resume incremental sync after full sync completes. +* RGWOp_BILog_Status response should include per-bucket status information + - For log trimming of old generations -- 2.39.5