From bf156a08a62fa376a263929baa8a7a14601dea29 Mon Sep 17 00:00:00 2001 From: Samuel Just Date: Thu, 28 May 2020 16:27:56 -0700 Subject: [PATCH] doc/dev/osd_internals/manifest.rst: add information about clone snap refcounting Signed-off-by: Samuel Just --- doc/dev/osd_internals/manifest.rst | 70 ++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) diff --git a/doc/dev/osd_internals/manifest.rst b/doc/dev/osd_internals/manifest.rst index f41193b999fe9..0b8d317291140 100644 --- a/doc/dev/osd_internals/manifest.rst +++ b/doc/dev/osd_internals/manifest.rst @@ -218,6 +218,76 @@ we may want to exploit. The dedup-tool needs to be updated to use LIST_SNAPS to discover clones as part of leak detection. +An important question is how we deal with the fact that many clones +will frequently have references to the same backing chunks at the same +offset. In particular, make_writeable will generally create a clone +that shares the same object_manifest_t references with the exception +of any extents modified in that transaction. The metadata that +commits as part of that transaction must therefore map onto the same +refcount as before because otherwise we'd have to first increment +refcounts on backing objects (or risk a reference to a dead object) +Thus, we introduce a simple convention: consecutive clones which +share a reference at the same offset share the same refcount. This +means that a write that invokes make_writeable may decrease refcounts, +but not increase them. This has some conquences for removing clones. +Consider the following sequence :: + + write foo [0, 1024) + flush foo -> + head: [0, 512) aaa, [512, 1024) bbb + refcount(aaa)=1, refcount(bbb)=1 + snapshot 10 + write foo [0, 512) -> + head: [512, 1024) bbb + 10 : [0, 512) aaa, [512, 1024) bbb + refcount(aaa)=1, refcount(bbb)=1 + flush foo -> + head: [0, 512) ccc, [512, 1024) bbb + 10 : [0, 512) aaa, [512, 1024) bbb + refcount(aaa)=1, refcount(bbb)=1, refcount(ccc)=1 + snapshot 20 + write foo [0, 512) (same contents as the original write) + head: [512, 1024) bbb + 20 : [0, 512) ccc, [512, 1024) bbb + 10 : [0, 512) aaa, [512, 1024) bbb + refcount(aaa)=?, refcount(bbb)=1 + flush foo + head: [0, 512) aaa, [512, 1024) bbb + 20 : [0, 512) ccc, [512, 1024) bbb + 10 : [0, 512) aaa, [512, 1024) bbb + refcount(aaa)=?, refcount(bbb)=1, refcount(ccc)=1 + +What should be the refcount for aaa be at the end? By our +above rule, it should be two since the two aaa refs are not +contiguous. However, consider removing clone 20 :: + + initial: + head: [0, 512) aaa, [512, 1024) bbb + 20 : [0, 512) ccc, [512, 1024) bbb + 10 : [0, 512) aaa, [512, 1024) bbb + refcount(aaa)=2, refcount(bbb)=1, refcount(ccc)=1 + trim 20 + head: [0, 512) aaa, [512, 1024) bbb + 10 : [0, 512) aaa, [512, 1024) bbb + refcount(aaa)=?, refcount(bbb)=1, refcount(ccc)=0 + +At this point, our rule dictates that refcount(aaa) is 1. +This means that removing 20 needs to check for refs held by +the clones on either side which will then match. + +See osd_types.h:object_manifest_t::calc_refs_to_drop_on_removal +for the logic implementing this rule. + +This seems complicated, but it gets us two valuable properties: + +1) The refcount change from make_writeable will not block on + incrementing a ref +2) We don't need to load the object_manifest_t for every clone + to determine how to handle removing one -- just the ones + immediately preceeding and suceeding it. + +All clone operations will need to consider adjacent chunk_maps +when adding or removing references. Cache/Tiering ------------- -- 2.39.5