]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph-ci.git/commit
rgw/dedup: split-head mechanism
authorGabriel BenHanokh <gbenhano@redhat.com>
Mon, 15 Sep 2025 19:01:02 +0000 (19:01 +0000)
committerbenhanokh <gbenhano@redhat.com>
Tue, 24 Feb 2026 19:17:38 +0000 (21:17 +0200)
commit48ba9c00caa3e2a705952259ce858cb4cf3b331b
tree2478b2b4494fb8cc27f1d0c3add1b9cd780ccd92
parente6eec7683022e6824acc16f4c26421a3804e27e6
rgw/dedup: split-head mechanism
Split head object into 2 objects - one with attributes and no data and
a new tail-object with only data.
The new-tail object will be deduped (unlike the head objects which can't
be dedup)
We will split head for objects with size 16MB or less

A few extra improvemnts:
Skip objects created by server-side-copy
Use reftag for comp-swap instead of manifest
Skip shared-manifest objects after readint attributes
Made max_obj_size_for_split and min_obj_size_for_dedup config value in
rgw.yaml.in

refined test: validate size after dedup
TBD: add rados ls -l to report object size on-bulk to speedup the process
improved tests - verify refcount are working, validate objects, remove
duplicates and then verify the last remaining object making sure it was
not deleted

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
14 files changed:
doc/radosgw/s3_objects_dedup.rst
src/common/options/rgw.yaml.in
src/rgw/driver/rados/rgw_dedup.cc
src/rgw/driver/rados/rgw_dedup.h
src/rgw/driver/rados/rgw_dedup_cluster.cc
src/rgw/driver/rados/rgw_dedup_store.cc
src/rgw/driver/rados/rgw_dedup_store.h
src/rgw/driver/rados/rgw_dedup_table.cc
src/rgw/driver/rados/rgw_dedup_table.h
src/rgw/driver/rados/rgw_dedup_utils.cc
src/rgw/driver/rados/rgw_dedup_utils.h
src/rgw/driver/rados/rgw_obj_manifest.h
src/rgw/rgw_common.h
src/test/rgw/dedup/test_dedup.py