git.apps.os.sepia.ceph.com Git - ceph-ci.git/commit

author	JinyongHa <jyha200@gmail.com>
	Fri, 25 Feb 2022 06:48:36 +0000 (06:48 +0000)
committer	JinyongHa <jy200.ha@samsung.com>
	Fri, 26 Aug 2022 00:50:15 +0000 (00:50 +0000)
commit	3a5876be6e3121b11f8167aa59da75ee9b0efa7b
tree	54de34e9db08fe9cffb8327792e037f542d98fa9	tree \| snapshot
parent	cc63d6ee7336adc5df10ba4c1ec69d0caf7fcdaa	commit \| diff

dedup-tool: add basic crawling

Create crawling threads which crawl objects in base pool and deduplicate
based on their deduplication efficiency. Crawler samples objects and finds
duplicated chunks within the samples. It regards an object which has
duplicated chunks higher than object_dedup_threshold value as an efficient
object to be deduplicated. Besides the chunk which is duplicated more than
chunk_dedup_threshold times is also deduplicated.
The commit contains basic crawling which crawls all objects in base pool
instead of sampling among the objects.

[usage]
  ceph_dedup_tool --op sample-dedup --pool POOL --chunk-pool POOL \
    --chunk-altorithm ALGO --fingerprint-algorithm FP \
    --object-dedup-threshold <percentile> --chunk-dedup-threshold <number>

Signed-off-by: JinyongHa <jy200.ha@samsung.com>