]> git.apps.os.sepia.ceph.com Git - ceph-ci.git/log
ceph-ci.git
3 years agomsg: tidied up formatting for new poll driver
Rafael Lopez [Thu, 4 Aug 2022 07:14:44 +0000 (07:14 +0000)]
msg: tidied up formatting for new poll driver

Signed-off-by: Rafael Lopez <rafael.lopez@softiron.com>
3 years agomsg: add new async event driver based on poll()
Raf Lopez [Fri, 3 Jun 2022 04:28:16 +0000 (04:28 +0000)]
msg: add new async event driver based on poll()

Driver to replace select() where useful, currently this is
windows clients as select is the only available driver for it.
Windows is limited by the FD_SETSIZE hard limit of 64
descriptors. This driver Uses poll() or WSAPoll() and maintains
pollfd structures to overcome select() limitations.

Fixes: https://tracker.ceph.com/issues/55840
Signed-off-by: Rafael Lopez <rafael.lopez@softiron.com>
3 years agoMerge pull request #46411 from pcuzner/add-serial-numbers
Adam King [Thu, 2 Jun 2022 23:01:01 +0000 (19:01 -0400)]
Merge pull request #46411 from pcuzner/add-serial-numbers

cephadm: Add server serial info to gather-facts

Reviewed-by: Adam King <adking@redhat.com>
3 years agoMerge pull request #46444 from rkachach/fix_issue_55800
Adam King [Thu, 2 Jun 2022 23:00:17 +0000 (19:00 -0400)]
Merge pull request #46444 from rkachach/fix_issue_55800

mgr/cephadm: check if a service exists before trying to restart it

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Michael Fritch <mfritch@suse.com>
3 years agoMerge pull request #46445 from rkachach/fix_issue_55801
Adam King [Thu, 2 Jun 2022 22:59:02 +0000 (18:59 -0400)]
Merge pull request #46445 from rkachach/fix_issue_55801

mgr/cephadm: capture exception when not able to list upgrade tags

Reviewed-by: Adam King <adking@redhat.com>
3 years agoMerge pull request #46481 from guits/cephadm-custom-names-osd-adoption
Adam King [Thu, 2 Jun 2022 22:58:24 +0000 (18:58 -0400)]
Merge pull request #46481 from guits/cephadm-custom-names-osd-adoption

cephadm: fix osd adoption with custom cluster name

Reviewed-by: Adam King <adking@redhat.com>
3 years agoMerge pull request #39002 from ceph/wip-rgw-multisite-reshard
Casey Bodley [Thu, 2 Jun 2022 20:04:30 +0000 (16:04 -0400)]
Merge pull request #39002 from ceph/wip-rgw-multisite-reshard

rgw multisite: bucket reshard work in progress

Reviewed-by: Matt Benjamin <mbenjami@redhat.com>
3 years agoMerge pull request #45470 from ceph/wip-setx
Ernesto Puerta [Thu, 2 Jun 2022 17:22:59 +0000 (19:22 +0200)]
Merge pull request #45470 from ceph/wip-setx

run-backend-api-tests.sh: set -x for Jenkins job debugging

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
3 years agoMerge pull request #44684 from zenomri/wip-omri-tracing-compiled
Yuval Lifshitz [Thu, 2 Jun 2022 17:22:17 +0000 (20:22 +0300)]
Merge pull request #44684 from zenomri/wip-omri-tracing-compiled

tracer: set tracing compiled in by default

3 years agoMerge pull request #43216 from k0ste/fix_47537
Anthony D'Atri [Thu, 2 Jun 2022 15:41:25 +0000 (08:41 -0700)]
Merge pull request #43216 from k0ste/fix_47537

doc/mgr: Document wildcard to expose Prometheus metrics for all RBD pools and namespaces

3 years agoMerge pull request #46019 from yushu20171007/fix_issue_55422
Casey Bodley [Thu, 2 Jun 2022 14:47:22 +0000 (10:47 -0400)]
Merge pull request #46019 from yushu20171007/fix_issue_55422

common: notify all when max backlog reached in OutputDataSocket

Reviewed-by: Matt Benjamin <mbenjami@redhat.com>
3 years agoMerge pull request #46501 from rhcs-dashboard/fix-55826-master
Ernesto Puerta [Thu, 2 Jun 2022 12:13:26 +0000 (14:13 +0200)]
Merge pull request #46501 from rhcs-dashboard/fix-55826-master

qa: fix teuthology master branch ref

Reviewed-by: amathuria <NOT@FOUND>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
3 years agoqa: fix teuthology master branch ref
Ernesto Puerta [Thu, 2 Jun 2022 10:27:02 +0000 (12:27 +0200)]
qa: fix teuthology master branch ref

Fixes: https://tracker.ceph.com/issues/55826
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
3 years agodoc/mgr: Document wildcard to expose Prometheus metrics for all RBD pools and namespaces
Konstantin Shalygin [Sat, 18 Sep 2021 10:22:14 +0000 (17:22 +0700)]
doc/mgr: Document wildcard to expose Prometheus metrics for all RBD pools and namespaces

Fixes: https://tracker.ceph.com/issues/47537
Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
3 years agoMerge pull request #46272 from sshambar/bug-55664
Adam King [Wed, 1 Jun 2022 21:44:54 +0000 (17:44 -0400)]
Merge pull request #46272 from sshambar/bug-55664

cephadm: preserve cephadm user during RPM upgrade

Reviewed-by: Kefu Chai <tchaikov@gmail.com>
Reviewed-by: Michael Fritch <mfritch@suse.com>
3 years agoMerge pull request #46488 from jtlayton/teuth-branch-fix
David Galloway [Wed, 1 Jun 2022 19:17:03 +0000 (15:17 -0400)]
Merge pull request #46488 from jtlayton/teuth-branch-fix

qa: remove .teuthology_branch file

3 years agoqa: remove .teuthology_branch file
Jeff Layton [Wed, 1 Jun 2022 18:26:33 +0000 (14:26 -0400)]
qa: remove .teuthology_branch file

This was originally added to help support the py2 -> py3 conversion.
That's long since complete so we should be able to just remove this file
now.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
3 years agoMerge pull request #46487 from jtlayton/teuth-branch-fix
David Galloway [Wed, 1 Jun 2022 18:18:00 +0000 (14:18 -0400)]
Merge pull request #46487 from jtlayton/teuth-branch-fix

qa: fix .teuthology_branch file in qa/

3 years agotest/rgw/multisite: enable zonegroup resharding feature
Casey Bodley [Wed, 1 Jun 2022 18:10:24 +0000 (14:10 -0400)]
test/rgw/multisite: enable zonegroup resharding feature

qa/tasks/rgw_multisite.py uses 'zonegroup set' to create zonegroups from
their json format. this doesn't enable any of the supported zonegroup
features by default, so this adds the 'enabled_features' field to the
json representations

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agoqa: fix .teuthology_branch file in qa/
Jeff Layton [Wed, 1 Jun 2022 17:57:29 +0000 (13:57 -0400)]
qa: fix .teuthology_branch file in qa/

According to teuthology-suite:

  -t <branch>, --teuthology-branch <branch>
                              The teuthology branch to run against.
                              Default value is determined in the next order.
                              There is TEUTH_BRANCH environment variable set.
                              There is `qa/.teuthology_branch` present in
                              the suite repo and contains non-empty string.
                              There is `teuthology_branch` present in one of
                              the user or system `teuthology.yaml` configuration
                              files respectively, otherwise use `main`.

The .teuthology_branch file in the qa/ dir currently points at "master".
Change it to point to "main".

Signed-off-by: Jeff Layton <jlayton@redhat.com>
3 years agocephadm: fix osd adoption with custom cluster name
Guillaume Abrioux [Wed, 1 Jun 2022 11:24:50 +0000 (13:24 +0200)]
cephadm: fix osd adoption with custom cluster name

When adopting Ceph OSD containers from a Ceph cluster with a custom name, it fails
because the name isn't propagated in unit.run.
The idea here is to change the lvm metadata and enforce 'ceph.cluster_name=ceph'
given that cephadm doesn't support custom names anyway.

Fixes: https://tracker.ceph.com/issues/55654
Signed-off-by: Adam King <adking@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agoMerge pull request #46474 from idryomov/wip-rbd-codeowners
Ilya Dryomov [Wed, 1 Jun 2022 16:29:38 +0000 (18:29 +0200)]
Merge pull request #46474 from idryomov/wip-rbd-codeowners

CODEOWNERS: add RBD team

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
3 years agoCODEOWNERS: add RBD team
Ilya Dryomov [Wed, 1 Jun 2022 07:22:15 +0000 (09:22 +0200)]
CODEOWNERS: add RBD team

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
3 years agomgr/cephadm: capture exception when not able to list upgrade tags
Redouane Kachach [Tue, 31 May 2022 10:59:26 +0000 (12:59 +0200)]
mgr/cephadm: capture exception when not able to list upgrade tags
Fixes: https://tracker.ceph.com/issues/55801
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
3 years agoMerge pull request #46382 from rzarzynski/wip-crimson-op-tracking-3
Samuel Just [Tue, 31 May 2022 23:48:52 +0000 (16:48 -0700)]
Merge pull request #46382 from rzarzynski/wip-crimson-op-tracking-3

crimson/osd: add support for historic & slow op tracking

Reviewed-by: Samuel Just <sjust@redhat.com>
3 years agoMerge pull request #46437 from cyx1231st/wip-seastore-tune-and-fixes
Samuel Just [Tue, 31 May 2022 23:37:11 +0000 (16:37 -0700)]
Merge pull request #46437 from cyx1231st/wip-seastore-tune-and-fixes

crimson/os/seastore/segment_cleaner: tune and fixes around reclaiming

Reviewed-by: Samuel Just <sjust@redhat.com>
3 years agoMerge pull request #46193 from ljflores/wip-zero-detection-off-by-default
Laura Flores [Tue, 31 May 2022 21:55:51 +0000 (16:55 -0500)]
Merge pull request #46193 from ljflores/wip-zero-detection-off-by-default

os/bluestore: turn bluestore zero block detection off by default

3 years agorgw: restore check for empty olh name on reshard
Casey Bodley [Tue, 31 May 2022 21:29:37 +0000 (17:29 -0400)]
rgw: restore check for empty olh name on reshard

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agotest/rgw: fix test case for empty-OLH-name cleanup
Casey Bodley [Tue, 31 May 2022 21:29:18 +0000 (17:29 -0400)]
test/rgw: fix test case for empty-OLH-name cleanup

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agoMerge pull request #46367 from 0xavi0/dbstore-default-dbdir-rgw-data
Soumya Koduri [Tue, 31 May 2022 15:57:28 +0000 (21:27 +0530)]
Merge pull request #46367 from 0xavi0/dbstore-default-dbdir-rgw-data

rgw/dbstore: change default value of dbstore_db_dir to /var/lib/ceph/radosgw

Reviewed-by: Soumya Koduri <skoduri@redhat.com>
3 years agoMerge pull request #46395 from cbodley/wip-backport-create-issue-assigned-to
Casey Bodley [Tue, 31 May 2022 15:17:12 +0000 (11:17 -0400)]
Merge pull request #46395 from cbodley/wip-backport-create-issue-assigned-to

backport-create-issue: copy 'Assignee' of original issue to backports

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
3 years agoMerge pull request #46415 from neha-ojha/wip-cw-core
Neha Ojha [Tue, 31 May 2022 14:16:18 +0000 (07:16 -0700)]
Merge pull request #46415 from neha-ojha/wip-cw-core

.github/CODEOWNERS: tag core devs on core PRs

Reviewed-by: Laura Flores <lflores@redhat.com>
3 years agomgr/cephadm: check if a service exists before trying to restart it
Redouane Kachach [Tue, 31 May 2022 10:11:03 +0000 (12:11 +0200)]
mgr/cephadm: check if a service exists before trying to restart it
Fixes: https://tracker.ceph.com/issues/55800
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
3 years agoMerge pull request #46430 from zdover23/wip-doc-2022-05-30-hw-recs-memory-section
zdover23 [Tue, 31 May 2022 07:07:40 +0000 (17:07 +1000)]
Merge pull request #46430 from zdover23/wip-doc-2022-05-30-hw-recs-memory-section

doc/start: update "memory" in hardware-recs.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
3 years agorgw/dbstore: change default value of dbstore_db_dir to /var/lib/ceph/radosgw
0xavi0 [Mon, 23 May 2022 12:01:25 +0000 (14:01 +0200)]
rgw/dbstore: change default value of dbstore_db_dir to /var/lib/ceph/radosgw

Changes a few NULL to nullptr.

Adds std::filesystem for path building so they're platform independant.

Fixes a bug for DBStoreManager's second constructor not creating the DB.

Adds unit tests to test DB path and prefix.

Fixes: https://tracker.ceph.com/issues/55731
Signed-off-by: 0xavi0 <xavi.garcia@suse.com>
3 years agodoc/start: update "memory" in hardware-recs.rst
Zac Dover [Mon, 30 May 2022 13:32:06 +0000 (23:32 +1000)]
doc/start: update "memory" in hardware-recs.rst

This PR corrects some usage errors in the "Memory" section
of the hardware-recommendations.rst file. It also closes
some opened but never closed parentheses.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
3 years agoMerge pull request #46086 from nmshelke/feature-55401
Venky Shankar [Tue, 31 May 2022 05:00:56 +0000 (10:30 +0530)]
Merge pull request #46086 from nmshelke/feature-55401

mgr/volumes: set, get, list and remove metadata of snapshot

Reviewed-by: Venky Shankar <vshankar@redhat.com>
3 years agocrimson/os/seastore/segment_cleaner: add info logs to reveal trim activities
Yingxin Cheng [Mon, 30 May 2022 12:30:07 +0000 (20:30 +0800)]
crimson/os/seastore/segment_cleaner: add info logs to reveal trim activities

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore/transaction_manager: set to test mode under debug build
Yingxin Cheng [Mon, 30 May 2022 10:35:33 +0000 (18:35 +0800)]
crimson/os/seastore/transaction_manager: set to test mode under debug build

* force to test mode under debug build.
* make reclaim to happen and validated as early as possible.
* do not block user transaction when reclaim-ratio (unalive/unavailable)
  is high, especially in the beginning.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore/segment_cleaner: cleanup reclaim logic
Yingxin Cheng [Fri, 27 May 2022 09:05:33 +0000 (17:05 +0800)]
crimson/os/seastore/segment_cleaner: cleanup reclaim logic

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore/seastore_types: include backref as physical extents
Yingxin Cheng [Fri, 27 May 2022 08:43:26 +0000 (16:43 +0800)]
crimson/os/seastore/seastore_types: include backref as physical extents

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore/cache: assert dirty
Yingxin Cheng [Fri, 27 May 2022 08:42:19 +0000 (16:42 +0800)]
crimson/os/seastore/cache: assert dirty

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore: cleanup rewrite_extent()
Yingxin Cheng [Fri, 27 May 2022 08:32:06 +0000 (16:32 +0800)]
crimson/os/seastore: cleanup rewrite_extent()

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore/segment_cleaner: delay reclaim until near full
Yingxin Cheng [Mon, 30 May 2022 05:27:30 +0000 (13:27 +0800)]
crimson/os/seastore/segment_cleaner: delay reclaim until near full

It should be generically better to delay reclaim as much as possible, so
that:
* unalive/unavailable can higher to reduce reclaim efforts;
* less conflicts between mutate and reclaim transactions;

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agoMerge pull request #46296 from ceph/wip-nitzan-osd-log-to-correct-sufix
Samuel Just [Mon, 30 May 2022 23:37:41 +0000 (16:37 -0700)]
Merge pull request #46296 from ceph/wip-nitzan-osd-log-to-correct-sufix

crimson/osd: logger into log_file

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Kefu Chai <tchaikov@gmail.com>
3 years agoMerge pull request #46388 from rzarzynski/wip-crimson-reindent-main-trace
Samuel Just [Mon, 30 May 2022 23:33:25 +0000 (16:33 -0700)]
Merge pull request #46388 from rzarzynski/wip-crimson-reindent-main-trace

crimson/osd: reindent the trace-related fragment of main()

Reviewed-by: Kefu Chai <tchaikov@gmail.com>
3 years agoMerge pull request #46320 from ronen-fr/wip-rf-snaps-onerr
Ronen Friedman [Mon, 30 May 2022 17:36:41 +0000 (20:36 +0300)]
Merge pull request #46320 from ronen-fr/wip-rf-snaps-onerr

osd/scrub: restart snap trimming after a failed scrub

Reviewed-by: Laura Flores <lflores@redhat.com>
3 years agocrimson/osd: add support for slowest historic op tracking
Radosław Zarzyński [Fri, 27 May 2022 17:31:18 +0000 (19:31 +0200)]
crimson/osd: add support for slowest historic op tracking

Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
3 years agocrimson/osd: make OSDOperationRegistry responsible for historic ops
Radosław Zarzyński [Fri, 27 May 2022 17:24:45 +0000 (19:24 +0200)]
crimson/osd: make OSDOperationRegistry responsible for historic ops

Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
3 years agocrimson/osd: add support for historic op tracking.
Radosław Zarzyński [Thu, 14 Apr 2022 11:17:17 +0000 (13:17 +0200)]
crimson/osd: add support for historic op tracking.

Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
3 years agoosd/scrub: restart snap trimming after a failed scrub
Ronen Friedman [Tue, 17 May 2022 16:13:59 +0000 (16:13 +0000)]
osd/scrub: restart snap trimming after a failed scrub

A followup to PR#45640.
In PR#45640 snap trimming was restarted (if blocked) after all
successful scrubs, and after most scrub failures. Still, a few
failure scenarios did not handle snaptrim restart correctly.

The current PR cleans up and fixes the interaction between
scrub initiation/termination (for whatever cause) and snap
trimming.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
3 years agoMerge pull request #46426 from idryomov/wip-iscsi-mutual-chap-doc
Ilya Dryomov [Mon, 30 May 2022 12:54:06 +0000 (14:54 +0200)]
Merge pull request #46426 from idryomov/wip-iscsi-mutual-chap-doc

doc/rbd: add mutual CHAP authentication example

Reviewed-by: Xiubo Li <xiubli@redhat.com>
3 years agodoc/rbd: add mutual CHAP authentication example
Ilya Dryomov [Mon, 30 May 2022 09:54:35 +0000 (11:54 +0200)]
doc/rbd: add mutual CHAP authentication example

Based on https://github.com/ceph/ceph-iscsi/pull/260.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
3 years agoMerge pull request #35598 from tchaikov/wip-cephfs-java
Kefu Chai [Sat, 28 May 2022 05:29:25 +0000 (13:29 +0800)]
Merge pull request #35598 from tchaikov/wip-cephfs-java

rpm,install-dep.sh: build cephfs java binding

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
3 years ago.github/CODEOWNERS: tag core devs on core PRs
Neha Ojha [Fri, 27 May 2022 19:34:57 +0000 (19:34 +0000)]
.github/CODEOWNERS: tag core devs on core PRs

Start with everything that is present under core in .github/labeler.yml.

Signed-off-by: Neha Ojha <nojha@redhat.com>
3 years agoqa/rgw: fix flake8 errors in test_rgw_reshard.py
Casey Bodley [Wed, 25 May 2022 18:13:55 +0000 (14:13 -0400)]
qa/rgw: fix flake8 errors in test_rgw_reshard.py

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw/motr: fix build for MotrStore
Casey Bodley [Tue, 24 May 2022 15:29:52 +0000 (11:29 -0400)]
rgw/motr: fix build for MotrStore

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw: `RGWSyncBucketCR` reads remote info on non-`Incremental` state
Adam C. Emerson [Tue, 17 May 2022 03:26:48 +0000 (23:26 -0400)]
rgw: `RGWSyncBucketCR` reads remote info on non-`Incremental` state

This ensures that the remote bucket index log info is available for
all cases where we're calling `InitBucketFullSyncStatusCR`

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agotest/rgw: bucket sync run recovery case
Adam C. Emerson [Fri, 13 May 2022 19:56:28 +0000 (15:56 -0400)]
test/rgw: bucket sync run recovery case

1. Write several generations worth of objects. Ensure that everything
   has synced and that at least some generations have been trimmed.
2. Turn off the secondary `radosgw`.
3. Use `radosgw-admin object rm` to delete all objects in the bucket
   on the secondary.
4. Invoke `radosgw-admin bucket sync init` on the secondary.
5. Invoke `radosgw-admin bucket sync run` on the secondary.
6. Verify that all objects on the primary are also present on the
   secondary.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agotest/rgw: Add incremental test of bucket sync run
Adam C. Emerson [Wed, 11 May 2022 22:39:08 +0000 (18:39 -0400)]
test/rgw: Add incremental test of bucket sync run

This tests for iterating properly over the generations.

1. Create a bucket and write some objects to it. Wait for sync to
   complete. This ensures we are in Incremental.
2. Turn off the secondary `radosgw`.
3. Manually reshard. Then continue writing objects and resharding.
4. Choose objects so that each generation has objects in many but not
   all shards.
5. After building up several generations, run `bucket sync run` on the
   secondary.
6. Verify that all objects on the primary are on the secondary.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: add `bucket object shard` command to radosgw-admin
Adam C. Emerson [Tue, 17 May 2022 03:23:40 +0000 (23:23 -0400)]
rgw: add `bucket object shard` command to radosgw-admin

Given an object, return the bucket shard appropriate to it.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Add 'bucket shard objects' command to radosgw-admin
Adam C. Emerson [Wed, 11 May 2022 21:08:01 +0000 (17:08 -0400)]
rgw: Add 'bucket shard objects' command to radosgw-admin

To be used in testing, to write to some subset of shards for reshard testing.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: bucket sync run walks over generations
Adam C. Emerson [Sat, 14 May 2022 05:11:57 +0000 (01:11 -0400)]
rgw: bucket sync run walks over generations

This should make the troubleshooting use case of bucket sync init/run
usable with multisite reshard.

This also fixes a few issues with the original bucket sync run code,
by spawning multiple shards at a time and retrying retryable errors.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Remove unused RGWRemoteBucketManager
Adam C. Emerson [Thu, 14 Apr 2022 13:51:00 +0000 (09:51 -0400)]
rgw: Remove unused RGWRemoteBucketManager

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Disentangle RGWBucketPipeSyncStatusManager::run
Adam C. Emerson [Thu, 14 Apr 2022 13:42:54 +0000 (09:42 -0400)]
rgw: Disentangle RGWBucketPipeSyncStatusManager::run

Again, from RGWRemoteBucketManager.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Disentangle read_sync_status from RemoteBucketManager
Adam C. Emerson [Thu, 14 Apr 2022 13:35:40 +0000 (09:35 -0400)]
rgw: Disentangle read_sync_status from RemoteBucketManager

Also fix the problem where we read the status from all peers into the
same map at once.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Disentangle init_sync_status from RemoteBucketManager
Adam C. Emerson [Wed, 13 Apr 2022 02:10:31 +0000 (22:10 -0400)]
rgw: Disentangle init_sync_status from RemoteBucketManager

RGWRemoteBucketManager's current design isn't really compatible with
what we need for bucket sync run to work as the number of shards
changes from run to run.

We can make a smaller 'hold information common to all three
operations' class and simplify things a bit.

We also need to fetch `rgw_bucket_index_marker_info` and supply it to
`InitBucketFullSyncStatusCR` to ensure we have the correct generation
and shard count.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Get rid of RGWBucketPipeSyncStatusManager::init
Adam C. Emerson [Wed, 13 Apr 2022 01:37:05 +0000 (21:37 -0400)]
rgw: Get rid of RGWBucketPipeSyncStatusManager::init

Use the Named Constructor Idiom instead.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: RGWBucketPipeSyncStatusManager doesn't need a conn
Adam C. Emerson [Tue, 12 Apr 2022 23:44:48 +0000 (19:44 -0400)]
rgw: RGWBucketPipeSyncStatusManager doesn't need a conn

`conn` is per-source. last_zone just saved a lookup in a small map.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Get rid of RGWBucketPipeSyncStatusManager::get_sync_status
Adam C. Emerson [Tue, 12 Apr 2022 20:46:46 +0000 (16:46 -0400)]
rgw: Get rid of RGWBucketPipeSyncStatusManager::get_sync_status

Instead of one function that sets a variable and another function that
returns it and nobody else touches it, just return the sync status.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Clean up RGWBucketPipeSyncStatusManager construction
Adam C. Emerson [Tue, 12 Apr 2022 20:26:27 +0000 (16:26 -0400)]
rgw: Clean up RGWBucketPipeSyncStatusManager construction

The coupling between this class and RGWRemoteBucketManager makes no
sense. Clean things up a bit.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Remove unused members of RGWBucketPipeSyncStatusManager
Adam C. Emerson [Tue, 12 Apr 2022 23:53:46 +0000 (19:53 -0400)]
rgw: Remove unused members of RGWBucketPipeSyncStatusManager

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agotest/rgw: add DistinctGen to test_rgw_bucket_sync_cache
Casey Bodley [Mon, 16 May 2022 21:11:13 +0000 (17:11 -0400)]
test/rgw: add DistinctGen to test_rgw_bucket_sync_cache

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agotest/rgw: update test_rgw_bucket_sync_cache with nullopt
Casey Bodley [Mon, 16 May 2022 21:08:15 +0000 (17:08 -0400)]
test/rgw: update test_rgw_bucket_sync_cache with nullopt

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw: Disable urgent data notifications
Adam C. Emerson [Sat, 14 May 2022 06:39:56 +0000 (02:39 -0400)]
rgw: Disable urgent data notifications

These interfere with multisite resharding and are thus disabled.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw/multisite: ignore sync init state when doing bilog trimming
Yuval Lifshitz [Fri, 15 Apr 2022 11:45:49 +0000 (14:45 +0300)]
rgw/multisite: ignore sync init state when doing bilog trimming

regardless of the sync state, we take the marker from the
incremental sync object

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
3 years agosrc/test: Addition of new bilog trim test
Kalpesh Pandya [Wed, 16 Feb 2022 06:11:51 +0000 (11:41 +0530)]
src/test: Addition of new bilog trim test

This test includes checking radosgw-admin bucket layout command
along with bilog autrotrim on a resharded bucket.

Signed-off-by: Kalpesh Pandya <kapandya@redhat.com>
3 years agorgw/multisite: match sharding logic to generation number
yuval Lifshitz [Tue, 15 Mar 2022 18:42:01 +0000 (20:42 +0200)]
rgw/multisite: match sharding logic to generation number

Signed-off-by: yuval Lifshitz <ylifshit@redhat.com>
3 years agorgw multisite: resharding scales up shard counts 4x faster
Casey Bodley [Fri, 25 Mar 2022 21:14:05 +0000 (17:14 -0400)]
rgw multisite: resharding scales up shard counts 4x faster

in multisite reshard, we need to keep the old index shards around until
other zones finishing syncing from them. we don't want to allow a
bunch of reshards in a row, because we have to duplicate that many
sets of index objects. so we impose a limit of 4 bilog generations (or 3
reshards), and refuse to reshard again until bilog trimming catches up/
trims the oldest generation

under a sustained write workload, a bucket can fill quickly and need
successive reshards. if we have a limit of 3, we should make them count!
so instead of doubling the shard count at each step, multiply by 8
instead when we're in a multisite configuration

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw: Redimension bucket sync cache to include optional generation
Adam C. Emerson [Tue, 22 Feb 2022 19:50:33 +0000 (14:50 -0500)]
rgw: Redimension bucket sync cache to include optional generation

The alternative would be to compare generations and throw out older
generation/no generation if we have a (newer) one.

But if we have the potential for older generations and blank
generations coming up on error retry, then we have to keep them
around.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: prevent spurious/lost notifications in the index completion thread
Yuval Lifshitz [Wed, 23 Feb 2022 15:21:10 +0000 (17:21 +0200)]
rgw: prevent spurious/lost notifications in the index completion thread

this was happening when asyn completions happened during reshard.
more information about testing:
https://gist.github.com/yuvalif/d526c0a3a4c5b245b9e951a6c5a10517

we also add more logs to the completion manager.
should allow finding unhandled completions due to reshards.

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
3 years agorgw: refrash the generation of the bucket shard when fetching info
Yuval Lifshitz [Thu, 10 Feb 2022 16:12:55 +0000 (18:12 +0200)]
rgw: refrash the generation of the bucket shard when fetching info

when RGWRados::block_while_resharding() fails because reshard is in
progress, in the next iteration we should fetch the bucket shard
generation. for the case that the generation changed in the middle.

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
3 years agotest/rgw: test_rgw_reshard.py injects ECANCELED on set_target_layout/commit_target_layout
Casey Bodley [Thu, 10 Feb 2022 19:32:16 +0000 (14:32 -0500)]
test/rgw: test_rgw_reshard.py injects ECANCELED on set_target_layout/commit_target_layout

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agoradosgw-admin: add --inject-error-code to customize injected error
Casey Bodley [Thu, 10 Feb 2022 19:31:26 +0000 (14:31 -0500)]
radosgw-admin: add --inject-error-code to customize injected error

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw/reshard: revert_target_layout handles ECANCELED races/retries
Casey Bodley [Thu, 10 Feb 2022 23:32:49 +0000 (18:32 -0500)]
rgw/reshard: revert_target_layout handles ECANCELED races/retries

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw/reshard: init_target_layout handles ECANCELED races/retries
Casey Bodley [Thu, 10 Feb 2022 23:32:29 +0000 (18:32 -0500)]
rgw/reshard: init_target_layout handles ECANCELED races/retries

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw/reshard: commit_reshard handles ECANCELED races/retries
Casey Bodley [Thu, 10 Feb 2022 22:39:45 +0000 (17:39 -0500)]
rgw/reshard: commit_reshard handles ECANCELED races/retries

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw: pass non-const ReshardFaultInjector
Casey Bodley [Thu, 10 Feb 2022 22:06:23 +0000 (17:06 -0500)]
rgw: pass non-const ReshardFaultInjector

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw: add comparison operators for index layout types
Casey Bodley [Thu, 10 Feb 2022 22:04:05 +0000 (17:04 -0500)]
rgw: add comparison operators for index layout types

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw/reshard: set_resharding_status() doesn't need retry
Casey Bodley [Thu, 10 Feb 2022 22:38:16 +0000 (17:38 -0500)]
rgw/reshard: set_resharding_status() doesn't need retry

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw: Retry -ECANCELED in reshard commit and cancel
Adam C. Emerson [Mon, 7 Feb 2022 20:23:57 +0000 (15:23 -0500)]
rgw: Retry -ECANCELED in reshard commit and cancel

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: prevent 'radosgw-admin bucket reshard' if zonegroup reshard is disabled
Casey Bodley [Wed, 9 Feb 2022 21:55:38 +0000 (16:55 -0500)]
rgw: prevent 'radosgw-admin bucket reshard' if zonegroup reshard is disabled

dynamic reshard was gated behind the zonegroup resharding flag with
RGWSI_Zone::can_reshard(), but manual reshard was only calling
RGWBucketReshard::can_reshard()

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw: add back json for zone/zonegroup features
Casey Bodley [Wed, 9 Feb 2022 20:08:02 +0000 (15:08 -0500)]
rgw: add back json for zone/zonegroup features

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw: RGWBucket::sync() no longer duplicates datalog/bilog entries
Casey Bodley [Wed, 19 Jan 2022 01:39:37 +0000 (20:39 -0500)]
rgw: RGWBucket::sync() no longer duplicates datalog/bilog entries

RGWSI_BucketIndex_RADOS::handle_overwrite() is already writing the
datalog/bilog entries related to BUCKET_DATASYNC_DISABLED

RGWBucket::sync() calls handle_overwrite() indirectly from
bucket->put_info() when it writes the bucket instance with this new
BUCKET_DATASYNC_DISABLED flag, so RGWBucket::sync() shouldn't
duplicate those writes here

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw: add checks for non-empty layout.logs
Casey Bodley [Tue, 18 Jan 2022 21:48:25 +0000 (16:48 -0500)]
rgw: add checks for non-empty layout.logs

always verify that logs is not empty before calling logs.back() or
logs.front()

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw: use get_current_index() instead of log_to_index_layout()
Casey Bodley [Tue, 18 Jan 2022 21:43:42 +0000 (16:43 -0500)]
rgw: use get_current_index() instead of log_to_index_layout()

several places were getting the current index layout indirectly
with layout.logs.back() and rgw::log_to_index_layout(). use
get_current_index() instead so we don't rely on layout.logs, which may
be empty for indexless buckets

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agoradosgw-admin: add command to dump 'bucket layout'
Casey Bodley [Tue, 8 Feb 2022 19:39:12 +0000 (14:39 -0500)]
radosgw-admin: add command to dump 'bucket layout'

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw: Add generation to ChangeStatus
Adam C. Emerson [Tue, 8 Feb 2022 18:11:44 +0000 (13:11 -0500)]
rgw: Add generation to ChangeStatus

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Compare log.gen to log.gen
Adam C. Emerson [Mon, 7 Feb 2022 22:00:25 +0000 (17:00 -0500)]
rgw: Compare log.gen to log.gen

And refuse to remove the only log.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Don't erase bucket attributes on trim
Adam C. Emerson [Wed, 2 Feb 2022 20:53:41 +0000 (15:53 -0500)]
rgw: Don't erase bucket attributes on trim

Writing bucket instance info is surprising, as if you pass a null
pointer for the attributes, it just erases all the attributes.

To avoid disturbing users and other 'system objects', make a special
case that we can pass in explicitly.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>