]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
3 years agorgw: fix un/signed comparison warnings in rgw_sync.cc 46536/head
Casey Bodley [Mon, 6 Jun 2022 16:06:19 +0000 (12:06 -0400)]
rgw: fix un/signed comparison warnings in rgw_sync.cc

Fixes: https://tracker.ceph.com/issues/55898
Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agoMerge PR #46516 into main
Patrick Donnelly [Mon, 6 Jun 2022 12:48:39 +0000 (08:48 -0400)]
Merge PR #46516 into main

* refs/pull/46516/head:
doc/dev/developer_guide/testing_integration_tests: document how to test custom kernels

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
3 years agoMerge pull request #46165 from rishabh-d-dave/qa-omit-sudo
Venky Shankar [Mon, 6 Jun 2022 05:53:11 +0000 (11:23 +0530)]
Merge pull request #46165 from rishabh-d-dave/qa-omit-sudo

qa/cephfs: set omit_sudo False when sudo is set to True

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Nikhilkumar Shelke <nshelke@redhat.com>
3 years agoMerge pull request #46168 from rishabh-d-dave/fix-caps-helper
Venky Shankar [Mon, 6 Jun 2022 05:50:58 +0000 (11:20 +0530)]
Merge pull request #46168 from rishabh-d-dave/fix-caps-helper

qa/cephfs: fix minor bug in caps_helper.py's run_mon_cap_tests()

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
3 years agoMerge pull request #40434 from rishabh-d-dave/fs-refactor-method-in-mount
Venky Shankar [Mon, 6 Jun 2022 05:49:54 +0000 (11:19 +0530)]
Merge pull request #40434 from rishabh-d-dave/fs-refactor-method-in-mount

qa/cephfs: modify get_key_from_keyfile() in mount.py

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Jos Collin <jcollin@redhat.com>
3 years agoMerge pull request #46522 from tchaikov/wip-crimson-logging
Kefu Chai [Mon, 6 Jun 2022 00:30:46 +0000 (08:30 +0800)]
Merge pull request #46522 from tchaikov/wip-crimson-logging

crimson/osd: reset logger before exit

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
3 years agoMerge pull request #46416 from tchaikov/wip-debian-dh-python3
Kefu Chai [Sun, 5 Jun 2022 13:44:41 +0000 (21:44 +0800)]
Merge pull request #46416 from tchaikov/wip-debian-dh-python3

debian: python3 related cleanups

Reviewed-by: Redouane Kachach <rkachach@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
3 years agocrimson/osd: reset logger before exit 46522/head
Kefu Chai [Sun, 5 Jun 2022 10:30:28 +0000 (18:30 +0800)]
crimson/osd: reset logger before exit

* extract the code to set logging fstream into a dedicated function
* do not reset logging until the end of the seastar application.

before this change, `reset_logger` is created in the
`if (auto log_file = local_conf()->log_file; !log_file.empty())` branch,
so its life cycle ends when the `if` block ends. in other words,
the cerr fstream is used for logging after the `if` block ends.
this is not the expected behavior.

after this changge, `reset_logger` is created out of the `if` block.
so we won't reset the logger back to `cerr` until the lambda passed to
`seastar::async()` exits.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
3 years agoMerge pull request #46483 from yaarith/rook-telemetry
Neha Ojha [Fri, 3 Jun 2022 20:12:48 +0000 (13:12 -0700)]
Merge pull request #46483 from yaarith/rook-telemetry

mgr/telemetry: add Rook data

Reviewed-by: Laura Flores <lflores@redhat.com>
3 years agoMerge pull request #46417 from xxhdx1985126/wip-gc-parallel-live_extent_retrieval
Samuel Just [Fri, 3 Jun 2022 19:06:35 +0000 (12:06 -0700)]
Merge pull request #46417 from xxhdx1985126/wip-gc-parallel-live_extent_retrieval

crimson/os/seastore/segment_cleaner: parallel live extents retrieval

Reviewed-by: Samuel Just <sjust@redhat.com>
3 years agodoc/dev/developer_guide/testing_integration_tests: document how to test custom kernels 46516/head
Patrick Donnelly [Fri, 3 Jun 2022 14:11:31 +0000 (10:11 -0400)]
doc/dev/developer_guide/testing_integration_tests: document how to test custom kernels

Fixes: https://tracker.ceph.com/issues/55530
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agoMerge pull request #46480 from cfsnyder/wip-cfsnyder-device-classes-in-service-spec
Adam King [Fri, 3 Jun 2022 15:46:54 +0000 (11:46 -0400)]
Merge pull request #46480 from cfsnyder/wip-cfsnyder-device-classes-in-service-spec

python-common: allow crush device class to be set from osd service spec

Reviewed-by: Adam King <adking@redhat.com>
3 years agotest/crimson/seastore: add test case for parallel extent retrieval 46417/head
Xuehan Xu [Thu, 2 Jun 2022 13:33:57 +0000 (21:33 +0800)]
test/crimson/seastore: add test case for parallel extent retrieval

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
3 years agocrimson/os/seastore/cache: make access to Transaction::read_set atomic
Xuehan Xu [Wed, 1 Jun 2022 10:44:30 +0000 (18:44 +0800)]
crimson/os/seastore/cache: make access to Transaction::read_set atomic

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
3 years agoMerge pull request #46505 from rhcs-dashboard/fix-backports_main-main
Ernesto Puerta [Fri, 3 Jun 2022 12:02:55 +0000 (14:02 +0200)]
Merge pull request #46505 from rhcs-dashboard/fix-backports_main-main

script/ceph-backport.sh: deal with main branch

Reviewed-by: Laura Flores <lflores@redhat.com>
Reviewed-by: nmshelke <NOT@FOUND>
Reviewed-by: Kefu Chai <kchai@redhat.com>
3 years agoMerge pull request #46186 from rhcs-dashboard/add-daemon-logs
Ernesto Puerta [Fri, 3 Jun 2022 10:09:56 +0000 (12:09 +0200)]
Merge pull request #46186 from rhcs-dashboard/add-daemon-logs

mgr/dashboard: Add daemon logs tab to Logs component

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: sunilangadi2 <NOT@FOUND>
3 years agoMerge pull request #46283 from MrFreezeex/mixin-config
Ernesto Puerta [Fri, 3 Jun 2022 10:05:58 +0000 (12:05 +0200)]
Merge pull request #46283 from MrFreezeex/mixin-config

ceph-mixin: fix linting issue and add cluster template support

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
3 years agoMerge pull request #46454 from idryomov/wip-rbd-unlink-newest-snap-at-capacity
Ilya Dryomov [Fri, 3 Jun 2022 09:51:41 +0000 (11:51 +0200)]
Merge pull request #46454 from idryomov/wip-rbd-unlink-newest-snap-at-capacity

librbd: unlink newest mirror snapshot when at capacity, bump capacity

Reviewed-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
Reviewed-by: Mykola Golub <mgolub@suse.com>
3 years agoMerge pull request #46434 from idryomov/wip-rbd-preserve-non-primary-snap
Ilya Dryomov [Fri, 3 Jun 2022 09:50:41 +0000 (11:50 +0200)]
Merge pull request #46434 from idryomov/wip-rbd-preserve-non-primary-snap

rbd-mirror: don't prune non-primary snapshot when restarting delta sync

Reviewed-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
Reviewed-by: Mykola Golub <mgolub@suse.com>
3 years agoqa/cephfs: modify get_key_from_keyfile() in mount.py 40434/head
Rishabh Dave [Fri, 26 Mar 2021 11:32:38 +0000 (17:02 +0530)]
qa/cephfs: modify get_key_from_keyfile() in mount.py

CephFSMount.get_key_from_keyfile() should raise an exception instead of
returning None if key is not found in keyring file.

Fixes: https://tracker.ceph.com/issues/50010
Signed-off-by: Rishabh Dave <ridave@redhat.com>
3 years agoMerge pull request #46411 from pcuzner/add-serial-numbers
Adam King [Thu, 2 Jun 2022 23:01:01 +0000 (19:01 -0400)]
Merge pull request #46411 from pcuzner/add-serial-numbers

cephadm: Add server serial info to gather-facts

Reviewed-by: Adam King <adking@redhat.com>
3 years agoMerge pull request #46444 from rkachach/fix_issue_55800
Adam King [Thu, 2 Jun 2022 23:00:17 +0000 (19:00 -0400)]
Merge pull request #46444 from rkachach/fix_issue_55800

mgr/cephadm: check if a service exists before trying to restart it

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Michael Fritch <mfritch@suse.com>
3 years agoMerge pull request #46445 from rkachach/fix_issue_55801
Adam King [Thu, 2 Jun 2022 22:59:02 +0000 (18:59 -0400)]
Merge pull request #46445 from rkachach/fix_issue_55801

mgr/cephadm: capture exception when not able to list upgrade tags

Reviewed-by: Adam King <adking@redhat.com>
3 years agoMerge pull request #46481 from guits/cephadm-custom-names-osd-adoption
Adam King [Thu, 2 Jun 2022 22:58:24 +0000 (18:58 -0400)]
Merge pull request #46481 from guits/cephadm-custom-names-osd-adoption

cephadm: fix osd adoption with custom cluster name

Reviewed-by: Adam King <adking@redhat.com>
3 years agoMerge pull request #39002 from ceph/wip-rgw-multisite-reshard
Casey Bodley [Thu, 2 Jun 2022 20:04:30 +0000 (16:04 -0400)]
Merge pull request #39002 from ceph/wip-rgw-multisite-reshard

rgw multisite: bucket reshard work in progress

Reviewed-by: Matt Benjamin <mbenjami@redhat.com>
3 years agoMerge pull request #45470 from ceph/wip-setx
Ernesto Puerta [Thu, 2 Jun 2022 17:22:59 +0000 (19:22 +0200)]
Merge pull request #45470 from ceph/wip-setx

run-backend-api-tests.sh: set -x for Jenkins job debugging

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
3 years agoMerge pull request #44684 from zenomri/wip-omri-tracing-compiled
Yuval Lifshitz [Thu, 2 Jun 2022 17:22:17 +0000 (20:22 +0300)]
Merge pull request #44684 from zenomri/wip-omri-tracing-compiled

tracer: set tracing compiled in by default

3 years agoscript/ceph-backport.sh: deal with main branch 46505/head
Ernesto Puerta [Thu, 2 Jun 2022 15:11:33 +0000 (17:11 +0200)]
script/ceph-backport.sh: deal with main branch

Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
3 years agoMerge pull request #43216 from k0ste/fix_47537
Anthony D'Atri [Thu, 2 Jun 2022 15:41:25 +0000 (08:41 -0700)]
Merge pull request #43216 from k0ste/fix_47537

doc/mgr: Document wildcard to expose Prometheus metrics for all RBD pools and namespaces

3 years agopython-common: allow crush device class to be set from osd service spec 46480/head
Cory Snyder [Wed, 1 Jun 2022 09:39:11 +0000 (05:39 -0400)]
python-common: allow crush device class to be set from osd service spec

Adds crush_device_class parameter to DriveGroupSpec so that device class
can be set via service specs with cephadm.

Fixes: https://tracker.ceph.com/issues/55813
Signed-off-by: Cory Snyder <csnyder@iland.com>
3 years agoMerge pull request #46019 from yushu20171007/fix_issue_55422
Casey Bodley [Thu, 2 Jun 2022 14:47:22 +0000 (10:47 -0400)]
Merge pull request #46019 from yushu20171007/fix_issue_55422

common: notify all when max backlog reached in OutputDataSocket

Reviewed-by: Matt Benjamin <mbenjami@redhat.com>
3 years agoMerge pull request #46501 from rhcs-dashboard/fix-55826-master
Ernesto Puerta [Thu, 2 Jun 2022 12:13:26 +0000 (14:13 +0200)]
Merge pull request #46501 from rhcs-dashboard/fix-55826-master

qa: fix teuthology master branch ref

Reviewed-by: amathuria <NOT@FOUND>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
3 years agoqa: fix teuthology master branch ref 46501/head
Ernesto Puerta [Thu, 2 Jun 2022 10:27:02 +0000 (12:27 +0200)]
qa: fix teuthology master branch ref

Fixes: https://tracker.ceph.com/issues/55826
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
3 years agodoc/mgr: Document wildcard to expose Prometheus metrics for all RBD pools and namespaces 43216/head
Konstantin Shalygin [Sat, 18 Sep 2021 10:22:14 +0000 (17:22 +0700)]
doc/mgr: Document wildcard to expose Prometheus metrics for all RBD pools and namespaces

Fixes: https://tracker.ceph.com/issues/47537
Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
3 years agoMerge pull request #46272 from sshambar/bug-55664
Adam King [Wed, 1 Jun 2022 21:44:54 +0000 (17:44 -0400)]
Merge pull request #46272 from sshambar/bug-55664

cephadm: preserve cephadm user during RPM upgrade

Reviewed-by: Kefu Chai <tchaikov@gmail.com>
Reviewed-by: Michael Fritch <mfritch@suse.com>
3 years agoMerge pull request #46488 from jtlayton/teuth-branch-fix
David Galloway [Wed, 1 Jun 2022 19:17:03 +0000 (15:17 -0400)]
Merge pull request #46488 from jtlayton/teuth-branch-fix

qa: remove .teuthology_branch file

3 years agoqa: remove .teuthology_branch file 46488/head
Jeff Layton [Wed, 1 Jun 2022 18:26:33 +0000 (14:26 -0400)]
qa: remove .teuthology_branch file

This was originally added to help support the py2 -> py3 conversion.
That's long since complete so we should be able to just remove this file
now.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
3 years agoMerge pull request #46487 from jtlayton/teuth-branch-fix
David Galloway [Wed, 1 Jun 2022 18:18:00 +0000 (14:18 -0400)]
Merge pull request #46487 from jtlayton/teuth-branch-fix

qa: fix .teuthology_branch file in qa/

3 years agotest/rgw/multisite: enable zonegroup resharding feature 39002/head
Casey Bodley [Wed, 1 Jun 2022 18:10:24 +0000 (14:10 -0400)]
test/rgw/multisite: enable zonegroup resharding feature

qa/tasks/rgw_multisite.py uses 'zonegroup set' to create zonegroups from
their json format. this doesn't enable any of the supported zonegroup
features by default, so this adds the 'enabled_features' field to the
json representations

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agoqa: fix .teuthology_branch file in qa/ 46487/head
Jeff Layton [Wed, 1 Jun 2022 17:57:29 +0000 (13:57 -0400)]
qa: fix .teuthology_branch file in qa/

According to teuthology-suite:

  -t <branch>, --teuthology-branch <branch>
                              The teuthology branch to run against.
                              Default value is determined in the next order.
                              There is TEUTH_BRANCH environment variable set.
                              There is `qa/.teuthology_branch` present in
                              the suite repo and contains non-empty string.
                              There is `teuthology_branch` present in one of
                              the user or system `teuthology.yaml` configuration
                              files respectively, otherwise use `main`.

The .teuthology_branch file in the qa/ dir currently points at "master".
Change it to point to "main".

Signed-off-by: Jeff Layton <jlayton@redhat.com>
3 years agocephadm: fix osd adoption with custom cluster name 46481/head
Guillaume Abrioux [Wed, 1 Jun 2022 11:24:50 +0000 (13:24 +0200)]
cephadm: fix osd adoption with custom cluster name

When adopting Ceph OSD containers from a Ceph cluster with a custom name, it fails
because the name isn't propagated in unit.run.
The idea here is to change the lvm metadata and enforce 'ceph.cluster_name=ceph'
given that cephadm doesn't support custom names anyway.

Fixes: https://tracker.ceph.com/issues/55654
Signed-off-by: Adam King <adking@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agoMerge pull request #46474 from idryomov/wip-rbd-codeowners
Ilya Dryomov [Wed, 1 Jun 2022 16:29:38 +0000 (18:29 +0200)]
Merge pull request #46474 from idryomov/wip-rbd-codeowners

CODEOWNERS: add RBD team

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
3 years agoCODEOWNERS: add RBD team 46474/head
Ilya Dryomov [Wed, 1 Jun 2022 07:22:15 +0000 (09:22 +0200)]
CODEOWNERS: add RBD team

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
3 years agocrimson/os/seastore/segment_cleaner: retrieve different live extents in parallel
Xuehan Xu [Sat, 28 May 2022 08:38:30 +0000 (16:38 +0800)]
crimson/os/seastore/segment_cleaner: retrieve different live extents in parallel

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
3 years agomgr/cephadm: capture exception when not able to list upgrade tags 46445/head
Redouane Kachach [Tue, 31 May 2022 10:59:26 +0000 (12:59 +0200)]
mgr/cephadm: capture exception when not able to list upgrade tags
Fixes: https://tracker.ceph.com/issues/55801
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
3 years agomgr/telemetry: add Rook data 46483/head
Yaarit Hatuka [Wed, 1 Jun 2022 04:46:17 +0000 (04:46 +0000)]
mgr/telemetry: add Rook data

Add the first Rook data collection to telemetry's basic channel.

We choose to nag with this collection since we wish to know the volume
of Rook deployments in the wild.

The next Rook collections should have consecutive numbers (basic_rook_v02,
basic_rook_v03, ...).

See tracker below for more details.

Fixes: https://tracker.ceph.com/issues/55740
Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
3 years agoMerge pull request #46382 from rzarzynski/wip-crimson-op-tracking-3
Samuel Just [Tue, 31 May 2022 23:48:52 +0000 (16:48 -0700)]
Merge pull request #46382 from rzarzynski/wip-crimson-op-tracking-3

crimson/osd: add support for historic & slow op tracking

Reviewed-by: Samuel Just <sjust@redhat.com>
3 years agoMerge pull request #46437 from cyx1231st/wip-seastore-tune-and-fixes
Samuel Just [Tue, 31 May 2022 23:37:11 +0000 (16:37 -0700)]
Merge pull request #46437 from cyx1231st/wip-seastore-tune-and-fixes

crimson/os/seastore/segment_cleaner: tune and fixes around reclaiming

Reviewed-by: Samuel Just <sjust@redhat.com>
3 years agoMerge pull request #46193 from ljflores/wip-zero-detection-off-by-default
Laura Flores [Tue, 31 May 2022 21:55:51 +0000 (16:55 -0500)]
Merge pull request #46193 from ljflores/wip-zero-detection-off-by-default

os/bluestore: turn bluestore zero block detection off by default

3 years agorgw: restore check for empty olh name on reshard 46464/head
Casey Bodley [Tue, 31 May 2022 21:29:37 +0000 (17:29 -0400)]
rgw: restore check for empty olh name on reshard

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agotest/rgw: fix test case for empty-OLH-name cleanup
Casey Bodley [Tue, 31 May 2022 21:29:18 +0000 (17:29 -0400)]
test/rgw: fix test case for empty-OLH-name cleanup

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agoMerge pull request #46367 from 0xavi0/dbstore-default-dbdir-rgw-data
Soumya Koduri [Tue, 31 May 2022 15:57:28 +0000 (21:27 +0530)]
Merge pull request #46367 from 0xavi0/dbstore-default-dbdir-rgw-data

rgw/dbstore: change default value of dbstore_db_dir to /var/lib/ceph/radosgw

Reviewed-by: Soumya Koduri <skoduri@redhat.com>
3 years agoMerge pull request #46395 from cbodley/wip-backport-create-issue-assigned-to
Casey Bodley [Tue, 31 May 2022 15:17:12 +0000 (11:17 -0400)]
Merge pull request #46395 from cbodley/wip-backport-create-issue-assigned-to

backport-create-issue: copy 'Assignee' of original issue to backports

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
3 years agoMerge pull request #46415 from neha-ojha/wip-cw-core 46674/head
Neha Ojha [Tue, 31 May 2022 14:16:18 +0000 (07:16 -0700)]
Merge pull request #46415 from neha-ojha/wip-cw-core

.github/CODEOWNERS: tag core devs on core PRs

Reviewed-by: Laura Flores <lflores@redhat.com>
3 years agolibrbd: unlink newest mirror snapshot when at capacity, bump capacity 46454/head
Ilya Dryomov [Sun, 29 May 2022 16:20:34 +0000 (18:20 +0200)]
librbd: unlink newest mirror snapshot when at capacity, bump capacity

CreatePrimaryRequest::unlink_peer() invoked via "rbd mirror image
snapshot" command or via rbd_support mgr module when creating a new
scheduled mirror snapshot at rbd_mirroring_max_mirroring_snapshots
capacity on the primary cluster can race with Replayer::unlink_peer()
invoked by rbd-mirror when finishing syncing an older snapshot on the
secondary cluster.  Consider the following:

   [ primary: primary-snap1, primary-snap2, primary-snap3
     secondary: non-primary-snap1 (complete), non-primary-snap2 (syncing) ]

0. rbd-mirror is syncing snap1..snap2 delta
1. rbd_support creates primary-snap4
2. due to rbd_mirroring_max_mirroring_snapshots == 3, rbd_support picks
   primary-snap3 for unlinking
3. rbd-mirror finishes syncing snap1..snap2 delta and marks
   non-primary-snap2 complete

   [ snap1 (the old base) is no longer needed on either cluster ]

4. rbd-mirror unlinks and removes primary-snap1
5. rbd-mirror removes non-primary-snap1
6. rbd-mirror picks snap2 as the new base
7. rbd-mirror creates non-primary-snap3 and starts syncing snap2..snap3
   delta

   [ primary: primary-snap2, primary-snap3, primary-snap4
     secondary: non-primary-snap2 (complete), non-primary-snap3 (syncing) ]

8. rbd_support unlinks and removes primary-snap3 which is in-use by
   rbd-mirror

If snap trimming on the primary cluster kicks in soon enough, the
secondary image becomes corrupted: rbd-mirror would eventually finish
"syncing" non-primary-snap3 and mark it complete in spite of bogus data
in the HEAD -- the primary cluster OSDs would start returning ENOENT
for snap trimmed objects.  Luckily, rbd-mirror's attempt to pick snap3
as the new base would wedge the replayer with "split-brain detected:
failed to find matching non-primary snapshot in remote image" error.

Before commit a888bff8d00e ("librbd/mirror: tweak which snapshot is
unlinked when at capacity") this could happen pretty much all the time
as it was the second oldest snapshot that was unlinked.  This commit
changed it to be the third oldest snapshot, turning this into a more
narrow but still very much possible to hit race.

Unfortunately this race condition appears to be inherent to the way
snapshot-based mirroring is currently implemented:

a. when mirror snapshots are created on the producer side of the
   snapshot queue, they are already linked
b. mirror snapshots can be concurrently unlinked/removed on both
   sides of the snapshot queue by non-cooperating clients (local
   rbd_mirror_image_create_snapshot() vs remote rbd-mirror)
c. with mirror peer links off the list due to (a), there is no
   existing way for rbd-mirror to persistently mark a snapshot as
   in-use

As a workaround, bump rbd_mirroring_max_mirroring_snapshots to 5 and
always unlink the newest snapshot (i.e. slot 4) instead of the third
oldest snapshot (i.e. slot 2).  Hopefully this gives enough leeway,
as rbd-mirror would need to sync two snapshots (i.e. transition from
syncing 0-1 to 1-2 and then to 2-3) before potentially colliding with
rbd_mirror_image_create_snapshot() on slot 4.

Fixes: https://tracker.ceph.com/issues/55803
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
3 years agotest/librbd: fix set_val() call in SuccessUnlink* test cases
Ilya Dryomov [Sun, 29 May 2022 17:55:04 +0000 (19:55 +0200)]
test/librbd: fix set_val() call in SuccessUnlink* test cases

rbd_mirroring_max_mirroring_snapshots isn't actually set to 3 there
due to the stray conf_ prefix.  It didn't matter until now because the
default was also 3.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
3 years agodebian: extract python3 packages to a single place 46416/head
Kefu Chai [Sun, 29 May 2022 00:22:59 +0000 (08:22 +0800)]
debian: extract python3 packages to a single place

to better maintainability

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
3 years agodebian: add .requires for specifying python3 deps
Kefu Chai [Sat, 28 May 2022 09:03:34 +0000 (17:03 +0800)]
debian: add .requires for specifying python3 deps

we use dh_python3 to define subvar of ${python3:Depends} as a part
of the runtime dependencies of python3 packages, like,
ceph-mgr modules named "ceph-mgr-*", python3 bindings named "python3-*".

but unlike python3 bindings of Ceph APIs, the ceph-mgr modules are
not packaged in a typical python way. in other words, they do not
ship a "dist-info" or an "egg-info" directory. instead, we just
install the python scripts into a directory which can be found by
ceph-mgr, by default it is /usr/share/ceph/mgr/dashboard/plugins.

this does not follow the convention of python packaging or
debian packaging policies related to python package. but it
still makes to put these files in this non-convention place, as
they are not supposed to be python packages consumed by the
outer world -- they are but plugins. and should always work
with the same version of ceph-mgr.

the problem is, despite that we have ${python3:Depends} in
the "Depends" field of packages like ceph-mgr-dashboard, dh_python3
is not able to figure out the dependencies by looking at the
installed files. for instance, we have following "Depends" of
ceph-mgr-dashboard:

Depends: ceph-mgr (= 17.0.0-12481-g805d2320-1focal), python3-cherrypy3, python3-jwt, python3-bcrypt, python3-werkzeug, python3-routes

and in the debian/control file we have:

Depends: ceph-mgr (= ${binary:Version}),
         python3-cherrypy3,
         python3-jwt,
         python3-bcrypt,
         python3-werkzeug,
         python3-routes,
         ${misc:Depends},
         ${python:Depends},
         ${shlibs:Depends},

apparently, none of the subvar is materialized to
a non-empty string.

to improve the packaging, in this change:

* drop all subvars from ceph-mgr-*, as they
  are all implemented in pure python.
* add debian/ceph-mgr-*.requires, it's content
  is replicated with the corresponding requirements.txt
  files.
  * add python3-distutils for distutils, as debian
    and its derivatives package non-essetial part of
    distutils into a separate package, see
    https://packages.debian.org/stable/python3-distutils
* add ${python3:Depends} so dh_python3
  can extract the deps from debian/ceph-mgr-*.pydist
* update the rule for "override_dh_python3" target,
  so dh_python3 can pick up the dependencies specified
  in .requires file.
* remove the python3 dependencies not used by
  ceph-mgr from ceph-mgr's "Depends"

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
3 years agomgr/cephadm: check if a service exists before trying to restart it 46444/head
Redouane Kachach [Tue, 31 May 2022 10:11:03 +0000 (12:11 +0200)]
mgr/cephadm: check if a service exists before trying to restart it
Fixes: https://tracker.ceph.com/issues/55800
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
3 years agoMerge pull request #46430 from zdover23/wip-doc-2022-05-30-hw-recs-memory-section
zdover23 [Tue, 31 May 2022 07:07:40 +0000 (17:07 +1000)]
Merge pull request #46430 from zdover23/wip-doc-2022-05-30-hw-recs-memory-section

doc/start: update "memory" in hardware-recs.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
3 years agorgw/dbstore: change default value of dbstore_db_dir to /var/lib/ceph/radosgw 46367/head
0xavi0 [Mon, 23 May 2022 12:01:25 +0000 (14:01 +0200)]
rgw/dbstore: change default value of dbstore_db_dir to /var/lib/ceph/radosgw

Changes a few NULL to nullptr.

Adds std::filesystem for path building so they're platform independant.

Fixes a bug for DBStoreManager's second constructor not creating the DB.

Adds unit tests to test DB path and prefix.

Fixes: https://tracker.ceph.com/issues/55731
Signed-off-by: 0xavi0 <xavi.garcia@suse.com>
3 years agodoc/start: update "memory" in hardware-recs.rst 46430/head
Zac Dover [Mon, 30 May 2022 13:32:06 +0000 (23:32 +1000)]
doc/start: update "memory" in hardware-recs.rst

This PR corrects some usage errors in the "Memory" section
of the hardware-recommendations.rst file. It also closes
some opened but never closed parentheses.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
3 years agoMerge pull request #46086 from nmshelke/feature-55401
Venky Shankar [Tue, 31 May 2022 05:00:56 +0000 (10:30 +0530)]
Merge pull request #46086 from nmshelke/feature-55401

mgr/volumes: set, get, list and remove metadata of snapshot

Reviewed-by: Venky Shankar <vshankar@redhat.com>
3 years agocrimson/os/seastore/segment_cleaner: add info logs to reveal trim activities 46437/head
Yingxin Cheng [Mon, 30 May 2022 12:30:07 +0000 (20:30 +0800)]
crimson/os/seastore/segment_cleaner: add info logs to reveal trim activities

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore/transaction_manager: set to test mode under debug build
Yingxin Cheng [Mon, 30 May 2022 10:35:33 +0000 (18:35 +0800)]
crimson/os/seastore/transaction_manager: set to test mode under debug build

* force to test mode under debug build.
* make reclaim to happen and validated as early as possible.
* do not block user transaction when reclaim-ratio (unalive/unavailable)
  is high, especially in the beginning.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore/segment_cleaner: cleanup reclaim logic
Yingxin Cheng [Fri, 27 May 2022 09:05:33 +0000 (17:05 +0800)]
crimson/os/seastore/segment_cleaner: cleanup reclaim logic

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore/seastore_types: include backref as physical extents
Yingxin Cheng [Fri, 27 May 2022 08:43:26 +0000 (16:43 +0800)]
crimson/os/seastore/seastore_types: include backref as physical extents

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore/cache: assert dirty
Yingxin Cheng [Fri, 27 May 2022 08:42:19 +0000 (16:42 +0800)]
crimson/os/seastore/cache: assert dirty

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore: cleanup rewrite_extent()
Yingxin Cheng [Fri, 27 May 2022 08:32:06 +0000 (16:32 +0800)]
crimson/os/seastore: cleanup rewrite_extent()

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore/segment_cleaner: delay reclaim until near full
Yingxin Cheng [Mon, 30 May 2022 05:27:30 +0000 (13:27 +0800)]
crimson/os/seastore/segment_cleaner: delay reclaim until near full

It should be generically better to delay reclaim as much as possible, so
that:
* unalive/unavailable can higher to reduce reclaim efforts;
* less conflicts between mutate and reclaim transactions;

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agoMerge pull request #46296 from ceph/wip-nitzan-osd-log-to-correct-sufix
Samuel Just [Mon, 30 May 2022 23:37:41 +0000 (16:37 -0700)]
Merge pull request #46296 from ceph/wip-nitzan-osd-log-to-correct-sufix

crimson/osd: logger into log_file

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Kefu Chai <tchaikov@gmail.com>
3 years agoMerge pull request #46388 from rzarzynski/wip-crimson-reindent-main-trace
Samuel Just [Mon, 30 May 2022 23:33:25 +0000 (16:33 -0700)]
Merge pull request #46388 from rzarzynski/wip-crimson-reindent-main-trace

crimson/osd: reindent the trace-related fragment of main()

Reviewed-by: Kefu Chai <tchaikov@gmail.com>
3 years agorbd-mirror: don't prune non-primary snapshot when restarting delta sync 46434/head
Ilya Dryomov [Sat, 28 May 2022 18:06:22 +0000 (20:06 +0200)]
rbd-mirror: don't prune non-primary snapshot when restarting delta sync

When restarting interrupted sync (signified by the "end" non-primary
snapshot with last_copied_object_number > 0), preserve the "start"
non-primary snapshot until the sync is completed, like it would have
been done had the sync not been interrupted.  This ensures that the
same m_local_snap_id_start is passed to scan_remote_mirror_snapshots()
and ultimately ImageCopyRequest state machine on restart as on initial
start.

This ends up being yet another fixup for 281af0de86b1 ("rbd-mirror:
prune unnecessary non-primary mirror snapshots"), following earlier
7ba9214ea5b7 ("rbd-mirror: don't prune older mirror snapshots when
pruning incomplete snapshot") and ecd3778a6f9a ("rbd-mirror: ensure
that the last non-primary snapshot cannot be pruned").

Fixes: https://tracker.ceph.com/issues/55796
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
3 years agocls/rbd: fix operator<< for MirrorSnapshotNamespace
Ilya Dryomov [Sat, 28 May 2022 08:04:11 +0000 (10:04 +0200)]
cls/rbd: fix operator<< for MirrorSnapshotNamespace

Commit 50702eece0b1 ("cls/rbd: added clean_since_snap_id to
MirrorSnapshotNamespace") updated dump() but missed operator<<
overload.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
3 years agoMerge pull request #46320 from ronen-fr/wip-rf-snaps-onerr
Ronen Friedman [Mon, 30 May 2022 17:36:41 +0000 (20:36 +0300)]
Merge pull request #46320 from ronen-fr/wip-rf-snaps-onerr

osd/scrub: restart snap trimming after a failed scrub

Reviewed-by: Laura Flores <lflores@redhat.com>
3 years agocrimson/osd: add support for slowest historic op tracking 46382/head
Radosław Zarzyński [Fri, 27 May 2022 17:31:18 +0000 (19:31 +0200)]
crimson/osd: add support for slowest historic op tracking

Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
3 years agocrimson/osd: make OSDOperationRegistry responsible for historic ops
Radosław Zarzyński [Fri, 27 May 2022 17:24:45 +0000 (19:24 +0200)]
crimson/osd: make OSDOperationRegistry responsible for historic ops

Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
3 years agocrimson/osd: add support for historic op tracking.
Radosław Zarzyński [Thu, 14 Apr 2022 11:17:17 +0000 (13:17 +0200)]
crimson/osd: add support for historic op tracking.

Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
3 years agoosd/scrub: restart snap trimming after a failed scrub 46320/head
Ronen Friedman [Tue, 17 May 2022 16:13:59 +0000 (16:13 +0000)]
osd/scrub: restart snap trimming after a failed scrub

A followup to PR#45640.
In PR#45640 snap trimming was restarted (if blocked) after all
successful scrubs, and after most scrub failures. Still, a few
failure scenarios did not handle snaptrim restart correctly.

The current PR cleans up and fixes the interaction between
scrub initiation/termination (for whatever cause) and snap
trimming.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
3 years agoMerge pull request #46426 from idryomov/wip-iscsi-mutual-chap-doc
Ilya Dryomov [Mon, 30 May 2022 12:54:06 +0000 (14:54 +0200)]
Merge pull request #46426 from idryomov/wip-iscsi-mutual-chap-doc

doc/rbd: add mutual CHAP authentication example

Reviewed-by: Xiubo Li <xiubli@redhat.com>
3 years agodoc/rbd: add mutual CHAP authentication example 46426/head
Ilya Dryomov [Mon, 30 May 2022 09:54:35 +0000 (11:54 +0200)]
doc/rbd: add mutual CHAP authentication example

Based on https://github.com/ceph/ceph-iscsi/pull/260.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
3 years agodebian: s/${python:Depends}/${python3:Depends}/
Kefu Chai [Sat, 28 May 2022 10:37:46 +0000 (18:37 +0800)]
debian: s/${python:Depends}/${python3:Depends}/

${python:Depends} is added by dh_python2. but we've migrated to
python3 and Ceph is not compatible with python2 anymore. let's
replace all references of python2 with python3.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
3 years agoMerge pull request #35598 from tchaikov/wip-cephfs-java
Kefu Chai [Sat, 28 May 2022 05:29:25 +0000 (13:29 +0800)]
Merge pull request #35598 from tchaikov/wip-cephfs-java

rpm,install-dep.sh: build cephfs java binding

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
3 years ago.github/CODEOWNERS: tag core devs on core PRs 46415/head
Neha Ojha [Fri, 27 May 2022 19:34:57 +0000 (19:34 +0000)]
.github/CODEOWNERS: tag core devs on core PRs

Start with everything that is present under core in .github/labeler.yml.

Signed-off-by: Neha Ojha <nojha@redhat.com>
3 years agoqa/rgw: fix flake8 errors in test_rgw_reshard.py
Casey Bodley [Wed, 25 May 2022 18:13:55 +0000 (14:13 -0400)]
qa/rgw: fix flake8 errors in test_rgw_reshard.py

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw/motr: fix build for MotrStore
Casey Bodley [Tue, 24 May 2022 15:29:52 +0000 (11:29 -0400)]
rgw/motr: fix build for MotrStore

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw: `RGWSyncBucketCR` reads remote info on non-`Incremental` state
Adam C. Emerson [Tue, 17 May 2022 03:26:48 +0000 (23:26 -0400)]
rgw: `RGWSyncBucketCR` reads remote info on non-`Incremental` state

This ensures that the remote bucket index log info is available for
all cases where we're calling `InitBucketFullSyncStatusCR`

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agotest/rgw: bucket sync run recovery case
Adam C. Emerson [Fri, 13 May 2022 19:56:28 +0000 (15:56 -0400)]
test/rgw: bucket sync run recovery case

1. Write several generations worth of objects. Ensure that everything
   has synced and that at least some generations have been trimmed.
2. Turn off the secondary `radosgw`.
3. Use `radosgw-admin object rm` to delete all objects in the bucket
   on the secondary.
4. Invoke `radosgw-admin bucket sync init` on the secondary.
5. Invoke `radosgw-admin bucket sync run` on the secondary.
6. Verify that all objects on the primary are also present on the
   secondary.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agotest/rgw: Add incremental test of bucket sync run
Adam C. Emerson [Wed, 11 May 2022 22:39:08 +0000 (18:39 -0400)]
test/rgw: Add incremental test of bucket sync run

This tests for iterating properly over the generations.

1. Create a bucket and write some objects to it. Wait for sync to
   complete. This ensures we are in Incremental.
2. Turn off the secondary `radosgw`.
3. Manually reshard. Then continue writing objects and resharding.
4. Choose objects so that each generation has objects in many but not
   all shards.
5. After building up several generations, run `bucket sync run` on the
   secondary.
6. Verify that all objects on the primary are on the secondary.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: add `bucket object shard` command to radosgw-admin
Adam C. Emerson [Tue, 17 May 2022 03:23:40 +0000 (23:23 -0400)]
rgw: add `bucket object shard` command to radosgw-admin

Given an object, return the bucket shard appropriate to it.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Add 'bucket shard objects' command to radosgw-admin
Adam C. Emerson [Wed, 11 May 2022 21:08:01 +0000 (17:08 -0400)]
rgw: Add 'bucket shard objects' command to radosgw-admin

To be used in testing, to write to some subset of shards for reshard testing.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: bucket sync run walks over generations
Adam C. Emerson [Sat, 14 May 2022 05:11:57 +0000 (01:11 -0400)]
rgw: bucket sync run walks over generations

This should make the troubleshooting use case of bucket sync init/run
usable with multisite reshard.

This also fixes a few issues with the original bucket sync run code,
by spawning multiple shards at a time and retrying retryable errors.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Remove unused RGWRemoteBucketManager
Adam C. Emerson [Thu, 14 Apr 2022 13:51:00 +0000 (09:51 -0400)]
rgw: Remove unused RGWRemoteBucketManager

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Disentangle RGWBucketPipeSyncStatusManager::run
Adam C. Emerson [Thu, 14 Apr 2022 13:42:54 +0000 (09:42 -0400)]
rgw: Disentangle RGWBucketPipeSyncStatusManager::run

Again, from RGWRemoteBucketManager.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Disentangle read_sync_status from RemoteBucketManager
Adam C. Emerson [Thu, 14 Apr 2022 13:35:40 +0000 (09:35 -0400)]
rgw: Disentangle read_sync_status from RemoteBucketManager

Also fix the problem where we read the status from all peers into the
same map at once.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Disentangle init_sync_status from RemoteBucketManager
Adam C. Emerson [Wed, 13 Apr 2022 02:10:31 +0000 (22:10 -0400)]
rgw: Disentangle init_sync_status from RemoteBucketManager

RGWRemoteBucketManager's current design isn't really compatible with
what we need for bucket sync run to work as the number of shards
changes from run to run.

We can make a smaller 'hold information common to all three
operations' class and simplify things a bit.

We also need to fetch `rgw_bucket_index_marker_info` and supply it to
`InitBucketFullSyncStatusCR` to ensure we have the correct generation
and shard count.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Get rid of RGWBucketPipeSyncStatusManager::init
Adam C. Emerson [Wed, 13 Apr 2022 01:37:05 +0000 (21:37 -0400)]
rgw: Get rid of RGWBucketPipeSyncStatusManager::init

Use the Named Constructor Idiom instead.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: RGWBucketPipeSyncStatusManager doesn't need a conn
Adam C. Emerson [Tue, 12 Apr 2022 23:44:48 +0000 (19:44 -0400)]
rgw: RGWBucketPipeSyncStatusManager doesn't need a conn

`conn` is per-source. last_zone just saved a lookup in a small map.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Get rid of RGWBucketPipeSyncStatusManager::get_sync_status
Adam C. Emerson [Tue, 12 Apr 2022 20:46:46 +0000 (16:46 -0400)]
rgw: Get rid of RGWBucketPipeSyncStatusManager::get_sync_status

Instead of one function that sets a variable and another function that
returns it and nobody else touches it, just return the sync status.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
3 years agorgw: Clean up RGWBucketPipeSyncStatusManager construction
Adam C. Emerson [Tue, 12 Apr 2022 20:26:27 +0000 (16:26 -0400)]
rgw: Clean up RGWBucketPipeSyncStatusManager construction

The coupling between this class and RGWRemoteBucketManager makes no
sense. Clean things up a bit.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>