]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
2 years agodoc/rados: edit devices.rst 51473/head
Zac Dover [Mon, 15 May 2023 01:01:19 +0000 (11:01 +1000)]
doc/rados: edit devices.rst

Line-edit doc/rados/operations/devices.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
2 years agoMerge pull request #51463 from zdover23/wip-doc-2023-05-13-fs-volumes-1-of-x
zdover23 [Sat, 13 May 2023 02:41:36 +0000 (12:41 +1000)]
Merge pull request #51463 from zdover23/wip-doc-2023-05-13-fs-volumes-1-of-x

doc/cephfs: edit fs-volumes.rst (1 of x)

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2 years agodoc/cephfs: edit fs-volumes.rst (1 of x) 51463/head
Zac Dover [Fri, 12 May 2023 15:49:14 +0000 (01:49 +1000)]
doc/cephfs: edit fs-volumes.rst (1 of x)

Edit the syntax of the English language in the file
doc/cephfs/fs-volumes.rst up to (but not including) the section called
"FS Subvolumes".

Signed-off-by: Zac Dover <zac.dover@proton.me>
2 years agoMerge pull request #51458 from zdover23/wip-doc-2023-05-12-cephfs-fs-volumes-prompt...
zdover23 [Fri, 12 May 2023 12:42:23 +0000 (22:42 +1000)]
Merge pull request #51458 from zdover23/wip-doc-2023-05-12-cephfs-fs-volumes-prompt-rectification

doc/cephfs: rectify prompts in fs-volumes.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2 years agodoc/cephfs: rectify prompts in fs-volumes.rst 51458/head
Zac Dover [Fri, 12 May 2023 10:35:25 +0000 (20:35 +1000)]
doc/cephfs: rectify prompts in fs-volumes.rst

Make sure all prompts are unselectable. This PR is meant to be
backported to Reef, Quincy, and Pacific, to get all of the prompts into
a fit state so that a line-edit can be performed on the Englsh language
in this file.

Follows https://github.com/ceph/ceph/pull/51427.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2 years agoMerge pull request #51448 from Matan-B/wip-matanb-crimson-only-mclock-boot
Samuel Just [Thu, 11 May 2023 18:35:20 +0000 (11:35 -0700)]
Merge pull request #51448 from Matan-B/wip-matanb-crimson-only-mclock-boot

crimson/osd/scheduler/mclock_scheduler: Fix OSD unable to start

Reviewed-by: Samuel Just <sjust@redhat.com>
2 years agoMerge pull request #51388 from Matan-B/wip-matanb-c-enable-rbd-tests
Matan [Thu, 11 May 2023 14:28:55 +0000 (16:28 +0200)]
Merge pull request #51388 from Matan-B/wip-matanb-c-enable-rbd-tests

qa/suites/crimson: Enhance rbd api testing

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Radosław Zarzyński <rzarzyns@redhat.com>
2 years agocrimson/osd/scheduler/mclock_scheduler: Fix OSD unable to start 51448/head
Matan Breizman [Thu, 11 May 2023 14:18:46 +0000 (14:18 +0000)]
crimson/osd/scheduler/mclock_scheduler: Fix OSD unable to start

https://github.com/ceph/ceph/pull/49975 Introduced changes to
mclock conf value types which caused the osd to stall while booting.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2 years agoMerge PR #51251 into main
Venky Shankar [Thu, 11 May 2023 05:51:14 +0000 (11:21 +0530)]
Merge PR #51251 into main

* refs/pull/51251/head:
PendingReleaseNotes: add a note about deleting files from lost+found directory
qa: add checks that validate removal of entries from lost+found dir
mds: allow unlink operation under lost+found directory

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2 years agoMerge PR #51201 into main
Venky Shankar [Thu, 11 May 2023 05:49:13 +0000 (11:19 +0530)]
Merge PR #51201 into main

* refs/pull/51201/head:
qa: run scrub post file system recovery

Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>
2 years agoMerge PR #51188 into main
Venky Shankar [Thu, 11 May 2023 03:55:50 +0000 (09:25 +0530)]
Merge PR #51188 into main

* refs/pull/51188/head:
client: use deep-copy when setting permission during make_request

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
2 years agoMerge pull request #51423 from bigjust/replace-go-example-mods
Ken Dreyer [Wed, 10 May 2023 17:18:22 +0000 (13:18 -0400)]
Merge pull request #51423 from bigjust/replace-go-example-mods

examples: replace example go modules with instructions to run

Reviewed-by: Yuval Lifshitz <ylifshit@redhat.com>
Reviewed-by: Ken Dreyer <kdreyer@redhat.com>
2 years agoMerge pull request #51427 from zdover23/wip-doc-2023-05-10-cephfs-fs-volumes-prompt-fix
zdover23 [Wed, 10 May 2023 15:30:44 +0000 (01:30 +1000)]
Merge pull request #51427 from zdover23/wip-doc-2023-05-10-cephfs-fs-volumes-prompt-fix

doc/cephfs: fix prompts in fs-volumes.rst

Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
2 years agodoc/cephfs: fix prompts in fs-volumes.rst 51427/head
Zac Dover [Wed, 10 May 2023 14:52:50 +0000 (00:52 +1000)]
doc/cephfs: fix prompts in fs-volumes.rst

Fixed a regression introduced in
e5355e3d66e1438d51de6b57eae79fab47cd0184 that broke the unselectable
prompts in the RST.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2 years agoMerge pull request #51345 from cbodley/wip-59639
Casey Bodley [Wed, 10 May 2023 12:56:37 +0000 (08:56 -0400)]
Merge pull request #51345 from cbodley/wip-59639

rgw/dbstore: allow NULL RealmIDs in sqlite schema

Reviewed-by: Soumya Koduri <skoduri@redhat.com>
Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
2 years agoexamples: replace example go modules with instructions to run 51423/head
Justin Caratzas [Tue, 18 Apr 2023 16:35:37 +0000 (12:35 -0400)]
examples: replace example go modules with instructions to run

Signed-off-by: Justin Caratzas <jcaratza@ibm.com>
2 years agoMerge pull request #50627 from AliMasarweh/wip-ali-masa-multipart-populate-etag
Ali Masarwa [Wed, 10 May 2023 12:10:12 +0000 (15:10 +0300)]
Merge pull request #50627 from AliMasarweh/wip-ali-masa-multipart-populate-etag

RGW: Solving the issue of not populating etag in Multipart upload result
Reviewed-by: Daniel Gryniewicz <dang1@ibm.com>
2 years agoMerge pull request #49742 from ajarr/fix-56724
Ilya Dryomov [Wed, 10 May 2023 09:55:42 +0000 (11:55 +0200)]
Merge pull request #49742 from ajarr/fix-56724

mgr/rbd_support: recover from rados client blocklisting

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2 years agoMerge pull request #51166 from chrisphoffman/wip-rbd-59393
Ilya Dryomov [Wed, 10 May 2023 09:53:16 +0000 (11:53 +0200)]
Merge pull request #51166 from chrisphoffman/wip-rbd-59393

librbd: localize snap_remove op for mirror snapshots

Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2 years agoMerge PR #43184 into main
Venky Shankar [Wed, 10 May 2023 08:34:58 +0000 (14:04 +0530)]
Merge PR #43184 into main

* refs/pull/43184/head:
qa: fix journal flush failure issue due to the MDS daemon crashes
qa: add test support for the alloc ino failing
mds: do not take the ino which has been used

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2 years agoqa: run scrub post file system recovery 51201/head
Venky Shankar [Mon, 24 Apr 2023 04:54:55 +0000 (00:54 -0400)]
qa: run scrub post file system recovery

Running file system scrub is recommended post running filesystem
data and metadata recovery. Running scrub isn't covered in tests.

Fixes: http://tracker.ceph.com/issues/59527
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2 years agoMerge pull request #51167 from liu-chunmei/teuthology-multicore
Liu-Chunmei [Tue, 9 May 2023 23:04:47 +0000 (16:04 -0700)]
Merge pull request #51167 from liu-chunmei/teuthology-multicore

crimson/qa: make crimson run multicore in teuthology test

Reviewed-by: Samuel Just <sjust@redhat.com>
2 years agoMerge pull request #51301 from ceph/wip-yuriw-release-16.2.13-main
Laura Flores [Tue, 9 May 2023 22:04:38 +0000 (17:04 -0500)]
Merge pull request #51301 from ceph/wip-yuriw-release-16.2.13-main

doc: 16.2.13 Release Notes

2 years agodoc: 16.2.13 Release Notes 51301/head
Yuri Weinstein [Mon, 1 May 2023 20:09:47 +0000 (13:09 -0700)]
doc: 16.2.13 Release Notes

Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
Signed-off-by: Laura Flores <lflores@redhat.com>
2 years agoMerge pull request #50411 from xxhdx1985126/wip-58928
Samuel Just [Tue, 9 May 2023 18:55:41 +0000 (11:55 -0700)]
Merge pull request #50411 from xxhdx1985126/wip-58928

crimson/osd: start operations asynchrously

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Reviewed-by: Matan Breizman <mbreizma@redhat.com>
2 years agoMerge pull request #51403 from zdover23/wip-doc-2023-05-09-start-get-involved-planet...
zdover23 [Tue, 9 May 2023 14:50:01 +0000 (00:50 +1000)]
Merge pull request #51403 from zdover23/wip-doc-2023-05-09-start-get-involved-planet-ceph

doc/start: fix "Planet Ceph" link

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2 years agocrimson/qa: enable multicore for crimson in teuthology test 51167/head
chunmei [Thu, 20 Apr 2023 22:09:34 +0000 (22:09 +0000)]
crimson/qa: enable multicore for crimson in teuthology test

Signed-off-by: chunmei <chunmei.liu@intel.com>
2 years agoMerge pull request #47749 from xxhdx1985126/wip-intra-fixedkvbtree-pointers-2
Yingxin [Tue, 9 May 2023 08:37:41 +0000 (16:37 +0800)]
Merge pull request #47749 from xxhdx1985126/wip-intra-fixedkvbtree-pointers-2

crimson/os/seastore/btree: link fixedkvbtree's nodes and logical extents with forward and backward pointers, and drop the pin_set

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
2 years agoMerge pull request #51308 from jzhu116-bloomberg/wip-59592
Yuval Lifshitz [Tue, 9 May 2023 07:34:36 +0000 (10:34 +0300)]
Merge pull request #51308 from jzhu116-bloomberg/wip-59592

rgw/notification: remove non x-amz-meta-* attributes from bucket notifications

2 years agocrimson/tools/store_nbd: read logical extents via 47749/head
Xuehan Xu [Mon, 8 May 2023 08:15:55 +0000 (08:15 +0000)]
crimson/tools/store_nbd: read logical extents via
TransactionManager::read_pin()

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agocrimson/os/seastore/cache: add comment about backref_extent_entry_t
Xuehan Xu [Thu, 23 Mar 2023 09:59:12 +0000 (09:59 +0000)]
crimson/os/seastore/cache: add comment about backref_extent_entry_t

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agotest/crimson/seastore: complement lba test with logical extents
Xuehan Xu [Sat, 11 Mar 2023 03:46:14 +0000 (03:46 +0000)]
test/crimson/seastore: complement lba test with logical extents

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agotest/crimson/seastore: check intra-fixedkv-btree parent->child trackers during unittests
Xuehan Xu [Mon, 29 Aug 2022 08:12:00 +0000 (16:12 +0800)]
test/crimson/seastore: check intra-fixedkv-btree parent->child trackers during unittests

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agocrimson/os/seastore/btree: drop btree_pin_set_t
Xuehan Xu [Mon, 27 Mar 2023 02:20:59 +0000 (02:20 +0000)]
crimson/os/seastore/btree: drop btree_pin_set_t

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agocrimson/os/seastore/transaction_manager: follow leaf<->logical extent pointers to...
Xuehan Xu [Sat, 6 May 2023 09:26:18 +0000 (17:26 +0800)]
crimson/os/seastore/transaction_manager: follow leaf<->logical extent pointers to read extent

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agocrimson/os/seastore/lba_manager: link lba leaf nodes with logical extents by pointers
Xuehan Xu [Tue, 25 Oct 2022 06:03:43 +0000 (14:03 +0800)]
crimson/os/seastore/lba_manager: link lba leaf nodes with logical extents by pointers

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agocrimson/os/seastore/btree: "templatize" btree leaf node to distinguish leaf nodes...
Xuehan Xu [Thu, 27 Oct 2022 07:21:32 +0000 (15:21 +0800)]
crimson/os/seastore/btree: "templatize" btree leaf node to distinguish leaf nodes with(out) children

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agocrimson/os/seastore/btree: link fixed-kv-btree and root_block with pointers
Xuehan Xu [Thu, 20 Oct 2022 09:41:25 +0000 (17:41 +0800)]
crimson/os/seastore/btree: link fixed-kv-btree and root_block with pointers

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agocrimson/os/seastore: more debug logs
Xuehan Xu [Thu, 20 Oct 2022 05:35:08 +0000 (13:35 +0800)]
crimson/os/seastore: more debug logs

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agocrimson/os/seastore/backref_manager: retrieve live backref extents throught the backr...
Xuehan Xu [Wed, 17 Aug 2022 10:07:42 +0000 (18:07 +0800)]
crimson/os/seastore/backref_manager: retrieve live backref extents throught the backref tree

After involving intra-fixed-kv-btree parent-child pointers, we need to keep the
invariant that it's only when extents are not in transactions' read_set that
we can directly query cache with inspecting the transaction

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agocrimson/os/seastore/btree: avoid searching transactions' read_set when retrieving...
Xuehan Xu [Thu, 13 Oct 2022 06:27:34 +0000 (14:27 +0800)]
crimson/os/seastore/btree: avoid searching transactions' read_set when retrieving btree nodes

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agocrimson/os/seastore/btree: search fixed-kv-btree by parent<->child pointers
Xuehan Xu [Thu, 13 Oct 2022 03:50:17 +0000 (11:50 +0800)]
crimson/os/seastore/btree: search fixed-kv-btree by parent<->child pointers

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agocrimson/os/seastore/cache: invalidate out-dated extent when initiating Cache
Xuehan Xu [Thu, 13 Oct 2022 02:57:09 +0000 (10:57 +0800)]
crimson/os/seastore/cache: invalidate out-dated extent when initiating Cache

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agocrimson/os/seastore/cached_extent: improve the representation of "has_been_invalidated"
Xuehan Xu [Wed, 12 Oct 2022 06:37:39 +0000 (14:37 +0800)]
crimson/os/seastore/cached_extent: improve the representation of "has_been_invalidated"

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agocrimson/os/seastore/btree: don't go to leaf nodes when updating internal mappings
Xuehan Xu [Tue, 31 Jan 2023 06:36:42 +0000 (14:36 +0800)]
crimson/os/seastore/btree: don't go to leaf nodes when updating internal mappings

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agocrimson/os/seastore/btree: introduce parent<->child pointers for fixed-kv-btree nodes
Xuehan Xu [Tue, 11 Oct 2022 02:34:16 +0000 (10:34 +0800)]
crimson/os/seastore/btree: introduce parent<->child pointers for fixed-kv-btree nodes

maintain correct parent<->child pointers when modifying the btree

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2 years agodoc/start: fix "Planet Ceph" link 51403/head
Zac Dover [Tue, 9 May 2023 03:39:10 +0000 (13:39 +1000)]
doc/start: fix "Planet Ceph" link

Fix a link to Planet Ceph on the doc/start/get-involved.rst page.

Reported 2023 Apr 21, here:
https://pad.ceph.com/p/Report_Documentation_Bugs

Signed-off-by: Zac Dover <zac.dover@proton.me>
2 years agoMerge pull request #51355 from aravind-wdc/wip-crimson-zbd
Yingxin [Tue, 9 May 2023 03:29:54 +0000 (11:29 +0800)]
Merge pull request #51355 from aravind-wdc/wip-crimson-zbd

crimson/os/seastore: enable SMR HDD

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
2 years agoMerge pull request #51392 from parth-gr/rgw-mutisite-ceph-doc
zdover23 [Tue, 9 May 2023 02:37:40 +0000 (12:37 +1000)]
Merge pull request #51392 from parth-gr/rgw-mutisite-ceph-doc

doc: update multisite doc

Reviewed-by: Jiffin Tony Thottan <jthottan@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Zac Dover <zac.dover@proton.me>
2 years agoMerge pull request #50857 from kamoltat/wip-ksirivad-iswriteable
Kamoltat Sirivadhna [Tue, 9 May 2023 01:04:59 +0000 (21:04 -0400)]
Merge pull request #50857 from kamoltat/wip-ksirivad-iswriteable

mon/Monitor.cc: exit function if !osdmon()->is_writeable()
Reviewd-by: Gregory Farnum <gfarnum@redhat.com>
2 years agoMerge pull request #51394 from rzarzynski/wip-doc-encode-stdoptional
zdover23 [Tue, 9 May 2023 00:53:06 +0000 (10:53 +1000)]
Merge pull request #51394 from rzarzynski/wip-doc-encode-stdoptional

doc/dev/encoding.txt: update per std::optional

Reviewed-by: Zac Dover <zac.dover@proton.me>
2 years agoqa/workunits/rbd: Add tests for rbd_support module recovery 49742/head
Ramana Raja [Sun, 5 Feb 2023 03:36:16 +0000 (22:36 -0500)]
qa/workunits/rbd: Add tests for rbd_support module recovery

... after the module's RADOS client is blocklisted.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 years agomgr/rbd_support: recover from rados client blocklisting
Ramana Raja [Wed, 15 Feb 2023 15:12:54 +0000 (10:12 -0500)]
mgr/rbd_support: recover from rados client blocklisting

In certain scenarios the OSDs were slow to process RBD requests.
This lead to the rbd_support module's RBD client not being able to
gracefully handover a RBD exclusive lock to another RBD client.
After the condition persisted for some time, the other RBD client
forcefully acquired the lock by blocklisting the rbd_support module's
RBD client, and consequently blocklisted the module's RADOS client. The
rbd_support module stopped working. To recover the module, the entire
mgr service had to be restarted which reloaded other mgr modules.

Instead of recovering the rbd_support module from client blocklisting
by being disruptive to other mgr modules, recover the module
automatically without restarting the mgr serivce. On client getting
blocklisted, shutdown the module's handlers and blocklisted client,
create a new rados client for the module, and start the new handlers.

Fixes: https://tracker.ceph.com/issues/56724
Signed-off-by: Ramana Raja <rraja@redhat.com>
2 years agoMerge pull request #51365 from nbalacha/fix-remove-unused-type
Ilya Dryomov [Mon, 8 May 2023 19:24:28 +0000 (21:24 +0200)]
Merge pull request #51365 from nbalacha/fix-remove-unused-type

librbd: remove unused enum WriteOpType

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2 years agoMerge pull request #49975 from sseshasa/wip-fix-mclk-rec-backfill-cost
Radoslaw Zarzynski [Mon, 8 May 2023 18:22:11 +0000 (20:22 +0200)]
Merge pull request #49975 from sseshasa/wip-fix-mclk-rec-backfill-cost

osd: mClock recovery/backfill cost fixes

Reviewed-by: Sam Just <sjust@redhat.com>
Reviewed-by: Radosław Zarzyński <rzarzyns@redhat.com>
2 years agopybind/rados: add ConnectionShutdown exception class
Ramana Raja [Thu, 12 Jan 2023 02:53:16 +0000 (21:53 -0500)]
pybind/rados: add ConnectionShutdown exception class

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 years agomgr/rbd_support: notify the thread waiting on pending snapshot
Ramana Raja [Tue, 17 Jan 2023 03:04:08 +0000 (22:04 -0500)]
mgr/rbd_support: notify the thread waiting on pending snapshot

... requests to be completed.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 years agoMerge pull request #51381 from Matan-B/wip-matanb-c-blocklist-fix
Matan [Mon, 8 May 2023 16:48:28 +0000 (19:48 +0300)]
Merge pull request #51381 from Matan-B/wip-matanb-c-blocklist-fix

crimson/osd/osd_operations/client_request: Fix client blocklisting

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
2 years agoMerge pull request #43245 from thiagoarrais/docs-java-examples
Daniel Gryniewicz [Mon, 8 May 2023 15:47:15 +0000 (11:47 -0400)]
Merge pull request #43245 from thiagoarrais/docs-java-examples

[rgw]: Update AWS SDK in Java examples

2 years agodoc/dev/encoding.txt: update per std::optional 51394/head
Radoslaw Zarzynski [Mon, 8 May 2023 14:41:22 +0000 (14:41 +0000)]
doc/dev/encoding.txt: update per std::optional

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2 years agolibrbd: localize snap_remove op for mirror snapshots 51166/head
Christopher Hoffman [Wed, 19 Apr 2023 15:26:27 +0000 (15:26 +0000)]
librbd: localize snap_remove op for mirror snapshots

A client may attempt a lock request not quickly enough to
obtain exclusive lock for operations when another competing
client responds quicker. This can happen when a peer site has
different performance characteristics or latency. Instead of
relying on this unpredictable behavior, localize operation to
primary cluster.

Fixes: https://tracker.ceph.com/issues/59393
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
2 years agodoc: update multisite doc 51392/head
parth-gr [Mon, 8 May 2023 13:53:29 +0000 (19:23 +0530)]
doc: update multisite doc

cmd for getting zone group was spelled incorrectly
Updated to rdosgw-admin

Signed-off-by: parth-gr <paarora@redhat.com>
2 years agolibrbd : remove unused enum type WriteOpType 51365/head
N Balachandran [Mon, 8 May 2023 13:24:35 +0000 (18:54 +0530)]
librbd : remove unused enum type WriteOpType

This removes the unused enum WriteOpType from
the librbd deep_copy code.

Signed-off-by: N Balachandran <nibalach@redhat.com>
2 years agoMerge pull request #51387 from zdover23/wip-doc-2023-05-08-rados-operations-stretch...
zdover23 [Mon, 8 May 2023 12:48:30 +0000 (22:48 +1000)]
Merge pull request #51387 from zdover23/wip-doc-2023-05-08-rados-operations-stretch-mode-other-commands

doc/rados: stretch-mode.rst (other commands)

Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
2 years agodoc/rados: stretch-mode.rst (other commands) 51387/head
Zac Dover [Mon, 8 May 2023 11:08:49 +0000 (21:08 +1000)]
doc/rados: stretch-mode.rst (other commands)

Edit the "Other Commands" section of
doc/rados/operations/stretch-mode.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2 years agoqa/suites/crimson: Introduce rbd_python_api_tests.yaml 51388/head
Matan Breizman [Mon, 8 May 2023 10:53:00 +0000 (10:53 +0000)]
qa/suites/crimson: Introduce rbd_python_api_tests.yaml

Test python api with new image format.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2 years agoqa/suites/crimson: Skip unsupported tests (Crimson)
Matan Breizman [Mon, 8 May 2023 10:50:19 +0000 (10:50 +0000)]
qa/suites/crimson: Skip unsupported tests (Crimson)

Align with `rbd_api_tests` and skip deep_copy and breaklock tests
in Crimson.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2 years agoqa/: Override mClock profile to 'high_recovery_ops' for qa tests 49975/head
Sridhar Seshasayee [Sat, 29 Apr 2023 05:16:58 +0000 (10:46 +0530)]
qa/: Override mClock profile to 'high_recovery_ops' for qa tests

The qa tests are not client I/O centric and mostly focus on triggering
recovery/backfills and monitor them for completion within a finite amount
of time. The same holds true for scrub operations.

Therefore, an mClock profile that optimizes background operations is a
better fit for qa related tests. The osd_mclock_profile is therefore
globally overriden to 'high_recovery_ops' profile for the Rados suite as
it fits the requirement.

Also, many standalone tests expect recovery and scrub operations to
complete within a finite time. To ensure this, the osd_mclock_profile
options is set to 'high_recovery_ops' as part of the run_osd() function
in ceph-helpers.sh.

A subset of standalone tests explicitly used 'high_recovery_ops' profile.
Since the profile is now set as part of run_osd(), the earlier overrides
are redundant and therefore removed from the tests.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agodoc/: Modify mClock configuration documentation to reflect profile changes
Sridhar Seshasayee [Tue, 11 Apr 2023 17:57:05 +0000 (23:27 +0530)]
doc/: Modify mClock configuration documentation to reflect profile changes

Modify the relevant documentation to reflect:

- change in the default mClock profile to 'balanced'
- new allocations for ops across mClock profiles
- change in the osd_max_backfills limit
- miscellaneous changes related to warnings.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agocommon/options/osd.yaml.in: Change mclock max sequential bandwidth for SSDs
Sridhar Seshasayee [Tue, 11 Apr 2023 16:47:53 +0000 (22:17 +0530)]
common/options/osd.yaml.in: Change mclock max sequential bandwidth for SSDs

The osd_mclock_max_sequential_bandwidth_ssd is changed to 1200 MiB/s as
a reasonable middle ground considering the broad range of SSD capabilities.
This allows the mClock's cost model to extract the SSDs capability
depending on the cost of the IO being performed.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd/: Retain the default osd_max_backfills limit to 1 for mClock
Sridhar Seshasayee [Tue, 11 Apr 2023 16:30:11 +0000 (22:00 +0530)]
osd/: Retain the default osd_max_backfills limit to 1 for mClock

The earlier limit of 3 was still aggressive enough to have an impact on
the client and other competing operations. Retain the current default
for mClock. This can be modified if necessary after setting the
osd_mclock_override_recovery_settings option.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agocommon/options/osd.yaml.in: change mclock profile default to balanced
Samuel Just [Tue, 11 Apr 2023 15:15:38 +0000 (08:15 -0700)]
common/options/osd.yaml.in: change mclock profile default to balanced

Let's use the middle profile as the default.
Modify the standalone tests accordingly.

Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd/scheduler/mClockScheduler: avoid limits for recovery
Samuel Just [Tue, 11 Apr 2023 15:10:04 +0000 (08:10 -0700)]
osd/scheduler/mClockScheduler: avoid limits for recovery

Now that recovery operations are split between background_recovery and
background_best_effort, rebalance qos params to avoid penalizing
background_recovery while idle.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: add counters for ops delayed due to degraded|unreadable target
Samuel Just [Mon, 10 Apr 2023 21:18:49 +0000 (14:18 -0700)]
osd/: add counters for ops delayed due to degraded|unreadable target

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: add counters for queue latency for PGRecovery[Context]
Samuel Just [Thu, 6 Apr 2023 21:15:02 +0000 (14:15 -0700)]
osd/: add counters for queue latency for PGRecovery[Context]

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: add per-op latency averages for each recovery related message
Samuel Just [Thu, 6 Apr 2023 20:50:48 +0000 (20:50 +0000)]
osd/: add per-op latency averages for each recovery related message

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: differentiate priority for PGRecovery[Context]
Samuel Just [Thu, 6 Apr 2023 07:04:05 +0000 (00:04 -0700)]
osd/: differentiate priority for PGRecovery[Context]

PGs with degraded objects should be higher priority.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: add MSG_OSD_PG_(BACKFILL|BACKFILL_REMOVE|SCAN) as recovery messages
Samuel Just [Thu, 6 Apr 2023 05:57:48 +0000 (22:57 -0700)]
osd/: add MSG_OSD_PG_(BACKFILL|BACKFILL_REMOVE|SCAN) as recovery messages

Otherwise, these end up as PGOpItem and therefore as immediate:

class PGOpItem : public PGOpQueueable {
...
  op_scheduler_class get_scheduler_class() const final {
    auto type = op->get_req()->get_type();
    if (type == CEPH_MSG_OSD_OP ||
  type == CEPH_MSG_OSD_BACKOFF) {
      return op_scheduler_class::client;
    } else {
      return op_scheduler_class::immediate;
    }
  }
...
};

This was probably causing a bunch of extra interference with client
ops.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: differentiate scheduler class for undersized/degraded vs data movement
Samuel Just [Thu, 6 Apr 2023 05:57:42 +0000 (22:57 -0700)]
osd/: differentiate scheduler class for undersized/degraded vs data movement

Recovery operations on pgs/objects that have fewer than the configured
number of copies should be treated more urgently than operations on
pgs/objects that simply need to be moved to a new location.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/.../OpSchedulerItem: add MSG_OSD_PG_PULL to is_recovery_msg
Samuel Just [Thu, 6 Apr 2023 04:30:18 +0000 (04:30 +0000)]
osd/.../OpSchedulerItem: add MSG_OSD_PG_PULL to is_recovery_msg

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: move PGRecoveryMsg check from osd into PGRecoveryMsg::is_recovery_msg
Samuel Just [Thu, 6 Apr 2023 04:23:23 +0000 (04:23 +0000)]
osd/: move PGRecoveryMsg check from osd into PGRecoveryMsg::is_recovery_msg

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: move get_recovery_op_priority into PeeringState next to get_*_priority
Samuel Just [Thu, 6 Apr 2023 03:45:19 +0000 (03:45 +0000)]
osd/: move get_recovery_op_priority into PeeringState next to get_*_priority

Consolidate methods governing recovery scheduling in PeeringState.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/scheduler: simplify qos specific params in OpSchedulerItem
Samuel Just [Tue, 4 Apr 2023 23:34:17 +0000 (23:34 +0000)]
osd/scheduler: simplify qos specific params in OpSchedulerItem

is_qos_item() was only used in operator<< for OpSchedulerItem.  However,
it's actually useful to see priority for mclock items since it affects
whether it goes into the immediate queues and, for some types, the
class.  Unconditionally display both class_id and priority.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/scheduler: remove unused PGOpItem::maybe_get_mosd_op
Samuel Just [Tue, 4 Apr 2023 23:22:59 +0000 (23:22 +0000)]
osd/scheduler: remove unused PGOpItem::maybe_get_mosd_op

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/scheduler: remove OpQueueable::get_order_locker() and supporting machinery
Samuel Just [Tue, 4 Apr 2023 23:13:41 +0000 (23:13 +0000)]
osd/scheduler: remove OpQueueable::get_order_locker() and supporting machinery

Apparently unused.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/scheduler: remove OpQueueable::get_op_type() and supporting machinery
Samuel Just [Tue, 4 Apr 2023 23:05:56 +0000 (23:05 +0000)]
osd/scheduler: remove OpQueueable::get_op_type() and supporting machinery

Apparently unused.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoPeeringState::clamp_recovery_priority: use std::clamp
Samuel Just [Mon, 3 Apr 2023 20:31:46 +0000 (13:31 -0700)]
PeeringState::clamp_recovery_priority: use std::clamp

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agodoc: Modify mClock configuration documentation to reflect new cost model
Sridhar Seshasayee [Sat, 25 Mar 2023 07:14:40 +0000 (12:44 +0530)]
doc: Modify mClock configuration documentation to reflect new cost model

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd: Retain overridden mClock recovery settings across osd restarts
Sridhar Seshasayee [Tue, 21 Feb 2023 12:24:36 +0000 (17:54 +0530)]
osd: Retain overridden mClock recovery settings across osd restarts

Fix an issue where an overridden mClock recovery setting (set prior to
an osd restart) could be lost after an osd restart.

For e.g., consider that prior to an osd restart, the option
'osd_max_backfill' was successfully set to a value different from the
mClock default. If the osd was restarted for some reason, the
boot-up sequence was incorrectly resetting the backfill value to the
mclock default within the async local/remote reservers. This fix
ensures that no change is made if the current overriden value is
different from the mClock default.

Modify an existing standalone test to verify that the local and remote
async reservers are updated to the desired number of backfills under
normal conditions and also across osd restarts.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd: Set default max active recovery and backfill limits for mClock
Sridhar Seshasayee [Mon, 20 Mar 2023 12:29:17 +0000 (17:59 +0530)]
osd: Set default max active recovery and backfill limits for mClock

Client ops are sensitive to the recovery load and must be carefully
set for osds whose underlying device is HDD. Tests revealed that
recoveries with osd_max_backfills = 10 and osd_recovery_max_active_hdd = 5
were still aggressive and overwhelmed client ops. The built-in defaults
for mClock are now set to:

    1) osd_recovery_max_active_hdd = 3
    2) osd_recovery_max_active_ssd = 10
    3) osd_max_backfills = 3

The above may be modified if necessary by setting
osd_mclock_override_recovery_settings option.

Fixes: https://tracker.ceph.com/issues/58529
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd/scheduler/mClockScheduler: make is_rotational const
Samuel Just [Wed, 29 Mar 2023 06:29:58 +0000 (23:29 -0700)]
osd/scheduler/mClockScheduler: make is_rotational const

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/scheduler/mClockScheduler: simplify profile handling
Samuel Just [Wed, 29 Mar 2023 07:10:57 +0000 (00:10 -0700)]
osd/scheduler/mClockScheduler: simplify profile handling

Previously, setting default configs from the configured profile was
split across:
- enable_mclock_profile_settings
- set_mclock_profile - sets mclock_profile class member
- set_*_allocations - updates client_allocs class member
- set_profile_config - sets profile based on client_allocs class member

This made tracing the effect of changing the profile pretty challenging
due passing state through class member variables.

Instead, define a simple profile_t with three constexpr values
corresponding to the three profiles and handle it all in a single
set_config_defaults_from_profile() method.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd: Modify mClock scheduler's cost model to represent cost in bytes
Sridhar Seshasayee [Thu, 9 Feb 2023 15:17:44 +0000 (20:47 +0530)]
osd: Modify mClock scheduler's cost model to represent cost in bytes

The mClock scheduler's cost model for HDDs/SSDs is modified and now
represents the cost of an IO in terms of bytes.

The cost parameters, namely, osd_mclock_cost_per_io_usec_[hdd|ssd]
and osd_mclock_cost_per_byte_usec_[hdd|ssd] which represent the cost
of an IO in secs are inaccurate and therefore removed.

The new model considers the following aspects of an osd to calculate
the cost of an IO:

 - osd_mclock_max_capacity_iops_[hdd|ssd] (existing option)
   The measured random write IOPS at 4 KiB block size. This is
   measured during OSD boot-up using OSD bench tool.
 - osd_mclock_max_sequential_bandwidth_[hdd|ssd] (new config option)
   The maximum sequential bandwidth of of the underlying device.
   For HDDs, 150 MiB/s is considered, and for SSDs 750 MiB/s is
   considered in the cost calculation.

The following important changes are made to arrive at the overall
cost of an IO,

1. Represent QoS reservation and limit config parameter as proportion:
The reservation and limit parameters are now set in terms of a
proportion of the OSD's max IOPS capacity. The earlier representation
was in terms of IOPS per OSD shard which required the user to perform
calculations before setting the parameter. Representing the
reservation and limit in terms of proportions is much more intuitive
and simpler for a user.

2. Cost per IO Calculation:
Using the above config options, osd_bandwidth_cost_per_io for the osd is
calculated and set. It is the ratio of the max sequential bandwidth and
the max random write iops of the osd. It is a constant and represents the
base cost of an IO in terms of bytes. This is added to the actual size of
the IO(in bytes) to represent the overall cost of the IO operation.See
mClockScheduler::calc_scaled_cost().

3. Cost calculation in Bytes:
The settings for reservation and limit in terms a fraction of the OSD's
maximum IOPS capacity is converted to Bytes/sec before updating the
mClock server's ClientInfo structure. This is done for each OSD op shard
using osd_bandwidth_capacity_per_shard shown below:

    (res|lim)  = (IOPS proportion) * osd_bandwidth_capacity_per_shard
    (Bytes/sec)   (unitless)             (bytes/sec)

The above result is updated within the mClock server's ClientInfo
structure for different op_scheduler_class operations. See
mClockScheduler::ClientRegistry::update_from_config().

The overall cost of an IO operation (in secs) is finally determined
during the tag calculations performed in the mClock server. See
crimson::dmclock::RequestTag::tag_calc() for more details.

4. Profile Allocations:
Optimize mClock profile allocations due to the change in the cost model
and lower recovery cost.

5. Modify standalone tests to reflect the change in the QoS config
parameter representation of reservation and limit options.

Fixes: https://tracker.ceph.com/issues/58529
Fixes: https://tracker.ceph.com/issues/59080
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd: update PGRecovery queue item cost to reflect object size
Sridhar Seshasayee [Thu, 2 Feb 2023 10:00:26 +0000 (15:30 +0530)]
osd: update PGRecovery queue item cost to reflect object size

Previously, we used a static value of osd_recovery_cost (20M
by default) for PGRecovery. For pools with relatively small
objects, this causes mclock to backfill very very slowly as
20M massively overestimates the amount of IO each recovery
queue operation requires. Instead, add a cost_per_object
parameter to OSDService::awaiting_throttle and set it to the
average object size in the PG being queued.

Fixes: https://tracker.ceph.com/issues/58606
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd: update OSDService::queue_recovery_context to specify cost
Sridhar Seshasayee [Thu, 2 Feb 2023 08:12:39 +0000 (13:42 +0530)]
osd: update OSDService::queue_recovery_context to specify cost

Previously, we always queued this with cost osd_recovery_cost which
defaults to 20M. With mclock, this caused these items to be delayed
heavily. Instead, base the cost on the operation queued.

Fixes: https://tracker.ceph.com/issues/58606
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd/osd_types: use appropriate cost value for PullOp
Sridhar Seshasayee [Fri, 3 Feb 2023 05:36:06 +0000 (11:06 +0530)]
osd/osd_types: use appropriate cost value for PullOp

See included comments -- previous values did not account for object
size.  This causes problems for mclock which is much more strict
in how it interprets costs.

Fixes: https://tracker.ceph.com/issues/58607
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd/osd_types: use appropriate cost value for PushReplyOp
Sridhar Seshasayee [Wed, 25 Jan 2023 08:19:59 +0000 (13:49 +0530)]
osd/osd_types: use appropriate cost value for PushReplyOp

See included comments -- previous values did not account for object
size.  This causes problems for mclock which is much more strict
in how it interprets costs.

Fixes: https://tracker.ceph.com/issues/58529
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agotest/*rbd: Enable supported Crimson test
Matan Breizman [Wed, 3 May 2023 10:38:55 +0000 (10:38 +0000)]
test/*rbd: Enable supported Crimson test

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2 years agocrimson/osd/osd_operations/client_request: Fix client blocklisting 51381/head
Matan Breizman [Sun, 7 May 2023 13:29:22 +0000 (13:29 +0000)]
crimson/osd/osd_operations/client_request: Fix client blocklisting

See #50835.
In crimson, conn is independently maintained outside Message.
Therefore, when trying to use the message's connection for `get_peer_addr()`
we won't be able to get the peer address.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2 years agoMerge pull request #51322 from zdover23/wip-doc-2023-05-03-rados-operations-stretch...
zdover23 [Sun, 7 May 2023 06:19:11 +0000 (16:19 +1000)]
Merge pull request #51322 from zdover23/wip-doc-2023-05-03-rados-operations-stretch-mode-stretch-mode-issues

doc/rados: stretch-mode: stretch cluster issues

Reviewed-by: Greg Farnum <gfarnum@redhat.com>