]>
git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log
Dhairya Parmar [Fri, 21 Apr 2023 10:52:02 +0000 (16:22 +0530)]
qa: ignore warnings
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
Dhairya Parmar [Wed, 12 Apr 2023 10:52:49 +0000 (16:22 +0530)]
qa: add test cases to check client eviction if an OSD is laggy
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
Dhairya Parmar [Thu, 2 Mar 2023 13:07:14 +0000 (18:37 +0530)]
mds,messages: enable beacon to report clients lagginess
using new MDS health metric
Fixes: https://tracker.ceph.com/issues/58023
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
Dhairya Parmar [Tue, 21 Mar 2023 12:02:37 +0000 (17:32 +0530)]
mds: do not evict client on laggy osds
A client might get unresponsive/laggy due to laggy OSD(s).
This change provides us a way to defer client eviction in
such scenarios
also adds helpers:
- get_laggy_clients()
- clear_laggy_clients()
and call clear_laggy_clients() before calling related
Server methods
Fixes: https://tracker.ceph.com/issues/58023
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
Dhairya Parmar [Mon, 27 Mar 2023 09:27:44 +0000 (14:57 +0530)]
common: add new config option to defer client eviction
Fixes: https://tracker.ceph.com/issues/58023
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
dparmar18 [Thu, 16 Feb 2023 09:48:42 +0000 (15:18 +0530)]
osd: add method to check for laggy osds
Fixes: https://tracker.ceph.com/issues/58023
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
zdover23 [Mon, 15 May 2023 10:18:30 +0000 (20:18 +1000)]
Merge pull request #51473 from zdover23/wip-doc-2023-05-15-rados-operations-devices
doc/rados: edit devices.rst
Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com
Zac Dover [Mon, 15 May 2023 01:01:19 +0000 (11:01 +1000)]
doc/rados: edit devices.rst
Line-edit doc/rados/operations/devices.rst.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
Venky Shankar [Mon, 15 May 2023 06:56:46 +0000 (12:26 +0530)]
Merge PR #51386 into main
* refs/pull/51386/head:
qa: ignore cluster warning when fs flag refuse_client_session is set
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Aashish Sharma [Mon, 15 May 2023 06:38:12 +0000 (12:08 +0530)]
Merge pull request #50806 from rhcs-dashboard/dashboard-multisite-migrate
Dashboard multisite migrate
Reviewed-by: Nizamudeen A <nia@redhat.com>
Aashish Sharma [Thu, 20 Apr 2023 05:22:41 +0000 (10:52 +0530)]
mgr/dashboard: Migrate from single site to multi-site
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
zdover23 [Sat, 13 May 2023 02:41:36 +0000 (12:41 +1000)]
Merge pull request #51463 from zdover23/wip-doc-2023-05-13-fs-volumes-1-of-x
doc/cephfs: edit fs-volumes.rst (1 of x)
Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Zac Dover [Fri, 12 May 2023 15:49:14 +0000 (01:49 +1000)]
doc/cephfs: edit fs-volumes.rst (1 of x)
Edit the syntax of the English language in the file
doc/cephfs/fs-volumes.rst up to (but not including) the section called
"FS Subvolumes".
Signed-off-by: Zac Dover <zac.dover@proton.me>
zdover23 [Fri, 12 May 2023 12:42:23 +0000 (22:42 +1000)]
Merge pull request #51458 from zdover23/wip-doc-2023-05-12-cephfs-fs-volumes-prompt-rectification
doc/cephfs: rectify prompts in fs-volumes.rst
Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Zac Dover [Fri, 12 May 2023 10:35:25 +0000 (20:35 +1000)]
doc/cephfs: rectify prompts in fs-volumes.rst
Make sure all prompts are unselectable. This PR is meant to be
backported to Reef, Quincy, and Pacific, to get all of the prompts into
a fit state so that a line-edit can be performed on the Englsh language
in this file.
Follows https://github.com/ceph/ceph/pull/51427.
Signed-off-by: Zac Dover <zac.dover@proton.me>
Samuel Just [Thu, 11 May 2023 18:35:20 +0000 (11:35 -0700)]
Merge pull request #51448 from Matan-B/wip-matanb-crimson-only-mclock-boot
crimson/osd/scheduler/mclock_scheduler: Fix OSD unable to start
Reviewed-by: Samuel Just <sjust@redhat.com>
Matan [Thu, 11 May 2023 14:28:55 +0000 (16:28 +0200)]
Merge pull request #51388 from Matan-B/wip-matanb-c-enable-rbd-tests
qa/suites/crimson: Enhance rbd api testing
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Radosław Zarzyński <rzarzyns@redhat.com>
Matan Breizman [Thu, 11 May 2023 14:18:46 +0000 (14:18 +0000)]
crimson/osd/scheduler/mclock_scheduler: Fix OSD unable to start
https://github.com/ceph/ceph/pull/49975 Introduced changes to
mclock conf value types which caused the osd to stall while booting.
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Venky Shankar [Thu, 11 May 2023 05:51:14 +0000 (11:21 +0530)]
Merge PR #51251 into main
* refs/pull/51251/head:
PendingReleaseNotes: add a note about deleting files from lost+found directory
qa: add checks that validate removal of entries from lost+found dir
mds: allow unlink operation under lost+found directory
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Venky Shankar [Thu, 11 May 2023 05:49:13 +0000 (11:19 +0530)]
Merge PR #51201 into main
* refs/pull/51201/head:
qa: run scrub post file system recovery
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>
Venky Shankar [Thu, 11 May 2023 03:55:50 +0000 (09:25 +0530)]
Merge PR #51188 into main
* refs/pull/51188/head:
client: use deep-copy when setting permission during make_request
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Ken Dreyer [Wed, 10 May 2023 17:18:22 +0000 (13:18 -0400)]
Merge pull request #51423 from bigjust/replace-go-example-mods
examples: replace example go modules with instructions to run
Reviewed-by: Yuval Lifshitz <ylifshit@redhat.com>
Reviewed-by: Ken Dreyer <kdreyer@redhat.com>
zdover23 [Wed, 10 May 2023 15:30:44 +0000 (01:30 +1000)]
Merge pull request #51427 from zdover23/wip-doc-2023-05-10-cephfs-fs-volumes-prompt-fix
doc/cephfs: fix prompts in fs-volumes.rst
Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
Zac Dover [Wed, 10 May 2023 14:52:50 +0000 (00:52 +1000)]
doc/cephfs: fix prompts in fs-volumes.rst
Fixed a regression introduced in
e5355e3d66e1438d51de6b57eae79fab47cd0184 that broke the unselectable
prompts in the RST.
Signed-off-by: Zac Dover <zac.dover@proton.me>
Casey Bodley [Wed, 10 May 2023 12:56:37 +0000 (08:56 -0400)]
Merge pull request #51345 from cbodley/wip-59639
rgw/dbstore: allow NULL RealmIDs in sqlite schema
Reviewed-by: Soumya Koduri <skoduri@redhat.com>
Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
Justin Caratzas [Tue, 18 Apr 2023 16:35:37 +0000 (12:35 -0400)]
examples: replace example go modules with instructions to run
Signed-off-by: Justin Caratzas <jcaratza@ibm.com>
Ali Masarwa [Wed, 10 May 2023 12:10:12 +0000 (15:10 +0300)]
Merge pull request #50627 from AliMasarweh/wip-ali-masa-multipart-populate-etag
RGW: Solving the issue of not populating etag in Multipart upload result
Reviewed-by: Daniel Gryniewicz <dang1@ibm.com>
Ilya Dryomov [Wed, 10 May 2023 09:55:42 +0000 (11:55 +0200)]
Merge pull request #49742 from ajarr/fix-56724
mgr/rbd_support: recover from rados client blocklisting
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Wed, 10 May 2023 09:53:16 +0000 (11:53 +0200)]
Merge pull request #51166 from chrisphoffman/wip-rbd-59393
librbd: localize snap_remove op for mirror snapshots
Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Venky Shankar [Wed, 10 May 2023 08:34:58 +0000 (14:04 +0530)]
Merge PR #43184 into main
* refs/pull/43184/head:
qa: fix journal flush failure issue due to the MDS daemon crashes
qa: add test support for the alloc ino failing
mds: do not take the ino which has been used
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Venky Shankar [Mon, 24 Apr 2023 04:54:55 +0000 (00:54 -0400)]
qa: run scrub post file system recovery
Running file system scrub is recommended post running filesystem
data and metadata recovery. Running scrub isn't covered in tests.
Fixes: http://tracker.ceph.com/issues/59527
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Liu-Chunmei [Tue, 9 May 2023 23:04:47 +0000 (16:04 -0700)]
Merge pull request #51167 from liu-chunmei/teuthology-multicore
crimson/qa: make crimson run multicore in teuthology test
Reviewed-by: Samuel Just <sjust@redhat.com>
Laura Flores [Tue, 9 May 2023 22:04:38 +0000 (17:04 -0500)]
Merge pull request #51301 from ceph/wip-yuriw-release-16.2.13-main
doc: 16.2.13 Release Notes
Yuri Weinstein [Mon, 1 May 2023 20:09:47 +0000 (13:09 -0700)]
doc: 16.2.13 Release Notes
Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
Signed-off-by: Laura Flores <lflores@redhat.com>
Samuel Just [Tue, 9 May 2023 18:55:41 +0000 (11:55 -0700)]
Merge pull request #50411 from xxhdx1985126/wip-58928
crimson/osd: start operations asynchrously
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Reviewed-by: Matan Breizman <mbreizma@redhat.com>
zdover23 [Tue, 9 May 2023 14:50:01 +0000 (00:50 +1000)]
Merge pull request #51403 from zdover23/wip-doc-2023-05-09-start-get-involved-planet-ceph
doc/start: fix "Planet Ceph" link
Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
chunmei [Thu, 20 Apr 2023 22:09:34 +0000 (22:09 +0000)]
crimson/qa: enable multicore for crimson in teuthology test
Signed-off-by: chunmei <chunmei.liu@intel.com>
Yingxin [Tue, 9 May 2023 08:37:41 +0000 (16:37 +0800)]
Merge pull request #47749 from xxhdx1985126/wip-intra-fixedkvbtree-pointers-2
crimson/os/seastore/btree: link fixedkvbtree's nodes and logical extents with forward and backward pointers, and drop the pin_set
Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
Yuval Lifshitz [Tue, 9 May 2023 07:34:36 +0000 (10:34 +0300)]
Merge pull request #51308 from jzhu116-bloomberg/wip-59592
rgw/notification: remove non x-amz-meta-* attributes from bucket notifications
Xuehan Xu [Mon, 8 May 2023 08:15:55 +0000 (08:15 +0000)]
crimson/tools/store_nbd: read logical extents via
TransactionManager::read_pin()
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Thu, 23 Mar 2023 09:59:12 +0000 (09:59 +0000)]
crimson/os/seastore/cache: add comment about backref_extent_entry_t
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Sat, 11 Mar 2023 03:46:14 +0000 (03:46 +0000)]
test/crimson/seastore: complement lba test with logical extents
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Mon, 29 Aug 2022 08:12:00 +0000 (16:12 +0800)]
test/crimson/seastore: check intra-fixedkv-btree parent->child trackers during unittests
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Mon, 27 Mar 2023 02:20:59 +0000 (02:20 +0000)]
crimson/os/seastore/btree: drop btree_pin_set_t
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Sat, 6 May 2023 09:26:18 +0000 (17:26 +0800)]
crimson/os/seastore/transaction_manager: follow leaf<->logical extent pointers to read extent
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Tue, 25 Oct 2022 06:03:43 +0000 (14:03 +0800)]
crimson/os/seastore/lba_manager: link lba leaf nodes with logical extents by pointers
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Thu, 27 Oct 2022 07:21:32 +0000 (15:21 +0800)]
crimson/os/seastore/btree: "templatize" btree leaf node to distinguish leaf nodes with(out) children
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Thu, 20 Oct 2022 09:41:25 +0000 (17:41 +0800)]
crimson/os/seastore/btree: link fixed-kv-btree and root_block with pointers
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Thu, 20 Oct 2022 05:35:08 +0000 (13:35 +0800)]
crimson/os/seastore: more debug logs
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Wed, 17 Aug 2022 10:07:42 +0000 (18:07 +0800)]
crimson/os/seastore/backref_manager: retrieve live backref extents throught the backref tree
After involving intra-fixed-kv-btree parent-child pointers, we need to keep the
invariant that it's only when extents are not in transactions' read_set that
we can directly query cache with inspecting the transaction
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Thu, 13 Oct 2022 06:27:34 +0000 (14:27 +0800)]
crimson/os/seastore/btree: avoid searching transactions' read_set when retrieving btree nodes
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Thu, 13 Oct 2022 03:50:17 +0000 (11:50 +0800)]
crimson/os/seastore/btree: search fixed-kv-btree by parent<->child pointers
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Thu, 13 Oct 2022 02:57:09 +0000 (10:57 +0800)]
crimson/os/seastore/cache: invalidate out-dated extent when initiating Cache
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Wed, 12 Oct 2022 06:37:39 +0000 (14:37 +0800)]
crimson/os/seastore/cached_extent: improve the representation of "has_been_invalidated"
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Tue, 31 Jan 2023 06:36:42 +0000 (14:36 +0800)]
crimson/os/seastore/btree: don't go to leaf nodes when updating internal mappings
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Xuehan Xu [Tue, 11 Oct 2022 02:34:16 +0000 (10:34 +0800)]
crimson/os/seastore/btree: introduce parent<->child pointers for fixed-kv-btree nodes
maintain correct parent<->child pointers when modifying the btree
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Zac Dover [Tue, 9 May 2023 03:39:10 +0000 (13:39 +1000)]
doc/start: fix "Planet Ceph" link
Fix a link to Planet Ceph on the doc/start/get-involved.rst page.
Reported 2023 Apr 21, here:
https://pad.ceph.com/p/Report_Documentation_Bugs
Signed-off-by: Zac Dover <zac.dover@proton.me>
Yingxin [Tue, 9 May 2023 03:29:54 +0000 (11:29 +0800)]
Merge pull request #51355 from aravind-wdc/wip-crimson-zbd
crimson/os/seastore: enable SMR HDD
Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
zdover23 [Tue, 9 May 2023 02:37:40 +0000 (12:37 +1000)]
Merge pull request #51392 from parth-gr/rgw-mutisite-ceph-doc
doc: update multisite doc
Reviewed-by: Jiffin Tony Thottan <jthottan@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Zac Dover <zac.dover@proton.me>
Kamoltat Sirivadhna [Tue, 9 May 2023 01:04:59 +0000 (21:04 -0400)]
Merge pull request #50857 from kamoltat/wip-ksirivad-iswriteable
mon/Monitor.cc: exit function if !osdmon()->is_writeable()
Reviewd-by: Gregory Farnum <gfarnum@redhat.com>
zdover23 [Tue, 9 May 2023 00:53:06 +0000 (10:53 +1000)]
Merge pull request #51394 from rzarzynski/wip-doc-encode-stdoptional
doc/dev/encoding.txt: update per std::optional
Reviewed-by: Zac Dover <zac.dover@proton.me>
Ramana Raja [Sun, 5 Feb 2023 03:36:16 +0000 (22:36 -0500)]
qa/workunits/rbd: Add tests for rbd_support module recovery
... after the module's RADOS client is blocklisted.
Signed-off-by: Ramana Raja <rraja@redhat.com>
Ramana Raja [Wed, 15 Feb 2023 15:12:54 +0000 (10:12 -0500)]
mgr/rbd_support: recover from rados client blocklisting
In certain scenarios the OSDs were slow to process RBD requests.
This lead to the rbd_support module's RBD client not being able to
gracefully handover a RBD exclusive lock to another RBD client.
After the condition persisted for some time, the other RBD client
forcefully acquired the lock by blocklisting the rbd_support module's
RBD client, and consequently blocklisted the module's RADOS client. The
rbd_support module stopped working. To recover the module, the entire
mgr service had to be restarted which reloaded other mgr modules.
Instead of recovering the rbd_support module from client blocklisting
by being disruptive to other mgr modules, recover the module
automatically without restarting the mgr serivce. On client getting
blocklisted, shutdown the module's handlers and blocklisted client,
create a new rados client for the module, and start the new handlers.
Fixes: https://tracker.ceph.com/issues/56724
Signed-off-by: Ramana Raja <rraja@redhat.com>
Ilya Dryomov [Mon, 8 May 2023 19:24:28 +0000 (21:24 +0200)]
Merge pull request #51365 from nbalacha/fix-remove-unused-type
librbd: remove unused enum WriteOpType
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Radoslaw Zarzynski [Mon, 8 May 2023 18:22:11 +0000 (20:22 +0200)]
Merge pull request #49975 from sseshasa/wip-fix-mclk-rec-backfill-cost
osd: mClock recovery/backfill cost fixes
Reviewed-by: Sam Just <sjust@redhat.com>
Reviewed-by: Radosław Zarzyński <rzarzyns@redhat.com>
Ramana Raja [Thu, 12 Jan 2023 02:53:16 +0000 (21:53 -0500)]
pybind/rados: add ConnectionShutdown exception class
Signed-off-by: Ramana Raja <rraja@redhat.com>
Ramana Raja [Tue, 17 Jan 2023 03:04:08 +0000 (22:04 -0500)]
mgr/rbd_support: notify the thread waiting on pending snapshot
... requests to be completed.
Signed-off-by: Ramana Raja <rraja@redhat.com>
Matan [Mon, 8 May 2023 16:48:28 +0000 (19:48 +0300)]
Merge pull request #51381 from Matan-B/wip-matanb-c-blocklist-fix
crimson/osd/osd_operations/client_request: Fix client blocklisting
Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Daniel Gryniewicz [Mon, 8 May 2023 15:47:15 +0000 (11:47 -0400)]
Merge pull request #43245 from thiagoarrais/docs-java-examples
[rgw]: Update AWS SDK in Java examples
Radoslaw Zarzynski [Mon, 8 May 2023 14:41:22 +0000 (14:41 +0000)]
doc/dev/encoding.txt: update per std::optional
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Christopher Hoffman [Wed, 19 Apr 2023 15:26:27 +0000 (15:26 +0000)]
librbd: localize snap_remove op for mirror snapshots
A client may attempt a lock request not quickly enough to
obtain exclusive lock for operations when another competing
client responds quicker. This can happen when a peer site has
different performance characteristics or latency. Instead of
relying on this unpredictable behavior, localize operation to
primary cluster.
Fixes: https://tracker.ceph.com/issues/59393
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
parth-gr [Mon, 8 May 2023 13:53:29 +0000 (19:23 +0530)]
doc: update multisite doc
cmd for getting zone group was spelled incorrectly
Updated to rdosgw-admin
Signed-off-by: parth-gr <paarora@redhat.com>
N Balachandran [Mon, 8 May 2023 13:24:35 +0000 (18:54 +0530)]
librbd : remove unused enum type WriteOpType
This removes the unused enum WriteOpType from
the librbd deep_copy code.
Signed-off-by: N Balachandran <nibalach@redhat.com>
Dhairya Parmar [Mon, 8 May 2023 08:50:28 +0000 (14:20 +0530)]
qa: ignore cluster warning when fs flag refuse_client_session is set
Fixes: https://tracker.ceph.com/issues/59667
Introduced-by: https://github.com/ceph/ceph/pull/48720
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
zdover23 [Mon, 8 May 2023 12:48:30 +0000 (22:48 +1000)]
Merge pull request #51387 from zdover23/wip-doc-2023-05-08-rados-operations-stretch-mode-other-commands
doc/rados: stretch-mode.rst (other commands)
Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
Zac Dover [Mon, 8 May 2023 11:08:49 +0000 (21:08 +1000)]
doc/rados: stretch-mode.rst (other commands)
Edit the "Other Commands" section of
doc/rados/operations/stretch-mode.rst.
Signed-off-by: Zac Dover <zac.dover@proton.me>
Matan Breizman [Mon, 8 May 2023 10:53:00 +0000 (10:53 +0000)]
qa/suites/crimson: Introduce rbd_python_api_tests.yaml
Test python api with new image format.
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Matan Breizman [Mon, 8 May 2023 10:50:19 +0000 (10:50 +0000)]
qa/suites/crimson: Skip unsupported tests (Crimson)
Align with `rbd_api_tests` and skip deep_copy and breaklock tests
in Crimson.
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Sridhar Seshasayee [Sat, 29 Apr 2023 05:16:58 +0000 (10:46 +0530)]
qa/: Override mClock profile to 'high_recovery_ops' for qa tests
The qa tests are not client I/O centric and mostly focus on triggering
recovery/backfills and monitor them for completion within a finite amount
of time. The same holds true for scrub operations.
Therefore, an mClock profile that optimizes background operations is a
better fit for qa related tests. The osd_mclock_profile is therefore
globally overriden to 'high_recovery_ops' profile for the Rados suite as
it fits the requirement.
Also, many standalone tests expect recovery and scrub operations to
complete within a finite time. To ensure this, the osd_mclock_profile
options is set to 'high_recovery_ops' as part of the run_osd() function
in ceph-helpers.sh.
A subset of standalone tests explicitly used 'high_recovery_ops' profile.
Since the profile is now set as part of run_osd(), the earlier overrides
are redundant and therefore removed from the tests.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Sridhar Seshasayee [Tue, 11 Apr 2023 17:57:05 +0000 (23:27 +0530)]
doc/: Modify mClock configuration documentation to reflect profile changes
Modify the relevant documentation to reflect:
- change in the default mClock profile to 'balanced'
- new allocations for ops across mClock profiles
- change in the osd_max_backfills limit
- miscellaneous changes related to warnings.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Sridhar Seshasayee [Tue, 11 Apr 2023 16:47:53 +0000 (22:17 +0530)]
common/options/osd.yaml.in: Change mclock max sequential bandwidth for SSDs
The osd_mclock_max_sequential_bandwidth_ssd is changed to 1200 MiB/s as
a reasonable middle ground considering the broad range of SSD capabilities.
This allows the mClock's cost model to extract the SSDs capability
depending on the cost of the IO being performed.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Sridhar Seshasayee [Tue, 11 Apr 2023 16:30:11 +0000 (22:00 +0530)]
osd/: Retain the default osd_max_backfills limit to 1 for mClock
The earlier limit of 3 was still aggressive enough to have an impact on
the client and other competing operations. Retain the current default
for mClock. This can be modified if necessary after setting the
osd_mclock_override_recovery_settings option.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Samuel Just [Tue, 11 Apr 2023 15:15:38 +0000 (08:15 -0700)]
common/options/osd.yaml.in: change mclock profile default to balanced
Let's use the middle profile as the default.
Modify the standalone tests accordingly.
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Samuel Just [Tue, 11 Apr 2023 15:10:04 +0000 (08:10 -0700)]
osd/scheduler/mClockScheduler: avoid limits for recovery
Now that recovery operations are split between background_recovery and
background_best_effort, rebalance qos params to avoid penalizing
background_recovery while idle.
Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Mon, 10 Apr 2023 21:18:49 +0000 (14:18 -0700)]
osd/: add counters for ops delayed due to degraded|unreadable target
Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Thu, 6 Apr 2023 21:15:02 +0000 (14:15 -0700)]
osd/: add counters for queue latency for PGRecovery[Context]
Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Thu, 6 Apr 2023 20:50:48 +0000 (20:50 +0000)]
osd/: add per-op latency averages for each recovery related message
Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Thu, 6 Apr 2023 07:04:05 +0000 (00:04 -0700)]
osd/: differentiate priority for PGRecovery[Context]
PGs with degraded objects should be higher priority.
Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Thu, 6 Apr 2023 05:57:48 +0000 (22:57 -0700)]
osd/: add MSG_OSD_PG_(BACKFILL|BACKFILL_REMOVE|SCAN) as recovery messages
Otherwise, these end up as PGOpItem and therefore as immediate:
class PGOpItem : public PGOpQueueable {
...
op_scheduler_class get_scheduler_class() const final {
auto type = op->get_req()->get_type();
if (type == CEPH_MSG_OSD_OP ||
type == CEPH_MSG_OSD_BACKOFF) {
return op_scheduler_class::client;
} else {
return op_scheduler_class::immediate;
}
}
...
};
This was probably causing a bunch of extra interference with client
ops.
Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Thu, 6 Apr 2023 05:57:42 +0000 (22:57 -0700)]
osd/: differentiate scheduler class for undersized/degraded vs data movement
Recovery operations on pgs/objects that have fewer than the configured
number of copies should be treated more urgently than operations on
pgs/objects that simply need to be moved to a new location.
Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Thu, 6 Apr 2023 04:30:18 +0000 (04:30 +0000)]
osd/.../OpSchedulerItem: add MSG_OSD_PG_PULL to is_recovery_msg
Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Thu, 6 Apr 2023 04:23:23 +0000 (04:23 +0000)]
osd/: move PGRecoveryMsg check from osd into PGRecoveryMsg::is_recovery_msg
Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Thu, 6 Apr 2023 03:45:19 +0000 (03:45 +0000)]
osd/: move get_recovery_op_priority into PeeringState next to get_*_priority
Consolidate methods governing recovery scheduling in PeeringState.
Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Tue, 4 Apr 2023 23:34:17 +0000 (23:34 +0000)]
osd/scheduler: simplify qos specific params in OpSchedulerItem
is_qos_item() was only used in operator<< for OpSchedulerItem. However,
it's actually useful to see priority for mclock items since it affects
whether it goes into the immediate queues and, for some types, the
class. Unconditionally display both class_id and priority.
Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Tue, 4 Apr 2023 23:22:59 +0000 (23:22 +0000)]
osd/scheduler: remove unused PGOpItem::maybe_get_mosd_op
Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Tue, 4 Apr 2023 23:13:41 +0000 (23:13 +0000)]
osd/scheduler: remove OpQueueable::get_order_locker() and supporting machinery
Apparently unused.
Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Tue, 4 Apr 2023 23:05:56 +0000 (23:05 +0000)]
osd/scheduler: remove OpQueueable::get_op_type() and supporting machinery
Apparently unused.
Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Mon, 3 Apr 2023 20:31:46 +0000 (13:31 -0700)]
PeeringState::clamp_recovery_priority: use std::clamp
Signed-off-by: Samuel Just <sjust@redhat.com>
Sridhar Seshasayee [Sat, 25 Mar 2023 07:14:40 +0000 (12:44 +0530)]
doc: Modify mClock configuration documentation to reflect new cost model
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Sridhar Seshasayee [Tue, 21 Feb 2023 12:24:36 +0000 (17:54 +0530)]
osd: Retain overridden mClock recovery settings across osd restarts
Fix an issue where an overridden mClock recovery setting (set prior to
an osd restart) could be lost after an osd restart.
For e.g., consider that prior to an osd restart, the option
'osd_max_backfill' was successfully set to a value different from the
mClock default. If the osd was restarted for some reason, the
boot-up sequence was incorrectly resetting the backfill value to the
mclock default within the async local/remote reservers. This fix
ensures that no change is made if the current overriden value is
different from the mClock default.
Modify an existing standalone test to verify that the local and remote
async reservers are updated to the desired number of backfills under
normal conditions and also across osd restarts.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>