Venky Shankar [Mon, 29 Jan 2024 13:12:36 +0000 (18:42 +0530)]
Merge PR #53734 into main
* refs/pull/53734/head:
qa: refactor client upgrade yamls and other minor touchups
qa/upgrade/nofs: upgrade pacific->reef
qa/upgrade/upgraded_client: upgrade nautilus->pacific and pacific->reef
Laura Flores [Mon, 29 Jan 2024 00:58:25 +0000 (00:58 +0000)]
mgr: pin pytest to version 7.4.4
On 2024-01-27, pytest updated to 8.0.0,
which broke run-tox-mgr.
https://docs.pytest.org/en/stable/changelog.html
==================================== ERRORS ====================================
_____________________ ERROR collecting alerts/__init__.py ______________________
alerts/__init__.py:2: in <module>
from .module import Alerts
alerts/module.py:6: in <module>
from mgr_module import CLIReadCommand, HandleCommandResult, MgrModule, Option
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
______________________ ERROR collecting alerts/module.py _______________________
alerts/module.py:6: in <module>
from mgr_module import CLIReadCommand, HandleCommandResult, MgrModule, Option
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
____________________ ERROR collecting balancer/__init__.py _____________________
balancer/__init__.py:2: in <module>
from .module import Module
balancer/module.py:12: in <module>
from mgr_module import CLIReadCommand, CLICommand, CommandResult, MgrModule, Option, OSDMap, CephReleases
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
_____________________ ERROR collecting balancer/module.py ______________________
balancer/module.py:12: in <module>
from mgr_module import CLIReadCommand, CLICommand, CommandResult, MgrModule, Option, OSDMap, CephReleases
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
Fixes: https://tracker.ceph.com/issues/64200 Signed-off-by: Laura Flores <lflores@ibm.com>
Laura Flores [Fri, 26 Jan 2024 17:32:43 +0000 (17:32 +0000)]
osd: clear out unneeded pending pg-upmap-primary mappings
If the score did not improve, we should clear out any
pending pg-upmap-primary mappings so they don't execute
in situations where the same incremental is used to balance
multiple pools (i.e. in the balancer mgr module).
Laura Flores [Tue, 2 Jan 2024 21:28:03 +0000 (21:28 +0000)]
mgr/balancer: add pg_upmap_primaries to `balancer status detail`
Followup to https://github.com/ceph/ceph/pull/54801/commits/8a5553597ca6a428cb8ffc9fc5bebde048fbd068.
Streamlines some of the logic so pg upmap activity is properly
initalized, and updated in offline mode as well as online.
Laura Flores [Thu, 18 Jan 2024 18:57:24 +0000 (18:57 +0000)]
mgr: add read balancer support inside the balancer module
Read balancing may now be managed automatically via the balancer
manager module. Users may choose between two new modes: ``upmap-read``, which
offers upmap and read optimization simultaneously, or ``read``, which may be used
to only optimize reads. Existing balancer commands have also been added to
contain more information about read balancing.
Run the following commands to test the new automatic behavior:
`ceph balancer on` (on by default)
`ceph balancer mode <read|upmap-read>`
`ceph balancer status`
Run the following commands to test the new supervised behavior:
`ceph balancer off`
`ceph balancer mode <read|upmap-read>`
`ceph balancer eval` | `ceph balancer eval <pool-name>`
`ceph balancer eval-verbose` | `ceph balancer eval-verbose <pool-name>`
`ceph balancer optimize <plan-name>`
`ceph balancer show <plan-name>`
`ceph balancer eval <plan-name>`
`ceph balancer execute <plan-name>`
In the balancer module, there is also a new "self_test" function which tests
the module's basic functionality. This test can be triggered with the following
commands:
`ceph mgr module enable selftest`
`ceph mgr self-test module balancer`
Related Trello: https://trello.com/c/sWoKctzL/859-add-read-balancer-support-inside-the-balancer-module Signed-off-by: Laura Flores <lflores@ibm.com>
Ronen Friedman [Fri, 5 Jan 2024 15:07:19 +0000 (09:07 -0600)]
osd/scrub: add "queue my request" flag to replica reservation messages
Up-to-date primaries will set this flag when sending a reservation
request. The replica OSD, if too busy to handle the request immediately, will queue
it until such time that the number of concurrent reservations is below the
configured limit. The queued requests are honored in FIFO order.
Old primaries will not set this flag, and will receive the expected
grant or deny reply immediately.
ceph-volume: fix partitions support in disk.get_devices()
The following:
```
is_part = get_file_contents(os.path.join(_sys_dev_block_path, item, 'partition')) == "1"
```
assumes any `/sys/dev/block/x:y/partition` contains '1' which is wrong.
This file actually contains the corresponding partition number.
ceph-volume: use 'no workqueue' options with dmcrypt
CloudFlare engineers made some testing and realized that using
workqueues with encryption on flash devices has a bad effect.
See [1] for details.
With this patch it will make ceph-volume call crypsetup with
`--perf-no_read_workqueue` and `--perf-no_write_workqueue` options
when the device is not a rotational.
Ramana Raja [Wed, 17 Jan 2024 18:24:36 +0000 (13:24 -0500)]
rbd-nbd: map using netlink interface by default
Mapping rbd images to nbd devices using ioctl interface is not
robust. It was discovered that the device size or the md5 checksum
of the nbd device was incorrect immediately after mapping using
ioctl method. When using the nbd netlink interface to map RBD images
the issue was not encountered. Switch to using nbd netlink interface
for mapping.
Fixes: https://tracker.ceph.com/issues/64063 Signed-off-by: Ramana Raja <rraja@redhat.com>
Afreen [Thu, 25 Jan 2024 11:21:06 +0000 (16:51 +0530)]
mgr/dashboard: Code refactor rgw migrate component for using correctly the MIGRATE action verb
fixes https://tracker.ceph.com/issues/64152
this.MIGRATE = $localize`Migrate to Multi-Site`;
Just like other action verbs we should set this.Migrate = "MIGRATE" only.
This will require rephrasing in the following places as well:
1. https://github.com/ceph/ceph/blob/d3256c484136a1b32b79a904861f681a9248ba3c/src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-multisite-details/rgw-multisite-details.component.ts#L223-L228
Just because this is what Ceph's config uses and it saves a narrowing
conversion. If we want to set a max value on the thread count, we
should do it in config.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Ramana Raja [Tue, 23 Jan 2024 21:07:04 +0000 (16:07 -0500)]
rbd-nbd: log errors during netlink_resize() using derr
When using rbd CLI to map the images to NBD devices via netlink,
any errors that arose during image resizing in netlink_resize()
were not logged. Switching the error logging from using cerr to
derr helps log the errors from netlink_resize().
Ramana Raja [Mon, 22 Jan 2024 22:06:58 +0000 (17:06 -0500)]
rbd_nbd: fix resize of images mapped using netlink
Include device identifier or cookie in the message sent to the kernel
to resize images mapped to NBD devices using netlink. Otherwise,
netlink_resize() fails and the size of the device isn't updated.
Fixes: https://tracker.ceph.com/issues/64139 Signed-off-by: Ramana Raja <rraja@redhat.com>
While submitting the log line asyncronously is reasonable,
with this implementation the EntryVector &q parameter does
not necessarily outlive the submission continuation.