git.apps.os.sepia.ceph.com Git

mgr/ActivePyModules: Add metadata id in dump_server()

The `DaemonStateCollection` used to always contain the daemon name in its
`DaemonKey`, but since #40220 (or more specifically
afc33758e076761b8d4ec004c8f9c49b80a48770), the RadosGW registers with its
instance ID instead (`rados.get_instance_id()`).

As a result, the `ceph_rgw_*` metrics returned by `ceph-mgr` through the
`prometheus` module have their `ceph_daemon` label include that ID instead of
the daemon name, e.g.

```
ceph_rgw_req{ceph_daemon="rgw.127202"}
```

instead of

```
ceph_rgw_req{ceph_daemon="rgw.my-hostname.rgw0"}
```

This commit adds the daemon name from `state->metadata["id"]` if available, as
`service.name` in the JSON document returned by `dump_server()`.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 2db1aaabe5f4627bb7b177ab3441593f08aa7cbe)

Merge pull request #44943 from ljflores/wip-54203-quincy

quincy: monitoring: mention PyYAML only once in requirements

monitoring: mention PyYAML only once in requirements

Following error occurs while running "sudo install-deps.sh" -
ERROR: Double requirement given: PyYAML==6.0 (from -r requirements-lint.txt (line 5)) (already in pyyaml (from -r requirements-alerts.txt (line 1)), name='PyYAML')

PyYAML is mentioned twice as a requirement. It is mentioned once in both
the following files -
monitoring/ceph-mixin/requirements-lint.txt
monitoring/ceph-mixin/requirements-alerts.txt

These requirements were added in commits
44d3e4c264506154373ffaeb13d6c924c580e6b5 and
4750ac0d7766a8a089adf073415af0ac0d3f81d9.

Fixes: https://tracker.ceph.com/issues/54185
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit a6f5efb620c429f81ea13992c2f77b4ca55458bc)

Merge pull request #44914 from idryomov/wip-rbd-quincy-batch-2

quincy: rbd backports (batch 2)

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

krbd: return error when no initial monitor address found

Since we filter monitor addresses based on ms_mode, check that at
least one address was found.

Otherwise, we mismatch arguments when calling sysfs/add_single_major
which emits a misleading error message to dmesg:

libceph: resolve 'name=user1' (ret=-3): failed
libceph: parse_ips bad ip 'name=user1,key=client.user1'

Fixes: https://tracker.ceph.com/issues/54128
Signed-off-by: Burt Holzman <burt@fnal.gov>
(cherry picked from commit 0076ffc86e043af7aedc127df8661eaf87fc1c58)

Merge pull request #44902 from neha-ojha/wip-44868-quincy

quincy: qa/distros: remove centos8

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Adam King adking@redhat.com
Reviewed-by: Yuri Weinstein <yweinste@redhat.com>

Merge pull request #44848 from cbodley/wip-54088

quincy: qa: remove centos8 from supported distros

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

qa/workunits/rbd: improve schedule add/remove cli test

This patch adds few tests to cover schedule add/remove with invalid
inputs.

Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
(cherry picked from commit a9312d4777a82d8f2d8766a011f10952f84d3f27)

mgr/rbd_support: fix schedule remove

Issue:

If we provide a random string in the schedule remove
command the entire schedule at specified level gets
removed.

Fixes: https://tracker.ceph.com/issues/53250
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
(cherry picked from commit 1b62447071a900b9fa7d856617cb7db9e030f91e)

qa/suites/krbd: add legacy+rxbounce and crc+rxbounce coverage

For basic, rbd and rbd-nomount subsuites, replace legacy and crc
facets with "legacy or legacy+rxbounce" and "crc or crc+rxbounce"
facets (chosen at random).

For fsx, singleton and thrash subsuites, add legacy+rxbounce and
crc+rxbounce facets and drop prefer-crc facet. The expected behaviour
of the latter depends on cluster configuration and should be tested
separately.

The total number of jobs remains the same.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit fbf8c1d68be60ab294719113edbd7f459a755c15)

qa: krbd rxbounce test

Lives in its own directory since ms_mode doesn't need to be permuted
here.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 95d30b534ef65207168397dd25ca7213c8290568)

rbd: recognize rxbounce map option

Fixes: https://tracker.ceph.com/issues/54063
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 8d2a456d7055cfb64e6bb9927187e2240b8c4d2a)

qa/suites/rbd: add cram-based mon command API test

With mon (rbd_support mgr module in this case) command definitions
generated automatically by @CLI{Read,Write}Command decorator, it's
very easy to accidentally break the external facing API.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 4ed1e74d83e8bf99b77d794d2d3bd0b22fe0997a)

mgr/rbd_support: level_spec is optional for schedule list/status

Commit fea6fdff4c74 ("mgr/rbd_support: level_spec passed to some
commands is not optional") is wrong. While it is true that a valid
level_spec is needed to create a LevelSpec instance, an empty string
is very much a valid level spec -- it signifies "all levels".

This wasn't caught because within Ceph these commands are wrapped by
rbd CLI which injects an empty string in get_level_spec_args().

Fixes: https://tracker.ceph.com/issues/54058
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit a5eef01e9248b09c187fcb8c6d122fd08dc54c88)

mgr/rbd_support: "trash remove" takes image_id_spec, not image_spec

Because of @CLIWriteCommand, the parameter name has to adhere to
the mon command API. Commit dcb51b067a49 ("mgr/rbd_support: define
commands using CLICommand") accidentally changed image_id_spec to
image_spec, breaking external users such as go-ceph.

Fixes: https://tracker.ceph.com/issues/54057
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 2f5faabf4258ec37984f871f46fee73e630c8a33)

qa/suites/rados/thrash-old-clients: remove centos_8.3_container_tools_3.0

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 8ca5729d21cd0b515a60c864183c56ba1b047e1f)

qa/distros/container-hosts: remove centos_8.2 and centos_8.3

Related to https://tracker.ceph.com/issues/54087

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 77b079176039eb4c7a81bb3b38418371638c374c)

qa/distros/podman: remove centos_8.2 and centos_8.3

Related to https://tracker.ceph.com/issues/54087

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 8f1d1b7c55a6a5715e12e9586be0951fad15cf49)

Merge pull request #44867 from cbodley/wip-qa-rgw-quincy

quincy: qa/rgw: tests run against ceph-quincy branch

Reviewed-by: Ali Maredia <amaredia@redhat.com>

qa/rgw: tests run against ceph-quincy branch

target the ceph-quincy branch of s3tests, ragweed, and java_s3tests.
this commit targets the quincy branch specifically, rather than merging
to master and backporting

Signed-off-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #44891 from cbodley/wip-53974

quincy: test/bufferlist: ensure rebuild_aligned_size_and_memory() always rebuilds

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

test/bufferlist: ensure rebuild_aligned_size_and_memory() always rebuilds.

Before the patch the test case was showing an unreliable behaviour
dependent on the underlying memory allocator. It was because
the bufferlist rebuild can be skipped, resulting in unchanged number
of buffers, if all of them begin at aligned addresses.

The commit fixes that by allocating a 4 KiB-aligned buffer and
offsetting it by a small constant (42) to ensure the memory added
to the bufferlist begins at non-4 KiB address.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 88176acd27b0d495abf9396c7ae2c6ec3fa46b1b)

test/bufferlist: assert the rebuild in rebuild_aligned_size_and_memory() actually happens.

For the investigation of failures like the following one:

```
[ RUN      ] BufferList.rebuild_aligned_size_and_memory
../src/test/bufferlist.cc:1865: Failure
Expected equality of these values:
  bl.get_num_buffers()
    Which is: 2
  1
[  FAILED  ] BufferList.rebuild_aligned_size_and_memory (0 ms)
```

The test case assumes the rebuild before the failed clause **always**
happens while `bufferlist::rebuild_aligned_size_and_memory()` skips it
if buffers are already aligned.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 2de5f17647437d8f67adfe79becbb0e280f7f26f)

qa/rgw: rgw/verify no longer pins centos 8.0

the symlink rgw/verify/centos_latest.yaml already selects centos

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 0fad609d4dca01335abda6c48ae2663a8fd15494)

qa/distros: remove duplicate centos_8.stream.yaml from supported

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 3b71b41190bbb0af5026babc82266541b6398e92)

qa/distros: centos_8.yaml is now a symlink to centos_8.stream.yaml

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 0f4e51f05f9b340fe6128b46ea4601ecf01625d2)

Merge pull request #44776 from guits/wip-bp-lvm-hackery-quincy

quincy: ceph-volume backports

Reviewed-by: Teoman Onay <tonay@redhat.com>

Merge pull request #44790 from guits/wip-54022-quincy

quincy: ceph-volume: improve mpath devices support

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>

Merge pull request #44828 from neha-ojha/wip-44735-quincy

quincy: qa/suites/rados: reduce the number of cephadm tests

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
Reviewed-by: Adam King adking@redhat.com

Merge pull request #44773 from sebastian-philipp/quincy-backport-44506-44485-44510-44566-44489-44578-44647-44600-44761-44066

quincy: mgr/cephadm: Quincy backport January

Reviewed-by: Adam King adking@redhat.com

Merge pull request #44824 from rhcs-dashboard/wip-dashboard-quincy-backports

quincy: dashboard: first batch of dashboard backports

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
Reviewed-by: Yuri Weinstein <yweins@redhat.com>

Merge pull request #44814 from idryomov/wip-rbd-quicy-batch-1

quincy: rbd backports (batch 1)

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

qa/suites/rados: reduce the number of cephadm tests

Currently, every rados run of ~400 jobs is running ~150 cephadm tests,
which is unnecessary and redundant. With this change, we will run some
basic cephadm tests within the rados suite. The following seems to be
a good start.

qa/suites/rados/cephadm/osds
qa/suites/rados/cephadm/smoke
qa/suites/rados/cephadm/smoke-singlehost
qa/suites/rados/cephadm/workunits

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit f849f1554c6dde32814bcde96671f1b02baa9f28)

Merge pull request #44764 from pdvian/wip-53978-quincy

quincy: osd/OSD: Log aggregated slow ops detail to cluster logs

Reviewed-by: Neha Ojha <nojha@redhat.com>

doc/cephadm: Co-location of daemons

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>

cephadm: add tests for Packager validate function

With the validate function split from the add_repo function we can
independently test the behavior of the validate function.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: validate that the constructed YumDnf baseurl is usable

If the inputs to the `cephadm add-repo` command would result in an
invalid URL for repo metadata fail the command early with a (somewhat)
helpful error.

Fixes: https://tracker.ceph.com/issues/46773
Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: add a validate function to Packager

The validate function is for testing the inputs to the Packager
subclasses independently of writing the configuration to disk.
It only raises an exception upon failed validation.
Use it for the existing YumDnf validation exceptions.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

cephadm: fully replace agent files when writing them

otherwise, if the new file is shorter than the old one
we will end up with a malformed file retaining the end bit
of the old version

Signed-off-by: Adam King <adking@redhat.com>

mgr/cephadm: store agent metadata in its own cache

To avoid having the host cache getting too big
due to having to store this additional info

Fixes: https://tracker.ceph.com/issues/53624
Signed-off-by: Adam King <adking@redhat.com>

mgr/cephadm: improve asyncssh type checking

asyncssh 2.9.0 introduces additional type hints

Fixes: https://tracker.ceph.com/issues/54003
Signed-off-by: Michael Fritch <mfritch@suse.com>

mgr/cephadm/iscsi: use `mon_command` in `post_remove` instead of `check_mon_command`

Use `mon_command` instead of `check_mon_command` in `post_remove` to avoid errors such as if iscsi service is removed before the iscsi gateway list is updated, cluster will enter error state and iscsi removal gets stuck.

Fixes: https://tracker.ceph.com/issues/53706
Signed-off-by: Melissa Li <melissali@redhat.com>

cephadm: _parse_ipv6_route: Fix parsing ifs w/o route

Fix parsing interfaces as returned by `ip -6 addr ls` that
don't have a route associated.

Note, in this case there is no route for enp3s0f1

Fixes: https://tracker.ceph.com/issues/53842
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>

cephadm: add shared_ceph_folder opt to ceph-volume subcommand

This commit adds the `--shared_ceph_folder` option to `ceph-volume`
subcommand, just like `shell` and `bootstrap` subcommands.

Fixes: https://tracker.ceph.com/issues/53931
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

mgr/cephadm: allow miscellaneous container args at service level

Fixes: https://tracker.ceph.com/issues/51566
Signed-off-by: Adam King <adking@redhat.com>

mgr/cephadm: add a test for enabling cephfs mirroring module

Add a test that checks that when cephfs mirror service is enabled
the mirroring mgr module gets enabled.

Actually-written-by: Sebastian Wagner <sewagner@redhat.com>
Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit bcb4fa70f9f739dce3e1c111db0a322804350f9d)

mgr/cephadm: auto-enable mirroring module when deploying service

Automatically enable the mgr's mirroring module when creating
cephfs-mirror services. This will trigger a mgr respawn.

Fixes: https://tracker.ceph.com/issues/50593
Based roughly on 50dc1d0decb2fbf7b9129bddba940969410be5cd

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit e030130fd1d680ebf9d2c6c9f95f44109fbf3036)

doc/cephadm: remove duplicate deployment scenario section

Signed-off-by: Melissa Li <melissali@redhat.com>
(cherry picked from commit 2222f26a37137a2f70b3f736ffad16c51a6b4e44)

mgr/cephadm: still check agent deps if it is marked down

Right now if an agent is down, the way _check_agent works
if will return without ever going on to check the deps or
scheduled actions for that agent. This causes a few issues.
For one, if an agent is marked down and then a mgr failover
happens, even if reconfiguring the agent would put it in a working
state (e.g. changing the target ip if the active mgr has moved)
we never try it because _check_agent just returns as soon as it
sees the agent is down. Additionally, if someone purposely tried
to schedule a redeploy of a down agent for whatever reason, we
would never make good on this action.

This change allows us to still carry out the normal checks/
scheduled actions even on down agents

Fixes: https://tracker.ceph.com/issues/53723
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 09a593c8d56adabb01f2aeb0859e7885994ce687)

mgr/cephadm: fixes minor grammar nit in Dry-Runs message

Signed-off-by: James McClune <jmcclune@mcclunetechnologies.net>
(cherry picked from commit ed20f98df167680eb2bcb2b3d7e811867a75b2a5)

doc/cephadm: improve the developer's guide a bit

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 4c58d71d2bcd6b89e1578b844d8092b692cec4b2)

doc/cephadm: fix a typo in developing-cephadm.rst

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit e513869fd36459518178ac321e8dda61836d4631)

cephadm: add unit.meta for agent

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 6cb56672566650793f76909731ec4857a5c0271a)

cephadm: change agent file permissions to 600

Fixes: https://tracker.ceph.com/issues/53541
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 0f839996df8c7065a982a92df13f9ec16298b541)

qa/suites/orch/cephadm: Also run the rbd/iscsi suite

Adding a new workload test to our suite.

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 651192aacc4ac695a03f4ab0f7ffa045632d5d11)

Merge pull request #44777 from guits/wip-54014-quincy

quincy: ceph-volume: zap osds in rollback_osd()

Reviewed-by: Teoman Onay <tonay@redhat.com>

mgr/dashboard: navigation page e2e fix

Looks like the newly added relative position for the sidebar is
causing cypress to verify that the sidebar is hidden from the user view.

Fixes: https://tracker.ceph.com/issues/53960
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 6b2be10dbba639cea4d40c39fb63900b7bc0ccd0)

mgr/dashboard: fix openapi-check

Fixes: https://tracker.ceph.com/issues/53950
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit 3487ddac665b5e2b260bb5dbf32d8fdef8d3e82a)

mgr/dashboard: remove feedback as always-on

It was mistakenly left as always-on

Fixes: https://tracker.ceph.com/issues/53950
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit 444da26e794615cfa9db8e2522b954ec80e82a4d)

qa/dashboard: ensure node 16 is installed

For Ubuntu: https://github.com/nodesource/distributions#manual-installation

Fixes: https://tracker.ceph.com/issues/53843
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit 7225b68e46173350954beb418ecd43e9eca4d179)

mgr/dashboard: Refactoring dashboard cephadm checks

I isolated all the tests suites into there respective files
so that in future it is easier to add more tests to it.

I also given priority to the host actions.

Create OSD checks are now written in a way that OSDs
are created only on the intended hosts. This will make
the host draining process easier and less time consuming.

Also tried to address the flaky force maintenance checks.

Removed some duplicated codes

Service creation part improved to reduce the time taken
for its completion

Fixes: https://tracker.ceph.com/issues/53905
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit b6759b75c9fc4d3fb565201aa6bbe0c2473fd3d4)

Merge pull request #44817 from adk3798/quincy-asyncssh28

quincy: mgr/cephadm: require asyncssh 2.8

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

monitoring/grafana: replace filestore osd count

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
(cherry picked from commit 57c26311de9f747122720e14da8e8fea0f889632)

monitoring/grafana: use Path class instead of split

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
(cherry picked from commit a3cf5c5e9fad32753e5263994eaa2754157af92f)

monitoring/grafana: remove explicit str casting

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
(cherry picked from commit 1e4d85d04fdd97d0e6613dab65089817dd09d2e6)

monitoring/grafana: add generated json files

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
(cherry picked from commit 2b4f3561d2fb49c23dc489475600266f3223232e)

monitoring/grafana: ValueError instead of RuntimeError

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
(cherry picked from commit b381a83e9bcde79c0695906e749c7b758faff674)

monitoring/grafana: Replace missing legendFormat warning with error

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
(cherry picked from commit 4c302234ff2e478d65647238112a94383a608ea3)

mgr/dashboard: Notification banners at the top of the UI have fixed height

Fixes: https://tracker.ceph.com/issues/51575
Signed-off-by: Waad AlKhoury <walkhour@redhat.com>
(cherry picked from commit ea55a0b33db2990a67dca10a7a9bc476096cd4ee)

mgr/dashboard: Improve notifications for osd nearfull, full

This PR adds some visual hints for osds that are near full or full

Fixes: https://tracker.ceph.com/issues/53334
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit f771cd492cd06da13f26e5f7ffe41b2d3c43f950)

mgr/dashboard: report ceph tracker bug/feature through GUI

Fixes: https://tracker.ceph.com/issues/44851
Signed-off-by: Shreya Sharma <shreyasharma.ss305@gmail.com>
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit ed2b4e7a569b26eb2eef56b6f331c4a7cda83a01)

mgr/cephadm: require asyncssh 2.8

Fixes: https://tracker.ceph.com/issues/54003
Signed-off-by: Michael Fritch <mfritch@suse.com>

Merge pull request #44733 from adamemerson/wip-53941-quincy

rgw: Report empty endpoints as error instead of crashing

Reviewed-by: Casey Bodley <cbodley@redhat.com>

qa/run_xfstests_qemu.sh: disable 251, 260 and 288

All three are skipped with virtio disks:

251 [not run] FITRIM not supported on /dev/vdc
260 [not run] FITRIM not supported on /dev/vdc
288 [not run] FITRIM not supported on /dev/vdc

But 260 and 288 fail with ide disks, where discard defaults to on. The
ancient kernel in our ubuntu-12.04.qcow2 doesn't support virtio discard
anyway so let's just disable them for consistency.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit b274bca1b2dfd7a46f66d2075f20fba2d2b1f73b)

qa/run_xfstests_qemu.sh: fall back to ide disks if needed

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1f0782057993b4658a2c8b4bb671a69863e78015)

qa/tasks/qemu: switch nbd devices from virtio to ide

This commit is a workaround of a bug in the virtio interface in qemu 6.1.0+.

Fixes: https://tracker.ceph.com/issues/53587
Signed-off-by: Or Ozeri <oro@il.ibm.com>
(cherry picked from commit 555a2896d74511271303d901e9da29d765f31637)

test/rbd_mirror: drop redundant MockJournaler instances

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 303d3ede48a088b947cb99f6fe1b400a6b0871be)

rbd-mirror: fix races in snapshot-based mirroring deletion propagation

When remote image is deleted, rbd-mirror can encounter three cases:

  1) no remote image id
  2) no remote mirror metadata
  3) MIRROR_IMAGE_STATE_DISABLING in remote mirror metadata

Commit d4c66ac5c615 ("rbd-mirror: fix issue with snapshot-based
mirroring deletion propagation") fixed case 1.  Cases 2 and 3 remained
broken because for both of them finalize_snapshot_state_builder() would
populate not only remote_mirror_peer_uuid but also remote_image_id,
thus disabling ENOLINK logic in handle_prepare_remote_image() and
handle_bootstrap().  Commit ff60aec2d9ef ("rbd-mirror: fix bootstrap
sequence while the image is removed") touched on case 3, but it made
a difference only for journal-based mirroring.

Stop calling finalize_snapshot_state_builder() on errors.  Instead,
align with journal-based mirroring by filling remote_mirror_peer_uuid
together with remote_mirror_uuid.

Fixes: https://tracker.ceph.com/issues/53963
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit d634a1df5b19d61955f2f94c7cc29bd4f3b678c8)

rbd-mirror: don't default replay_requires_remote_image() implementation

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ccfbf3e97ed1f50df0adcbec812f1b11fe22cace)

rbd-mirror: untangle StateBuilder::is_linked() overloads

Make it clear that the local image non-primariness is asserted
independent of the mode; avoid the default implementation being
overridden but still relied on by both modes.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit f49fa483ec6cdc19b4d60debefbb21bf65b7a385)

rbd-mirror: drop redundant initialization of StateBuilder members

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit baf57925abdee287cfa0aefc5ba2f602dac8c25e)

rbd: add missing switch arguments for recognition by get_command_spec()

Currently this

  $ rbd --all children img

doesn't work, while this

  $ rbd children --all img

or this

  $ rbd children img --all

does.  The issue is that -a/--all isn't on the list of known switch
arguments.  The "rbd children" example may seem contrived but for more
complicated commands such as "rbd device map" mixing switches and
positional arguments occurs naturally:

  $ rbd device --device-type nbd --options try-netlink --show-cookie map img

Fixes: https://tracker.ceph.com/issues/53935
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit e1b4811bc324236892e43a6bb841d6278fe1584e)

cls/rbd: add namespace to snapshot dump results

Signed-off-by: LiuYang <yippeetry@gmail.com>
(cherry picked from commit 4899558d935e9d63cf4151e12155001653cc56d5)

Merge pull request #44645 from cbodley/wip-rgw-quincy-backports-1

quincy: rgw: first batch of quincy backports

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>

Merge pull request #44702 from alimaredia/quincy

qa: move certificates for kmip task into /etc/ceph

Reviewed-by: Casey Bodley <cbodley@redhat.com>

ceph-volume: fix typo in tests

This fixes 2 typo in ceph-volume tests.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b07bd3e0e17021e0cf9773f916fad954f12254ed)

doc/ceph-volume: fix a typo

This fixes a typo in ceph-volume documentation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5d0a3cee5d7021dafd1e166e17946689b4bb90b7)

ceph-volume: add a test `test_mpath_device_is_device`

This test checks that Device.is_device() returns True for a mpath device.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0280ff6df09bc26107bc97446e9d5c18fbc582e9)

ceph-volume: improve mpath devices support

ee8887f4c0ff4f91117f31b621b95c8d08019130 was intended for adding
mpath devices support in ceph-volume but it has missed the lvm batch scenario.
This also fixes the zapping of mpath devices prepared with `ceph-volume raw`

Fixes: https://tracker.ceph.com/issues/52908
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 601ff7ed0a3ba5172b6bd886ca8ba2bd4d9e655a)

Merge pull request #44671 from kamoltat/wip-ksirivad-quincy-backport-44553

quincy: pybind/mgr/progress: enforced try and except on accessing event dictionary

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #44692 from MrFreezeex/wip-53938-quincy

quincy: cls/journal: skip disconnected clients when finding min_commit_position

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Mykola Golub <mgolub@mirantis.com>

ceph-volume: zap osds in rollback_osd()

rollback_osd() should zap and wipe the device for the corresponding osd
that was being prepared after a failure happens.

Fixes: https://tracker.ceph.com/issues/53376
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit effe65533f4b7248137fcdc0ae966f8438a05b01)

ceph-volume: don't use MultiLogger in find_executable_on_host()

This generates a lot of unnecessary messages on the terminal.

Fixes: https://tracker.ceph.com/issues/53934
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3be55621600be3ebc9c70295a3a351dab426b3a3)

ceph-volume: fix regression introcuded via #43536

The recent changes from PR #43536 introduced a regeression preventing from
running ceph-volume in a containerized context on Ubuntu 18.04.

Given that the path for the binary `lvs` differs between CentOS 8 and Ubuntu 18.04.
(`/usr/sbin/lvs` and `/sbin/lvs` respictively). It means that ceph-volume running
in the container on CentOS 8 sees the `lvs` binary at `/usr/sbin/lvs` and try to
run it with `nsenter` on the host which is running Ubuntu 18.04.

Fixes: https://tracker.ceph.com/issues/53812
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 95e88cda3df76b59b548ae808df0ef7f19db1f63)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3c93ffdc92d4d03b9ae7415b548192a572cfc5ea)

osd/OSD: Log aggregated slow ops detail to cluster logs

Slow requests can overwhelm a cluster log with every slow op in
detail and also fills up the monitor db. Instead, log slow ops
details in aggregated format.

Fixes: https://tracker.ceph.com/issues/52424
Signed-off-by: Prashant D <pdhange@redhat.com>
(cherry picked from commit 9319dc9273b36dc4f4bef1261b3c57690336a8cc)

rgw: Report empty endpoints as error instead of crashing

Fixes: https://tracker.ceph.com/issues/53941
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 3c4a64ca040d3a0e0ddf762c391575498dc2a77f)
Fixes: https://tracker.ceph.com/issues/53973
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

qa: move certificates for kmip task into /etc/ceph

On rhel/centos the ceph user does not have permission
to access these certs which leads to s3-test failures
in teuthology.

Signed-off-by: Ali Maredia <amaredia@redhat.com>

cls/journal: skip disconnected clients when finding min_commit_position

When a new journal client is registered, all already registered
clients are checked, and a client with min position is selected
as a position for the new client. Thus we may expect that
starting from the registered position all journal entries will be
available (not trimmed) for the new client.

But when looking for a min commit position, the client_register
function did not take into account that a registered client might
be in disconnected state, and in that case the journal entries
might be trimmed for this client.

Fixes: https://tracker.ceph.com/issues/53888
Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit 078d72e5e6cfa41f809045ff03971ac8acf0d31e)

pybind/mgr/progress: enforced try and except on accessing event dictionary

There is a certain race condition scenario where
an event gets deleted while the progress module
iterates through the ``events`` dictionary,
without a ``try and except``, this will cause
an unhandled exception error and will crash
the module.

This commit will enforce ``try and except``
on every part of the code where we are accessing
the ``events`` dictionary.

Fixes: https://tracker.ceph.com/issues/53803
Signed-off-by: Kamoltat <ksirivad@redhat.com>
(cherry picked from commit b70d4a9caae0eb859e10b68f93573d507625d267)