git.apps.os.sepia.ceph.com Git

cephadm: show error message if private registry credentials not provided

Raise UnauthorizedRegistryError in `_pull_image` if user tries to pull from a private registry without authentication, handle error in `command_bootstrap`, `commond_adopt`, `command_pull`

Fixes: https://tracker.ceph.com/issues/55015
Signed-off-by: Melissa Li <melissali@redhat.com>
(cherry picked from commit 4de0803ba893abf341ab634d1382208370de7c98)

mgr/cephadm: improving logging to send errors to stderr
Fixes: https://tracker.ceph.com/issues/47905
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 7f09307a614b313908a545a1d26e28e3e704e321)

mgr/cephadm: support non-root ssh-user w permissions

Restructured code, so that in case of non-root, the resulting file will
be created with permissions set to the ssh-user. This allows the
subsequent scp to be able to write the file. The remaining code kept the
same, so that file permissions are restored to the expected ones, but
just runs after the scp.

Fixes: https://tracker.ceph.com/issues/54620
Signed-off-by: Christoph Glaubitz <c.glaubitz@syseleven.de>
(cherry picked from commit 452e52a7e39409e3409d59940133333416b830bc)

mgr/cephadm: fixing public network conf parsing
Fixes: https://tracker.ceph.com/issues/55132
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 3ef6341e8ef5fe6a01f15c847f6bc9e2205d4d97)

ceph-volume/tests: reject loop devices in lvm.conf

The current task doesn't works (typo?).
Otherwise api/lvm.py can't work properly, functions such as
`get_single_lv()` and many other don't return the expected results.
Indeed, lvm is confused because of the nvme_loop setup.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a5fab15e44517ac63f3fd257989e81b8127b86d9)

ceph-volume: do not leave pv when zapping osds

when zapping a device and no vg/lv are left, the pv should be
removed too.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7f007e7fc75b4d6e7465c684f7e5b2458883dcc5)

orchestrator: support complex osd creation

This adds the support of complex OSD creation with command
`orch daemon add osd`.
Any argument supported by `DriveGroupSpec()` can be passed on the command line.

Usage:
```
ceph orch daemon add osd host:data_devices=device1,device2,db_devices=device3,osds_per_device=2,...
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8aa2f4745adff0ba3c7a0731cf48ccc1c85b33f3)

DriveSelection: skip unavailable devices

Cephadm shouldn't try to deploy a disk reported as unavailable by ceph-volume.
The idea here is to check the rejection reason so we can still use DB devices
in case of OSD replacement.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3a88547559769f4dd438f6557cef22ef9004fa2a)

ceph-volume: various fixes in arg_validators

if a device with an FS is passed, ceph-volume should abort
the OSD creation.

Fixes: https://tracker.ceph.com/issues/54535
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9f4b830dcfb45eda81eabf18a8461ac4e1bf642e)

doc/cephadm: fix a typo

s/osd_crush_choose_leaf_type/osd_crush_chooseleaf_type

```
[ceph: root@adm-1 /]# ceph config set global osd_crush_choose_leaf_type 0
Error EINVAL: unrecognized config option 'osd_crush_choose_leaf_type'
[ceph: root@adm-1 /]# ceph config set global osd_crush_chooseleaf_type 0
[ceph: root@adm-1 /]#
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d43189c17b03420674ea5424666388b8272c2580)

ceph-volume/tests: speed up tox tests

Let's use `--numprocesses=auto` in order to speed up the unit tests execution.

See the difference, without `--numprocesses=auto`:
```

... omitted output ...

real    1m22.884s
user    0m23.003s
sys     0m20.504s
```

with `--numprocesses=auto`:

```

... omitted output ...

real    0m18.767s
user    0m33.056s
sys     0m23.244s
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cd5eb7939ed92b584c45689a3169847811b8518d)

mgr/cephadm: generate one c-v raw prepare cmd per data device in raw mode

Fixes: https://tracker.ceph.com/issues/54522
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit b6556e5dbd21192c9207faf84c96f32bd8877d18)

mgr/cephadm: check spec host when adding osd
Fixes: https://tracker.ceph.com/issues/47872
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit b87c966697d36ef51f1e62425d77200667e651ae)

mgr/cephadm: adding HostSpec validation
Fixes: https://tracker.ceph.com/issues/54342
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 15ba147a2a4cae8ca69437382136d328a1f416f2)

mgr/cephadm: Show warning when user provides --fsid option
Fixes: https://tracker.ceph.com/issues/50804
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 8780aa04651fa2cddeec1d9d2dfcf4e08412d4ce)

doc/cephadm/operations.rst: fix typos

Signed-off-by: wangyunqing <wangyunqing@inspur.com>
(cherry picked from commit 92eb799a952db4f2fe2290aef56d2f66b8f64802)

mgr/cephadm: fixing prometheus port handling
Fixes: https://tracker.ceph.com/issues/51072
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 8eb1397d77dace25f387e88137a1807993a0796d)

mgr/cephadm: checking service name before removal
Fixes: https://tracker.ceph.com/issues/54503
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit b26c114c8456941d6cccf7d4355445f21cb373a7)

cephadm: respect --skip-firewalld flag

Fixes: https://tracker.ceph.com/issues/54137
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit d97057f8d7263cce8efc0857e3fe4a10faee30c8)

cephadm: verify config file exists when inferring it

Fixes: https://tracker.ceph.com/issues/54571
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 1568875a281d56b413e75b244c9c75311cf353a0)

python-common/drive_group: add extra_container_args to supported features

Should have been added when extending extra container args
to all the services but was missed

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit f036bdaf5a1e5f6b18a9591949be878fea8bb70d)

mgr/cephadm: add keep-alive requests to ssh connections

Fixes: https://tracker.ceph.com/issues/51733
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 445425ceccaab0cef9c04b795a8fe0236f56d9eb)

doc: Add note to osds_per_device description about dual-actuator devices

This commit adds information about using dual-actuator devices with the
osds_per_device drive group option, letting users know they can create
an OSD for each actuator by setting this value to 2 in the drive group
they're using to apply OSDs to the device.

Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
(cherry picked from commit 882fc277189acd0f528dfb271268474a06d373a7)

doc/cephadm/adoption.rst: fix typos

Signed-off-by: wangyunqing <wangyunqing@inspur.com>
(cherry picked from commit e4db28f6b294909e0f177e82dbda8cfcc8129846)

mgr/cephadm: fixing MDSSpec ctr
Fixes issue: https://tracker.ceph.com/issues/54487

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit eb4e3f2494e8706cbf5f7c60a37dcb13bca0d83f)

qa/tasks/cephfs: increase timeout in test_nfs.py

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
(cherry picked from commit 44ad552093b4f0dc21563dd9f804974ade239440)

mgr/cephadm: block draining last _admin host

Fixes: https://tracker.ceph.com/issues/54413
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit a0d21c7108e8f95b541bdb7653d2595f68e42520)

mgr/cephadm: block removing last instance of _admin label

Fixes: https://tracker.ceph.com/issues/54425
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit fbe0c3fd23f9005986959bade149093c340f6238)

cephadm/box: default add hosts

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
(cherry picked from commit dd1b5eb38ce891c9d0786b48c42152c6cade9b62)

mgr/cephadm: extend extra_container_args to other service types

Otherwise, without this change, this can only be used for mgr,
mon and crash (daemons without their own service spec class)

Fixes: https://tracker.ceph.com/issues/54390
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit d3c14a17dc5cafef199f4fc3ce657bab54d89b4a)

cephadm: still set container_image when --no-assimilate-config is provided

Fixes: https://tracker.ceph.com/issues/54141
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 59d004cb901eb6d84fb6907cb88314fd31b87904)

mgr/cephadm: reduce log level for asyncssh error messages

Fixes: https://tracker.ceph.com/issues/54132
Signed-off-by: Melissa Li <melissali@redhat.com>
(cherry picked from commit 95d5db0f4297286c420057ac10f1b63d3116eace)

mgr/cephadm: Show an error when invalid format
Fixes: https://tracker.ceph.com/issues/54198
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit d2fa22fd77d4a20f5db0d9315e5efebb016de481)

mgr/cephadm: using MDSSPec instead of ServiceSpec
Fixes: https://tracker.ceph.com/issues/54184
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit db765bd80608b7c6930a5111eb006b5d12f73de2)

mgr/cephadm: Adding AGE field to device ls cmd
Fixes: https://tracker.ceph.com/issues/53540
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 1c5b3e86f9b8ae0ca3ae41798dfa18e9ffe9fcb7)

cephadm: chown the prometheus data dir during redeploy

some builds of prometheus run with a uid 65534 (nobody) where other
builds of prometheus run with a uid of 0 (root)

Fixes: https://tracker.ceph.com/issues/54159
Signed-off-by: Michael Fritch <mfritch@suse.com>
(cherry picked from commit 21fb80aaab0b333d997d8241e17cf9749a37e065)

mgr/cephadm: Delete ceph.target if last cluster
Fixes: https://tracker.ceph.com/issues/46655
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit f2a916b985ce3fef103fb1385159d57f3788c888)

doc/cephadm: Add CentOS Stream install instructions

Signed-off-by: Patrick C. F. Ernzer <pcfe@pcfe.net>.
(cherry picked from commit 7f243262f9d64768e3ee12a9328fc36245bb244f)

mgr/cephadm: Adding logic to cleanup several dirs after an rm-cluster
Fixes: https://tracker.ceph.com/issues/53010
https://tracker.ceph.com/issues/53815
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 0df6c04d8f99692a542b49c22bddbf12510801a5)

doc/cephadm: fixing cluster purging section
https://tracker.ceph.com/issues/54018
ceph orch is not enough to stop all cephadm operations

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 5a4f5fb29ed88110f64ffed0187902ffa3368880)

Merge pull request #46065 from adk3798/quincy-add-natsort

quincy: mgr/cephadm: Adding python natsort module

Reviewed-by: Redouane Kachach <rkachach@redhat.com>

Merge pull request #45901 from ivancich/wip-55043-quincy

quincy: cls/rgw: rgw_dir_suggest_changes detects race with completion

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #45992 from idryomov/wip-make-check-enable-rbd-caches-quincy

quincy: run-make-check.sh: enable RBD persistent caches

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #45966 from tchaikov/quincy-pr-45916

quincy: cmake/modules: always use the python3 specified in command line

Reviewed-by: David Galloway <dgallowa@redhat.com>

Merge pull request #45913 from idryomov/wip-resurrect-mutex-debug-quincy

quincy: cmake: resurrect mutex debugging in all Debug builds

Reviewed-by: Kefu Chai <kchai@redhat.com>

Merge pull request #45875 from ideepika/wip-ninja-default-quincy

quincy: ceph.spec: make ninja-build package install always

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>

Merge pull request #45927 from kotreshhr/wip-55348-quincy

quincy: mgr/volumes: Show clone failure reason in clone status command

Reviewed-by: Venky Shankar <vshankar@redhat.com>

Merge pull request #45291 from joscollin/wip-54480-quincy

quincy: mgr/stats: be resilient to offline MDS rank-0

Reviewed-by: Venky Shankar <vshankar@redhat.com>

mgr/cephadm: Adding python natsort module
Needed by: https://tracker.ceph.com/issues/54026

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 7e646d99589205633a21e19cbdc11b7999ae5da1)

Conflicts:
debian/control

Merge pull request #45896 from idryomov/wip-persistent-cache-status-quincy

quincy: rbd persistent cache UX improvements (status report, metrics, flush command)

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #45766 from cbodley/wip-55175

quincy: cmake: WITH_SYSTEM_UTF8PROC defaults to OFF

Reviewed-by: Laura Flores <lflores@redhat.com>

Merge pull request #45665 from s0nea/wip-55042-quincy

quincy: mgr/cephadm: try to get FQDN for configuration files

Reviewed-by: Adam King adking@redhat.com
Reviewed-by: Michael Fritch <mfritch@suse.com>

Merge pull request #45625 from ljflores/wip-quincy-test-cli-timeout

quincy: qa/tasks/cephadm_cases: increase timeouts in test_cli.py

Reviewed-by: Adam King adking@redhat.com

Merge pull request #45595 from ronen-fr/wip-rf-44050-quincy

quincy: osd/scrub: ignoring unsolicited DigestUpdate events

Reviewed-by: Laura Flores <lflores@redhat.com>

Merge pull request #45568 from mgfritch/backport-45420-quincy

quincy: cephadm: infer the default container image during pull

Reviewed-by: Adam King adking@redhat.com

Merge pull request #45359 from mgfritch/backport-45347-quincy

quincy: cephadm: preserve `authorized_keys` file during upgrade

Reviewed-by: Adam King adking@redhat.com

cmake/modules: use exact version of python3 when finding cython

* CMakeLists.txt:
    always pass "EXACT" to find_package(Python3).
    because per cmake document, "EXACT" only takes effect when
    <Package>_FIND_VERSION_COUNT is greater than 1, where <Package>
    is "Python3". see also cmake/modules/FindPython/Support.cmake
* cmake/modules/AddCephTest.cmake:
    drop redundant find_package(Python3) calls. since Python3 is
    a mandatory requirement for building Ceph, we only need a
    single call of find_package(Python3..) in the top of the source
    tree. the only possible case to repeat it is to ensure that we
    have the correct version of Python3 used in following CMake
    script. but there is no need to repeat it if we just want to
    ensure that we have a python3 interpretor in place.
* cmake/modules/Distutils.cmake:
    always pass "EXACT" to find_package(Python3).
    we should always pass EXACT to find_package() when finding python3,
    this is a follow-up of e2babdfae8c99f39f99a7c8a8f966299b2e62b19

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit ea4ae6d2f17ae8dcfb3d6f215d53b3f82a99270d)

Merge pull request #45988 from zdover23/wip-doc-os-recommendations-backport-quincy-3

quincy: doc/start: add testing support information

Reviewed-by: Josh Durgin <jdurgin@redhat.com>

Merge pull request #45905 from idryomov/wip-rbd-mirror-test-timer-lock-quincy

quincy: test/rbd_mirror: grab timer lock before calling add_event_after()

Reviewed-by: Christopher Hoffman <choffman@redhat.com>

run-make-check.sh: enable RBD persistent caches

This was attempted in commit 69a7ed4eab36 ("run-make-check: enable
WITH_RBD_RWL when WITH_PMEM is true") but never completed. We soon
bumped the requirement on libpmem, so WITH_SYSTEM_PMDK=ON wouldn't
have worked anyway.

Enable the RWL mode conditionally based on WITH_RBD_RWL variable.
Enable the SSD mode unconditionally as it has no special dependencies
and can be built on any architecture.

Fixes: https://tracker.ceph.com/issues/55285
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 0f1634a21f5da2250915d8ac05a6f179d4e76d03)

Conflicts:
run-make-check.sh [ commit 57edb76ea468 ("build: Add some
debugging messages") not in quincy ]

test/encoding/check-generated.sh: show diff if binary reencode check fails

Take bf0b161115aa ("test/encoding/check-generated.sh: show diff if cmp
fails") a bit further. Suggesting "cmp $tmp1 $tmp2" isn't very helpful
since cmp would report just the mismatch offset.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 59d928a06c028bf381307001d2b68fa8545d8fc4)

librbd/cache/pwl: WriteLogCacheEntry constructor must initialize flags

Initializing the individual bit field members leaves the remaining two
bits uninitialized and that garbage state gets persisted.

In general, using bit fields in a structure where the layout actually
matters is not desirable.  Even with a few single bits, such as here,
their order, strictly speaking, is not guaranteed:

    An implementation may allocate any addressable storage unit large
    enough to hold a bit-field. If enough space remains, a bit-field
    that immediately follows another bit-field in a structure shall be
    packed into adjacent bits of the same unit. If insufficient space
    remains, whether a bit-field that does not fit is put into the next
    unit or overlaps adjacent units is implementation-defined. The
    order of allocation of bit-fields within a unit (high-order to
    low-order or low-order to high-order) is implementation-defined.
    The alignment of the addressable storage unit is unspecified.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 91d270b210a908ea2f3578dd7db3263383da95a8)

librbd/cache/pwl: initialize generate_test_instances() objects

... to prevent check-generated.sh failures such as:

**** librbd::cache::pwl::WriteLogPoolRoot test 1 dump_json check failed ****
   ceph-dencoder type librbd::cache::pwl::WriteLogPoolRoot select_test 1 dump_json > /tmp/typ-cAoWrqlHC
   ceph-dencoder type librbd::cache::pwl::WriteLogPoolRoot select_test 1 encode decode dump_json > /tmp/typ-ES5yHpfGL
5c5
<     "flushed_sync_gen": 0,
---
>     "flushed_sync_gen": 255,

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 2c131f57d63454de39210375ce75a282df6fe365)

librbd/cache/pwl: fix -Wunused-lambda-capture warnings

Reported by clang on "make check" and "make check arm64" builds.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 753aa038fdbc26a2ee0978f54d3f7dcfa052e833)

doc/start: add testing support information

This PR adds information about support for testing,
and information about which distros the Ceph project
builds packages for.

This is one in a series of PRs including the following:

https://github.com/ceph/ceph/pull/45385
https://github.com/ceph/ceph/pull/45764

This PR specifically includes the information that Ernesto
Puerta collected here:
https://github.com/ceph/ceph/pull/45385#pullrequestreview-911766656

Signed-off-by: Zac Dover <zac.dover@gmail.com>
(cherry picked from commit 0364f3afcccc85d190237b0a74b4deeefa4738f3)

cmake/modules: always use the python3 specified in command line

if another python3 with higher version is found by
find_package(Python3), the cmake's install script would just
install the python modules/extensions into that python3's
dist-package directory, and the packaging script would fail
to find these artifacts when trying to package them.

so we need to ensure that the install directories for python
modeules/extensions are always "versioned" with WITH_PYTHON3
cmake option.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit e2babdfae8c99f39f99a7c8a8f966299b2e62b19)

qa: make test_perf_stats_stale_metrics check only the clients created for the tests

Uses the client's global id to get the metrics, instead of using the index.
This ensures that test_perf_stats_stale_metrics checks only the clients mounted for
the tests.

Fixes: https://tracker.ceph.com/issues/54971
Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit 1621308214cd18750c8be803fc014bdf73e2218a)

qa: test `ceph fs perf stats` doesn't output stale metrics

That `ceph fs perf stats` doesn't output stale metrics
after the rank0 MDS failover.

Fixes: https://tracker.ceph.com/issues/50033
Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit 116e89a2f2849ed7cb711d1ae465c6f510b2810d)

mgr, pybind/mgr, mgr/stats: be resilient to offline rank0 MDS

Reregister the user queries during the rank0 MDS failover event
by calling listener.handle_query_updated(). This enables
`ceph fs perf stats` to receive the updated metrics after the
failover.

Fixes: https://tracker.ceph.com/issues/50033
Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit c2470f271cce4d512f2cf00552c9b753e4c69f71)

17.2.0

Merge pull request #45932 from adk3798/revert-pids-limit-quincy

quincy: cephadm: Revert pids limit

Revert "cephadm: remove containers pids-limit"

This reverts commit 7c1214f38091dde0ba2c5e0557dcd98f97f91302.

Signed-off-by: Adam King <adking@redhat.com>

Revert "qa/suites/orch/cephadm: restrict test_iscsi_pids_limit to CentOS"

This reverts commit 355a819d3a65ef05ccc078fcb58eca4c84dac573.

Signed-off-by: Adam King <adking@redhat.com>

doc: Document the clone failure status

Fixes: https://tracker.ceph.com/issues/55190
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 555d3610635f4206a21f7b41deabffdf2136ccbc)

mgr/volumes: Fix clone hang issue

Following sequence of operation lead to deadlock

1. Created subvolume
2. Written some I/O on the subvolume
3. Create snapshot of the subvolume
4. Create clone of the snapshot
5. Delete snapshot from back end (don't use subvolume interface) before
clone completes
6. Delete clone with force
7. Delete subvolume
8. Delete fs and associated pools
9. Created new fs
10 Created new subvolume,
11. Written some I/O on the subvolume
12. Create snapshot of the subvolume
13. Create clone of the snapshot <---------------THIS OPERATION HANGS -----------------

Root Cause:
Since the snapshot is deleted from the back end, the clone fails. But it
also fails to remove the clone index at '/volumes/_index/clone'. The
cloner thread goes to infinite loop of starting the clone and failing.
This involves taking 'self.async_job.lock()' and reads the clone index
to get the job and registers the above job.

While the 'cloner thread' is in above loop, the fs is destroyed. The
cloner threads which lives till the mgr/volumes is enabled in mgr, takes
the 'self.async_job.lock()' and hangs while reading the clone index.

Any further clone operations which also requires above lock hangs.

Fix:
Remove the clone index even though snapshot is not present.

Fixes: https://tracker.ceph.com/issues/55217
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit cfd8d63e0158b1d336f5e080a1e83dbf6fc60079)

qa: Add test for clone failure status

Fixes: https://tracker.ceph.com/issues/55190
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 916a5981cf2912e2b5f318295cb1af83e34f9183)

mgr/volumes: Add clone failure reason in clone status

Add the clone failure reason in the clone status.
The sample output is as below:

$ ceph fs clone status cephfs clone_0
{
  "status": {
    "state": "failed",
    "source": {
      "volume": "cephfs",
      "subvolume": "subvolume_0",
      "snapshot": "snapshot_0",
      "size": "52428800"
    },
    "failure": {
      "errno": "2",
      "error_msg": "snapshot 'snapshot_0' does not exist"
    }
  }
}

Fixes: https://tracker.ceph.com/issues/55190
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit e0ba36f3344a3633b0343565b63472bbf94538b2)

cmake: resurrect mutex debugging in all Debug builds

Commit 403f1ec2888a ("cmake: make "WITH_CEPH_DEBUG_MUTEX" depend on
CMAKE_BUILD_TYPE") made WITH_CEPH_DEBUG_MUTEX depend on build type
being set to Debug, in CMakeLists.txt.  However, if CMAKE_BUILD_TYPE
isn't specified by the user, we may still set it to Debug later, in
src/CMakeLists.txt, and in that case WITH_CEPH_DEBUG_MUTEX doesn't
get enabled.  The result is that

  $ do_cmake.sh -DCMAKE_BUILD_TYPE=Debug ...

debug builds have mutex debugging enabled, while

  $ do_cmake.sh ...

builds, which are supposed to be the same, don't.  Jenkins builders
don't pass -DCMAKE_BUILD_TYPE=Debug so that commit effectively turned
off all ceph_mutex_is_locked* asserts in "make check".

Fixes: https://tracker.ceph.com/issues/55318
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 226e614c95f1a1dcd79fa2011012ced1de19e6d3)

Merge pull request #45885 from markhpc/quincy-bs-avl-cursor-fix

quincy: os/bluestore: Always update the cursor position in AVL near-fit search.

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>

test/rbd_mirror: grab timer lock before calling add_event_after()

add_event_after() expects an externally provided mutex to be held
for the call. This was missed in commit 8965a0f2a6f7 ("rbd-mirror:
synchronize with in-flight stop in ImageReplayer::stop()").

Fixes: https://tracker.ceph.com/issues/55317
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 60e16106837e0d23366709f70f39c4f1ae7a2a45)

os/bluestore: Always update the cursor position in AVL near-fit search.

Signed-off-by: Mark Nelson <mnelson@redhat.com>
(cherry picked from commit 3bed53debfa2f9ec9d31021ce7eaf8b78f78f9e0)

test/cls/rgw: test dir_suggest after successful completion

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit a350888bf9d812db36b3bdf6d1e4ee7469964fad)

cls/rgw: rgw_dir_suggest_changes detects race with completion

if bucket listing races with a pending index transaction, its suggested
removal may be mistakenly applied if that index transaction completes
before the osd receives this suggestion

in `rgw_dir_suggest_changes()`, the sole condition for applying a
suggested change is that the `cur_disk.pending_map` is empty. this is
true after rgw_bucket_complete_op()

on index completion, `rgw_bucket_dir_entry::index_ver` is updated to match
the new value of `rgw_bucket_dir_header::ver`. because most of `struct
rgw_bucket_dir_entry` makes the round trip through bucket listing ->
dir_suggest, we have access to the index_ver of the suggested entry. by
comparing this against the stored entry, we can ignore any suggestions
that were sent before the most recent completion

Fixes: https://tracker.ceph.com/issues/54528
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit aa381b6765b0fb316976c4af7a45f32a157a4f75)

Merge pull request #45780 from vshankar/wip-55110-quincy

quincy: mount.ceph: remove `ms_mode' mount option when switching to old-syntax

Reviewed-by: Xiubo Li <xiubli@redhat.com>

librbd/cache/pwl: remove RBD_FEATURE_DIRTY_CACHE check in DiscardRequest

"m_image_ctx.features &&RBD_FEATURE_DIRTY_CACHE" is obviously wrong
because it would pretty much always be true.  However, even if bitwise
AND was used, this check would still be dead because DiscardRequest is
only invoked if RBD_FEATURE_DIRTY_CACHE is enabled:

  int invalidate_cache(ImageCtx *ictx) {
  {
    ...
    // Delete writeback cache if it is not initialized
    if ((!ictx->exclusive_lock ||
         !ictx->exclusive_lock->is_lock_owner()) &&
ictx->test_features(RBD_FEATURE_DIRTY_CACHE)) {
      C_SaferCond ctx3;
      ictx->plugin_registry->discard(&ctx3);
      r = ctx3.wait();
    }

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit aee78bbb9d7edd606a8a235c57b2b704d7b94e4c)

librbd/cache/pwl: don't crash if cache file removal fails

The non-ec overload will throw fs::filesystem_error on any error
(e.g. EPERM due to unprivileged "rbd persistent-cache invalidate"
being brought up against a privileged workload).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 63197ff7003fa9e595527a7431f9f3f6790f7d57)

rbd: add persistent-cache flush command

Add a flush command so that users can manually flush cache.

[ idryomov: error messages, incorporate doc and help.t hunks, drop
do_persistent_cache_flush() ]

Signed-off-by: Yin Congmin <congmin.yin@intel.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 644fbc9fcc8f12eb93d5cc20054cd8598ab001b7)

rbd: rename image-cache invalidate command

Rename command image-cache to persistent-cache. Refactoring the code
of invalidate command.

[ idryomov: error message, incorporate doc and help.t hunks, drop
do_persistent_cache_invalidate() ]

Signed-off-by: Yin Congmin <congmin.yin@intel.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 05bfe10ad9fde533aa728f9aa0cc8a8f155c03c5)

librbd/cache/pwl: rename persistent cache key

librbd "internal" metadata keys was change to ".rbd" prefix. Change
peristent cache to ".rbd" too.
And the name of persistent cache key is IMAGE_CACHE_STATE. Since
this key is planned to be used outside the pwl directory, it seems
more appropriate to change it to a clear name as PERSISTENT_CACHE_STATE.

Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit bd66fdda910f02ffe91bb026f82a85f28a6ff225)

rbd: include persistent cache metrics in "rbd status" report

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit e996fd80601ec8c309c1517f33171e88a2f31cad)

rbd: factor out get_percentage() helper

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 9324ab94711dbe9a1265643adcc79ae0a3cba812)

librbd/cache/pwl: no need to set clean and empty in remove_pool_file()

It is redundant -- the only caller sets both since commit 6593e31fff18
("librbd/cache/pwl: correct cache state").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit d64a3ae265897806809d9fa08ac72c549b4bca4f)

librbd/cache/pwl: avoid inconsistencies in ImageCacheState

When empty and/or clean bools are updated in I/O handling code paths,
ImageCacheState becomes inconistent for a short while: e.g. with clean
transitioned to true, dirty_bytes counter could still be positive
because the counters are updated only in periodic_stats(). Move to
updating the counters in update_image_cache_state(Context*) to avoid
this.

update_image_cache_state(Context*) now requires m_lock -- most call
sites already hold it anyway. The only problematic call site was
AbstractWriteLog::shut_down() callback chain: perf_stop() needed to
be moved to the very end since perf counters must be alive now for
update_image_cache_state() to work.

Don't override expect_op_work_queue() in unit tests: completing
context in the same thread now results in a deadlock on m_lock in
all test cases that call AbstractWriteLog::init().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 016882925a63f4f03a9c445d008b2325d479bc30)

librbd/cache/pwl: handle invalid ImageCacheState json

get_json_format() and create_image_cache_state() attempt to get
particular keys which could result in an unhandled std::runtime_error
exception. Conversely, ImageCacheState constructor just swallows that
exception which could leave the newly constructed object incorrectly
initialized. Avoid doing parsing in the constructor and introduce
init_from_config() and init_from_metadata() methods instead.

While at it, move everything out from under "persistent_cache" key.
Also fix init_state_json_write test case which stopped working now
that types are enforced by json_spirit.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 7678ee2490965a8a73c02a47283adaa5036dbcab)

librbd/cache/pwl: add basic metrics to ImageCacheState

Add basic metrics to ImageCacheState and persist them, including
allocated_bytes, cached_bytes, dirty_bytes, free_bytes and hit/miss
info.

Leverage periodic_stats() timer to call update_image_cache_state.
In order to avoid outputting too much debug information, the original
statistics output log level is changed to 5.

Switch to json_spirit for encoding because encode_json encodes bool as
"true"/"false" string.

Remove rbd_persistent_cache_log_periodic_stats option because we need
to always update cache state.

[ idryomov: add cached_bytes and hits_partial; report misses and
miss_bytes instead of respective totals; naming ]

Fixes: https://tracker.ceph.com/issues/50614
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 769f3a06ecf85249c1473cbb6bab7503beb1ba78)

librbd/cache/pwl: correct cache state

update cache state after dirty_entries or log_enties list updated.

Fixes: https://tracker.ceph.com/issues/50614
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit 6593e31fff180ec4123e37107c88eb39f7d10fdf)

Merge pull request #45857 from ljflores/wip-quincy-55269

quincy: mgr/telemetry: anonymize daemons in telemetry `perf_counters`

Reviewed-by: Yaarit Hatuka <yaarithatuka@gmail.com>

build: make ninja-build package install always

we use ninja build as default build now, having it installed only with
make check enabled may make builds fail, if ran without make check.

Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
(cherry picked from commit fa7821dd00b1577391db9518bbef0c7618b78ade)

Merge pull request #45627 from ljflores/wip-55051-quincy

quincy: admin/doc-requirements: bump sphinx to 4.4.0

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: David Galloway <dgallowa@redhat.com>

mgr/telemetry: fix daemon anonymization in perf_counters

Anonymized daemons now appear with a SHA1 digest instead of their
original identifier, e.g.:

    "perf_counters": {
        "mon.1b1b829ba9298527f4934053a4742a1710937007": {
            "mon": {
                "election_call": {
                    "value": 1
                },
                ...
                "session_trim": {
                    "value": 0
                }
            },
        ...
        }
    ...
    }

Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
(cherry picked from commit 2f4cc770e7ac9767d6d3be51c1de03f6014a6f98)