git.apps.os.sepia.ceph.com Git

mgr/cephadm: block draining last _admin host

Fixes: https://tracker.ceph.com/issues/54413
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit a0d21c7108e8f95b541bdb7653d2595f68e42520)

mgr/cephadm: block removing last instance of _admin label

Fixes: https://tracker.ceph.com/issues/54425
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit fbe0c3fd23f9005986959bade149093c340f6238)

cephadm/box: default add hosts

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
(cherry picked from commit dd1b5eb38ce891c9d0786b48c42152c6cade9b62)

mgr/cephadm: extend extra_container_args to other service types

Otherwise, without this change, this can only be used for mgr,
mon and crash (daemons without their own service spec class)

Fixes: https://tracker.ceph.com/issues/54390
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit d3c14a17dc5cafef199f4fc3ce657bab54d89b4a)

cephadm: still set container_image when --no-assimilate-config is provided

Fixes: https://tracker.ceph.com/issues/54141
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 59d004cb901eb6d84fb6907cb88314fd31b87904)

mgr/cephadm: reduce log level for asyncssh error messages

Fixes: https://tracker.ceph.com/issues/54132
Signed-off-by: Melissa Li <melissali@redhat.com>
(cherry picked from commit 95d5db0f4297286c420057ac10f1b63d3116eace)

mgr/cephadm: Show an error when invalid format
Fixes: https://tracker.ceph.com/issues/54198
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit d2fa22fd77d4a20f5db0d9315e5efebb016de481)

mgr/cephadm: using MDSSPec instead of ServiceSpec
Fixes: https://tracker.ceph.com/issues/54184
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit db765bd80608b7c6930a5111eb006b5d12f73de2)

mgr/cephadm: Adding AGE field to device ls cmd
Fixes: https://tracker.ceph.com/issues/53540
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 1c5b3e86f9b8ae0ca3ae41798dfa18e9ffe9fcb7)

cephadm: chown the prometheus data dir during redeploy

some builds of prometheus run with a uid 65534 (nobody) where other
builds of prometheus run with a uid of 0 (root)

Fixes: https://tracker.ceph.com/issues/54159
Signed-off-by: Michael Fritch <mfritch@suse.com>
(cherry picked from commit 21fb80aaab0b333d997d8241e17cf9749a37e065)

mgr/cephadm: Delete ceph.target if last cluster
Fixes: https://tracker.ceph.com/issues/46655
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit f2a916b985ce3fef103fb1385159d57f3788c888)

doc/cephadm: Add CentOS Stream install instructions

Signed-off-by: Patrick C. F. Ernzer <pcfe@pcfe.net>.
(cherry picked from commit 7f243262f9d64768e3ee12a9328fc36245bb244f)

mgr/cephadm: Adding logic to cleanup several dirs after an rm-cluster
Fixes: https://tracker.ceph.com/issues/53010
https://tracker.ceph.com/issues/53815
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 0df6c04d8f99692a542b49c22bddbf12510801a5)

doc/cephadm: fixing cluster purging section
https://tracker.ceph.com/issues/54018
ceph orch is not enough to stop all cephadm operations

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 5a4f5fb29ed88110f64ffed0187902ffa3368880)

Merge pull request #46065 from adk3798/quincy-add-natsort

quincy: mgr/cephadm: Adding python natsort module

Reviewed-by: Redouane Kachach <rkachach@redhat.com>

Merge pull request #45901 from ivancich/wip-55043-quincy

quincy: cls/rgw: rgw_dir_suggest_changes detects race with completion

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #45992 from idryomov/wip-make-check-enable-rbd-caches-quincy

quincy: run-make-check.sh: enable RBD persistent caches

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #45966 from tchaikov/quincy-pr-45916

quincy: cmake/modules: always use the python3 specified in command line

Reviewed-by: David Galloway <dgallowa@redhat.com>

Merge pull request #45913 from idryomov/wip-resurrect-mutex-debug-quincy

quincy: cmake: resurrect mutex debugging in all Debug builds

Reviewed-by: Kefu Chai <kchai@redhat.com>

Merge pull request #45875 from ideepika/wip-ninja-default-quincy

quincy: ceph.spec: make ninja-build package install always

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>

Merge pull request #45927 from kotreshhr/wip-55348-quincy

quincy: mgr/volumes: Show clone failure reason in clone status command

Reviewed-by: Venky Shankar <vshankar@redhat.com>

Merge pull request #45291 from joscollin/wip-54480-quincy

quincy: mgr/stats: be resilient to offline MDS rank-0

Reviewed-by: Venky Shankar <vshankar@redhat.com>

mgr/cephadm: Adding python natsort module
Needed by: https://tracker.ceph.com/issues/54026

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 7e646d99589205633a21e19cbdc11b7999ae5da1)

Conflicts:
debian/control

Merge pull request #45896 from idryomov/wip-persistent-cache-status-quincy

quincy: rbd persistent cache UX improvements (status report, metrics, flush command)

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #45766 from cbodley/wip-55175

quincy: cmake: WITH_SYSTEM_UTF8PROC defaults to OFF

Reviewed-by: Laura Flores <lflores@redhat.com>

Merge pull request #45665 from s0nea/wip-55042-quincy

quincy: mgr/cephadm: try to get FQDN for configuration files

Reviewed-by: Adam King adking@redhat.com
Reviewed-by: Michael Fritch <mfritch@suse.com>

Merge pull request #45625 from ljflores/wip-quincy-test-cli-timeout

quincy: qa/tasks/cephadm_cases: increase timeouts in test_cli.py

Reviewed-by: Adam King adking@redhat.com

Merge pull request #45595 from ronen-fr/wip-rf-44050-quincy

quincy: osd/scrub: ignoring unsolicited DigestUpdate events

Reviewed-by: Laura Flores <lflores@redhat.com>

Merge pull request #45568 from mgfritch/backport-45420-quincy

quincy: cephadm: infer the default container image during pull

Reviewed-by: Adam King adking@redhat.com

Merge pull request #45359 from mgfritch/backport-45347-quincy

quincy: cephadm: preserve `authorized_keys` file during upgrade

Reviewed-by: Adam King adking@redhat.com

cmake/modules: use exact version of python3 when finding cython

* CMakeLists.txt:
    always pass "EXACT" to find_package(Python3).
    because per cmake document, "EXACT" only takes effect when
    <Package>_FIND_VERSION_COUNT is greater than 1, where <Package>
    is "Python3". see also cmake/modules/FindPython/Support.cmake
* cmake/modules/AddCephTest.cmake:
    drop redundant find_package(Python3) calls. since Python3 is
    a mandatory requirement for building Ceph, we only need a
    single call of find_package(Python3..) in the top of the source
    tree. the only possible case to repeat it is to ensure that we
    have the correct version of Python3 used in following CMake
    script. but there is no need to repeat it if we just want to
    ensure that we have a python3 interpretor in place.
* cmake/modules/Distutils.cmake:
    always pass "EXACT" to find_package(Python3).
    we should always pass EXACT to find_package() when finding python3,
    this is a follow-up of e2babdfae8c99f39f99a7c8a8f966299b2e62b19

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit ea4ae6d2f17ae8dcfb3d6f215d53b3f82a99270d)

Merge pull request #45988 from zdover23/wip-doc-os-recommendations-backport-quincy-3

quincy: doc/start: add testing support information

Reviewed-by: Josh Durgin <jdurgin@redhat.com>

Merge pull request #45905 from idryomov/wip-rbd-mirror-test-timer-lock-quincy

quincy: test/rbd_mirror: grab timer lock before calling add_event_after()

Reviewed-by: Christopher Hoffman <choffman@redhat.com>

run-make-check.sh: enable RBD persistent caches

This was attempted in commit 69a7ed4eab36 ("run-make-check: enable
WITH_RBD_RWL when WITH_PMEM is true") but never completed. We soon
bumped the requirement on libpmem, so WITH_SYSTEM_PMDK=ON wouldn't
have worked anyway.

Enable the RWL mode conditionally based on WITH_RBD_RWL variable.
Enable the SSD mode unconditionally as it has no special dependencies
and can be built on any architecture.

Fixes: https://tracker.ceph.com/issues/55285
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 0f1634a21f5da2250915d8ac05a6f179d4e76d03)

Conflicts:
run-make-check.sh [ commit 57edb76ea468 ("build: Add some
debugging messages") not in quincy ]

test/encoding/check-generated.sh: show diff if binary reencode check fails

Take bf0b161115aa ("test/encoding/check-generated.sh: show diff if cmp
fails") a bit further. Suggesting "cmp $tmp1 $tmp2" isn't very helpful
since cmp would report just the mismatch offset.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 59d928a06c028bf381307001d2b68fa8545d8fc4)

librbd/cache/pwl: WriteLogCacheEntry constructor must initialize flags

Initializing the individual bit field members leaves the remaining two
bits uninitialized and that garbage state gets persisted.

In general, using bit fields in a structure where the layout actually
matters is not desirable.  Even with a few single bits, such as here,
their order, strictly speaking, is not guaranteed:

    An implementation may allocate any addressable storage unit large
    enough to hold a bit-field. If enough space remains, a bit-field
    that immediately follows another bit-field in a structure shall be
    packed into adjacent bits of the same unit. If insufficient space
    remains, whether a bit-field that does not fit is put into the next
    unit or overlaps adjacent units is implementation-defined. The
    order of allocation of bit-fields within a unit (high-order to
    low-order or low-order to high-order) is implementation-defined.
    The alignment of the addressable storage unit is unspecified.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 91d270b210a908ea2f3578dd7db3263383da95a8)

librbd/cache/pwl: initialize generate_test_instances() objects

... to prevent check-generated.sh failures such as:

**** librbd::cache::pwl::WriteLogPoolRoot test 1 dump_json check failed ****
   ceph-dencoder type librbd::cache::pwl::WriteLogPoolRoot select_test 1 dump_json > /tmp/typ-cAoWrqlHC
   ceph-dencoder type librbd::cache::pwl::WriteLogPoolRoot select_test 1 encode decode dump_json > /tmp/typ-ES5yHpfGL
5c5
<     "flushed_sync_gen": 0,
---
>     "flushed_sync_gen": 255,

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 2c131f57d63454de39210375ce75a282df6fe365)

librbd/cache/pwl: fix -Wunused-lambda-capture warnings

Reported by clang on "make check" and "make check arm64" builds.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 753aa038fdbc26a2ee0978f54d3f7dcfa052e833)

doc/start: add testing support information

This PR adds information about support for testing,
and information about which distros the Ceph project
builds packages for.

This is one in a series of PRs including the following:

https://github.com/ceph/ceph/pull/45385
https://github.com/ceph/ceph/pull/45764

This PR specifically includes the information that Ernesto
Puerta collected here:
https://github.com/ceph/ceph/pull/45385#pullrequestreview-911766656

Signed-off-by: Zac Dover <zac.dover@gmail.com>
(cherry picked from commit 0364f3afcccc85d190237b0a74b4deeefa4738f3)

cmake/modules: always use the python3 specified in command line

if another python3 with higher version is found by
find_package(Python3), the cmake's install script would just
install the python modules/extensions into that python3's
dist-package directory, and the packaging script would fail
to find these artifacts when trying to package them.

so we need to ensure that the install directories for python
modeules/extensions are always "versioned" with WITH_PYTHON3
cmake option.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit e2babdfae8c99f39f99a7c8a8f966299b2e62b19)

qa: make test_perf_stats_stale_metrics check only the clients created for the tests

Uses the client's global id to get the metrics, instead of using the index.
This ensures that test_perf_stats_stale_metrics checks only the clients mounted for
the tests.

Fixes: https://tracker.ceph.com/issues/54971
Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit 1621308214cd18750c8be803fc014bdf73e2218a)

qa: test `ceph fs perf stats` doesn't output stale metrics

That `ceph fs perf stats` doesn't output stale metrics
after the rank0 MDS failover.

Fixes: https://tracker.ceph.com/issues/50033
Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit 116e89a2f2849ed7cb711d1ae465c6f510b2810d)

mgr, pybind/mgr, mgr/stats: be resilient to offline rank0 MDS

Reregister the user queries during the rank0 MDS failover event
by calling listener.handle_query_updated(). This enables
`ceph fs perf stats` to receive the updated metrics after the
failover.

Fixes: https://tracker.ceph.com/issues/50033
Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit c2470f271cce4d512f2cf00552c9b753e4c69f71)

17.2.0

Merge pull request #45932 from adk3798/revert-pids-limit-quincy

quincy: cephadm: Revert pids limit

Revert "cephadm: remove containers pids-limit"

This reverts commit 7c1214f38091dde0ba2c5e0557dcd98f97f91302.

Signed-off-by: Adam King <adking@redhat.com>

Revert "qa/suites/orch/cephadm: restrict test_iscsi_pids_limit to CentOS"

This reverts commit 355a819d3a65ef05ccc078fcb58eca4c84dac573.

Signed-off-by: Adam King <adking@redhat.com>

doc: Document the clone failure status

Fixes: https://tracker.ceph.com/issues/55190
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 555d3610635f4206a21f7b41deabffdf2136ccbc)

mgr/volumes: Fix clone hang issue

Following sequence of operation lead to deadlock

1. Created subvolume
2. Written some I/O on the subvolume
3. Create snapshot of the subvolume
4. Create clone of the snapshot
5. Delete snapshot from back end (don't use subvolume interface) before
clone completes
6. Delete clone with force
7. Delete subvolume
8. Delete fs and associated pools
9. Created new fs
10 Created new subvolume,
11. Written some I/O on the subvolume
12. Create snapshot of the subvolume
13. Create clone of the snapshot <---------------THIS OPERATION HANGS -----------------

Root Cause:
Since the snapshot is deleted from the back end, the clone fails. But it
also fails to remove the clone index at '/volumes/_index/clone'. The
cloner thread goes to infinite loop of starting the clone and failing.
This involves taking 'self.async_job.lock()' and reads the clone index
to get the job and registers the above job.

While the 'cloner thread' is in above loop, the fs is destroyed. The
cloner threads which lives till the mgr/volumes is enabled in mgr, takes
the 'self.async_job.lock()' and hangs while reading the clone index.

Any further clone operations which also requires above lock hangs.

Fix:
Remove the clone index even though snapshot is not present.

Fixes: https://tracker.ceph.com/issues/55217
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit cfd8d63e0158b1d336f5e080a1e83dbf6fc60079)

qa: Add test for clone failure status

Fixes: https://tracker.ceph.com/issues/55190
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 916a5981cf2912e2b5f318295cb1af83e34f9183)

mgr/volumes: Add clone failure reason in clone status

Add the clone failure reason in the clone status.
The sample output is as below:

$ ceph fs clone status cephfs clone_0
{
  "status": {
    "state": "failed",
    "source": {
      "volume": "cephfs",
      "subvolume": "subvolume_0",
      "snapshot": "snapshot_0",
      "size": "52428800"
    },
    "failure": {
      "errno": "2",
      "error_msg": "snapshot 'snapshot_0' does not exist"
    }
  }
}

Fixes: https://tracker.ceph.com/issues/55190
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit e0ba36f3344a3633b0343565b63472bbf94538b2)

cmake: resurrect mutex debugging in all Debug builds

Commit 403f1ec2888a ("cmake: make "WITH_CEPH_DEBUG_MUTEX" depend on
CMAKE_BUILD_TYPE") made WITH_CEPH_DEBUG_MUTEX depend on build type
being set to Debug, in CMakeLists.txt.  However, if CMAKE_BUILD_TYPE
isn't specified by the user, we may still set it to Debug later, in
src/CMakeLists.txt, and in that case WITH_CEPH_DEBUG_MUTEX doesn't
get enabled.  The result is that

  $ do_cmake.sh -DCMAKE_BUILD_TYPE=Debug ...

debug builds have mutex debugging enabled, while

  $ do_cmake.sh ...

builds, which are supposed to be the same, don't.  Jenkins builders
don't pass -DCMAKE_BUILD_TYPE=Debug so that commit effectively turned
off all ceph_mutex_is_locked* asserts in "make check".

Fixes: https://tracker.ceph.com/issues/55318
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 226e614c95f1a1dcd79fa2011012ced1de19e6d3)

Merge pull request #45885 from markhpc/quincy-bs-avl-cursor-fix

quincy: os/bluestore: Always update the cursor position in AVL near-fit search.

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>

test/rbd_mirror: grab timer lock before calling add_event_after()

add_event_after() expects an externally provided mutex to be held
for the call. This was missed in commit 8965a0f2a6f7 ("rbd-mirror:
synchronize with in-flight stop in ImageReplayer::stop()").

Fixes: https://tracker.ceph.com/issues/55317
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 60e16106837e0d23366709f70f39c4f1ae7a2a45)

os/bluestore: Always update the cursor position in AVL near-fit search.

Signed-off-by: Mark Nelson <mnelson@redhat.com>
(cherry picked from commit 3bed53debfa2f9ec9d31021ce7eaf8b78f78f9e0)

test/cls/rgw: test dir_suggest after successful completion

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit a350888bf9d812db36b3bdf6d1e4ee7469964fad)

cls/rgw: rgw_dir_suggest_changes detects race with completion

if bucket listing races with a pending index transaction, its suggested
removal may be mistakenly applied if that index transaction completes
before the osd receives this suggestion

in `rgw_dir_suggest_changes()`, the sole condition for applying a
suggested change is that the `cur_disk.pending_map` is empty. this is
true after rgw_bucket_complete_op()

on index completion, `rgw_bucket_dir_entry::index_ver` is updated to match
the new value of `rgw_bucket_dir_header::ver`. because most of `struct
rgw_bucket_dir_entry` makes the round trip through bucket listing ->
dir_suggest, we have access to the index_ver of the suggested entry. by
comparing this against the stored entry, we can ignore any suggestions
that were sent before the most recent completion

Fixes: https://tracker.ceph.com/issues/54528
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit aa381b6765b0fb316976c4af7a45f32a157a4f75)

Merge pull request #45780 from vshankar/wip-55110-quincy

quincy: mount.ceph: remove `ms_mode' mount option when switching to old-syntax

Reviewed-by: Xiubo Li <xiubli@redhat.com>

librbd/cache/pwl: remove RBD_FEATURE_DIRTY_CACHE check in DiscardRequest

"m_image_ctx.features &&RBD_FEATURE_DIRTY_CACHE" is obviously wrong
because it would pretty much always be true.  However, even if bitwise
AND was used, this check would still be dead because DiscardRequest is
only invoked if RBD_FEATURE_DIRTY_CACHE is enabled:

  int invalidate_cache(ImageCtx *ictx) {
  {
    ...
    // Delete writeback cache if it is not initialized
    if ((!ictx->exclusive_lock ||
         !ictx->exclusive_lock->is_lock_owner()) &&
ictx->test_features(RBD_FEATURE_DIRTY_CACHE)) {
      C_SaferCond ctx3;
      ictx->plugin_registry->discard(&ctx3);
      r = ctx3.wait();
    }

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit aee78bbb9d7edd606a8a235c57b2b704d7b94e4c)

librbd/cache/pwl: don't crash if cache file removal fails

The non-ec overload will throw fs::filesystem_error on any error
(e.g. EPERM due to unprivileged "rbd persistent-cache invalidate"
being brought up against a privileged workload).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 63197ff7003fa9e595527a7431f9f3f6790f7d57)

rbd: add persistent-cache flush command

Add a flush command so that users can manually flush cache.

[ idryomov: error messages, incorporate doc and help.t hunks, drop
do_persistent_cache_flush() ]

Signed-off-by: Yin Congmin <congmin.yin@intel.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 644fbc9fcc8f12eb93d5cc20054cd8598ab001b7)

rbd: rename image-cache invalidate command

Rename command image-cache to persistent-cache. Refactoring the code
of invalidate command.

[ idryomov: error message, incorporate doc and help.t hunks, drop
do_persistent_cache_invalidate() ]

Signed-off-by: Yin Congmin <congmin.yin@intel.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 05bfe10ad9fde533aa728f9aa0cc8a8f155c03c5)

librbd/cache/pwl: rename persistent cache key

librbd "internal" metadata keys was change to ".rbd" prefix. Change
peristent cache to ".rbd" too.
And the name of persistent cache key is IMAGE_CACHE_STATE. Since
this key is planned to be used outside the pwl directory, it seems
more appropriate to change it to a clear name as PERSISTENT_CACHE_STATE.

Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit bd66fdda910f02ffe91bb026f82a85f28a6ff225)

rbd: include persistent cache metrics in "rbd status" report

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit e996fd80601ec8c309c1517f33171e88a2f31cad)

rbd: factor out get_percentage() helper

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 9324ab94711dbe9a1265643adcc79ae0a3cba812)

librbd/cache/pwl: no need to set clean and empty in remove_pool_file()

It is redundant -- the only caller sets both since commit 6593e31fff18
("librbd/cache/pwl: correct cache state").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit d64a3ae265897806809d9fa08ac72c549b4bca4f)

librbd/cache/pwl: avoid inconsistencies in ImageCacheState

When empty and/or clean bools are updated in I/O handling code paths,
ImageCacheState becomes inconistent for a short while: e.g. with clean
transitioned to true, dirty_bytes counter could still be positive
because the counters are updated only in periodic_stats(). Move to
updating the counters in update_image_cache_state(Context*) to avoid
this.

update_image_cache_state(Context*) now requires m_lock -- most call
sites already hold it anyway. The only problematic call site was
AbstractWriteLog::shut_down() callback chain: perf_stop() needed to
be moved to the very end since perf counters must be alive now for
update_image_cache_state() to work.

Don't override expect_op_work_queue() in unit tests: completing
context in the same thread now results in a deadlock on m_lock in
all test cases that call AbstractWriteLog::init().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 016882925a63f4f03a9c445d008b2325d479bc30)

librbd/cache/pwl: handle invalid ImageCacheState json

get_json_format() and create_image_cache_state() attempt to get
particular keys which could result in an unhandled std::runtime_error
exception. Conversely, ImageCacheState constructor just swallows that
exception which could leave the newly constructed object incorrectly
initialized. Avoid doing parsing in the constructor and introduce
init_from_config() and init_from_metadata() methods instead.

While at it, move everything out from under "persistent_cache" key.
Also fix init_state_json_write test case which stopped working now
that types are enforced by json_spirit.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 7678ee2490965a8a73c02a47283adaa5036dbcab)

librbd/cache/pwl: add basic metrics to ImageCacheState

Add basic metrics to ImageCacheState and persist them, including
allocated_bytes, cached_bytes, dirty_bytes, free_bytes and hit/miss
info.

Leverage periodic_stats() timer to call update_image_cache_state.
In order to avoid outputting too much debug information, the original
statistics output log level is changed to 5.

Switch to json_spirit for encoding because encode_json encodes bool as
"true"/"false" string.

Remove rbd_persistent_cache_log_periodic_stats option because we need
to always update cache state.

[ idryomov: add cached_bytes and hits_partial; report misses and
miss_bytes instead of respective totals; naming ]

Fixes: https://tracker.ceph.com/issues/50614
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 769f3a06ecf85249c1473cbb6bab7503beb1ba78)

librbd/cache/pwl: correct cache state

update cache state after dirty_entries or log_enties list updated.

Fixes: https://tracker.ceph.com/issues/50614
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit 6593e31fff180ec4123e37107c88eb39f7d10fdf)

Merge pull request #45857 from ljflores/wip-quincy-55269

quincy: mgr/telemetry: anonymize daemons in telemetry `perf_counters`

Reviewed-by: Yaarit Hatuka <yaarithatuka@gmail.com>

build: make ninja-build package install always

we use ninja build as default build now, having it installed only with
make check enabled may make builds fail, if ran without make check.

Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
(cherry picked from commit fa7821dd00b1577391db9518bbef0c7618b78ade)

Merge pull request #45627 from ljflores/wip-55051-quincy

quincy: admin/doc-requirements: bump sphinx to 4.4.0

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: David Galloway <dgallowa@redhat.com>

mgr/telemetry: fix daemon anonymization in perf_counters

Anonymized daemons now appear with a SHA1 digest instead of their
original identifier, e.g.:

    "perf_counters": {
        "mon.1b1b829ba9298527f4934053a4742a1710937007": {
            "mon": {
                "election_call": {
                    "value": 1
                },
                ...
                "session_trim": {
                    "value": 0
                }
            },
        ...
        }
    ...
    }

Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
(cherry picked from commit 2f4cc770e7ac9767d6d3be51c1de03f6014a6f98)

mgr/telemetry: add anonymize_entity_name function

The ability to anonymize entity names should have its own function
to prevent duplicate code.
Will clean up in a separate commit.

Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
(cherry picked from commit e89d821ee6256b18f94d520baeb07012de80b731)

mgr/telemetry: anonymize daemons in telemetry perf_counters

In the telemetry perf channel we collect 'perf_counters' of individual daemons.
The monitors appear with their full name, which includes the host name.
The host name part must be anonymized.

To err on the safe side, I have anonymized all daemons except for osds,
since they are not attached to host names.

Fixes: https://tracker.ceph.com/issues/55229
Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 0fe47b974ccc591c6108eb7a1b26087e62932bce)

mount.ceph: remove `ms_mode' mount option when switching to old-syntax

... and switch to using v1 addresses (if users haven't specified those
explicitly). kernel versions <5.11 do not understand `ms_mode' mount
option which would result in mount failure.

Fixes: http://tracker.ceph.com/issues/55110
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 6e28d3406df06435bea26b465baf97c259942920)

Merge pull request #45799 from rhcs-dashboard/fix-grafana-quincy

quincy: monitoring: several Grafana fixes

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: sunilangadi2 <NOT@FOUND>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #45804 from adk3798/nfs-export-quincy

quincy: mgr/nfs: nfs export management backport

Reviewed-by: John Mulligan <jmulligan@redhat.com>

mgr/nfs: remove redundant check

Remove the extra check of the cluster id from _apply method. As _apply
is a "private" method that should be only called from other private
methods that have already validated the cluster_id. It also removes
a dependency on the orch-requiring func available_clusters.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit dd5a47f83349c8f2b539ba3881f58f5270024cbb)

mgr/nfs: fix unintentional recursion

The `exports` property of the ExportMgr exists to cache the exports
configuration found in the .nfs namespace. Using that property
within the property method is probably not intentional and is probably
only working due to the lucky construction of the _exports dict
immediately after the None check so that the _exports dict is returned
(and is a mutable type).

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit daa455cd168d62cd8fbcaba4d7aa79b56e68ef0d)

mgr/nfs: add known_cluster_ids to generalize nfs cluster id fetching

The changes to the nfs module in 8c711afc are working but when I began
writing more test automation I found a few more places in the
export-configuration code path relying on the orchestration module
only. This change generalizes the logic to source nfs clusters from
orchestration when it's enabled but from the .nfs pool when
orchestration is disabled. It then uses that call when loading
the exports cache on the ExportMgr object.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 4d09660dea5696e5085a75694968aafe9253f47a)

doc/mgr/nfs: document that nfs exports related mgr call requirements

A recent change in the mgr/nfs module should enable the functioning
of export management commands/API calls as long as the rados namespaces
and objects have been already established. Document this fact, noting
that now only the `ceph nfs cluster ...` calls *require* an
orchestration module.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit b5b3e0bcb5e2a27375f50f7717786a3928cba711)

mgr/nfs: support managing exports without orchestration enabled

This change allows the `ceph nfs export ...` commands to function
without the entire mgr/nfs subsystem requiring orchestration to be
enabled. When there's no orchestration available, the code falls back
to examining the namespaces in the ".nfs" rados pool to determine what
cluster_id values are valid.

This change does not add support for creating the rados objects and
namespace needed to manage a nfs cluster. As discussed with the
orchestration group on 2022-01-22, rook does not need the mgr module to
establish the namespace. So, for now, we'll defer the work needed to
create the namespace/objects when orchestration is disabled.

Fixes: https://tracker.ceph.com/issues/54043
Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 8c711afc4ab898942a2569b619eb8379ee02ffba)

mgr/nfs: fix typo in error message

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 56323a2625133d5a53bf1ee1662346daa1b4f09b)

mgr/nfs: add unit test for normalize_path

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit ffa95fbc796aa5c00eaa32138291c0ef2a48949a)

mgr/nfs: change method format_path to function normalize_path

This function was not using self and thus has no need to be a method.
While we're at it, rename it to normalize_path because that's what
it is doing.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit f91dd1bf7bfab251d671a30d622bb544a4ce37d0)

mgr/nfs: clean up rados object naming code

The naming of rados objects used to store the nfs config was spread
all over the code, including inline f-strings, not-static methods,
etc.
This change unifies the naming by putting constant string prefixes
and name generating functions into the utils.py file.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 88266144423e6876dc392bc6ea59e32393024323)

mgr/nfs: make _check_rados_notify a function

This was previously a staticmethod. This static method was only used by
NFSRados object. Staticmethods are nearly always better implemented as
functions, which is done so here.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit c51a6755b52954910512f804cde9e5255c1db9e7)

mgr/nfs: limit dependency of NFSRados object

Previously, the NFSRados object accepted the "Module" as the
first argument but only used the rados attribute (type rados.Rados).
It's better to limit the scope of types when reasonably possible
so we can see what the true dependencies are. So we restrict
NFSRados to accepting a rados.Rados as the argument.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit d94b63830d94f21ba276452844a46d21e084fb3f)

mgr/dashboard: fix api test issue with pip

Fix
```
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
apache-libcloud 3.5.0 requires requests>=2.26.0, but you have requests 2.25.1 which is incompatible.
Successfully installed CherryPy-13.1.0 PyJWT-2.0.1 Routes-2.4.1 bcrypt-3.1.4 ceph-1.0.0 chardet-4.0.0 cheroot-8.6.0 idna-2.10 jaraco.functools-3.5.0 more-itertools-4.1.0 natsort-8.1.0 portend-3.1.0 pyopenssl-22.0.0 pytz-2022.1 repoze.lru-0.7 requests-2.25.1 tempora-5.0.1
```

Fixes: https://tracker.ceph.com/issues/55060
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit 2289ad2bc327b0d86916a1c96f4af2967a80c1b9)

mgr/cephadm: update monitoring stack versions

Fixes: https://tracker.ceph.com/issues/54311
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit 6a328ec30cd2c652c27e3bf070d5de7c2d4367b3)

Conflicts:
src/cephadm/cephadm
src/pybind/mgr/cephadm/module.py:
- Accept quincy changes and bring only updates in the Grafana,
Prometheus, Alertmanager and Node Exporter versions

mgr/dashboard: upgrade grafana pie-chart and vonage-status-panel versions

Fixes:https://tracker.ceph.com/issues/55195
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 2877920f58728eab20abe32fed24618449d76c09)

monitoring/grafana: fix version

Fixes: https://tracker.ceph.com/issues/55172
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit 8721bd6c5ddd3c09d04a07e5a2564a5772324c82)

grafana/Makefile: don't push to docker

Fixes: https://tracker.ceph.com/issues/55155
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit 7e6309fac3c4728b3527ab6c709becfb4dcdb126)

prometheus: spell check the alert descriptions

Signed-off-by: Travis Nielsen <tnielsen@redhat.com>
(cherry picked from commit 9cca95b16abd4af3eb3a5630acb3fb7e0cc73a4e)

mgr/dashboard: Pool overall performance shows multiple entries of same pool in pool overview

This PR intends to fix this issue

Fixes:https://tracker.ceph.com/issues/54513
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 9719cc795e1d6a38ab8a7e8f3eeb56c13f11c25d)

mgr/dashboard: fix promtool test for mtu alert

Fixes: https://tracker.ceph.com/issues/55004
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 49d6068463ae9238b6fffcca690dbb5d74b2448a)

mgr/dashboard: Compare values of MTU alert by device

Fixes: https://tracker.ceph.com/issues/55004
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
(cherry picked from commit 3821548a37373f87109ab0dac7f3ee2d8f3ead99)

mgr/dashboard: fix transition-through-oci image workaround in grafana build

Fixes: https://tracker.ceph.com/issues/54311
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 64b0e5ce8a204908e769e7da01a5ee7d075c0481)