git.apps.os.sepia.ceph.com Git

Merge PR #61517 into wip-jcollin-testing-20251016.142247-reef

* refs/pull/61517/head:
mds: fix rank root doesn't insert root ino into its subtree map when starting
mds: flush mds log before finishing STATE_STARTING
mds/FSMap: go back to STARTING state when rank doesn't make it pass STARTING

Merge PR #65768 into wip-jcollin-testing-20251016.142247-reef

* refs/pull/65768/head:
client: resolve bogus self-assignment
mds: add issue_seq to all cap messages
include/ceph_fs: correct ceph_mds_cap_peer field name
include/ceph_fs: correct ceph_mds_cap_item field name
messages/MClientCaps: use correct ceph_seq_t for cap sequence types
messages/MClientCaps: dump issue_seq for debugging
mds: remove dead code

Merge PR #65818 into wip-jcollin-testing-20251016.142247-reef

* refs/pull/65818/head:
src/common: add helper to prepend "..." to trimmed paths
mds/ScrubStack: avoid generating inode path since it is unused
mds: fix few log entries
client: trim path before logging it
mds: log trimmed path wherever generating full path is necessary
mds: for logging generate only 10 final components of dentry path
mds: for logging generate only 10 final components of inode path
qa, test: run unit tests for cephfs.pyx with non-root user
test/pybind: add unit tests for rmtree() in cephfs python bindings
pybind/cephfs, mgr/volumes: refactor purge() to be non-recursive

Merge pull request #65740 from stackhpc/doc-balancer-reef

reef: doc: Fixes a typo in balancer operations

src/common: add helper to prepend "..." to trimmed paths

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit c38a9138ba8294ab1243cf03ad0c8b0df4901967)

mds/ScrubStack: avoid generating inode path since it is unused

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 10e4ccb104d84444f0047e166a9dff997c4e2736)

mds: fix few log entries

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit e4c301b9f0204b6a82490a68ec4c3a26db7b013f)

Conflicts:
src/mds/MDSAuthCaps.cc
- slight differnce in this file in reef branch caused conflict.

client: trim path before logging it

Path can be virtually infinitely long and logging a long long path
(imagine around 2000 path components) is un-useful as well as lowers
readability of the log. Therefore, trim before logging.

Fixes: https://tracker.ceph.com/issues/72993
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit bdc8aae400fbbdd61df811455d49176deab1f331)

Conflicts:
src/client/Client.cc
src/client/Client.h

src/include/filepath.cc
- patch couldn't be applied because filepatch.cc is absent in this
branch. applying it to filepath.h instead.

mds: log trimmed path wherever generating full path is necessary

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 11de1e5772fa88125de10dc7972e0e31e33140d0)

Conflicts:
src/mds/Server.cc
src/mds/SessionMap.cc
- Inclusion of lesser header files in both these files confused git
while applying patch automatically.

src/mds/MDSAuthCaps.cc
src/test/mds/TestMDSAuthCaps.cc
- is_capable() takes one less argument in reef compared to main.

mds: for logging generate only 10 final components of dentry path

Generating full absolute path for dentries for printing in MDS logs
slows the down the FS to a great extent especially when the path is very
long (imagine a path with 2000 components). Printing such long paths in
MDS logs is not only pointless but also greatly reduces the readability
of MDS logs.

Therefore, generate only 10 final components of the dentry paths for logging.

Fixes: https://tracker.ceph.com/issues/72779
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 1430cd67d8f7bd7d98b241a7511fa3ceb7e5ba2e)

Conflicts:
src/include/filepath.cc
- this file is absent in reef, set_trimmed() has been moved to
filepath.h instead.

mds: for logging generate only 10 final components of inode path

Generating full absolute path for inodes for printing in MDS logs slows
down the FS to a great extent especially when the path is very long
(imagine a path with 2000 components). Also printing such long paths in
MDS logs is not only pointless but also greatly reduces the readability
of the MDS logs.

Therefore, generate only 10 final components of inode paths for logging.

Fixes: https://tracker.ceph.com/issues/72779
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 1518690210f3a4473978c7a9274e902fccaad862)

Conflicts:
src/mds/CDir.cc
- Code region where modification were made in main is completely absent
in Reef.

qa, test: run unit tests for cephfs.pyx with non-root user

Run test_python.sh with non-root user. This makes it necessary to change
the owner user and group of file system root to be same as this non-root
user. This brings testing closer to the real-world scenario and also
allows exercising negative tests where an FS op would fail for a non-root
user but it would pass for root user.

There are few tests that exercise FS operations where root user is
needed. Group these tests under a separate class and add extra code for
this class that allows these tests to run with root UID and GID.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 6021dda7ed137445885979cd4d4b28c770abce13)

Conflicts:
src/test/pybind/test_cephfs.py
- PR #64999 is merged in main but not in Reef (bug is absent in Reef),
causing this file to have less tests that need to be moved to class
TestWithRootUser.

doc: Fixes a typo in balancer operations

Signed-off-by: Tyler Brekke <tbrekke@digitalocean.com>
(cherry picked from commit b038b8093d01a5e676ffa419607489a79261ef29)

Merge pull request #65944 from phlogistonjohn/jjm-bwc-variants-r

reef: build-with-container: build image variants

script/build-with-container: add build image variants

Allow the user to control the content of the build image with a
high-level `--image-variant=` switch. Currently the supported values are
`default` (the same maximal image we have been generating) and
`packages` a slimmer image that avoids installing certain test-only
dependencies.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

Dockerfile.build: make FOR_MAKE_CHECK a build argument

Set it only during install time.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

install-deps.sh: let FOR_MAKE_CHECK variable take precedence

Previously, the FOR_MAKE_CHECK variable could only enable installing
extra (test) dependencies when install-deps.sh was used and it was
ignored if `tty -s` exited true. This change allows FOR_MAKE_CHECK to
take precedence over the tty check and to specify one of true, 1, yes to
enable extra "for make check" deps or false, 0, no to explicitly disable
the extra deps.

Based-on-work-by: Dan Mick <dan.mick@redhat.com>
Signed-off-by: John Mulligan <jmulligan@redhat.com>

Merge pull request #65463 from pdvian/wip-72852-reef

reef: mgr/DaemonState: Minimise time we hold the DaemonStateIndex lock

Merge pull request #65216 from ifed01/wip-ifed-discard-threads-better-lifecycle-reef

reef: blk/kernel: improve DiscardThread life cycle.

Reviewed-by: Yite Gu guyite@bytedance.com

Merge pull request #65334 from abitdrag/wip-72816-reef

reef: auth: msgr2 can return incorrect allowed_modes through AuthBadMethodFrame

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

Merge pull request #65837 from phlogistonjohn/jjm-rmc-backport-reef

reef: run-make-check.sh backports

Merge pull request #65845 from phlogistonjohn/jjm-bwc-backports-r

reef: sync build-with-container patches from main

script/build-with-container: improve error handling for invalid distros

Instead of throwing a long obnoxious traceback at the user if the value
supplied to -d/--distro is invalid do something nicer. For example:
```
$ ./src/script/build-with-container.py -d trixy -e build
usage: build-with-container.py [-h] [--help-build-steps]
build-with-container.py: error: argument --distro/-d: unknown distro: 'trixy' not in centos10, centos10stream, centos8, centos9, centos9stream, rocky9, rockylinux9, rocky10, rockylinux10, fedora41, fc41, fedora42, fc42, fedora43, fc43, ubuntu20.04, ubuntu-focal, focal, ubuntu22.04, ubuntu-jammy, jammy, ubuntu24.04, ubuntu-noble, noble, debian12, debian-bookworm, bookworm, debian13, debian-trixie, trixie

```

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 72f3ad9549e84bdba7bdfd97d2ede3c55e02f103)

script/build-with-container: add debian 13 (trixie)

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit a13fa091dd6bad35c44076cb7c46cb7bcc17a7ac)

script/build-with-container: add ubuntu 20.04 (focal)

Add ubuntu 20.04 (focal) to the available list of distro kinds.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 7c40f7bd07ac935d0657b9284118da8590a5cf0d)

script/build-with-container: add a pair of fedora distro versions

Add fedora 42 and the soon-to-be-released fedora 43.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 76fe5ad298ee5626eeb63591a702e8f8cc9be7d0)

script/build-with-container: lightly organize the distro kind aliases

Do a tiny reorg of the distro kind aliases and container images to keep
the EL distros together and comment out each "section".

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 4430a5ad6be6f26309d5f5bea0e448a4bbd432e1)

script/build-with-container: be consistent with naming in distro kinds

Update the DistroKind enum and related items so that the naming is
applied consistently. That is: the canonical (no pun indented) form
of the name is "<name><version>" and codenames, such as "jammy" or
"bookworm" are aliases. This matches the previously existing code.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit ac11a80a63ab1909fbdf682d830acde96856f502)

src/script: add bookworm to build-with-container.py

..and its friend buildcontainer-setup.sh

Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 34b497c2f3652e7d30c7b7476b711fd9f1f4ecac)

build-with-container: ensure npm dir is set up before configure

When the npm cache path option is passed the npm cache dir is passed
to all container `run` commands, ensure the dir has been created
before the first container command (configure) is used.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 79166af192ea0b4b982b56ce521516d5a29e7a0d)

run-make-check.sh: handle sudo and command that may not run in container

Work around a known failure that sudo is not expected to be present in
container images. Prepare to handle a failure to set a sysctl param.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 9f44155dff195015186315968a0a1e8ce925ed5d)

install-deps: extract SUDO variable logic into a reusable function

While the function is pretty simple and could be copy-pasted I
prefer to extract things into functions to indicate that the
logic is used/repeated elsewhere to ward off making changes to
one copy vs the other.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit bbd7933598e11d84758a6f09fd176f47c744aaa2)

run-make-check: Enable ctest resource allocation

Co-authored-by: Kefu Chai <tchaikov@gmail.com>
Signed-off-by: luo rixin <luorixin@huawei.com>
(cherry picked from commit 5aa832c5c60e0469127647570bb102ff64a3fe32)

Merge pull request #65295 from joscollin/wip-71436-reef

reef: mgr/snap_schedule: fix typo in error message during retention add

Reviewed-by: Venky Shankar <vshankar@redhat.com>

Merge pull request #65348 from joscollin/wip-72804-reef

reef: mds: Fix readdir when osd is full.

Reviewed-by: Venky Shankar <vshankar@redhat.com>

Merge pull request #65364 from ifed01/wip-ifed-fix-snapdiff-fragment-reef

reef: mds: fix snapdiff result fragmentation

Reviewed-by: Venky Shankar <vshankar@redhat.com>

test/pybind: add unit tests for rmtree() in cephfs python bindings

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 05082a932984bb6329481c14ee76ae033c019f4e)

pybind/cephfs, mgr/volumes: refactor purge() to be non-recursive

Method purge() in trash.py calls rmtree() which is recursive method. To
avoid Python's recurision limit, switch to non-recursive approach.

Path to directory along directory handle are clubbed in to a tuple and
that tuple is stored on the stack. Storing directory handle reduces call
to opendir() dramatically.

Fixes: https://tracker.ceph.com/issues/71648
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit f9046ca052d10a884a59c1d928cb0c8f0235696b)

Merge pull request #63129 from kshtsk/wip-71210-reef

reef: qa/tasks/cephfs/mount: use 'ip route' instead 'route'

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>
Reviewed-by: Venky Shankar <vshankar@redhat.com>

client: resolve bogus self-assignment

Credit to Ilya Dryomov for spotting this.

[1] https://github.com/ceph/ceph/pull/60283#discussion_r1846415092

Fixes: 1da6ef237fc70ddd64152d029cd6e0cf8f0c808e
Fixes: https://tracker.ceph.com/issues/68973
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit a07c5ef9259e2cd205a7edef4a1aeefb7afa90f3)

mds: add issue_seq to all cap messages

Right now only the clients tell the MDS what they believe the issue_seq to be.
The clients are expected to figure out issue_seq updates at

Fixes: https://tracker.ceph.com/issues/68515
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit cb4ff28af09f0afd5546d87a59cdfc64e64a2b15)

include/ceph_fs: correct ceph_mds_cap_peer field name

The peer seq is used as the issue_seq. Use that name for consistency.

Fixes: 4fdeb00df20ccd36c1e53c6ea234c63c18a9ff5a
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 1da6ef237fc70ddd64152d029cd6e0cf8f0c808e)

include/ceph_fs: correct ceph_mds_cap_item field name

Originally, the last_sent sequence from the MDS was sent by the client during
bulk cap release but it was shortly after changed to the last_issue which is
the sequence number that the cap was originally "issued" by the MDS rank (which
may be updated after import of caps).

Fixes: 6208f57f487ac170df24a9018f1cc87a5ac8b4b3
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 655cddb7c9f32c9dd9cddf40ac17f385d539c8f9)

messages/MClientCaps: use correct ceph_seq_t for cap sequence types

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 6d8a73439eddb43e45a82b3d3a24d778244524c3)

messages/MClientCaps: dump issue_seq for debugging

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 7766d3c72491030024c2aaecfc98f1f29a1c7e77)

mds: remove dead code

A const getter already exists.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit a72b31e2fb3651152021d0e8e3445f97efe05268)

Merge pull request #61512 from mchangir/wip-68768-reef

reef: mds: add an asok command to dump export states

Reviewed-by: Jos Collin <jcollin@redhat.com>

Merge pull request #65710 from aaSharma14/wip-73294-reef

reef: monitoring: fix MTU Mismatch alert rule and expr

Reviewed-by: Afreen Misbah <afreen@ibm.com>

monitoring: fix MTU Mismatch alert rule and expr

Fixes: https://tracker.ceph.com/issues/73290
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit bee24dec441b9e6b263e4498c2ab333b0a60a52d)

Conflicts:
monitoring/ceph-mixin/prometheus_alerts.yml
monitoring/ceph-mixin/tests_alerts/test_alerts.yml
src/pybind/mgr/dashboard/frontend/src/app/ceph/cluster/prometheus/active-alert-list/active-alert-list.component.html
src/pybind/mgr/dashboard/frontend/src/app/ceph/cluster/prometheus/active-alert-list/active-alert-list.component.ts
src/pybind/mgr/dashboard/frontend/src/app/shared/datatable/table-key-value/table-key-value.component.scss

Merge pull request #65621 from aaSharma14/wip-73165-reef

reef: mgr/dashboard: fix zone update API forcing STANDARD storage class

Reviewed-by: Afreen Misbah <afreen@ibm.com>

Merge pull request #62436 from rishabh-d-dave/mgr-vol-no-del-reef

reef: mgr/volumes: allow disabling async job threads

Reviewed-by: Jos Collin <jcollin@redhat.com>

PendingReleaseNote: add note for pause_purging and pause_cloninig

Added release notes for mgr/vol config option "pause_purging and
"pause_cloning".

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 03b90d589ca5759701164ea54b0dbf9b92c4efef)

doc/cephfs: add note for config option pause_purging and pause_cloning

Update documentation for add information about mgr/vol config options
"pause_purging" and "pause_cloning".

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 1f0bfe1c599769ac67d3b1b41f37bb3482e27839)

qa/cephfs: add tests for mgr/vol config pause_cloning

mgr/vol config option pause_cloning allows pausing of cloner threads.
Add tests for this.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit e7eb36e4393c8401c7cf1aa1f714b52c1ced9ca0)

qa/cephfs: extend wait for trash empty

Trash directory for a volume is not created by default. If
_wait_for_trash_empty() in test_volumes.py encounters absence of trash
directory, return true.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 6d6be8b41c990acf2d9c08f35eb382996d59d5a7)

qa/cephfs: add tests for config option pause_purging

Setting MGR config option mgr/volumes/pause_purging to true halts
all ongoing purges and allows no new purging to begin until this option
is changed to false. Add tests for this.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit f38fcbc6109494e23e4948d794f59c927a9303ff)

Conflicts:
qa/tasks/cephfs/test_volumes.py
- First conflict occurred due to missing import of safe_while which
  in Reef branch compared to main branch. Along with resolving this
  conflict this has been imported as it used by the tests.
- Second conflict occured due to absence of some test methods right
  before where TestPausePurging was to be added.
- Third conflict occured because entire contextutil was imported instead
  of just safe_while and only CommandFailedError was imported from
  teuthology.exceptions while this commit imports MaxWhileTries too.

qa/cephfs: don't strip any whitespace for get_shell_stdout

Whitespace is not removed from the end of the stdout returned by the
method get_ceph_cmd_stdout(). Follow the same policy here since it is
better to not do so (this whitespace can be useful, when copying Ceph
auth keyrings from stdout to a file) and also for sake of uniformity of
interfaces.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 9582b9b13a42fae4c7f38c22f9202eb893b6f1bc)

Conflicts:
qa/tasks/cephfs/mount.py
- Conflict occured for 2 reasons -
  - One, method get_shell_stdout() is absent on Reef branch but not in
    main so this patch which makes modification to it will obviously run
    in to conflict
   - Two, run_shell_payload() lies right next to get_shell_stdout() in
     main branch and its definition is quite different, leading to
     conflict again.

mgr/vol: add pause/resume mechanism for async jobs

Add mechansim that allows pausing/resuming of the entire async job
machinery that queues, launches and picks next async job; both async
jobs, clones as well as purges.

And then add mgr/vol config option pause_purging and pause_cloning so
that both of these async jobs can be paused and resumed individually.

Fixes: https://tracker.ceph.com/issues/61903
Fixes: https://tracker.ceph.com/issues/68630
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 01d37d5e1ba0e250e9d3a5f28ec7f3fa3597c63f)

Conflicts:
src/pybind/mgr/volumes/module.py
- Code where patch was to be applied was slighty different

qa: add test for 'dump_export_states'

Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
(cherry picked from commit 5506ed63c2f14cc5f4c3e72998d8a47fdd97f200)

Conflicts:
qa/tasks/cephfs/test_exports.py
- conflicts due to new test class addition at the bottom of file

mds: add an asok command to dump export states

Task to export subtree may be blocked, use this command
to find out what's going on.

Fixes: https://tracker.ceph.com/issues/58835
Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
(cherry picked from commit d34f33055d25ba78f63369f661eb75515b5f465d)

Conflicts:
src/mds/MDSCacheObject.h
src/mds/Migrator.cc
- conflicts due to quiesce additions in main branch

Merge pull request #60630 from kamoltat/wip-68841-reef

reef: mon [stretch mode]: support disable_stretch_mode & qa/workunits/mon: ensure election strategy is "connectivity" for stretch mode

Merge pull request #65637 from adk3798/reef-cephadm-pin-cheroot

reef: pybind/mgr: pin cheroot version in requirements-required.txt

Reviewed-by: John Mulligan <jmulligan@redhat.com>

mgr/dashboard: bump cheroot to > 10.0

Fixes: https://tracker.ceph.com/issues/55837
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 1ec74a8360d1c4abb39754320eba118d080e3499)

Merge pull request #65638 from zdover23/wip-doc-2025-09-23-reef-remove-cloud-restore-rst

reef: doc/radosgw: remove cloud-restore from reef

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #61279 from vshankar/wip-68765-reef

reef: qa: increase the http.maxRequestBuffer to 100MB and enable the git debug logs

Reviewed-by: Jos Collin <jcollin@redhat.com>

Merge pull request #65630 from phlogistonjohn/jjm-r-65514

reef: build-with-container: add argument groups to organize options

pybind/mgr: pin cheroot version in requirements-required.txt

With python 3.10 (didn't seem to happen with python 3.12) the
pybind/mgr/cephadm/tests/test_node_proxy.py test times out.
This appears to be related to a new release of the cheroot
package and a github issues describing the same problem
we're seeing has been opened by another user
https://github.com/cherrypy/cheroot/issues/769

It is worth noting that the workaround described in that
issue does also work for us. If you add

```
import cheroot
cheroot.server.HTTPServer._serve_unservicable = lambda: None
```

after the existing imports in test_node_proxy.py the
test hanging issue also disappears. Also worth noting the
particular pin of

cheroot~=10.0

was chosen as it matches the existing pin being used
in pybind/mgr/dashboard/constraints.txt

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 6231955b5d00ae6b3630ee94e85b2449092ef0fe)

doc/radosgw: remove cloud-restore from reef

Remove doc/radosgw/cloud-restore.rst from the reef branch.

cloud-restore does not appear in index.rst, so its removal from
index.rst is unnecessary.

Signed-off-by: Zac Dover <zac.dover@proton.me>

build-with-container: add argument groups to organize options

Use the argparse add_argument_group feature to organize the mass of
arguments into more sensible categories. Hopefully, someone reading
over the `--help` output can now more easily see options that
are useful rather than being overwhelmed by a wall of text.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 71a1be4dd0aea004da56c2f518ee70a281a3f7d3)

mgr/dashboard: fix zone update API forcing STANDARD storage class

The zone update REST API (`edit_zone`) always attempted to configure a
placement target for the `STANDARD` storage class, even when the request
was intended for a different storage class name.
This caused failures in deployments where `STANDARD` is not defined.

Changes:
Club add placement target and add storage class methods into one single
add_placement_targets_storage_class_zone method which takes the storage
class as a param as well alongside the rest of the placement params.

Fixes: https://tracker.ceph.com/issues/73105
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 135f3adb4973be493925839e946e7a5fc75e7d5c)

Merge pull request #65297 from joscollin/wip-71832-reef

reef: mgr/volumes: Keep mon caps if auth key has remaining mds/osd caps

Reviewed-by: Kotresh HR <khiremat@redhat.com>

Merge pull request #61297 from batrick/wip-68451-reef

reef: qa: ignore pg availability/degraded warnings

Reviewed-by: Jos Collin <jcollin@redhat.com>

Merge pull request #62092 from batrick/wip-70155-reef

reef: qa: ignore variant of down fs

Reviewed-by: Jos Collin <jcollin@redhat.com>

Merge pull request #63017 from batrick/wip-71092-reef

reef: qa/workunits/fs/misc: remove data pool cleanup

Reviewed-by: Jos Collin <jcollin@redhat.com>

Merge pull request #65595 from aaSharma14/wip-73134-reef

reef: Handle failures in metric parsing

Reviewed-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>

Handle failures in metric parsing

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2345460
Signed-off-by: Anmol Babu <anmolbabu@Anmols-MacBook-Pro.local>
(cherry picked from commit f29e3f307c46401328e920204cbe893fbd837c65)

Conflicts:
src/exporter/DaemonMetricCollector.cc

Merge pull request #61978 from batrick/wip-70066-reef

reef: mds: dump next_snap when checking dentry corruption

Merge pull request #62278 from dparmar18/wip-70034-reef

reef: mgr/nfs: validate path when modifying cephfs export

Merge pull request #62409 from neesingh-rh/wip-70418-reef

reef: cephfs-shell: add option to remove xattr

Merge pull request #65251 from joscollin/wip-70031-reef

reef: qa: enable debug mds/client for fs/nfs suite

Merge pull request #65253 from joscollin/wip-71379-reef

reef: cephfs: session tracker accounts for killing sessions

qa/workunits/mon: ensure election strategy is "connectivity" for stretch mode

The election strategy is randomly chosen for this type of test. Sometimes,
the test passes if the "connectivity" election strategy happens to be picked.
But if a different strategy, i.e. "classic", is picked, then the test will fail.

We can ensure that the election strategy is "connectivity" by setting it in the
workunit with the ceph CLI command. Although connectivity was specified in
stretch-mode-5-mons-8-osds.yaml, that config ultimately gets overridden by
the "qa/mon_config" yaml.

Fixes: https://tracker.ceph.com/issues/69107
Signed-off-by: Laura Flores <lflores@ibm.com>
(cherry picked from commit de2d9186bddbd452d2e7939723418c200e3fec46)

src/mon/MonMap: modify dump function

Problem:
Current dump for "removed_ranks" and "disallowed_leaders"
doesn't have the correct format so the python test
script can parse through these values.

Solution:
Modified the values such that it is in the correct format

Conflict: src/osd/osd_types.cc: Added f->dump_bool("is_stretch_pool", is_stretch_pool());

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit a7f3b7b749acabd235d615a3f5b80e3398a6d80d)

qa: Added tests for disabling stretch mode

Test disabling stretch mode with the following scenario:

1. Healthy Stretch Mode
2. Degraded Stretch Mode

Fixes: https://tracker.ceph.com/issues/67467
Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 4d2f8879bed2abd10c00e5a1c5008bd56c11bf61)

doc/rados/operations/stretch-mode.rst: Added Exitting Stretch Mode

Added documentation about exiting stretch mode.

Fixes: https://tracker.ceph.com/issues/67467
Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 0680f17d7bab386429a013c254dd90c70fbabeb7)

mon [stretch mode]: support disable_stretch_mode

Problem:

Currently, Ceph lacks the ability
to exit stretch mode and move back
to normal cluster (non-stretched).

Solution:

Provide a command to allow
the user to exit stretch mode gracefully:

`ceph mon disable_stretch_mode <crush_rule> --yes-i-really-mean-it`

User can either specify a crush rule that
they want all pools to move to or not specify
a rule and Ceph will use a default replicated crush rule.

Fixes: https://tracker.ceph.com/issues/67467
Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 78ce68de41b1d5278e14cf56dff7f15394969255)

Conflicts:
src/mon/MonmapMonitor.cc - replace `goto reply` with
`goto reply_no_propose`
src/mon/OSDMonitorcc - replace `rule_valid_for_pool_type`
with `get_rule_type` since
`rule_valid_for_pool_type` is not
backported.

Merge pull request #65473 from rhcs-dashboard/wip-72963-reef

reef: monitoring: add user-agent headers to the urllib

monitoring: add user-agent headers to the urllib

The documentation started raising 403 suddenly. Adding User-Agent
headers to the request

Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit b8fe487010483681bbc8ddb8dfe18b40ebfd346b)

mgr/DaemonState: Minimise time we hold the DaemonStateIndex lock

Calling back into python functions whilst holding the lock can result in
this thread being queued for the GIL and resulting in extended delays
for threads waiting to acquire the lock.

Fixes: https://tracker.ceph.com/issues/72337
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
(cherry picked from commit b4304d521f61b61515cade872824210e7d67f6db)

test/libcephfs: use more entries to reproduce snapdiff fragmentation
issue.

Snapdiff listing fragments have different boundaries in Reef and Squid+
releases hence original reproducer (made for Reef) doesn't work properly
in S+ releases. This patch fixes that at cost of longer execution.
This might be redundant/senseless when backporting to Reef.

Related-to: https://tracker.ceph.com/issues/72518
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 23397d32607fc307359d63cd651df3c83ada3a7f)

mds: rollback the snapdiff fragment entries with the same name if needed.

This is required when more entries with the same name don't fit into the
fragment. With the existing means for fragment offset specification such a splitting to be
prohibited.

Fixes: https://tracker.ceph.com/issues/72518
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 24955e66f4826f8623d2bec1dbfc580f0e4c39ae)

test/libcephfs: Polisihing SnapdiffDeletionRecreation case

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit daf3350621cfafa383cd9deea81b60b775a53093)

Test failure: LibCephFS.SnapdiffDeletionRecreation
Reproduces: https://tracker.ceph.com/issues/72518
Signed-off-by: Md Mahamudur Rahaman Sajib <mahamudur.sajib@croit.io>
(cherry picked from commit 4ff71386ac1529dc1f7c2640511f509bd6842862)
(cherry picked from commit 48f5a5d04fb2cef52c5e4a3daf452ccf988666d2)

Merge pull request #65002 from aaSharma14/wip-68481-reef

reef: mgr/dashboard: show non default realm sync status in rgw overview page

Reviewed-by: Afreen Misbah <afreen@ibm.com>

mgr/volumes: Fix json.loads for test on mon caps

Signed-off-by: Enrico Bocchi <enrico.bocchi@cern.ch>
(cherry picked from commit b008ef9eb690618608f902c67f8df1fb8a587e33)

mgr/volumes: Add test for mon caps if auth key has remaining mds/osd caps

Signed-off-by: Enrico Bocchi <enrico.bocchi@cern.ch>
(cherry picked from commit 403d5411364e2fddd70d98a6f120b26e416c1d99)

mgr/volumes: Keep mon caps if auth key has remaining mds/osd caps

Signed-off-by: Enrico Bocchi <enrico.bocchi@cern.ch>
(cherry picked from commit 0882bbe8a4470f82993d87b7c02b19aa7fe7fbcc)

qa: Add test for subvolume_ls on osd full

Fixes: https://tracker.ceph.com/issues/72260
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 8547e57ebc4022ca6750149f49b68599a8af712e)

mds: Fix readdir when osd is full.

Problem:
The readdir wouldn't list all the entries in the directory
when the osd is full with rstats enabled.

Cause:
The issue happens only in multi-mds cephfs cluster. If rstats
is enabled, the readdir would request 'Fa' cap on every dentry,
basically to fetch the size of the directories. Note that 'Fa' is
CEPH_CAP_GWREXTEND which maps to CEPH_CAP_FILE_WREXTEND and is
used by CEPH_STAT_RSTAT.

The request for the cap is a getattr call and it need not go to
the auth mds. If rstats is enabled, the getattr would go with
the mask CEPH_STAT_RSTAT which mandates the requirement for
auth-mds in 'handle_client_getattr', so that the request gets
forwarded to auth mds if it's not the auth. But if the osd is full,
the indode is fetched in the 'dispatch_client_request' before
calling the handler function of respective op, to check the
FULL cap access for certain metadata write operations. If the inode
doesn't exist, ESTALE is returned. This is wrong for the operations
like getattr, where the inode might not be in memory on the non-auth
mds and returning ESTALE is confusing and client wouldn't retry. This
is introduced by the commit 6db81d8479b539d which fixes subvolume
deletion when osd is full.

Fix:
Fetch the inode required for the FULL cap access check for the
relevant operations in osd full scenario. This makes sense because
all the operations would mostly be preceded with lookup and load
the inode in memory or they would handle ESTALE gracefully.

Fixes: https://tracker.ceph.com/issues/72260
Introduced-by: 6db81d8479b539d3ca6b98dc244c525e71a36437
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 1ca8f334f944ff78ba12894f385ffb8c1932901c)

qa: test failure for duplicate retention spec

Signed-off-by: Milind Changire <mchangir@redhat.com>
(cherry picked from commit 074f05ae294a50f8b6a22fb58d03b46bfb956966)