git.apps.os.sepia.ceph.com Git

Merge PR #60397 into wip-jcollin-testing-20251009.130945-reef

* refs/pull/60397/head:
PendingReleaseNotes: add a release note about confirm flag for max_mds
doc/cephfs: update about changing max_mds FS setting variable
qa/cephfs: add tests for confirmationn required to change max_mds
mon,cephfs: require confirmation when changing max_mds on unhealthy cluster

Merge PR #65430 into wip-jcollin-testing-20251009.130945-reef

* refs/pull/65430/head:
qa: Disable a test for kernel mount
src/test/mds: Fix TestMDSAuthCaps
client: Fix the multifs auth caps check
mds: Fix multifs auth caps check
qa: Fix validation of client_version
qa: Test cross fs access by single client in multifs
qa: Run test_admin with the reef client

Merge PR #65765 into wip-jcollin-testing-20251009.130945-reef

* refs/pull/65765/head:
qa: Fix test_with_health_warn_with_2_active_MDSs
qa/cephfs: allow detecting MDS ID from FS object for method the..
qa/cephfs: allow passing MDS ID to method that generate...

Merge pull request #65845 from phlogistonjohn/jjm-bwc-backports-r

reef: sync build-with-container patches from main

script/build-with-container: improve error handling for invalid distros

Instead of throwing a long obnoxious traceback at the user if the value
supplied to -d/--distro is invalid do something nicer. For example:
```
$ ./src/script/build-with-container.py -d trixy -e build
usage: build-with-container.py [-h] [--help-build-steps]
build-with-container.py: error: argument --distro/-d: unknown distro: 'trixy' not in centos10, centos10stream, centos8, centos9, centos9stream, rocky9, rockylinux9, rocky10, rockylinux10, fedora41, fc41, fedora42, fc42, fedora43, fc43, ubuntu20.04, ubuntu-focal, focal, ubuntu22.04, ubuntu-jammy, jammy, ubuntu24.04, ubuntu-noble, noble, debian12, debian-bookworm, bookworm, debian13, debian-trixie, trixie

```

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 72f3ad9549e84bdba7bdfd97d2ede3c55e02f103)

script/build-with-container: add debian 13 (trixie)

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit a13fa091dd6bad35c44076cb7c46cb7bcc17a7ac)

script/build-with-container: add ubuntu 20.04 (focal)

Add ubuntu 20.04 (focal) to the available list of distro kinds.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 7c40f7bd07ac935d0657b9284118da8590a5cf0d)

script/build-with-container: add a pair of fedora distro versions

Add fedora 42 and the soon-to-be-released fedora 43.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 76fe5ad298ee5626eeb63591a702e8f8cc9be7d0)

script/build-with-container: lightly organize the distro kind aliases

Do a tiny reorg of the distro kind aliases and container images to keep
the EL distros together and comment out each "section".

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 4430a5ad6be6f26309d5f5bea0e448a4bbd432e1)

script/build-with-container: be consistent with naming in distro kinds

Update the DistroKind enum and related items so that the naming is
applied consistently. That is: the canonical (no pun indented) form
of the name is "<name><version>" and codenames, such as "jammy" or
"bookworm" are aliases. This matches the previously existing code.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit ac11a80a63ab1909fbdf682d830acde96856f502)

src/script: add bookworm to build-with-container.py

..and its friend buildcontainer-setup.sh

Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 34b497c2f3652e7d30c7b7476b711fd9f1f4ecac)

build-with-container: ensure npm dir is set up before configure

When the npm cache path option is passed the npm cache dir is passed
to all container `run` commands, ensure the dir has been created
before the first container command (configure) is used.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 79166af192ea0b4b982b56ce521516d5a29e7a0d)

Merge pull request #65295 from joscollin/wip-71436-reef

reef: mgr/snap_schedule: fix typo in error message during retention add

Reviewed-by: Venky Shankar <vshankar@redhat.com>

Merge pull request #65348 from joscollin/wip-72804-reef

reef: mds: Fix readdir when osd is full.

Reviewed-by: Venky Shankar <vshankar@redhat.com>

Merge pull request #65364 from ifed01/wip-ifed-fix-snapdiff-fragment-reef

reef: mds: fix snapdiff result fragmentation

Reviewed-by: Venky Shankar <vshankar@redhat.com>

Merge pull request #63129 from kshtsk/wip-71210-reef

reef: qa/tasks/cephfs/mount: use 'ip route' instead 'route'

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>
Reviewed-by: Venky Shankar <vshankar@redhat.com>

qa: Fix test_with_health_warn_with_2_active_MDSs

The test intended to validate the failure of 'mds fail'
cmd on any active mds when one of them has warning.

The commit 221700273a82658c642a282c5761c0cbb00ec5b6
(PR 61554) changes this behavior and allows 'mds fail'
on mds without the warning. The test should have always
failed with this commit. But the test never failed until
tested extensively because the test mostly generated
warnings for both active mdses. Occasionaly, the test
generated a warning on single mds and failed. So it's a
race. This patch fixes the same by changing the following.

a. Changed the mds_cache_memory_limit to '50K' from '1K'
as '1K' was to less and generating warning on both the mdses.
b. Create a directory and pin it a single mds and open 400 files
in the backend to create cache pressure on one mds.

Also, there are two tests with the same name as
'test_with_health_warn_with_2_active_MDSs' but in different classes
though. So changed the test name to
'test_with_health_warn_on_1_mds_with_2_active_MDSs' to avoid
confusion and indicate what the test actually does.

Fixes: https://tracker.ceph.com/issues/71915
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit f990e7d1ea90a93b3f17a500be9178f979bf7e39)

qa/cephfs: allow detecting MDS ID from FS object for method the..

tha generates MDS_CACHE_OVERSIZE warning.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit d53be13c04dd05a94862affd9fab56efa6c2b98e)

qa/cephfs: allow passing MDS ID to method that generate...

MDS_CACHE_OVERSIZE warning.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 5a7834b6cd8b7dd2427e62c16f7955fa63518284)

Merge pull request #61512 from mchangir/wip-68768-reef

reef: mds: add an asok command to dump export states

Reviewed-by: Jos Collin <jcollin@redhat.com>

Merge pull request #65710 from aaSharma14/wip-73294-reef

reef: monitoring: fix MTU Mismatch alert rule and expr

Reviewed-by: Afreen Misbah <afreen@ibm.com>

monitoring: fix MTU Mismatch alert rule and expr

Fixes: https://tracker.ceph.com/issues/73290
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit bee24dec441b9e6b263e4498c2ab333b0a60a52d)

Conflicts:
monitoring/ceph-mixin/prometheus_alerts.yml
monitoring/ceph-mixin/tests_alerts/test_alerts.yml
src/pybind/mgr/dashboard/frontend/src/app/ceph/cluster/prometheus/active-alert-list/active-alert-list.component.html
src/pybind/mgr/dashboard/frontend/src/app/ceph/cluster/prometheus/active-alert-list/active-alert-list.component.ts
src/pybind/mgr/dashboard/frontend/src/app/shared/datatable/table-key-value/table-key-value.component.scss

Merge pull request #65621 from aaSharma14/wip-73165-reef

reef: mgr/dashboard: fix zone update API forcing STANDARD storage class

Reviewed-by: Afreen Misbah <afreen@ibm.com>

Merge pull request #62436 from rishabh-d-dave/mgr-vol-no-del-reef

reef: mgr/volumes: allow disabling async job threads

Reviewed-by: Jos Collin <jcollin@redhat.com>

PendingReleaseNote: add note for pause_purging and pause_cloninig

Added release notes for mgr/vol config option "pause_purging and
"pause_cloning".

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 03b90d589ca5759701164ea54b0dbf9b92c4efef)

doc/cephfs: add note for config option pause_purging and pause_cloning

Update documentation for add information about mgr/vol config options
"pause_purging" and "pause_cloning".

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 1f0bfe1c599769ac67d3b1b41f37bb3482e27839)

qa/cephfs: add tests for mgr/vol config pause_cloning

mgr/vol config option pause_cloning allows pausing of cloner threads.
Add tests for this.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit e7eb36e4393c8401c7cf1aa1f714b52c1ced9ca0)

qa/cephfs: extend wait for trash empty

Trash directory for a volume is not created by default. If
_wait_for_trash_empty() in test_volumes.py encounters absence of trash
directory, return true.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 6d6be8b41c990acf2d9c08f35eb382996d59d5a7)

qa/cephfs: add tests for config option pause_purging

Setting MGR config option mgr/volumes/pause_purging to true halts
all ongoing purges and allows no new purging to begin until this option
is changed to false. Add tests for this.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit f38fcbc6109494e23e4948d794f59c927a9303ff)

Conflicts:
qa/tasks/cephfs/test_volumes.py
- First conflict occurred due to missing import of safe_while which
  in Reef branch compared to main branch. Along with resolving this
  conflict this has been imported as it used by the tests.
- Second conflict occured due to absence of some test methods right
  before where TestPausePurging was to be added.
- Third conflict occured because entire contextutil was imported instead
  of just safe_while and only CommandFailedError was imported from
  teuthology.exceptions while this commit imports MaxWhileTries too.

qa/cephfs: don't strip any whitespace for get_shell_stdout

Whitespace is not removed from the end of the stdout returned by the
method get_ceph_cmd_stdout(). Follow the same policy here since it is
better to not do so (this whitespace can be useful, when copying Ceph
auth keyrings from stdout to a file) and also for sake of uniformity of
interfaces.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 9582b9b13a42fae4c7f38c22f9202eb893b6f1bc)

Conflicts:
qa/tasks/cephfs/mount.py
- Conflict occured for 2 reasons -
  - One, method get_shell_stdout() is absent on Reef branch but not in
    main so this patch which makes modification to it will obviously run
    in to conflict
   - Two, run_shell_payload() lies right next to get_shell_stdout() in
     main branch and its definition is quite different, leading to
     conflict again.

mgr/vol: add pause/resume mechanism for async jobs

Add mechansim that allows pausing/resuming of the entire async job
machinery that queues, launches and picks next async job; both async
jobs, clones as well as purges.

And then add mgr/vol config option pause_purging and pause_cloning so
that both of these async jobs can be paused and resumed individually.

Fixes: https://tracker.ceph.com/issues/61903
Fixes: https://tracker.ceph.com/issues/68630
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 01d37d5e1ba0e250e9d3a5f28ec7f3fa3597c63f)

Conflicts:
src/pybind/mgr/volumes/module.py
- Code where patch was to be applied was slighty different

qa: add test for 'dump_export_states'

Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
(cherry picked from commit 5506ed63c2f14cc5f4c3e72998d8a47fdd97f200)

Conflicts:
qa/tasks/cephfs/test_exports.py
- conflicts due to new test class addition at the bottom of file

mds: add an asok command to dump export states

Task to export subtree may be blocked, use this command
to find out what's going on.

Fixes: https://tracker.ceph.com/issues/58835
Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
(cherry picked from commit d34f33055d25ba78f63369f661eb75515b5f465d)

Conflicts:
src/mds/MDSCacheObject.h
src/mds/Migrator.cc
- conflicts due to quiesce additions in main branch

Merge pull request #60630 from kamoltat/wip-68841-reef

reef: mon [stretch mode]: support disable_stretch_mode & qa/workunits/mon: ensure election strategy is "connectivity" for stretch mode

Merge pull request #65637 from adk3798/reef-cephadm-pin-cheroot

reef: pybind/mgr: pin cheroot version in requirements-required.txt

Reviewed-by: John Mulligan <jmulligan@redhat.com>

mgr/dashboard: bump cheroot to > 10.0

Fixes: https://tracker.ceph.com/issues/55837
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 1ec74a8360d1c4abb39754320eba118d080e3499)

Merge pull request #65638 from zdover23/wip-doc-2025-09-23-reef-remove-cloud-restore-rst

reef: doc/radosgw: remove cloud-restore from reef

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #61279 from vshankar/wip-68765-reef

reef: qa: increase the http.maxRequestBuffer to 100MB and enable the git debug logs

Reviewed-by: Jos Collin <jcollin@redhat.com>

Merge pull request #65630 from phlogistonjohn/jjm-r-65514

reef: build-with-container: add argument groups to organize options

pybind/mgr: pin cheroot version in requirements-required.txt

With python 3.10 (didn't seem to happen with python 3.12) the
pybind/mgr/cephadm/tests/test_node_proxy.py test times out.
This appears to be related to a new release of the cheroot
package and a github issues describing the same problem
we're seeing has been opened by another user
https://github.com/cherrypy/cheroot/issues/769

It is worth noting that the workaround described in that
issue does also work for us. If you add

```
import cheroot
cheroot.server.HTTPServer._serve_unservicable = lambda: None
```

after the existing imports in test_node_proxy.py the
test hanging issue also disappears. Also worth noting the
particular pin of

cheroot~=10.0

was chosen as it matches the existing pin being used
in pybind/mgr/dashboard/constraints.txt

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 6231955b5d00ae6b3630ee94e85b2449092ef0fe)

doc/radosgw: remove cloud-restore from reef

Remove doc/radosgw/cloud-restore.rst from the reef branch.

cloud-restore does not appear in index.rst, so its removal from
index.rst is unnecessary.

Signed-off-by: Zac Dover <zac.dover@proton.me>

build-with-container: add argument groups to organize options

Use the argparse add_argument_group feature to organize the mass of
arguments into more sensible categories. Hopefully, someone reading
over the `--help` output can now more easily see options that
are useful rather than being overwhelmed by a wall of text.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 71a1be4dd0aea004da56c2f518ee70a281a3f7d3)

mgr/dashboard: fix zone update API forcing STANDARD storage class

The zone update REST API (`edit_zone`) always attempted to configure a
placement target for the `STANDARD` storage class, even when the request
was intended for a different storage class name.
This caused failures in deployments where `STANDARD` is not defined.

Changes:
Club add placement target and add storage class methods into one single
add_placement_targets_storage_class_zone method which takes the storage
class as a param as well alongside the rest of the placement params.

Fixes: https://tracker.ceph.com/issues/73105
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 135f3adb4973be493925839e946e7a5fc75e7d5c)

Merge pull request #65297 from joscollin/wip-71832-reef

reef: mgr/volumes: Keep mon caps if auth key has remaining mds/osd caps

Reviewed-by: Kotresh HR <khiremat@redhat.com>

Merge pull request #61297 from batrick/wip-68451-reef

reef: qa: ignore pg availability/degraded warnings

Reviewed-by: Jos Collin <jcollin@redhat.com>

Merge pull request #62092 from batrick/wip-70155-reef

reef: qa: ignore variant of down fs

Reviewed-by: Jos Collin <jcollin@redhat.com>

Merge pull request #63017 from batrick/wip-71092-reef

reef: qa/workunits/fs/misc: remove data pool cleanup

Reviewed-by: Jos Collin <jcollin@redhat.com>

Merge pull request #65595 from aaSharma14/wip-73134-reef

reef: Handle failures in metric parsing

Reviewed-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>

Handle failures in metric parsing

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2345460
Signed-off-by: Anmol Babu <anmolbabu@Anmols-MacBook-Pro.local>
(cherry picked from commit f29e3f307c46401328e920204cbe893fbd837c65)

Conflicts:
src/exporter/DaemonMetricCollector.cc

Merge pull request #61978 from batrick/wip-70066-reef

reef: mds: dump next_snap when checking dentry corruption

Merge pull request #62278 from dparmar18/wip-70034-reef

reef: mgr/nfs: validate path when modifying cephfs export

Merge pull request #62409 from neesingh-rh/wip-70418-reef

reef: cephfs-shell: add option to remove xattr

Merge pull request #65251 from joscollin/wip-70031-reef

reef: qa: enable debug mds/client for fs/nfs suite

Merge pull request #65253 from joscollin/wip-71379-reef

reef: cephfs: session tracker accounts for killing sessions

qa/workunits/mon: ensure election strategy is "connectivity" for stretch mode

The election strategy is randomly chosen for this type of test. Sometimes,
the test passes if the "connectivity" election strategy happens to be picked.
But if a different strategy, i.e. "classic", is picked, then the test will fail.

We can ensure that the election strategy is "connectivity" by setting it in the
workunit with the ceph CLI command. Although connectivity was specified in
stretch-mode-5-mons-8-osds.yaml, that config ultimately gets overridden by
the "qa/mon_config" yaml.

Fixes: https://tracker.ceph.com/issues/69107
Signed-off-by: Laura Flores <lflores@ibm.com>
(cherry picked from commit de2d9186bddbd452d2e7939723418c200e3fec46)

src/mon/MonMap: modify dump function

Problem:
Current dump for "removed_ranks" and "disallowed_leaders"
doesn't have the correct format so the python test
script can parse through these values.

Solution:
Modified the values such that it is in the correct format

Conflict: src/osd/osd_types.cc: Added f->dump_bool("is_stretch_pool", is_stretch_pool());

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit a7f3b7b749acabd235d615a3f5b80e3398a6d80d)

qa: Added tests for disabling stretch mode

Test disabling stretch mode with the following scenario:

1. Healthy Stretch Mode
2. Degraded Stretch Mode

Fixes: https://tracker.ceph.com/issues/67467
Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 4d2f8879bed2abd10c00e5a1c5008bd56c11bf61)

doc/rados/operations/stretch-mode.rst: Added Exitting Stretch Mode

Added documentation about exiting stretch mode.

Fixes: https://tracker.ceph.com/issues/67467
Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 0680f17d7bab386429a013c254dd90c70fbabeb7)

mon [stretch mode]: support disable_stretch_mode

Problem:

Currently, Ceph lacks the ability
to exit stretch mode and move back
to normal cluster (non-stretched).

Solution:

Provide a command to allow
the user to exit stretch mode gracefully:

`ceph mon disable_stretch_mode <crush_rule> --yes-i-really-mean-it`

User can either specify a crush rule that
they want all pools to move to or not specify
a rule and Ceph will use a default replicated crush rule.

Fixes: https://tracker.ceph.com/issues/67467
Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 78ce68de41b1d5278e14cf56dff7f15394969255)

Conflicts:
src/mon/MonmapMonitor.cc - replace `goto reply` with
`goto reply_no_propose`
src/mon/OSDMonitorcc - replace `rule_valid_for_pool_type`
with `get_rule_type` since
`rule_valid_for_pool_type` is not
backported.

PendingReleaseNotes: add a release note about confirm flag for max_mds

Add a release note for the fact that users now need to pass the
confirmation flag for modifying "max_mds" when cluster is unhealthy.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit a71c8e8d1186823cf5d01f23d7b922c5e2665aa5)

doc/cephfs: update about changing max_mds FS setting variable

Update the documentation for CephFs admininstration as well
troubleshooting.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 2d28faaeea11988867471a53e40145f309951307)

Conflicts:
doc/cephfs/troubleshooting.rst
- Code where patch was to be applied was a bit different in Reef
compared to main branch.

qa/cephfs: add tests for confirmationn required to change max_mds

Add tests to ensure that when cluster has any health warning, especially
MDS_TRIM, confirmation flag is mandatory to change max_mds.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 4d5ec87ab404c2b94aab6865061175eb5870fa33)

Conflicts:
qa/tasks/cephfs/test_admin.py
- Being slighty different from main branch version, patch couldn't be
applied seamlessly.

mon,cephfs: require confirmation when changing max_mds on unhealthy cluster

User must pass the confirmation flag (--yes-i-really-mean-it) to change
the value of CephFS setting variable "max_mds" when the Ceph cluster is
unhealthy.

This measure was decided upon to prevent users from changing "max_mds"
as a measure of troubleshotoing unhealthy cluster.

Fixes: https://tracker.ceph.com/issues/66301
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit a55a75c57e7a42a1317e4d7fc86c1964b71137f0)

Conflicts:

src/mon/FSCommands.cc
- Method set_val() is present in this file on main branch but not on
Reef branch

Merge pull request #65473 from rhcs-dashboard/wip-72963-reef

reef: monitoring: add user-agent headers to the urllib

monitoring: add user-agent headers to the urllib

The documentation started raising 403 suddenly. Adding User-Agent
headers to the request

Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit b8fe487010483681bbc8ddb8dfe18b40ebfd346b)

test/libcephfs: use more entries to reproduce snapdiff fragmentation
issue.

Snapdiff listing fragments have different boundaries in Reef and Squid+
releases hence original reproducer (made for Reef) doesn't work properly
in S+ releases. This patch fixes that at cost of longer execution.
This might be redundant/senseless when backporting to Reef.

Related-to: https://tracker.ceph.com/issues/72518
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 23397d32607fc307359d63cd651df3c83ada3a7f)

mds: rollback the snapdiff fragment entries with the same name if needed.

This is required when more entries with the same name don't fit into the
fragment. With the existing means for fragment offset specification such a splitting to be
prohibited.

Fixes: https://tracker.ceph.com/issues/72518
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 24955e66f4826f8623d2bec1dbfc580f0e4c39ae)

test/libcephfs: Polisihing SnapdiffDeletionRecreation case

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit daf3350621cfafa383cd9deea81b60b775a53093)

Test failure: LibCephFS.SnapdiffDeletionRecreation
Reproduces: https://tracker.ceph.com/issues/72518
Signed-off-by: Md Mahamudur Rahaman Sajib <mahamudur.sajib@croit.io>
(cherry picked from commit 4ff71386ac1529dc1f7c2640511f509bd6842862)
(cherry picked from commit 48f5a5d04fb2cef52c5e4a3daf452ccf988666d2)

Merge pull request #65002 from aaSharma14/wip-68481-reef

reef: mgr/dashboard: show non default realm sync status in rgw overview page

Reviewed-by: Afreen Misbah <afreen@ibm.com>

qa: Disable a test for kernel mount

The kclient fix isn't yet landed in the kernel and hence
the test 'test_multifs_single_client_cross_access_r_caps_end'
would fail for kernel mount. So disable the failing validation
in the test for kclient.

Fixes: https://tracker.ceph.com/issues/72167
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 485f37ac1fe7d233685ce1a1f9ac5360c142b1f5)

Conflicts:
qa/tasks/cephfs/test_admin.py: The commit
9d0ab233d822668e88c873bc1314e984feaf1296 is not backported

src/test/mds: Fix TestMDSAuthCaps

Fix the TestMDSAuthCaps after fixing
multifs authcaps comparison.

Fixes: https://tracker.ceph.com/issues/72167
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 92ab603e110e349342f3611e29f92fc64ae7d3ec)
(cherry picked from commit a3e3f27d5243c741b1f10e9433d341fac60d07d6)

client: Fix the multifs auth caps check

The fsname needs to be passed to validate the mds
auth caps check. This patch fixes the same.

Fixes: https://tracker.ceph.com/issues/72167
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit b1d6bb95d3c805af37883ef481b96a1aa33cedf0)
(cherry picked from commit cac4ee5d1a5e769c6bf90619d93043ade15fc27b)

mds: Fix multifs auth caps check

The fsname is not take into consideration while validating
the access check for the operations. This patch fixes
the same.

Fixes: https://tracker.ceph.com/issues/72167
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 1a5e083eb297dc611c8098abb67faf34fd8e4499)

qa: Fix validation of client_version

The multifs auth caps bug has a fix both in client and mds.
If it's old client and not patched, we expect that the fs
with 'rw' would end up having 'r' caps with the multifs
auth caps used as in the test
'test_multifs_single_client_cross_access_r_caps_end'.
This patch adds the conditional to validate the same.

This commit makes use of the PR #64005

Fixes: https://tracker.ceph.com/issues/72167
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit f10e34d0350d216f02d4e73ec695340daae11dd4)

Conflicts:
qa/tasks/cephfs/test_admin.py - The commit
9d0ab233d822668e88c873bc1314e984feaf1296 is not backported

qa: Test cross fs access by single client in multifs

Fixes: https://tracker.ceph.com/issues/72167
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 3516db300d3688cd048542dbed2e0318f9ac5ad3)

Conflicts:
qa/tasks/cephfs/test_admin.py: The commit
9d0ab233d822668e88c873bc1314e984feaf1296 is not backported

qa: Run test_admin with the reef client

This is required to test the features involving
fixes both in client and mds. This is to make
sure the older clients are not broken with the
fix. The version 18.2.6 is reef with out the
client fix.

The test suite sets up the cluster with squid
18.2.6 and upgrades only the ceph cluster node
leaving the client node.

NOTE: The version is changed to 18.2.6 because
this is reef backport where as it's 19.2.2 in
higher releases. Please check commit a4f97c0aa92

Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit a4f97c0aa92c37113b33d63b57f2fae870f403a1)

mgr/volumes: Fix json.loads for test on mon caps

Signed-off-by: Enrico Bocchi <enrico.bocchi@cern.ch>
(cherry picked from commit b008ef9eb690618608f902c67f8df1fb8a587e33)

mgr/volumes: Add test for mon caps if auth key has remaining mds/osd caps

Signed-off-by: Enrico Bocchi <enrico.bocchi@cern.ch>
(cherry picked from commit 403d5411364e2fddd70d98a6f120b26e416c1d99)

mgr/volumes: Keep mon caps if auth key has remaining mds/osd caps

Signed-off-by: Enrico Bocchi <enrico.bocchi@cern.ch>
(cherry picked from commit 0882bbe8a4470f82993d87b7c02b19aa7fe7fbcc)

qa: Add test for subvolume_ls on osd full

Fixes: https://tracker.ceph.com/issues/72260
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 8547e57ebc4022ca6750149f49b68599a8af712e)

mds: Fix readdir when osd is full.

Problem:
The readdir wouldn't list all the entries in the directory
when the osd is full with rstats enabled.

Cause:
The issue happens only in multi-mds cephfs cluster. If rstats
is enabled, the readdir would request 'Fa' cap on every dentry,
basically to fetch the size of the directories. Note that 'Fa' is
CEPH_CAP_GWREXTEND which maps to CEPH_CAP_FILE_WREXTEND and is
used by CEPH_STAT_RSTAT.

The request for the cap is a getattr call and it need not go to
the auth mds. If rstats is enabled, the getattr would go with
the mask CEPH_STAT_RSTAT which mandates the requirement for
auth-mds in 'handle_client_getattr', so that the request gets
forwarded to auth mds if it's not the auth. But if the osd is full,
the indode is fetched in the 'dispatch_client_request' before
calling the handler function of respective op, to check the
FULL cap access for certain metadata write operations. If the inode
doesn't exist, ESTALE is returned. This is wrong for the operations
like getattr, where the inode might not be in memory on the non-auth
mds and returning ESTALE is confusing and client wouldn't retry. This
is introduced by the commit 6db81d8479b539d which fixes subvolume
deletion when osd is full.

Fix:
Fetch the inode required for the FULL cap access check for the
relevant operations in osd full scenario. This makes sense because
all the operations would mostly be preceded with lookup and load
the inode in memory or they would handle ESTALE gracefully.

Fixes: https://tracker.ceph.com/issues/72260
Introduced-by: 6db81d8479b539d3ca6b98dc244c525e71a36437
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 1ca8f334f944ff78ba12894f385ffb8c1932901c)

qa: test failure for duplicate retention spec

Signed-off-by: Milind Changire <mchangir@redhat.com>
(cherry picked from commit 074f05ae294a50f8b6a22fb58d03b46bfb956966)

mgr/snap_schedule: fix message format error

Fixes: https://tracker.ceph.com/issues/64042
Signed-off-by: Milind Changire <mchangir@redhat.com>
(cherry picked from commit 59ec7a9bcda76aa6a71a1d34a1e6ca609af467f0)

Merge pull request #65418 from ceph/fix-api-tests-reef

reef: pybind/mgr/dashboard: Use teuthology's actual requirements

pybind/mgr/dashboard: Use teuthology's actual requirements

Signed-off-by: David Galloway <david.galloway@ibm.com>
(cherry picked from commit 22a87d959bca74478de1e2d9f86859676385491d)

Merge pull request #65380 from zdover23/wip-doc-2025-09-04-backport-65325-to-reef

reef: doc/cephfs: edit troubleshooting.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

doc/cephfs: edit troubleshooting.rst

Update the "Disconnected+Remounted FS" section in
doc/cephfs/troubleshooting.rst, as suggested by Venky Shankar in https://github.com/ceph/ceph/pull/65129/files#r2312903062

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit f4b40422fefaa993441396a5c31fbfd3d8714595)

Merge pull request #65250 from ceph/reef-pipeline-backports

reef: Recent pipeline backports

Merge pull request #65094 from zdover23/wip-doc-2025-08-18-backport-64931-to-reef

reef: doc/cephfs: edit troubleshooting.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

Merge pull request #65207 from zdover23/wip-doc-2025-08-26-backport-64074-to-reef

reef: doc/rados/configuration: Mention show-with-defaults and ceph-conf

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

Merge pull request #65212 from zdover23/wip-doc-2025-08-26-backport-65180-to-reef

reef: doc/dev:update blkin.rst doc for lttng trace

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

Merge pull request #65239 from zdover23/wip-doc-2025-08-26-backport-65230-to-reef

reef: doc/rados/operations: Improve health-checks.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

Merge pull request #64843 from NitzanMordhai/wip-72419-reef

reef: monitor: Enhance historic ops command output and error handling

Merge pull request #63134 from kshtsk/wip-71215-reef

reef: tasks/cephfs/mount: use 192.168.144.0.0/20 for brxnet

mgr/dashboard: show non default realm sync status in rgw overview page

Currently, we just show the sync status of the default realm in rgw
overview page. This PR is to show the sync status of non-default realms
as well. Multisite sync status can be viewed for any of the active daemon
which runs in default/non-default realm.

Fixes: https://tracker.ceph.com/issues/68329
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit ea53aceb8d72187f7f8629aa6d3b66c7cca88a86)

Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-overview-dashboard/rgw-overview-dashboard.component.ts
src/pybind/mgr/dashboard/openapi.yaml

Merge pull request #65201 from zdover23/wip-doc-2025-08-25-backport-65185-to-reef

reef: doc/cephfs: edit troubleshooting.rst (Slow MDS)

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

Merge pull request #65184 from zdover23/wip-doc-2025-08-22-backport-64726-to-reef

reef: doc/man/8: Improve mount.ceph.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

Merge pull request #65138 from zdover23/wip-doc-2025-08-20-backport-65128-to-reef

reef: doc/rados: repair short underline

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

Merge pull request #65091 from zdover23/wip-doc-2025-08-18-backport-64928-to-reef

reef: doc/cephfs: edit troubleshooting.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>