Jos Collin [Thu, 16 Oct 2025 14:23:06 +0000 (19:53 +0530)]
Merge PR #61517 into wip-jcollin-testing-20251016.142247-reef
* refs/pull/61517/head:
mds: fix rank root doesn't insert root ino into its subtree map when starting
mds: flush mds log before finishing STATE_STARTING
mds/FSMap: go back to STARTING state when rank doesn't make it pass STARTING
Jos Collin [Thu, 16 Oct 2025 14:23:00 +0000 (19:53 +0530)]
Merge PR #65768 into wip-jcollin-testing-20251016.142247-reef
* refs/pull/65768/head:
client: resolve bogus self-assignment
mds: add issue_seq to all cap messages
include/ceph_fs: correct ceph_mds_cap_peer field name
include/ceph_fs: correct ceph_mds_cap_item field name
messages/MClientCaps: use correct ceph_seq_t for cap sequence types
messages/MClientCaps: dump issue_seq for debugging
mds: remove dead code
Jos Collin [Thu, 16 Oct 2025 14:22:56 +0000 (19:52 +0530)]
Merge PR #65818 into wip-jcollin-testing-20251016.142247-reef
* refs/pull/65818/head:
src/common: add helper to prepend "..." to trimmed paths
mds/ScrubStack: avoid generating inode path since it is unused
mds: fix few log entries
client: trim path before logging it
mds: log trimmed path wherever generating full path is necessary
mds: for logging generate only 10 final components of dentry path
mds: for logging generate only 10 final components of inode path
qa, test: run unit tests for cephfs.pyx with non-root user
test/pybind: add unit tests for rmtree() in cephfs python bindings
pybind/cephfs, mgr/volumes: refactor purge() to be non-recursive
Rishabh Dave [Tue, 2 Sep 2025 17:37:36 +0000 (23:07 +0530)]
client: trim path before logging it
Path can be virtually infinitely long and logging a long long path
(imagine around 2000 path components) is un-useful as well as lowers
readability of the log. Therefore, trim before logging.
Fixes: https://tracker.ceph.com/issues/72993 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit bdc8aae400fbbdd61df811455d49176deab1f331)
Conflicts:
src/mds/Server.cc
src/mds/SessionMap.cc
- Inclusion of lesser header files in both these files confused git
while applying patch automatically.
src/mds/MDSAuthCaps.cc
src/test/mds/TestMDSAuthCaps.cc
- is_capable() takes one less argument in reef compared to main.
Rishabh Dave [Thu, 21 Aug 2025 11:51:48 +0000 (17:21 +0530)]
mds: for logging generate only 10 final components of dentry path
Generating full absolute path for dentries for printing in MDS logs
slows the down the FS to a great extent especially when the path is very
long (imagine a path with 2000 components). Printing such long paths in
MDS logs is not only pointless but also greatly reduces the readability
of MDS logs.
Therefore, generate only 10 final components of the dentry paths for logging.
Fixes: https://tracker.ceph.com/issues/72779 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 1430cd67d8f7bd7d98b241a7511fa3ceb7e5ba2e)
Conflicts:
src/include/filepath.cc
- this file is absent in reef, set_trimmed() has been moved to
filepath.h instead.
Rishabh Dave [Sun, 17 Aug 2025 18:13:40 +0000 (23:43 +0530)]
mds: for logging generate only 10 final components of inode path
Generating full absolute path for inodes for printing in MDS logs slows
down the FS to a great extent especially when the path is very long
(imagine a path with 2000 components). Also printing such long paths in
MDS logs is not only pointless but also greatly reduces the readability
of the MDS logs.
Therefore, generate only 10 final components of inode paths for logging.
Fixes: https://tracker.ceph.com/issues/72779 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 1518690210f3a4473978c7a9274e902fccaad862)
Conflicts:
src/mds/CDir.cc
- Code region where modification were made in main is completely absent
in Reef.
Rishabh Dave [Fri, 25 Jul 2025 08:20:06 +0000 (13:50 +0530)]
qa, test: run unit tests for cephfs.pyx with non-root user
Run test_python.sh with non-root user. This makes it necessary to change
the owner user and group of file system root to be same as this non-root
user. This brings testing closer to the real-world scenario and also
allows exercising negative tests where an FS op would fail for a non-root
user but it would pass for root user.
There are few tests that exercise FS operations where root user is
needed. Group these tests under a separate class and add extra code for
this class that allows these tests to run with root UID and GID.
Conflicts:
src/test/pybind/test_cephfs.py
- PR #64999 is merged in main but not in Reef (bug is absent in Reef),
causing this file to have less tests that need to be moved to class
TestWithRootUser.
Allow the user to control the content of the build image with a
high-level `--image-variant=` switch. Currently the supported values are
`default` (the same maximal image we have been generating) and
`packages` a slimmer image that avoids installing certain test-only
dependencies.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 13 Oct 2025 20:23:10 +0000 (16:23 -0400)]
install-deps.sh: let FOR_MAKE_CHECK variable take precedence
Previously, the FOR_MAKE_CHECK variable could only enable installing
extra (test) dependencies when install-deps.sh was used and it was
ignored if `tty -s` exited true. This change allows FOR_MAKE_CHECK to
take precedence over the tty check and to specify one of true, 1, yes to
enable extra "for make check" deps or false, 0, no to explicitly disable
the extra deps.
Based-on-work-by: Dan Mick <dan.mick@redhat.com> Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Wed, 8 Oct 2025 20:41:36 +0000 (16:41 -0400)]
script/build-with-container: improve error handling for invalid distros
Instead of throwing a long obnoxious traceback at the user if the value
supplied to -d/--distro is invalid do something nicer. For example:
```
$ ./src/script/build-with-container.py -d trixy -e build
usage: build-with-container.py [-h] [--help-build-steps]
build-with-container.py: error: argument --distro/-d: unknown distro: 'trixy' not in centos10, centos10stream, centos8, centos9, centos9stream, rocky9, rockylinux9, rocky10, rockylinux10, fedora41, fc41, fedora42, fc42, fedora43, fc43, ubuntu20.04, ubuntu-focal, focal, ubuntu22.04, ubuntu-jammy, jammy, ubuntu24.04, ubuntu-noble, noble, debian12, debian-bookworm, bookworm, debian13, debian-trixie, trixie
John Mulligan [Wed, 8 Oct 2025 14:23:25 +0000 (10:23 -0400)]
script/build-with-container: be consistent with naming in distro kinds
Update the DistroKind enum and related items so that the naming is
applied consistently. That is: the canonical (no pun indented) form
of the name is "<name><version>" and codenames, such as "jammy" or
"bookworm" are aliases. This matches the previously existing code.
John Mulligan [Thu, 28 Aug 2025 23:39:06 +0000 (19:39 -0400)]
build-with-container: ensure npm dir is set up before configure
When the npm cache path option is passed the npm cache dir is passed
to all container `run` commands, ensure the dir has been created
before the first container command (configure) is used.
John Mulligan [Sat, 15 Mar 2025 16:44:00 +0000 (12:44 -0400)]
install-deps: extract SUDO variable logic into a reusable function
While the function is pretty simple and could be copy-pasted I
prefer to extract things into functions to indicate that the
logic is used/repeated elsewhere to ward off making changes to
one copy vs the other.
Rishabh Dave [Fri, 13 Jun 2025 07:13:51 +0000 (12:43 +0530)]
pybind/cephfs, mgr/volumes: refactor purge() to be non-recursive
Method purge() in trash.py calls rmtree() which is recursive method. To
avoid Python's recurision limit, switch to non-recursive approach.
Path to directory along directory handle are clubbed in to a tuple and
that tuple is stored on the stack. Storing directory handle reduces call
to opendir() dramatically.
Fixes: https://tracker.ceph.com/issues/71648 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit f9046ca052d10a884a59c1d928cb0c8f0235696b)
Patrick Donnelly [Wed, 26 Jun 2024 13:49:17 +0000 (09:49 -0400)]
include/ceph_fs: correct ceph_mds_cap_item field name
Originally, the last_sent sequence from the MDS was sent by the client during
bulk cap release but it was shortly after changed to the last_issue which is
the sequence number that the cap was originally "issued" by the MDS rank (which
may be updated after import of caps).
Fixes: 6208f57f487ac170df24a9018f1cc87a5ac8b4b3 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 655cddb7c9f32c9dd9cddf40ac17f385d539c8f9)
Rishabh Dave [Fri, 11 Oct 2024 19:03:29 +0000 (00:33 +0530)]
qa/cephfs: extend wait for trash empty
Trash directory for a volume is not created by default. If
_wait_for_trash_empty() in test_volumes.py encounters absence of trash
directory, return true.
Rishabh Dave [Sat, 6 Jan 2024 14:42:31 +0000 (20:12 +0530)]
qa/cephfs: add tests for config option pause_purging
Setting MGR config option mgr/volumes/pause_purging to true halts
all ongoing purges and allows no new purging to begin until this option
is changed to false. Add tests for this.
Conflicts:
qa/tasks/cephfs/test_volumes.py
- First conflict occurred due to missing import of safe_while which
in Reef branch compared to main branch. Along with resolving this
conflict this has been imported as it used by the tests.
- Second conflict occured due to absence of some test methods right
before where TestPausePurging was to be added.
- Third conflict occured because entire contextutil was imported instead
of just safe_while and only CommandFailedError was imported from
teuthology.exceptions while this commit imports MaxWhileTries too.
Rishabh Dave [Fri, 12 Jan 2024 10:28:41 +0000 (15:58 +0530)]
qa/cephfs: don't strip any whitespace for get_shell_stdout
Whitespace is not removed from the end of the stdout returned by the
method get_ceph_cmd_stdout(). Follow the same policy here since it is
better to not do so (this whitespace can be useful, when copying Ceph
auth keyrings from stdout to a file) and also for sake of uniformity of
interfaces.
Conflicts:
qa/tasks/cephfs/mount.py
- Conflict occured for 2 reasons -
- One, method get_shell_stdout() is absent on Reef branch but not in
main so this patch which makes modification to it will obviously run
in to conflict
- Two, run_shell_payload() lies right next to get_shell_stdout() in
main branch and its definition is quite different, leading to
conflict again.
Rishabh Dave [Tue, 3 Sep 2024 10:01:07 +0000 (15:31 +0530)]
mgr/vol: add pause/resume mechanism for async jobs
Add mechansim that allows pausing/resuming of the entire async job
machinery that queues, launches and picks next async job; both async
jobs, clones as well as purges.
And then add mgr/vol config option pause_purging and pause_cloning so
that both of these async jobs can be paused and resumed individually.
Fixes: https://tracker.ceph.com/issues/61903 Fixes: https://tracker.ceph.com/issues/68630 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 01d37d5e1ba0e250e9d3a5f28ec7f3fa3597c63f)
Conflicts:
src/pybind/mgr/volumes/module.py
- Code where patch was to be applied was slighty different
Adam King [Mon, 22 Sep 2025 21:05:07 +0000 (17:05 -0400)]
pybind/mgr: pin cheroot version in requirements-required.txt
With python 3.10 (didn't seem to happen with python 3.12) the
pybind/mgr/cephadm/tests/test_node_proxy.py test times out.
This appears to be related to a new release of the cheroot
package and a github issues describing the same problem
we're seeing has been opened by another user
https://github.com/cherrypy/cheroot/issues/769
It is worth noting that the workaround described in that
issue does also work for us. If you add
John Mulligan [Fri, 12 Sep 2025 17:52:25 +0000 (13:52 -0400)]
build-with-container: add argument groups to organize options
Use the argparse add_argument_group feature to organize the mass of
arguments into more sensible categories. Hopefully, someone reading
over the `--help` output can now more easily see options that
are useful rather than being overwhelmed by a wall of text.
mgr/dashboard: fix zone update API forcing STANDARD storage class
The zone update REST API (`edit_zone`) always attempted to configure a
placement target for the `STANDARD` storage class, even when the request
was intended for a different storage class name.
This caused failures in deployments where `STANDARD` is not defined.
Changes:
Club add placement target and add storage class methods into one single
add_placement_targets_storage_class_zone method which takes the storage
class as a param as well alongside the rest of the placement params.
Laura Flores [Tue, 3 Dec 2024 22:15:19 +0000 (16:15 -0600)]
qa/workunits/mon: ensure election strategy is "connectivity" for stretch mode
The election strategy is randomly chosen for this type of test. Sometimes,
the test passes if the "connectivity" election strategy happens to be picked.
But if a different strategy, i.e. "classic", is picked, then the test will fail.
We can ensure that the election strategy is "connectivity" by setting it in the
workunit with the ceph CLI command. Although connectivity was specified in
stretch-mode-5-mons-8-osds.yaml, that config ultimately gets overridden by
the "qa/mon_config" yaml.
Problem:
Current dump for "removed_ranks" and "disallowed_leaders"
doesn't have the correct format so the python test
script can parse through these values.
Solution:
Modified the values such that it is in the correct format
Conflicts:
src/mon/MonmapMonitor.cc - replace `goto reply` with
`goto reply_no_propose`
src/mon/OSDMonitorcc - replace `rule_valid_for_pool_type`
with `get_rule_type` since
`rule_valid_for_pool_type` is not
backported.
mgr/DaemonState: Minimise time we hold the DaemonStateIndex lock
Calling back into python functions whilst holding the lock can result in
this thread being queued for the GIL and resulting in extended delays
for threads waiting to acquire the lock.
Igor Fedotov [Thu, 21 Aug 2025 10:42:54 +0000 (13:42 +0300)]
test/libcephfs: use more entries to reproduce snapdiff fragmentation
issue.
Snapdiff listing fragments have different boundaries in Reef and Squid+
releases hence original reproducer (made for Reef) doesn't work properly
in S+ releases. This patch fixes that at cost of longer execution.
This might be redundant/senseless when backporting to Reef.
Related-to: https://tracker.ceph.com/issues/72518 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 23397d32607fc307359d63cd651df3c83ada3a7f)
Igor Fedotov [Tue, 12 Aug 2025 13:17:49 +0000 (16:17 +0300)]
mds: rollback the snapdiff fragment entries with the same name if needed.
This is required when more entries with the same name don't fit into the
fragment. With the existing means for fragment offset specification such a splitting to be
prohibited.
Fixes: https://tracker.ceph.com/issues/72518 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 24955e66f4826f8623d2bec1dbfc580f0e4c39ae)
Problem:
The readdir wouldn't list all the entries in the directory
when the osd is full with rstats enabled.
Cause:
The issue happens only in multi-mds cephfs cluster. If rstats
is enabled, the readdir would request 'Fa' cap on every dentry,
basically to fetch the size of the directories. Note that 'Fa' is
CEPH_CAP_GWREXTEND which maps to CEPH_CAP_FILE_WREXTEND and is
used by CEPH_STAT_RSTAT.
The request for the cap is a getattr call and it need not go to
the auth mds. If rstats is enabled, the getattr would go with
the mask CEPH_STAT_RSTAT which mandates the requirement for
auth-mds in 'handle_client_getattr', so that the request gets
forwarded to auth mds if it's not the auth. But if the osd is full,
the indode is fetched in the 'dispatch_client_request' before
calling the handler function of respective op, to check the
FULL cap access for certain metadata write operations. If the inode
doesn't exist, ESTALE is returned. This is wrong for the operations
like getattr, where the inode might not be in memory on the non-auth
mds and returning ESTALE is confusing and client wouldn't retry. This
is introduced by the commit 6db81d8479b539d which fixes subvolume
deletion when osd is full.
Fix:
Fetch the inode required for the FULL cap access check for the
relevant operations in osd full scenario. This makes sense because
all the operations would mostly be preceded with lookup and load
the inode in memory or they would handle ESTALE gracefully.