Jos Collin [Thu, 16 Oct 2025 05:53:07 +0000 (11:23 +0530)]
Merge PR #65817 into wip-jcollin-testing-20251016.055245-squid
* refs/pull/65817/head:
src/common: add helper to prepend "..." to trimmed paths
mds/ScrubStack: avoid generating inode path since it is unused
mds: fix few log entries
client: trim path before logging it
mds: log trimmed path wherever generating full path is necessary
mds: for logging generate only 10 final components of dentry path
mds: for logging generate only 10 final components of inode path
qa, test: run unit tests for cephfs.pyx with non-root user
test/pybind: add unit tests for rmtree() in cephfs python bindings
pybind/cephfs, mgr/volumes: refactor purge() to be non-recursive
Jos Collin [Thu, 16 Oct 2025 05:52:51 +0000 (11:22 +0530)]
Merge PR #65824 into wip-jcollin-testing-20251016.055245-squid
* refs/pull/65824/head:
mds: fix test that directory has no snaps
qa: test for child dir with first beyond parent snaps
qa: remove extraneous directory from test
qa: correct test description
Allow the user to control the content of the build image with a
high-level `--image-variant=` switch. Currently the supported values are
`default` (the same maximal image we have been generating) and
`packages` a slimmer image that avoids installing certain test-only
dependencies.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 13 Oct 2025 20:23:10 +0000 (16:23 -0400)]
install-deps.sh: let FOR_MAKE_CHECK variable take precedence
Previously, the FOR_MAKE_CHECK variable could only enable installing
extra (test) dependencies when install-deps.sh was used and it was
ignored if `tty -s` exited true. This change allows FOR_MAKE_CHECK to
take precedence over the tty check and to specify one of true, 1, yes to
enable extra "for make check" deps or false, 0, no to explicitly disable
the extra deps.
Based-on-work-by: Dan Mick <dan.mick@redhat.com> Signed-off-by: John Mulligan <jmulligan@redhat.com>
Abhishek Desai [Thu, 9 Oct 2025 07:49:34 +0000 (13:19 +0530)]
mgr/dashboard : Fixed usage bar for secondary site in rbd mirroing
fixes : https://tracker.ceph.com/issues/73447 Signed-off-by: Abhishek Desai <abhishek.desai1@ibm.com>
(cherry picked from commit 60140b1ccc8006325632320e39fc209724524aef)
John Mulligan [Wed, 8 Oct 2025 20:41:36 +0000 (16:41 -0400)]
script/build-with-container: improve error handling for invalid distros
Instead of throwing a long obnoxious traceback at the user if the value
supplied to -d/--distro is invalid do something nicer. For example:
```
$ ./src/script/build-with-container.py -d trixy -e build
usage: build-with-container.py [-h] [--help-build-steps]
build-with-container.py: error: argument --distro/-d: unknown distro: 'trixy' not in centos10, centos10stream, centos8, centos9, centos9stream, rocky9, rockylinux9, rocky10, rockylinux10, fedora41, fc41, fedora42, fc42, fedora43, fc43, ubuntu20.04, ubuntu-focal, focal, ubuntu22.04, ubuntu-jammy, jammy, ubuntu24.04, ubuntu-noble, noble, debian12, debian-bookworm, bookworm, debian13, debian-trixie, trixie
John Mulligan [Wed, 8 Oct 2025 14:23:25 +0000 (10:23 -0400)]
script/build-with-container: be consistent with naming in distro kinds
Update the DistroKind enum and related items so that the naming is
applied consistently. That is: the canonical (no pun indented) form
of the name is "<name><version>" and codenames, such as "jammy" or
"bookworm" are aliases. This matches the previously existing code.
John Mulligan [Thu, 28 Aug 2025 23:39:06 +0000 (19:39 -0400)]
build-with-container: ensure npm dir is set up before configure
When the npm cache path option is passed the npm cache dir is passed
to all container `run` commands, ensure the dir has been created
before the first container command (configure) is used.
John Mulligan [Sat, 15 Mar 2025 16:44:00 +0000 (12:44 -0400)]
install-deps: extract SUDO variable logic into a reusable function
While the function is pretty simple and could be copy-pasted I
prefer to extract things into functions to indicate that the
logic is used/repeated elsewhere to ward off making changes to
one copy vs the other.
Rishabh Dave [Tue, 2 Sep 2025 17:37:36 +0000 (23:07 +0530)]
client: trim path before logging it
Path can be virtually infinitely long and logging a long long path
(imagine around 2000 path components) is un-useful as well as lowers
readability of the log. Therefore, trim before logging.
Fixes: https://tracker.ceph.com/issues/72993 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit bdc8aae400fbbdd61df811455d49176deab1f331)
Conflicts:
src/include/filepath.cc
src/include/filepath.h
- Unlink main branch, filepath.cc is absent in this branch. Therefore,
the changes must be moved to filepath.h.
Conflicts:
src/mds/MDSAuthCaps.cc
src/test/mds/TestMDSAuthCaps.cc
- is_capable takes one less argument in Squid compared to main branch
version.
src/mds/Server.cc
src/mds/SessionMap.cc
- There is lesser including of other header files in both these files
leading to difficulty in patch application in this region.
mds: fix rank 0 marked damaged if stopping fails after Elid flush and log trimmed
steps to reproduce
../src/vstart.sh --debug --new -x --localhost --bluestore
./bin/ceph tell mds.<rank 0> config set mds_kill_shutdown_at 10
./bin/ceph fs set <fs name> down true
wait for a few seconds and will see the following log from take-over mds
and rank 0 is marked damaged
2025-09-11T16:47:24.591+0800 785dabeaa6c0 -1 log_channel(cluster) log [ERR] : No subtrees found for root MDS rank!
2025-09-11T16:47:24.591+0800 785dabeaa6c0 5 mds.beacon.b set_want_state: up:rejoin -> down:damaged
During shutdown_pass after submitting Elid and trimming mdlog, mds log
will now have only ELid event which does nothing at replay.
After replay, no subtree is found.
Fix this by checking whther MDLog contains only one event.
If so, skip the subtree check for rank 0, and allow it to request
STATE_STOPPED just like the other ranks.
mds/FSMap: fix join_fscid being incorrectly reset for active MDS during filesystem removal
Fix bug where active MDS daemons in remaining filesystems incorrectly
have their join_fscid cleared to FS_CLUSTER_ID_NONE when any other
filesystem is removed.
The issue was caused by variable name shadowing in erase_filesystem()
where the loop variable 'fscid' shadowed the function parameter 'fscid':
Inside loop: if (info.join_fscid == fscid) compared against the
loop variable (remaining FS ID) instead of parameter (removed FS ID)
Renamed the loop variable to 'remaining_fscid' to eliminate the shadowing
and ensure the comparison uses the correct filesystem ID.
Reproducer:
../src/vstart.sh --new -x --localhost --bluestore
FS=b
./bin/ceph osd pool create cephfs.${FS}.meta 64 64 replicated
./bin/ceph osd pool create cephfs.${FS}.data 64 64 replicated
./bin/ceph fs new ${FS} cephfs.${FS}.meta cephfs.${FS}.data
./bin/ceph config set mds.a mds_join_fs a
./bin/ceph config set mds.b mds_join_fs a
./bin/ceph fs fail ${FS}
./bin/ceph fs rm ${FS} --yes-i-really-mean-it
Then from ./bin/ceph fs dump
We can see join_fscid in all active mds filesystem 'a' are reset.
Since there are standby mds with join_fscid=1
MDSMonitor think they have better affinity and trigger switch over.
Kotresh HR [Wed, 23 Oct 2024 19:00:41 +0000 (00:30 +0530)]
client: Fix a deadlock when osd is full
Problem:
When osd is full, the client receives the notification
and cancels the ongoing writes. If the ongoing writes
are async, it could cause a dead lock as the async
callback registered also takes the 'client_lock' which
the handle_osd_map takes at the beginning.
The op_cancel_writes calls the callback registered for
the async write synchronously holding the 'client_lock'
causing the deadlock.
Earlier approach:
It was tried to solve this issue by calling 'op_cancel_writes'
without holding 'client_lock'. But this failed lock dependency
between objecter's 'rwlock' and async write's callback taking
'client_lock'. The 'client_lock' should always be taken before
taking 'rwlock'. So this approach is dropped against the current
approach.
Solution:
Use C_OnFinisher for objecter async write callback i.e., wrap
the async write's callback using the Finisher. This queues the
callback to the Finisher's context queue which the finisher
thread picks up and executes thus avoiding the deadlock.
Testing:
The fix is tested in the vstart cluster with the following reproducer.
1. Mount the cephfs volume using nfs-ganesha at /mnt
2. Run fio on /mnt on one terminal
3. On the other terminal, blocklist the nfs client session
4. The fio would hang
It is reproducing in the vstart cluster most of the times. I think
that's because it's slow. The same test written for teuthology is
not reproducing the issue. The test expects one or more writes
to be on going in rados when the client is blocklisted for the deadlock
to be hit.
Stripped down version of Traceback:
----------
0 0x00007f4d77274960 in __lll_lock_wait ()
1 0x00007f4d7727aff2 in pthread_mutex_lock@@GLIBC_2.2.5 ()
2 0x00007f4d7491b0a1 in __gthread_mutex_lock (__mutex=0x7f4d200f99b0)
3 std::mutex::lock (this=<optimized out>)
4 std::scoped_lock<std::mutex>::scoped_lock (__m=..., this=<optimized out>, this=<optimized out>, __m=...)
5 Client::C_Lock_Client_Finisher::finish (this=0x7f4ca0103550, r=-28)
6 0x00007f4d74888dfd in Context::complete (this=0x7f4ca0103550, r=<optimized out>)
7 0x00007f4d7498850c in std::__do_visit<...>(...) (__visitor=...)
8 std::visit<Objecter::Op::complete(...) (__visitor=...)
9 Objecter::Op::complete(...) (e=..., e=..., r=-28, ec=..., f=...)
10 Objecter::Op::complete (e=..., r=-28, ec=..., this=0x7f4ca022c7f0)
11 Objecter::op_cancel (this=0x7f4d200fab20, s=<optimized out>, tid=<optimized out>, r=-28)
12 0x00007f4d7498ea12 in Objecter::op_cancel_writes (this=0x7f4d200fab20, r=-28, pool=103)
13 0x00007f4d748e1c8e in Client::_handle_full_flag (this=0x7f4d200f9830, pool=103)
14 0x00007f4d748ed20c in Client::handle_osd_map (m=..., this=0x7f4d200f9830)
15 Client::ms_dispatch2 (this=0x7f4d200f9830, m=...)
16 0x00007f4d75b8add2 in Messenger::ms_deliver_dispatch (m=..., this=0x7f4d200ed3e0)
17 DispatchQueue::entry (this=0x7f4d200ed6f0)
18 0x00007f4d75c27fa1 in DispatchQueue::DispatchThread::entry (this=<optimized out>)
19 0x00007f4d77277c02 in start_thread ()
20 0x00007f4d772fcc40 in clone3 ()
--------
Rishabh Dave [Thu, 21 Aug 2025 11:51:48 +0000 (17:21 +0530)]
mds: for logging generate only 10 final components of dentry path
Generating full absolute path for dentries for printing in MDS logs
slows the down the FS to a great extent especially when the path is very
long (imagine a path with 2000 components). Printing such long paths in
MDS logs is not only pointless but also greatly reduces the readability
of MDS logs.
Therefore, generate only 10 final components of the dentry paths for logging.
Fixes: https://tracker.ceph.com/issues/72779 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 1430cd67d8f7bd7d98b241a7511fa3ceb7e5ba2e)
Conflicts:
src/include/filepath.cc
- Unlike main branch, this file is absent in squid
Rishabh Dave [Sun, 17 Aug 2025 18:13:40 +0000 (23:43 +0530)]
mds: for logging generate only 10 final components of inode path
Generating full absolute path for inodes for printing in MDS logs slows
down the FS to a great extent especially when the path is very long
(imagine a path with 2000 components). Also printing such long paths in
MDS logs is not only pointless but also greatly reduces the readability
of the MDS logs.
Therefore, generate only 10 final components of inode paths for logging.
Fixes: https://tracker.ceph.com/issues/72779 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 1518690210f3a4473978c7a9274e902fccaad862)
Conflicts:
src/mds/CDir.cc
- Certain code region where trimmed inode path was generated was
modified to generated inode path but that code region is absent on
this branch.
Rishabh Dave [Fri, 25 Jul 2025 08:20:06 +0000 (13:50 +0530)]
qa, test: run unit tests for cephfs.pyx with non-root user
Run test_python.sh with non-root user. This makes it necessary to change
the owner user and group of file system root to be same as this non-root
user. This brings testing closer to the real-world scenario and also
allows exercising negative tests where an FS op would fail for a non-root
user but it would pass for root user.
There are few tests that exercise FS operations where root user is
needed. Group these tests under a separate class and add extra code for
this class that allows these tests to run with root UID and GID.
Rishabh Dave [Fri, 13 Jun 2025 07:13:51 +0000 (12:43 +0530)]
pybind/cephfs, mgr/volumes: refactor purge() to be non-recursive
Method purge() in trash.py calls rmtree() which is recursive method. To
avoid Python's recurision limit, switch to non-recursive approach.
Path to directory along directory handle are clubbed in to a tuple and
that tuple is stored on the stack. Storing directory handle reduces call
to opendir() dramatically.
Fixes: https://tracker.ceph.com/issues/71648 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit f9046ca052d10a884a59c1d928cb0c8f0235696b)
Rishabh Dave [Wed, 2 Apr 2025 15:25:32 +0000 (20:55 +0530)]
mgr/vol: add command to get snapshot path
Fixes: https://tracker.ceph.com/issues/70815 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 50d28992d99fcd67390815aa42f9da8ffaa82575)
Conflicts:
src/pybind/mgr/volumes/fs/volume.py
- Line where the original patch makes the change is slightly different
in main compared to Squid branch, leading to conflict.
1. Fixes the promql expr used to calculate "In" OSDs in
ceph-cluster-advanced.json.
2. Fixes the color coding for the single state panels used in the OSDs
grafana panel like "In", "Out" etc
according to `dpkg-buildflags`, ubuntu 24 raised this value to
`-D_FORTIFY_SOURCE=3` which causes `error: "_FORTIFY_SOURCE" redefined`
compilation failures because Ceph itself adds `-D_FORTIFY_SOURCE=2`
`_FORTIFY_SOURCE` is a hardening option. both our rpm and debian builds
already specify that via environment variables, so Ceph's cmake should
leave it alone
Anoop C S [Mon, 23 Sep 2024 07:06:55 +0000 (12:36 +0530)]
client: Gracefully handle empty pathname for statxat()
man statx(2)[1] says the following:
. . .
AT_EMPTY_PATH
If pathname is an empty string, operate on the file referred to by
dirfd (which may have been obtained using the open(2) O_PATH flag).
In this case, dirfd can refer to any type of file, not just a
directory.
If dirfd is AT_FDCWD, the call operates on the current working
directory.
. . .
Look out for an empty pathname and use the relative fd's inode in the
presence of AT_EMPTY_PATH flag before calling internal _getattr().
Fixes: https://tracker.ceph.com/issues/68189
Review with: git show -w
Anoop C S [Thu, 17 Oct 2024 16:15:17 +0000 (21:45 +0530)]
libcephfs.h: Fix API documentation for ceph_statxat
flags parameter for ceph_statxat() API is supposed to accept only
AT_STATX_DONT_SYNC and AT_SYMLINK_NOFOLLOW. Modify the corresponding
documentation to reflect the acceptance of above two flags.
Anoop C S [Fri, 20 Sep 2024 08:49:01 +0000 (14:19 +0530)]
client: Gracefully handle empty pathname for chownat()
man fchownat(2)[1] says the following:
. . .
AT_EMPTY_PATH (since Linux 2.6.39)
If pathname is an empty string, operate on the file referred to by
dirfd (which may have been obtained using the open(2) O_PATH flag).
In this case, dirfd can refer to any type of file, not just a
directory. If dirfd is AT_FDCWD, the call operates on the current
working directory.
. . .
Look out for an empty pathname and use the relative fd's inode in the
presence of AT_EMPTY_PATH flag before calling internal _setattr().
Fixes: https://tracker.ceph.com/issues/68189
Review with: git show -w
test/rbd-mirror: eliminate a race in ResyncRequestedRemoteNotPrimary
Adjust the wait_for_notification call in TestMockImageReplayerSnapshotReplayer.ResyncRequestedRemoteNotPrimary
to expect 2 notifications instead of 1. This allows the test to correctly wait for both expected events
i.e for finish_sync() and handle_replay_complete(locker, -EREMOTEIO, "remote image demoted"), ensuring the
replayer transitions to STATE_COMPLETE and is_replaying() returns false as intended.
Fixes: https://tracker.ceph.com/issues/72325 Signed-off-by: VinayBhaskar-V <vvarada@redhat.com>
(cherry picked from commit b5a013f6170bb4445da8f5469243e4869b760a81)
VinayBhaskar-V [Tue, 13 May 2025 20:25:44 +0000 (01:55 +0530)]
rbd-mirror: prevent image deletion if remote image is not primary
A resync on a mirrored image may incorrectly results in the local
image being deleted even when the remote image is no longer primary.
This issue can occur under the following conditions:
* if resync is requested on the secondary before the remote image has
been fully demoted
* if the demotion of the primary image is not mirrored
due to the rbd-mirror daemon being offline.
This can be fixed by ensuring that image deletion during a resync is
only allowed when the remote image is confirmed to be primary.
This commit fixes the issue only for snapshot based mirroring mode
Fixes: https://tracker.ceph.com/issues/70948 Signed-off-by: VinayBhaskar-V <vvarada@redhat.com>
(cherry picked from commit e14afbc95a5fb8f5a33e7ea23a035992b966d671)
Adam King [Mon, 22 Sep 2025 21:05:07 +0000 (17:05 -0400)]
pybind/mgr: pin cheroot version in requirements-required.txt
With python 3.10 (didn't seem to happen with python 3.12) the
pybind/mgr/cephadm/tests/test_node_proxy.py test times out.
This appears to be related to a new release of the cheroot
package and a github issues describing the same problem
we're seeing has been opened by another user
https://github.com/cherrypy/cheroot/issues/769
It is worth noting that the workaround described in that
issue does also work for us. If you add
John Mulligan [Fri, 12 Sep 2025 17:52:25 +0000 (13:52 -0400)]
build-with-container: add argument groups to organize options
Use the argparse add_argument_group feature to organize the mass of
arguments into more sensible categories. Hopefully, someone reading
over the `--help` output can now more easily see options that
are useful rather than being overwhelmed by a wall of text.