Jos Collin [Thu, 9 Oct 2025 08:25:05 +0000 (13:55 +0530)]
Merge PR #65808 into wip-jcollin-testing-20251009.082449-tentacle
* refs/pull/65808/head:
doc: added a note for damaged hard links in scrub documentation
qa: add a test to verify that a damage hard link id detected during scrub
mds: identify damaged hard links during scrub
Jos Collin [Thu, 9 Oct 2025 08:25:00 +0000 (13:55 +0530)]
Merge PR #65812 into wip-jcollin-testing-20251009.082449-tentacle
* refs/pull/65812/head:
src/common: add helper to prepend "..." to trimmed paths
mds/ScrubStack: avoid generating inode path since it is unused
mds: fix few log entries
client: trim path before logging it
mds: log trimmed path wherever generating full path is necessary
mds: for logging generate only 10 final components of dentry path
mds: for logging generate only 10 final components of inode path
qa, test: run unit tests for cephfs.pyx with non-root user
test/pybind: add unit tests for rmtree() in cephfs python bindings
pybind/cephfs, mgr/volumes: refactor purge() to be non-recursive
John Mulligan [Wed, 8 Oct 2025 20:41:36 +0000 (16:41 -0400)]
script/build-with-container: improve error handling for invalid distros
Instead of throwing a long obnoxious traceback at the user if the value
supplied to -d/--distro is invalid do something nicer. For example:
```
$ ./src/script/build-with-container.py -d trixy -e build
usage: build-with-container.py [-h] [--help-build-steps]
build-with-container.py: error: argument --distro/-d: unknown distro: 'trixy' not in centos10, centos10stream, centos8, centos9, centos9stream, rocky9, rockylinux9, rocky10, rockylinux10, fedora41, fc41, fedora42, fc42, fedora43, fc43, ubuntu20.04, ubuntu-focal, focal, ubuntu22.04, ubuntu-jammy, jammy, ubuntu24.04, ubuntu-noble, noble, debian12, debian-bookworm, bookworm, debian13, debian-trixie, trixie
John Mulligan [Wed, 8 Oct 2025 14:23:25 +0000 (10:23 -0400)]
script/build-with-container: be consistent with naming in distro kinds
Update the DistroKind enum and related items so that the naming is
applied consistently. That is: the canonical (no pun indented) form
of the name is "<name><version>" and codenames, such as "jammy" or
"bookworm" are aliases. This matches the previously existing code.
John Mulligan [Thu, 28 Aug 2025 23:39:06 +0000 (19:39 -0400)]
build-with-container: ensure npm dir is set up before configure
When the npm cache path option is passed the npm cache dir is passed
to all container `run` commands, ensure the dir has been created
before the first container command (configure) is used.
Rishabh Dave [Tue, 2 Sep 2025 17:37:36 +0000 (23:07 +0530)]
client: trim path before logging it
Path can be virtually infinitely long and logging a long long path
(imagine around 2000 path components) is un-useful as well as lowers
readability of the log. Therefore, trim before logging.
Fixes: https://tracker.ceph.com/issues/72993 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit bdc8aae400fbbdd61df811455d49176deab1f331)
Conflicts:
src/include/filepath.cc
- filepath.cc is absent in reef branches, patches for it are manually to
filepath.h instead.
src/include/filepath.h
- This file was modified in one of the previous commits to have the
defintion of set_trimmed() instead of declaration since filepath.cc is
absent in this branch.
Conflicts:
src/mds/MDSAuthCaps.cc
src/mds/Server.cc
src/test/mds/TestMDSAuthCaps.cc
- All three files were different from their main branch version which
led to this conflict.
Rishabh Dave [Thu, 21 Aug 2025 11:51:48 +0000 (17:21 +0530)]
mds: for logging generate only 10 final components of dentry path
Generating full absolute path for dentries for printing in MDS logs
slows the down the FS to a great extent especially when the path is very
long (imagine a path with 2000 components). Printing such long paths in
MDS logs is not only pointless but also greatly reduces the readability
of MDS logs.
Therefore, generate only 10 final components of the dentry paths for logging.
Fixes: https://tracker.ceph.com/issues/72779 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 1430cd67d8f7bd7d98b241a7511fa3ceb7e5ba2e)
Conflicts:
src/include/filepath.cc
- this file is absent in tentacle so changes need to be moved to
filepath.h.
Rishabh Dave [Sun, 17 Aug 2025 18:13:40 +0000 (23:43 +0530)]
mds: for logging generate only 10 final components of inode path
Generating full absolute path for inodes for printing in MDS logs slows
down the FS to a great extent especially when the path is very long
(imagine a path with 2000 components). Also printing such long paths in
MDS logs is not only pointless but also greatly reduces the readability
of the MDS logs.
Therefore, generate only 10 final components of inode paths for logging.
Fixes: https://tracker.ceph.com/issues/72779 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 1518690210f3a4473978c7a9274e902fccaad862)
Conflicts:
src/mds/CDir.cc
- the "else if" clause where changes were made in main's version is absent
in tentacle.
Rishabh Dave [Fri, 25 Jul 2025 08:20:06 +0000 (13:50 +0530)]
qa, test: run unit tests for cephfs.pyx with non-root user
Run test_python.sh with non-root user. This makes it necessary to change
the owner user and group of file system root to be same as this non-root
user. This brings testing closer to the real-world scenario and also
allows exercising negative tests where an FS op would fail for a non-root
user but it would pass for root user.
There are few tests that exercise FS operations where root user is
needed. Group these tests under a separate class and add extra code for
this class that allows these tests to run with root UID and GID.
Rishabh Dave [Fri, 13 Jun 2025 07:13:51 +0000 (12:43 +0530)]
pybind/cephfs, mgr/volumes: refactor purge() to be non-recursive
Method purge() in trash.py calls rmtree() which is recursive method. To
avoid Python's recurision limit, switch to non-recursive approach.
Path to directory along directory handle are clubbed in to a tuple and
that tuple is stored on the stack. Storing directory handle reduces call
to opendir() dramatically.
Fixes: https://tracker.ceph.com/issues/71648 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit f9046ca052d10a884a59c1d928cb0c8f0235696b)
For a remote link, try to open the dentry (this part of code
is copied from the mds path_traversal) which internally pushes
the dir_frag in the damage list if applicable.
John Mulligan [Thu, 13 Feb 2025 20:59:42 +0000 (15:59 -0500)]
ceph.spec.in: use rpm macro for python shebang pathfix
To support EL 10 distros, update the source of the pathfix tool (on EL
9+ distros) and use the macro for updating python shebangs that has been
available since at least EL 9.
Casey Bodley [Tue, 19 Aug 2025 13:44:52 +0000 (09:44 -0400)]
rpm: require gcc >= 13.3 regardless of gts_version
when gts_version is not set, bump the required version of gcc-c++ >= 13.3.
move this into a `%if 0%{?gts_version} == 0` block to prevent that from
applying to builds using gts, because the distro probably doesn't
provide a recent enough gcc-c++
John Mulligan [Fri, 27 Jun 2025 15:08:39 +0000 (11:08 -0400)]
ceph.spec.in: conditionalize crimson gts version on el10
EL10 distros come with GCC 14. When crimson was enabled it was always
trying to set gts_version to 13 (gcc-toolset version). Make the use of
gts version conditional on using el versions lower than 10.
Casey Bodley [Sat, 7 Jun 2025 01:43:33 +0000 (21:43 -0400)]
valgrind: wildcard glibc version for dlopen() leak suppression
the original suppression for "dlopen@@GLIBC_2.2.5" is very similar to
several later suppressions for "dlopen@@GLIBC_2.34". add a wildcard to
the original suppression so the rest can be removed
this also helps suppress a new leak, seen with gcc-13:
{
<insert_a_suppression_name_here>
Memcheck:Leak
match-leak-kinds: reachable
fun:malloc
fun:UnknownInlinedFun
fun:decompose_rpath
fun:_dl_map_object
fun:dl_open_worker_begin
fun:_dl_catch_exception
fun:dl_open_worker
fun:_dl_catch_exception
fun:_dl_open
fun:dlopen_doit
fun:_dl_catch_exception
fun:_dl_catch_error
fun:_dlerror_run
fun:dlopen@@GLIBC_2.34
fun:_sub_I_65535_0.0
fun:call_init
fun:call_init
fun:_dl_init
obj:/usr/lib64/ld-linux-x86-64.so.2
obj:*
obj:*
obj:*
obj:*
obj:*
obj:*
obj:*
}
Casey Bodley [Sat, 7 Jun 2025 01:27:20 +0000 (21:27 -0400)]
valgrind: update rocksdb ObjectLibrary leak suppression for gcc-13
the suppression for gcc-13 only differs on two lines, so add wildcards
to match either. the diff between the current suppression and the new
one follows:
rpm: reenable lto for gcc-toolset-13 by requiring 13.3
referenced gcc bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359
shows that it was resolved for 13.3. reenable lto for gcc-toolset-13 by
requiring 13.3 or later
mgr/dashboard: Local storage class creation via dashboard doesn't handle creation of pool.
Fixes: https://tracker.ceph.com/issues/72569 Signed-off-by: Dnyaneshwari <dtalweka@redhat.com>
mgr/dashboard: handle creation of new pool
Commit includes:
1) Provide link to create a new pool
2) Refactored validation on ACL mapping, removed required validator as default
3) fixed runtime error on console due to ACL length due to which the details section was not opening
4) Used rxjs operators to make API calls and making form ready once all data is available, fixing the form patch issues
5) Refactored some part of code to improve the performance
6) Added zone and pool information in details section for local storage class
Fixes: https://tracker.ceph.com/issues/72569 Signed-off-by: Naman Munet <naman.munet@ibm.com>
(cherry picked from commit 2d0e71c845643a26d4425ddac8ee0ff30153eff2)
cephadm: support custom distros by falling back to ID_LIKE
This change enables cephadm to work on custom or derivative distributions
that are based on supported distros without requiring code changes for
each new custom/derivative distro.
Update `rate()` queries to be more accurate. The use of `irate()` leads
to misleading graphs because it only looks at the last 2 samples over
the selected time range step interval. Also use `$__rate_interval`
consistently in order to scale over short and long time ranges.
* Replace `irate()` with `rate()` to avoid sample bias.
* Use `$__rate_interval` consistently.
* Update auto_count/min to provide higher detail graphs.
1. Fixes the promql expr used to calculate "In" OSDs in
ceph-cluster-advanced.json.
2. Fixes the color coding for the single state panels used in the OSDs
grafana panel like "In", "Out" etc
Nizamudeen A [Tue, 16 Sep 2025 07:02:45 +0000 (12:32 +0530)]
mgr/dashboard: fix total capacity value in dashboard
Regression from a different commit
https://github.com/ceph/ceph/commit/2609d4f62e9e3906cf3e3fcc042bfdf0bcc633bf#diff-caee5ab662130fe721d15eca7a6e2dc79b671df025bde3bfd78c3c3ca4c578d1R249
In file included from /home/pdonnell/ceph/src/mds/FSMap.h:31,
from /home/pdonnell/ceph/src/mon/PaxosFSMap.h:20,
from /home/pdonnell/ceph/src/mon/MDSMonitor.h:26,
from /home/pdonnell/ceph/src/mon/FSCommands.cc:17:
/home/pdonnell/ceph/src/mds/MDSMap.h: In member function ‘int FileSystemCommandHandler::set_val(Monitor*, FSMap&, MonOpRequestRef, const cmdmap_t&, std::ostream&, FileSystemCommandHandler::fs_or_fscid, std::string, std::string)’:
/home/pdonnell/ceph/src/mds/MDSMap.h:223:40: warning: ‘fsp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
223 | bool test_flag(int f) const { return flags & f; }
| ^~~~~
/home/pdonnell/ceph/src/mon/FSCommands.cc:417:21: note: ‘fsp’ was declared here
417 | const Filesystem* fsp;
| ^~~
Adam King [Mon, 22 Sep 2025 21:05:07 +0000 (17:05 -0400)]
pybind/mgr: pin cheroot version in requirements-required.txt
With python 3.10 (didn't seem to happen with python 3.12) the
pybind/mgr/cephadm/tests/test_node_proxy.py test times out.
This appears to be related to a new release of the cheroot
package and a github issues describing the same problem
we're seeing has been opened by another user
https://github.com/cherrypy/cheroot/issues/769
It is worth noting that the workaround described in that
issue does also work for us. If you add