Jos Collin [Thu, 16 Oct 2025 11:14:39 +0000 (16:44 +0530)]
Merge PR #65826 into wip-jcollin-testing-20251016.111424-tentacle
* refs/pull/65826/head:
pybind/cephfs: fix including of platform_errno.h
pybind: convert ceph errno to host-based errno
src/include: move ceph_to_hostos_errno() to separate header file
qa: set -x for qa/workunits/libcephfs/test.sh
Jos Collin [Thu, 16 Oct 2025 11:14:33 +0000 (16:44 +0530)]
Merge PR #65913 into wip-jcollin-testing-20251016.111424-tentacle
* refs/pull/65913/head:
test/libcephfs: add test for fsync on a write delegated inode
client: adjust `Fb` cap ref count check during synchronous fsync()
client: crash caused by invalid iterator in _readdir_cache_cb
Capacity of `readdir_cache` may change after `client_lock` is unlocked in iterations of `readdir_cache`,
and it can cause the iterator to be invalid, then using the invalid iterator in the next iteration will cause crash.
Crash may happen at `Dentry *dn = *pd` (pd points to invalid memory),
or at `if (pd >= dir->readdir_cache.end() || *pd != dn)` (pd is smaller than begin() if idx is negative).
Use index instead of iterator to solve this problem.
Allow the user to control the content of the build image with a
high-level `--image-variant=` switch. Currently the supported values are
`default` (the same maximal image we have been generating) and
`packages` a slimmer image that avoids installing certain test-only
dependencies.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 13 Oct 2025 20:23:10 +0000 (16:23 -0400)]
install-deps.sh: let FOR_MAKE_CHECK variable take precedence
Previously, the FOR_MAKE_CHECK variable could only enable installing
extra (test) dependencies when install-deps.sh was used and it was
ignored if `tty -s` exited true. This change allows FOR_MAKE_CHECK to
take precedence over the tty check and to specify one of true, 1, yes to
enable extra "for make check" deps or false, 0, no to explicitly disable
the extra deps.
Based-on-work-by: Dan Mick <dan.mick@redhat.com> Signed-off-by: John Mulligan <jmulligan@redhat.com>
Venky Shankar [Mon, 2 Jun 2025 05:08:01 +0000 (05:08 +0000)]
test/libcephfs: validate asynchronous write and fsync executing concurrently
This synthetic reproducer does three things:
- setup a client mount with a configuration to delay write operations and
initiates a write operation via a thread.
- a thread that invokes asynchronous fsync
- a thread that invokes setxattr for the client to track early replies
Without the fix[0], the test reproduces the following crash:
Venky Shankar [Tue, 3 Jun 2025 10:04:44 +0000 (10:04 +0000)]
client: catch buggy reference count drop for MetaRequest
With the prior commit that introduces a synthetic delay in write
operation so as to write a test reproducer which would interleave
asynchronous fsync and an operation that makes the MDS send a early
reply to the client (therefore, having the client track the early
replied response for an inode in Inode::unsafe_ops). Now, this is
enough to trick the client into the code path that causes a buggy
reference drop for the request (MetaRequest), but, hitting the
_exact_ crash backtrace requires the request to be a in various
[x]list's.
This last bit is tricky to synthetically massage in the test. So,
in order to catch the buggy reference drop, it would suffice to
assert on the reference count dropping to less than zero (0).
Abhishek Desai [Thu, 9 Oct 2025 07:49:34 +0000 (13:19 +0530)]
mgr/dashboard : Fixed usage bar for secondary site in rbd mirroing
fixes : https://tracker.ceph.com/issues/73447 Signed-off-by: Abhishek Desai <abhishek.desai1@ibm.com>
(cherry picked from commit 60140b1ccc8006325632320e39fc209724524aef)
This commit refactors setup_metadata_devices into smaller helper methods.
It keeps the distinction between existing logical volumes and raw devices
explicit, centralizes tag handling and path assignment to make the
control flow obvious and separates responsibilities for checking, creating,
and tagging devices.
ceph-volume: use udev data instead of LVM subprocess in get_devices()
Replace the check using `lvm.get_device_lvs(diskname)`, which
spawned a `pvs` subprocess, with a direct check on `/run/udev/data`
via `UdevData(diskname).is_lvm`.
This avoids spawning subprocesses while scanning devices. It improves
performance on systems with many disks, and keeps the device filtering
logic intact.
client: adjust `Fb` cap ref count check during synchronous fsync()
cephfs client holds a ref on Fb caps when handing out a write delegation[0].
As fsync from (Ganesha) client holding write delegation will block indefinitely[1]
waiting for cap ref for Fb to drop to 0, which will never happen until the
delegation is returned/recalled.
If an inode has been write delegated, adjust for cap reference count
check in fsync().
Note: This only workls for synchronous fsync() since `client_lock` is
held for the entire duration of the call (at least till the patch leading
upto the reference count check). Asynchronous fsync() needs to be fixed
separately (as that can drop `client_lock`).
Naman Munet [Mon, 29 Sep 2025 04:51:06 +0000 (10:21 +0530)]
mgr/dashboard: Rename side-nav panel items
Fixes: https://tracker.ceph.com/issues/73252
Commit includes changes:
1) Renaming Topic to Notification destination
2) Renaming Tiering to Storage class
3) Renaming Users to User Management
4) fix storage class table refresh after delete
5) Also made changes to internal routing for topic and storage class
John Mulligan [Wed, 8 Oct 2025 20:41:36 +0000 (16:41 -0400)]
script/build-with-container: improve error handling for invalid distros
Instead of throwing a long obnoxious traceback at the user if the value
supplied to -d/--distro is invalid do something nicer. For example:
```
$ ./src/script/build-with-container.py -d trixy -e build
usage: build-with-container.py [-h] [--help-build-steps]
build-with-container.py: error: argument --distro/-d: unknown distro: 'trixy' not in centos10, centos10stream, centos8, centos9, centos9stream, rocky9, rockylinux9, rocky10, rockylinux10, fedora41, fc41, fedora42, fc42, fedora43, fc43, ubuntu20.04, ubuntu-focal, focal, ubuntu22.04, ubuntu-jammy, jammy, ubuntu24.04, ubuntu-noble, noble, debian12, debian-bookworm, bookworm, debian13, debian-trixie, trixie
John Mulligan [Wed, 8 Oct 2025 14:23:25 +0000 (10:23 -0400)]
script/build-with-container: be consistent with naming in distro kinds
Update the DistroKind enum and related items so that the naming is
applied consistently. That is: the canonical (no pun indented) form
of the name is "<name><version>" and codenames, such as "jammy" or
"bookworm" are aliases. This matches the previously existing code.
John Mulligan [Thu, 28 Aug 2025 23:39:06 +0000 (19:39 -0400)]
build-with-container: ensure npm dir is set up before configure
When the npm cache path option is passed the npm cache dir is passed
to all container `run` commands, ensure the dir has been created
before the first container command (configure) is used.
Nizamudeen A [Thu, 11 Sep 2025 05:29:47 +0000 (10:59 +0530)]
mgr/dashboard: improve search and pagination behavior
add a throttle to the pagination cycle so that if you repeatedly try to
cycle through the page, it increases the delay. Doing this because
unlike search the button click to change page is deliberate and the
first click to the button should respond immediately.
another thing is that the search with a keyword stores every keystroke i
do in the search field and then after the debouncce interval it sends
all those request one by one.
for eg: if i type 222 it waits 1s for the
debounce timer and then sends a request to find osd with id 2 first then
again 2 and then again 2. Instead it should only send 222 at the end.
Rishabh Dave [Wed, 20 Aug 2025 07:41:04 +0000 (13:11 +0530)]
src/include: move ceph_to_hostos_errno() to separate header file
Including src/include/types.h in src/pybind/cephfs/types.pxd leads to
compilation error: "fatal error: acconfig.h: No such file or directory".
types.h as well as int_types.h include acconfig.h header file.
Move the code to be included in types.pxd to a separate file where
acconfig.h won't be included, thus preventing this error.
Rishabh Dave [Sun, 31 Aug 2025 18:50:19 +0000 (00:20 +0530)]
qa: set -x for qa/workunits/libcephfs/test.sh
LibCephFS unit tests are compiled into different binary files and run
after another but without logging name of the binary being executed,
which can make it bit difficult to find out which binary/test group is
being run. Therefore "set -x" in the script so that binary name/test
group is printed before tests run.
mon/OSDMonitor.cc: optionally display availability status in json
This commit enables users to specify the format option for the
data availability feature. Now if the users specific json-pretty,
output will be displayed in the given format.
John Mulligan [Thu, 13 Feb 2025 20:59:42 +0000 (15:59 -0500)]
ceph.spec.in: use rpm macro for python shebang pathfix
To support EL 10 distros, update the source of the pathfix tool (on EL
9+ distros) and use the macro for updating python shebangs that has been
available since at least EL 9.
Casey Bodley [Tue, 19 Aug 2025 13:44:52 +0000 (09:44 -0400)]
rpm: require gcc >= 13.3 regardless of gts_version
when gts_version is not set, bump the required version of gcc-c++ >= 13.3.
move this into a `%if 0%{?gts_version} == 0` block to prevent that from
applying to builds using gts, because the distro probably doesn't
provide a recent enough gcc-c++
John Mulligan [Fri, 27 Jun 2025 15:08:39 +0000 (11:08 -0400)]
ceph.spec.in: conditionalize crimson gts version on el10
EL10 distros come with GCC 14. When crimson was enabled it was always
trying to set gts_version to 13 (gcc-toolset version). Make the use of
gts version conditional on using el versions lower than 10.
Casey Bodley [Sat, 7 Jun 2025 01:43:33 +0000 (21:43 -0400)]
valgrind: wildcard glibc version for dlopen() leak suppression
the original suppression for "dlopen@@GLIBC_2.2.5" is very similar to
several later suppressions for "dlopen@@GLIBC_2.34". add a wildcard to
the original suppression so the rest can be removed
this also helps suppress a new leak, seen with gcc-13:
{
<insert_a_suppression_name_here>
Memcheck:Leak
match-leak-kinds: reachable
fun:malloc
fun:UnknownInlinedFun
fun:decompose_rpath
fun:_dl_map_object
fun:dl_open_worker_begin
fun:_dl_catch_exception
fun:dl_open_worker
fun:_dl_catch_exception
fun:_dl_open
fun:dlopen_doit
fun:_dl_catch_exception
fun:_dl_catch_error
fun:_dlerror_run
fun:dlopen@@GLIBC_2.34
fun:_sub_I_65535_0.0
fun:call_init
fun:call_init
fun:_dl_init
obj:/usr/lib64/ld-linux-x86-64.so.2
obj:*
obj:*
obj:*
obj:*
obj:*
obj:*
obj:*
}
Casey Bodley [Sat, 7 Jun 2025 01:27:20 +0000 (21:27 -0400)]
valgrind: update rocksdb ObjectLibrary leak suppression for gcc-13
the suppression for gcc-13 only differs on two lines, so add wildcards
to match either. the diff between the current suppression and the new
one follows:
rpm: reenable lto for gcc-toolset-13 by requiring 13.3
referenced gcc bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359
shows that it was resolved for 13.3. reenable lto for gcc-toolset-13 by
requiring 13.3 or later
mgr/dashboard: Local storage class creation via dashboard doesn't handle creation of pool.
Fixes: https://tracker.ceph.com/issues/72569 Signed-off-by: Dnyaneshwari <dtalweka@redhat.com>
mgr/dashboard: handle creation of new pool
Commit includes:
1) Provide link to create a new pool
2) Refactored validation on ACL mapping, removed required validator as default
3) fixed runtime error on console due to ACL length due to which the details section was not opening
4) Used rxjs operators to make API calls and making form ready once all data is available, fixing the form patch issues
5) Refactored some part of code to improve the performance
6) Added zone and pool information in details section for local storage class
Fixes: https://tracker.ceph.com/issues/72569 Signed-off-by: Naman Munet <naman.munet@ibm.com>
(cherry picked from commit 2d0e71c845643a26d4425ddac8ee0ff30153eff2)