Rishabh Dave [Tue, 2 Sep 2025 17:37:36 +0000 (23:07 +0530)]
client: trim path before logging it
Path can be virtually infinitely long and logging a long long path
(imagine around 2000 path components) is un-useful as well as lowers
readability of the log. Therefore, trim before logging.
Fixes: https://tracker.ceph.com/issues/72993 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Rishabh Dave [Thu, 21 Aug 2025 11:51:48 +0000 (17:21 +0530)]
mds: for logging generate only 10 final components of dentry path
Generating full absolute path for dentries for printing in MDS logs
slows the down the FS to a great extent especially when the path is very
long (imagine a path with 2000 components). Printing such long paths in
MDS logs is not only pointless but also greatly reduces the readability
of MDS logs.
Therefore, generate only 10 final components of the dentry paths for logging.
Fixes: https://tracker.ceph.com/issues/72779 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Rishabh Dave [Sun, 17 Aug 2025 18:13:40 +0000 (23:43 +0530)]
mds: for logging generate only 10 final components of inode path
Generating full absolute path for inodes for printing in MDS logs slows
down the FS to a great extent especially when the path is very long
(imagine a path with 2000 components). Also printing such long paths in
MDS logs is not only pointless but also greatly reduces the readability
of the MDS logs.
Therefore, generate only 10 final components of inode paths for logging.
Fixes: https://tracker.ceph.com/issues/72779 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Rishabh Dave [Fri, 25 Jul 2025 08:20:06 +0000 (13:50 +0530)]
qa, test: run unit tests for cephfs.pyx with non-root user
Run test_python.sh with non-root user. This makes it necessary to change
the owner user and group of file system root to be same as this non-root
user. This brings testing closer to the real-world scenario and also
allows exercising negative tests where an FS op would fail for a non-root
user but it would pass for root user.
There are few tests that exercise FS operations where root user is
needed. Group these tests under a separate class and add extra code for
this class that allows these tests to run with root UID and GID.
Rishabh Dave [Fri, 13 Jun 2025 07:13:51 +0000 (12:43 +0530)]
pybind/cephfs, mgr/volumes: refactor purge() to be non-recursive
Method purge() in trash.py calls rmtree() which is recursive method. To
avoid Python's recurision limit, switch to non-recursive approach.
Path to directory along directory handle are clubbed in to a tuple and
that tuple is stored on the stack. Storing directory handle reduces call
to opendir() dramatically.
Fixes: https://tracker.ceph.com/issues/71648 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Anikait Sehwag [Wed, 25 Jun 2025 06:48:44 +0000 (12:18 +0530)]
mgr/dashboard: Carbonised Toast Notification
Used carbon toast component to carbonise toast notifications
Dashboard: Toast Notification carbonised
This PR replaces the existing ngx-toastr implementation with Carbon Design System toast notifications to maintain UI consistency across the Ceph dashboard application.
libcephfs_proxy: fix userperm pointer decoding for older protocols
The random data used to decode pointers coming from the old protocol was
taken from the client instead of using the global_random data, which is
the correct one.
libcephfs_proxy: remove unnecessary protocol references in daemon
With the new protocol structure definitions, it's not necessary to
explicitly access each field inside its version substructure (v0, for
example). Now all fields of the latest version are declared inside an
anonymous substructure that can be accessed without a prefix.
libcephfs_proxy: remove unnecessary protocol references in client
With the new protocol structure definitions, it's not necessary to
explicitly access each field inside its version substructure (v0, for
example). Now all fields of the latest version are declared inside an
anonymous substructure that can be accessed without a prefix.
libcephfs_proxy: fix protocol structures for backward compatibility
The structures used for transferring data between the proxy client and
the proxy daemon had been reworked in a recent change to be able to
expand the protocol. This caused an inconsistency in the size of the
data transferred when communication with a peer using the older version.
The result was that the peer receiving the data with an unexpected size
was closing the connection, causing unexpected errors.
The discrepancy in size is the result of how compilers pad structures
combined with the change in the structure layout introduced when
extending the protocol. With these changes, the computation of the size
of each version of the structures was not done correctly.
This change makes the layout equal to the older version, so that
computing the size of the structures becomes easier and doesn't depend
on unexpected paddings.
doc: update RGW HTTPS configuration to use certmgr and new fields
With the introduction of certmgr, RGW services now support three
certificate sources: cephadm-signed (default), inline, and reference.
Docs have been updated to:
- Show how to provide inline certificates using the new ssl_cert/ssl_key
fields instead of the deprecated rgw_frontend_ssl_certificate.
- Explain how to register and reference user-provided certs/keys
- Clarify that cephadm-signed certificates remain the default, with
optional wildcard SANs support.
The usage of rgw_frontend_ssl_certificate is still supported for
backward compatibility, but is now documented as deprecated.
Venky Shankar [Mon, 2 Jun 2025 05:08:01 +0000 (05:08 +0000)]
test/libcephfs: validate asynchronous write and fsync executing concurrently
This synthetic reproducer does three things:
- setup a client mount with a configuration to delay write operations and
initiates a write operation via a thread.
- a thread that invokes asynchronous fsync
- a thread that invokes setxattr for the client to track early replies
Without the fix[0], the test reproduces the following crash:
Venky Shankar [Tue, 3 Jun 2025 10:04:44 +0000 (10:04 +0000)]
client: catch buggy reference count drop for MetaRequest
With the prior commit that introduces a synthetic delay in write
operation so as to write a test reproducer which would interleave
asynchronous fsync and an operation that makes the MDS send a early
reply to the client (therefore, having the client track the early
replied response for an inode in Inode::unsafe_ops). Now, this is
enough to trick the client into the code path that causes a buggy
reference drop for the request (MetaRequest), but, hitting the
_exact_ crash backtrace requires the request to be a in various
[x]list's.
This last bit is tricky to synthetically massage in the test. So,
in order to catch the buggy reference drop, it would suffice to
assert on the reference count dropping to less than zero (0).
Laura Flores [Fri, 5 Sep 2025 21:46:20 +0000 (16:46 -0500)]
doc/rados/operations: add kernel client procedure to read balancer documentation
As of now, the kernel client does not support `pg-upmap-primary`. I have
added some troubleshooting steps to help users who are unable to
mount images and filesystems with the kernel client while using `pg-upmap-primary`.
Once the feature is supported by the kernel client, users will be able
to perform mounts along with `pg-upmap-primary`.
Fixes: https://tracker.ceph.com/issues/72897 Signed-off-by: Laura Flores <lflores@ibm.com>
N Balachandran [Thu, 28 Aug 2025 06:22:23 +0000 (11:52 +0530)]
rgw/logging: fixes data loss during rollover
Multiple threads attempting to roll over the same log object can result
in the creation of numerous orphan tail objects, each with a single record.
This occurs when a NULL RGWObjVersionTracker is used during the creation of
a new logging object. These records are inaccessible, leading to data loss,
which is particularly critical in Journal mode.
Furthermore, valid log tail objects may be added to the Garbage Collection (GC)
list, exacerbating data loss.
Fixes: https://tracker.ceph.com/issues/72740 Signed-off-by: N Balachandran <nithya.balachandran@ibm.com>
Bill Scales [Wed, 9 Apr 2025 09:58:15 +0000 (10:58 +0100)]
test: add replica pool support to ceph_test_rados_io_sequence
Make 'ceph_test_rados_io_sequenece --pool rbd' work, replica
pools don't have an erausre code profile and do not have the
ec_allow_overwrites or ec_allow_optimizations flags
Fixes: https://tracker.ceph.com/issues/70844 Signed-off-by: Bill Scales <bill_scales@uk.ibm.com> Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
Jon Bailey [Mon, 14 Jul 2025 12:52:28 +0000 (13:52 +0100)]
common: Added values to json::OSDPoolGetReply
OSDPoolGetReply actually returns a lot more values than what is currently supplied. These have been added in as optionals (as they can not be give as well) so its possible to query them to find out if they exist and use them if they do.
Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
Modify collect_omap_stats() to guarantee that only
one 'large omap entry' warning message is logged
per chunk, thus maintaining the existing behaviour.
Unlike the existing behaviour - all 'large omap'
entries are counted.