The snapshot data on any part of the subtree should be correct irrespective
enabling/disabling 'mds_use_global_snaprealm_seq_for_subvol' even with a
directory marked as subvolume.
mds: Don't use global snaprealm seq for subvolumes
Don't use global snaprealm seq number while doing cow
on old inodes for subvolume inode and inodes under it
i.e., for directories marked with 'ceph.dir.subvolume'
vxattr. This is safe because all the hardlink/renames
are contained within the same subvolume snaprealm and
doesn't cross the subvolume snaprealms
For the directories between / and subvolume snapshot
directory, use the global snaprealm seq to cow the
old inodes only if there is atleast one snapshot taken.
The above behavior is made optional with the mds config
'mds_use_global_snaprealm_seq_for_subvol'. The option
is enabled by default which means the above behaviour
is disabled by default. The option is suggested to be
disabled only on cephfs volumes used for pure subvolume
usecase.
mds: Fix mds crash while removing ceph.dir.subvolume
Removing vxattr 'ceph.dir.subvolume' on a directory without
it being set causes the mds to crash. This is because the
snaprealm would be null for the directory and the null check
is missing. Setting the vxattr, creates the snaprealm for
the directory as part of it. Hence, mds doesn't crash when
the vxattr is set and then removed. This patch fixes the same.
Traceback:
-------
Core was generated by `./ceph/build/bin/ceph-mds -i a -c ./ceph/build/ceph.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt
#0 0x00007f33f1aa8034 in __pthread_kill_implementation () from /lib64/libc.so.6
#1 0x00007f33f1a4ef1e in raise () from /lib64/libc.so.6
#2 0x0000562b148a6fd0 in reraise_fatal (signum=signum@entry=11) at /ceph/src/global/signal_handler.cc:88
#3 0x0000562b148a83d9 in handle_oneshot_fatal_signal (signum=11) at /ceph/src/global/signal_handler.cc:367
#4 <signal handler called>
#5 Server::handle_client_setvxattr (this=0x562b4ee3f800, mdr=..., cur=0x562b4ef9cc00) at /ceph/src/mds/Server.cc:6406
#6 0x0000562b145fadc2 in Server::handle_client_removexattr (this=0x562b4ee3f800, mdr=...) at /ceph/src/mds/Server.cc:7022
#7 0x0000562b145fbff0 in Server::dispatch_client_request (this=0x562b4ee3f800, mdr=...) at /ceph/src/mds/Server.cc:2825
#8 0x0000562b145fcfa2 in Server::handle_client_request (this=0x562b4ee3f800, req=...) at /ceph/src/mds/Server.cc:2676
#9 0x0000562b1460063c in Server::dispatch (this=0x562b4ee3f800, m=...) at /ceph/src/mds/Server.cc:382
#10 0x0000562b1450eb22 in MDSRank::handle_message (this=this@entry=0x562b4ef42008, m=...) at /ceph/src/mds/MDSRank.cc:1222
#11 0x0000562b14510c93 in MDSRank::_dispatch (this=this@entry=0x562b4ef42008, m=..., new_msg=new_msg@entry=true)
at /ceph/src/mds/MDSRank.cc:1045
#12 0x0000562b14511620 in MDSRankDispatcher::ms_dispatch (this=this@entry=0x562b4ef42000, m=...) at /ceph/src/mds/MDSRank.cc:1019
#13 0x0000562b144ff117 in MDSDaemon::ms_dispatch2 (this=0x562b4ee64000, m=...) at /ceph/src/common/RefCountedObj.h:56
#14 0x00007f33f2f4974a in Messenger::ms_deliver_dispatch (this=0x562b4ee70000, m=...) at /ceph/src/msg/Messenger.h:746
#15 0x00007f33f2f467e2 in DispatchQueue::entry (this=0x562b4ee703b8) at /ceph/src/msg/DispatchQueue.cc:202
#16 0x00007f33f2ff61cb in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /ceph/src/msg/DispatchQueue.h:101
#17 0x00007f33f2df4b5d in Thread::entry_wrapper (this=0x562b4ee70518) at /ceph/src/common/Thread.cc:87
#18 0x00007f33f2df4b6f in Thread::_entry_func (arg=<optimized out>) at /ceph/src/common/Thread.cc:74
#19 0x00007f33f1aa6088 in start_thread () from /lib64/libc.so.6
#20 0x00007f33f1b29f8c in clone3 () from /lib64/libc.so.6
---------
Zac Dover [Thu, 8 May 2025 02:29:25 +0000 (12:29 +1000)]
doc/mgr: edit alerts.rst
Edit doc/mgr/alerts.rst as part of the project to determine where the
error is in https://github.com/ceph/ceph/pull/62782 that prevents the
Jenkins tests from passing.
This commit adds to the work done in
https://github.com/ceph/ceph/pull/62782 by correcting some of the
English that was present in that PR.
This is a change to one of twenty-five files in
https://github.com/ceph/ceph/pull/62782, and this commit represents one
of what will be at least twenty-five other commits made to track this
error down.
Zac Dover [Thu, 8 May 2025 00:08:06 +0000 (10:08 +1000)]
doc/mgr/ceph_api: edit index.rst
Edit doc/mgr/ceph_api/index.rst as part of the project to determine
where the error is in https://github.com/ceph/ceph/pull/62782 that
prevents the Jenkins tests from passing.
This is a change to one of twenty-five files in
https://github.com/ceph/ceph/pull/62782, and this commit represents one
of what will be at least twenty-five other commits made to track this
error down.
Kefu Chai [Wed, 7 May 2025 00:42:52 +0000 (08:42 +0800)]
librbd, tools: migrate from boost::variant to std::variant
Complete migration started in commit 017f333, replacing boost::variant with
std::variant throughout the librbd codebase. This change is part of our ongoing
effort to reduce third-party dependencies by leveraging C++ standard library
alternatives where possible.
Benefits include:
- Improved code readability and maintainability
- Reduced external dependency surface
- More consistent API usage with other components
Implementation note: Unlike Boost.variant, std::variant lacks built-in
operator<< support. This commit implements the necessary operator<< for
AttributeValue, our specific std::variant instantiation, to preserve the
existing behavior.
Also, despite that `apply_visit()` calls can be replaced with `visit()`
without being qualified with `std::` because of ADL, we are taking this
opportunity to adding the `std::` prefix for better readability.
Matan Breizman [Sun, 4 May 2025 14:22:38 +0000 (14:22 +0000)]
crimson/osd: Logging fixes
* Fix "failed to log message"
* PGRecovery move to new logging macro
* PGRecovery to print pg prefix as it's impossible to debug specific pg
recovery ops without it.
crimson/osd/pg: Let PGListener use start_peering_event_operation
PG::start_peering_event_operation is a template function while
PGRecovery::pg is of PGRecoveryListener* type. We can't expose a template
function through the PGRecoveryListener interface since it must be
also virtual.
Instead, introduce start_peering_event_operation_listener which will act
as a wrapper to PG::start_peering_event_operation for PGRecovery to use
freely.
Rishabh Dave [Wed, 2 Apr 2025 15:31:31 +0000 (21:01 +0530)]
mgr/vol: handle case where path goes missing for a clone
A thread is spawned to get the value of a certain extended attribute to
generate the progress statistics for the ongoing clone operations. In
case source and/or destination path for a clone operation goes missing,
this thread crashes. Instead of crashing, handle this case gracefully.
Fixes: https://tracker.ceph.com/issues/71019 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Ronen Friedman [Fri, 2 May 2025 08:03:15 +0000 (03:03 -0500)]
osd/scrub: check all(*) conditions in restrictions_on_scrubbing()
Modified OsdScrub::restrictions_on_scrubbing() to check all(*)
conditions, instead of stopping at the first one that is true.
The "new" (since Tentacle) scrub-type-to-conditions mapping is no
longer a simple one (is not "monotonic" in the sense of restrictions
always being removed as the scrub type is more important),
and the caller may want to know them all.
(*) The somewhat costly check for the random backoff is still only
performed if the OSD is not already running too many scrubs.
rgw-admin: report correct error code for non-existent bucket on deletion
admin api should return the correct error code when the bucket doesn't
exist on bucket deletion. apparently a regression by 9ae2d8c4e95807179fc17f84be6754d2b19fe639.
Ville Ojamo [Wed, 30 Apr 2025 07:37:57 +0000 (14:37 +0700)]
doc/radosgw: Improve language, capitalization and use config database
Use "RADOS Gateway" instead of "Rados Gateway", "rados gateway" etc.
I am aware of the term "Ceph Object Gateway" but this change intends to
be an uncontroversial low hanging fruit fix of obviously incorrectly
capitalized terms.
Use "RGW daemon" instead of "Gateway", "Rados Gateway" etc.
Use "RGW instance" instead of "rados gateway" for consistency with
exactly similar other instance.
If referring obviously clearly to an instance of the daemon with an
obviously not preferred term, change it to "RGW daemon"; for example
when talking about restarting the RGW.
Do not touch other instances that are not 100% clear.
The files touched mostly do not use "Ceph Object Gateway" so changing
the term to it would create inconsistency, or several more changes
would need to be done to update all instances to use this terminology.
Use configuration database instead of ceph.conf in d3n_datacache.rst.
Improve language in d3n_datacache.rst.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Correct the presentation of an example string in doc/cephadm/rgw.rst in
order to obviate an error reading "rgw.rst:202: WARNING: Inline emphasis start-string without end-string."
doc/rados: Update mClock doc on steps to override OSD IOPS capacity config
Describe the steps involved to
- Specify a global value for osd_mclock_max_capacity_iops_{ssd,hdd}, and
- Override existing individually scoped values for OSDs determined during
start-up for osd_mclock_max_capacity_iops_{ssd,hdd}.
The above is to help with the following:
- Steps to override existing setting with a global value.
- reduce the number of entries in the mon store and instead use a single
global specification for all OSDs in the cluster in case the underlying
hardware is the same for all OSDs.
N Balachandran [Wed, 30 Apr 2025 05:15:13 +0000 (10:45 +0530)]
rbd: write image mirror status if state is CREATING
It can take upto 30s for the image mirror status to be written
to rbd_mirroring on the secondary for a newly created image. This fix
attempts to reduce the time by writing the status to rbd_mirroring even
if the image state is set to CREATING.
Fixes: https://tracker.ceph.com/issues/71138 Signed-off-by: N Balachandran <nithya.balachandran@ibm.com>