Afreen Misbah [Tue, 6 May 2025 14:27:03 +0000 (19:57 +0530)]
mgr/dashboard: Fix delete listener
- pass gw_group to delete API in frontend
- when more than one gw groups present delete listener failing with error message: Multiple NVMe-oF gateway groups are configured. Please specify the 'gw_group' parameter in the request.
- added missing types, i18n
Afreen Misbah [Thu, 8 May 2025 04:09:59 +0000 (09:39 +0530)]
mgr/dashboard: Add default state when gateway groups are empty
Fixes https://tracker.ceph.com/issues/71247
- after upgrades the nvmeof service spec does not contain `group` field
- this causes UI combobox internal errors
- checking for `group` in spec and disabling the selector
Venky Shankar [Mon, 12 May 2025 13:02:40 +0000 (18:32 +0530)]
Merge PR #62250 into main
* refs/pull/62250/head:
qa/cephfs: increase data to be delay data sync by mirror daemon
cephfs-mirror: integrate blockdiff API for regular file transfers
mds: dout snapdiff snapid's before validation check
cephfs-mirror: current sync mechanism uses sync mechanism subclass'ing
qa: add test for syncing already existing snapshots
cephfs_mirror: avoid latest changes on the source fs to enable mirroring
Matan Breizman [Mon, 12 May 2025 11:16:26 +0000 (11:16 +0000)]
crimson/osd/../client_request: add logs around get_obc stage
If this stage is in use by other operation we would keep waiting for it
to finish. Add logs before entering the stage and after to keep track of
stuck requests.
Matan Breizman [Mon, 12 May 2025 11:14:47 +0000 (11:14 +0000)]
crimson/osd/pg: set log_entry_update_waiting_on prior to sending requests
Before this patch, we would first send the MOSDPGUpdateLogMissing to
all peers and only then insert this rep_tid to log_entry_update_waiting_on.
This could have resulted in race where we receive the reply prior to
actually inserting the rep_tid.
The reply would have been discarded with "reply on unknown tid" (which
is now aborting).
The unhandled reply would have not let submit_error to return and would
keep holding the lock on this obc.
Ronen Friedman [Sun, 11 May 2025 05:24:33 +0000 (00:24 -0500)]
osd/scrub: remove the 'deadline' attribute from the scrub job
The scrub job's 'overdue' attribute is no longer calculated -
the only 'scrub is overdue' status remaining after latest
scheduling refactor, is the one performed in PGMap.cc (the
one affecting the 'health warning' status of the cluster).
Thus - there is no longer any reason to maintain any 'deadline'
attribute for the scrub scheduler.
Ronen Friedman [Fri, 9 May 2025 12:46:26 +0000 (07:46 -0500)]
osd/scrub: remove the deep-scrubs deadline attribute
As it is no longer meaningful in the context of the new
scrub scheduling design.
The change mandates fixes to the way 'schedule-[deeps]crub'
commands are implemented. The offset to use when forcing the
last-scrub timestamp to a new value in now calculated in
ScrubJob::guaranteed_offset(), as ScrubJob is where all
schedule adjustments (which employ the same logic) are
implemented.
Samuel Just [Fri, 9 May 2025 16:46:48 +0000 (16:46 +0000)]
crimson/osd/pg_recovery: only reset_pglog_based_recovery_op if complete
ce4e9aaad, as part of the start_recovery_ops changed the call to
reset_pglog_based_recovery_op to occur unconditionally rather than only
if recovery has completed.
Note, this fix only restores the prior behavior. There's actually still
a race here where a DeferRecovery could be processed between the call to
reset_pglog_based_recovery_op and the RequestBackfill or
AllReplicasRecovered being processed.
Introduced: ce4e9aaad8f2cafae24511fe1687c61dc41affc1
Related: https://tracker.ceph.com/issues/71267 Fixes: https://tracker.ceph.com/issues/70337 Signed-off-by: Samuel Just <sjust@redhat.com>
Kefu Chai [Wed, 7 May 2025 01:09:00 +0000 (09:09 +0800)]
tools/ceph_dedup: remove 'using namespace std'
Remove 'using namespace std' from common.h to maintain consistent coding
practices. Although common.h is only used by ceph_dedup implementation,
keeping namespace declarations out of header files prevents potential
name conflicts and follows best practices for C++ code organization.
This change improves code clarity and reduces the risk of symbol collisions
when standard library elements are used alongside custom
implementations.
Ville Ojamo [Fri, 9 May 2025 08:17:00 +0000 (15:17 +0700)]
doc/radosgw: Use ref for hyperlinks, 1st batch
Use validated ":ref:" hyperlinks instead of "external links" in "target
definitions" when linking within the Ceph docs:
- Add a label at beginning of referenced files if missing.
- Remove unused "target definitions".
The rendered PR should look the same as the old docs, only differing in
the source RST.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
tools/cephfs/first-damage: default conf if CEPH_CONF not set
If $CEPH_CONF isn't set, you get:
unable to get monitor info from DNS SRV with service name: ceph-mon
2025-05-08T22:40:06.091-0400 7f604286f1c0 -1 failed for service _ceph-mon._tcp
2025-05-08T22:40:06.091-0400 7f604286f1c0 -1 monclient: get_monmap_and_config cannot identify monitors to contact
Traceback (most recent call last):
File "/root/first-damage.py", line 144, in <module>
R.connect()
File "rados.pyx", line 690, in rados.Rados.connect
rados.ObjectNotFound: [errno 2] RADOS object not found (error connecting to the cluster)
Make it default to /etc/ceph/ceph.conf if the env isn't set.
Signed-off-by: Dan van der Ster <dan.vanderster@clyso.com>
Samuel Just [Thu, 24 Apr 2025 22:13:04 +0000 (15:13 -0700)]
vstart.sh: simplify crimson core assignment, use assign_crimson_cores.py
This commit simplifies the internal flow in a few ways:
- core assignment is entirely handled by prep_balance_cpu and
do_balance_cpu. The latter simply does as the cpu_table
instructs.
- assign_crimson_cores calls lscpu and taskset internally, no
need for temp files.
It also changes some defaults:
- if crimson-balance-cpu is unset or set to none, crimson-osd will not
pin cpus at all rather than using the simple sequential allocation
scheme, which could be much less efficient on platforms where
cpuids 0,1,2,3,... are on socket 0,1,2,3,... "osd" and "socket"
options provide numa aware assignments when requested.
New features:
- Alienstore cores are now assigned with assign_crimson_cores
using the same balance strategy using
--crimson-alien-num-cores.
- --crimson-reactor-physical-only and
--crimson-alienstore-physical-only will cause reactor or
alienstore cpus respectively to be allocated with one
cpu per physical core rather than including smt siblings.
Fixes: https://tracker.ceph.com/issues/71096 Signed-off-by: Samuel Just <sjust@redhat.com>
Jos Collin [Mon, 27 Jan 2025 12:42:34 +0000 (18:12 +0530)]
cephfs_mirror: avoid latest changes on the source fs to enable mirroring
This avoids considering latest changes from the source filesystem for
the mirroring of already existing snapshots. Thus the destination
filesystem and snapshots would be created based only on the source snapshots.
The destination fs would be a replica of the last snapshot taken.
Fixes: https://tracker.ceph.com/issues/68567 Signed-off-by: Jos Collin <jcollin@redhat.com>
Ronen Friedman [Thu, 8 May 2025 13:45:23 +0000 (08:45 -0500)]
osd/scrub: fix deadline calculations
The scrub scheduling deadlines are calculated based on pool and OSD
configuration parameters. The specifics of the calculations are
modified to match the new scrub scheduling design.
Comments and documentation are updated to reflect the fact that
the deadlines no longer have any meaningful effect on scrub
scheduling.
Zac Dover [Thu, 8 May 2025 02:29:25 +0000 (12:29 +1000)]
doc/mgr: edit alerts.rst
Edit doc/mgr/alerts.rst as part of the project to determine where the
error is in https://github.com/ceph/ceph/pull/62782 that prevents the
Jenkins tests from passing.
This commit adds to the work done in
https://github.com/ceph/ceph/pull/62782 by correcting some of the
English that was present in that PR.
This is a change to one of twenty-five files in
https://github.com/ceph/ceph/pull/62782, and this commit represents one
of what will be at least twenty-five other commits made to track this
error down.
Zac Dover [Thu, 8 May 2025 00:08:06 +0000 (10:08 +1000)]
doc/mgr/ceph_api: edit index.rst
Edit doc/mgr/ceph_api/index.rst as part of the project to determine
where the error is in https://github.com/ceph/ceph/pull/62782 that
prevents the Jenkins tests from passing.
This is a change to one of twenty-five files in
https://github.com/ceph/ceph/pull/62782, and this commit represents one
of what will be at least twenty-five other commits made to track this
error down.
Kefu Chai [Wed, 7 May 2025 00:42:52 +0000 (08:42 +0800)]
librbd, tools: migrate from boost::variant to std::variant
Complete migration started in commit 017f333, replacing boost::variant with
std::variant throughout the librbd codebase. This change is part of our ongoing
effort to reduce third-party dependencies by leveraging C++ standard library
alternatives where possible.
Benefits include:
- Improved code readability and maintainability
- Reduced external dependency surface
- More consistent API usage with other components
Implementation note: Unlike Boost.variant, std::variant lacks built-in
operator<< support. This commit implements the necessary operator<< for
AttributeValue, our specific std::variant instantiation, to preserve the
existing behavior.
Also, despite that `apply_visit()` calls can be replaced with `visit()`
without being qualified with `std::` because of ADL, we are taking this
opportunity to adding the `std::` prefix for better readability.
Samuel Just [Tue, 29 Apr 2025 01:53:11 +0000 (01:53 +0000)]
tools/contrib: add assign_crimson_cores as a more general replacement for balance_cpu
Improvements:
- shorter
- has tests
- uses lscpu -e --json to get logical<->physical mappings and avoid
needing to parse cpu ranges in lscpu --json
- supports allocating alienstore threads
- supports requiring physical cores only independently for alienstore
and seastar reactors