Anoop C S [Fri, 5 Dec 2025 09:25:58 +0000 (14:55 +0530)]
mon/MonMap: Dump addr in backward compatible format
Prior to c5b43e9b2765ff98419c649a5ae53ec16601975d, we dumped only the
legacy string component from public_addrs as `addr`. Ensure that this
backward compatible filtering is retained when dumping MonMap.
Patrick Donnelly [Fri, 10 Apr 2026 17:23:45 +0000 (13:23 -0400)]
Merge PR #66541 into squid
* refs/pull/66541/head:
include: detect corrupt frag from byteswap
mds: dump frag_t as an object
common/frag: produce valid fragments for test instances
common: simplify fragment printing
common: properly convert frag_t to net/store endianness
mds: include sysinfo in status command output
include/frag.h: un-inline methods to reduce header dependencies
* refs/pull/67322/head:
qa: set column for insertion
qa: bail sqlite3 on any error
qa: use actual sqlite3 blob instead of string
test: use json_extract instead of awkward json_tree
Previously s3tests_java.py set JAVA_HOME using the `alternatives`
command. That had issues in that `alternatives` is not present on all
Ubuntu systems, and some installations of Java don't update
alternatives. So instead we look for a "java-8" jvm in /usr/lib/jvm/
and set JAVA_HOME to the first one we find.
* refs/pull/61894/head:
pybind/rados: add note for reversed arguments to WriteOp.zero()
test/pybind/test_rados.py: add test for reversed arguments offset,length in WriteOp.zero
pybind/rados: fix the incorrect order of offset,length in WriteOp.zero
Wang Chao [Sat, 10 Aug 2024 11:40:52 +0000 (19:40 +0800)]
pybind/rados: fix the incorrect order of offset,length in WriteOp.zero
The offset and length parameters in the rados pybind `WriteOp.zero()` method are being passed to the rados_write_op_zero() function in the incorrect order.
Incorrect order cause OP_ZERO not work correctly when use pybind's rados.
Patrick Donnelly [Sat, 28 Mar 2026 07:24:42 +0000 (12:54 +0530)]
Merge PR #67884 into squid
* refs/pull/67884/head:
qa/standalone: shorten bluefs test durations
qa/standalone: increase WAL volume size to 1GB
qa/standalone: fix bluefs expand test case
Patrick Donnelly [Thu, 19 Mar 2026 15:00:04 +0000 (11:00 -0400)]
Merge PR #66838 into squid
* refs/pull/66838/head:
os/bluestore: rename row names in RocksDBBlueFSVolumeSelector.
test/bluestore: add volume selector tests
os/bluestore:fix bluestore_volume_selection_reserved_factor usage
os/bluestore: print the first RocksDB level which doesn't fit into fast
The readable.sh script has forward incompat checks, but no
backward incompat checks.
This fix will:
1. Add check for backward_incompat directory for each type for specific
objects or all objects with the same type and skip those objects from being tested.
2. Add version comparison helper functions (version_lt, version_le, version_ge,
versions_span) for robust version handling
3. Replace 'sort -n' with 'sort -V' for proper version number sorting
4. Add CORPUS_PATH environment variable to allow teuthology tests to execute this script
5. Improve readability of the script
The difference between backward and forward incompat:
- forward_incompat: Marks objects from older versions that newer ceph-dencoder
versions cannot read. Example: Version 19.2.x objects marked incompat at version 20.2.x
means ceph-dencoder v20.2.x+ can't decode them. Skip when testing old objects
with a new ceph-dencoder.
- backward_incompat: Marks objects from newer versions that older ceph-dencoder
versions cannot read. Example: Version 19.2.x objects marked backward_incompat at v19.2.x
means ceph-dencoder < v19.2.x can't decode them. Skip when testing new objects
with an old ceph-dencoder.
NitzanMordhai [Mon, 2 Feb 2026 07:34:24 +0000 (07:34 +0000)]
workunits/dencoder: use readable.sh script instead of python script
The python script test_readable.py was added for backword and forward
compability. maintaining 2 scripts that finally doing the same is west,
reverting and using readable.sh and leave the python out.
2026-02-08T13:02:24.439 INFO:tasks.workunit.client.0.trial031.stderr:Parse error near line 2: no such column: "start" - should this be a string literal in single-quotes?
test: use json_extract instead of awkward json_tree
Ideally this should be port better across sqlite3 versions. The sqlite3
on rocky10 failed because it started requiring components of the keys
to be quoted:
sqlite> select * from p as a, p as b where a.i=1 and b.i = 2 and a.fullkey = '$."libcephsqlite_vfs"."opf_sync".avgcount' and b.fullkey = '$."libcephsqlite_vfs"."opf_sync".avgcount';
i key value type atom id parent fullkey path i key value type atom id parent fullkey
- -------- ----- ------- ---- --- ------ ----------------------------------------- -------------------------------- - -------- ----- ------- ---- --- ------ ------------------
1 avgcount 4 integer 4 581 570 $."libcephsqlite_vfs"."opf_sync".avgcount $."libcephsqlite_vfs"."opf_sync" 2 avgcount 5 integer 5 581 570 $."libcephsqlite_v
Fixes: https://tracker.ceph.com/issues/74755 Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit f304daa74ace4e6b856b585d71b8ff9c6e8a024a)
Patrick Donnelly [Wed, 19 Nov 2025 23:16:21 +0000 (18:16 -0500)]
mon/HealthMonitor: avoid MON_DOWN for freshly added Monitor
In testing, we often have the scenario where cephadm has created a
cluster but doesn't add more monitors until well past
mon_down_mkfs_grace. This causes useless MON_DOWN warnings to be thrown
which fails QA jobs. Avoid this situation entirely by giving a
reasonable grace period for a monitor added to the MonMap to join
quorum.
Fixes: https://tracker.ceph.com/issues/73934 Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit b028a41e1f000b87aab3f263ab3259a0ca439555)
Patrick Donnelly [Thu, 13 Nov 2025 19:51:20 +0000 (14:51 -0500)]
include: detect corrupt frag from byteswap
If a big-endian MDS writes frag_t values into the metadata pool, these
will persist and confuse the MDS after it tries properly parsing them as
little-endian. Fortunately detecting this situation is fairly easy as we
restrict the number of bits and the number of bits restricts the mask
value.
Fixes: https://tracker.ceph.com/issues/73792 Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 6bf91e4f6e49d99711b8be845eb77c883d662704)
Patrick Donnelly [Wed, 18 Mar 2026 00:22:23 +0000 (20:22 -0400)]
Merge PR #67623 into squid
* refs/pull/67623/head:
mgr/orchestrator: make group parameter optional for nvmeof (squid)
pybind/mgr/orchestrator/module.py: NvmeofServiceSpec service_id
Nitzan Mordechai [Mon, 17 Nov 2025 11:51:14 +0000 (11:51 +0000)]
qa/tasks/mgr: test_module_selftest set influx hostname to avoid warnings
self-test will hit error MGR_INFLUX_NO_SERVER since we dont have
hostname configed, the following command will add a test hostname
so the error won't appear and fail the test.
Ilya Dryomov [Sun, 1 Mar 2026 21:55:52 +0000 (22:55 +0100)]
qa/workunits/rbd: short-circuit status() if "ceph -s" fails
In mirror-thrash tests, status() can be invoked after one of the
clusters is effectively stopped due to a watchdog bark:
2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons
...
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ status
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ local cluster daemon image_pool image_ns image
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ for cluster in ${CLUSTER1} ${CLUSTER2}
In this scenario all commands that are invoked from the loop body
are going to time out anyway.
Ilya Dryomov [Sun, 1 Mar 2026 16:45:51 +0000 (17:45 +0100)]
qa: rbd_mirror_fsx_compare.sh doesn't error out as expected
In mirror-thrash tests, one of the clusters can be effectively stopped
due to a watchdog bark while rbd_mirror_fsx_compare.sh is running and is
in the middle of the "wait for all images" loop:
In this scenario "rbd ls" is going to time out repeatedly, turning the
loop into up to a ~60-hour sleep (up to 720 iterations with a 5-minute
timeout + 10-second sleep per iteration).
Ilya Dryomov [Fri, 27 Feb 2026 14:18:27 +0000 (15:18 +0100)]
qa/tasks: make rbd_mirror_thrash inherit from ThrasherGreenlet
Commit 21b4b89e5280 ("qa/tasks: watchdog terminate thrasher") made it
required for a thrasher to have stop_and_join() method, but the
preceding commit a035b5a22fb8 ("thrashers: standardize stop and join
method names") missed to add it to rbd_mirror_thrash (whether as an
ad-hoc implementation or by way of inheriting from ThrasherGreenlet).
Later on, commit 783f0e3a9903 ("qa: Adding a new class for the
daemonwatchdog to monitor") worsened the issue by expanding the use
of stop_and_join() to all watchdog barks rather than just the case of
a thrasher throwing an exception which is something that practically
never happens.
The reason we had a slow-requests is because during the test, 16 concurrent 4 MB writes were running while recovery and backfill were disabled. At the same time, osd.0 was marked out and then back in, causing PG remapping. Because recovery/backfill was disabled, some PGs could not restore their replicas after the remap, leaving them in degraded/remapped states. As a result, a batch of writes remained stuck in the replicated write path, leading to IO stall and slow ops being reported. Solution is to ignore this as we are testing the progress module, not the write paths of OSDs. We intentionally disable backfill and recovery in order to prevent the recovery event to finish quickly. We wanted to prolong it until the progress event pops up.
qa: make test_progress atomically capture OSD marked in/out events
Problem:
Test had a race condition where events could complete and disappear
between checking the event count and fetching the event, causing
test failures.
Solution:
Refactor to atomically capture events during the wait condition check.
Added helper methods _wait_for_osd_marked_out_event() and
_wait_for_osd_marked_in_event() that capture events at the moment
they're detected, eliminating the race window.
Kefu Chai [Tue, 3 Mar 2026 04:51:32 +0000 (12:51 +0800)]
mgr/orchestrator: make group parameter optional for nvmeof (squid)
Add default value for group parameter in nvmeof commands to maintain
backward compatibility with existing squid tests and deployments.
Context:
--------
On main branch, when commit 6bee4e10f7f added the group parameter, the
tests were subsequently updated to provide the group argument explicitly:
Main test: ceph orch apply nvmeof foo default
Expected: nvmeof.foo.default
However, on squid branch, the existing tests still use the older syntax
without specifying a group:
The previous cherry-pick (e1612d048a1) fixed the service_id construction
logic to handle empty groups correctly, but the group parameter was still
required without a default value, causing "ceph orch apply nvmeof foo" to
fail with EINVAL (missing required argument).
This commit adds the missing default value (group: str = '') to make the
parameter optional, maintaining backward compatibility with existing squid
tests and user scripts that don't specify a group.
With both changes:
1. Cherry-picked e1612d048a1: service_id logic handles empty group
2. This commit: group parameter has default value ''
Result:
"ceph orch apply nvmeof foo" works (creates nvmeof.foo)
"ceph orch apply nvmeof foo mygroup" also works (creates nvmeof.foo.mygroup)
Test: qa/suites/orch/cephadm/smoke-roleless/2-services/nvmeof.yaml
Fixes job 50373 failure from test run dgalloway-2026-02-13_23:06:25
Please note, this change was not cherry-picked from main branch, because
main intentionally still requires the CLI group argument for arch
apply/add nvmeof, and its tests were updated accordingly.
On squid, however, the earlier cherry-pick 6bee4e10 introduced the
required group parameter, but squid still has the old test/behavior
(ceph orch apply nvmeof foo expecting nvmeof.foo) and does not contain
the later main commits 3e5e85aadc1 and b377085c302.
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-multisite-details/rgw-multisite-details.component.ts
- kept only the import that is relavant
src/pybind/mgr/dashboard/frontend/src/app/shared/api/mgr-module.service.ts
- same as above
Nizamudeen A [Wed, 5 Mar 2025 16:46:03 +0000 (22:16 +0530)]
mgr/dashboard: fix access control permissions for roles
Since prometheus is being used in the dashboard page we need to make
sure every role has prometheus read only access so that the dashboard
page can load the utilization metrics.
I also saw permission issue with the osd settings endpoint when its
trying to get the nearfull/full ratio. so instead of failing the entire
page i am proceeding with a chart that doesn't have those details when
the user doesn't have permission to access the config opt.
Multisite page was not accessible in the case of rgw-manager or
read-only user because its trying to show the status of rgw module. This
si also now gracefully handled to show the alert only when the user has
sufficient permission.
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/dashboard-v3/dashboard/dashboard-v3.component.ts
- kept changes only relavant to bug fix and ignored the other changes
like h/w monitoring
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-multisite-details/rgw-multisite-details.component.html
- ignored multisite wizard changes
src/pybind/mgr/dashboard/frontend/src/app/core/navigation/administration/administration.component.html
- kept the current changes since carbon is not there in squid which
means this issue is not present
src/pybind/mgr/dashboard/frontend/src/app/core/navigation/navigation/navigation.component.html
- kept the current changes for the same reason above
src/pybind/mgr/dashboard/services/access_control.py
- ignored the SMB role manager and kept only what's available in squid
- make service_id better alligned with default/empty group
(https://github.com/ceph/ceph/commit/f6d552d7c777f1160545188dcffa6b685b05ca8a)
- fix service_id in nvmeof daemon add
Ilya Dryomov [Tue, 24 Feb 2026 11:46:35 +0000 (12:46 +0100)]
librbd/mirror: detect trashed snapshots in UnlinkPeerRequest
If two instances of UnlinkPeerRequest race with each other (e.g. due
to rbd-mirror daemon unlinking from a previous mirror snapshot and the
user taking another mirror snapshot at same time), the snapshot that
UnlinkPeerRequest was created for may be in the process of being removed
(which may mean trashed by SnapshotRemoveRequest::trash_snap()) or fully
removed by the time unlink_peer() grabs the image lock. Because trashed
snapshots weren't handled explicitly, UnlinkPeerRequest could spuriously
fail with EINVAL ("not mirror snapshot" case) instead of the expected
ENOENT ("missing snapshot" case). This in turn could lead to spurious
ImageReplayer failures with it stopping prematurely.
ImageUpdateWatchers::flush() requests aren't tracked with
m_in_flight-like mechanism the way ImageUpdateWatchers::send_notify()
requests are, but in both cases callbacks that represent delayed work
that is very likely to (indirectly) reference ImageCtx are involved.
When the image is getting closed, ImageUpdateWatchers::shut_down() is
called before anything that belongs to ImageCtx is destroyed. However,
the shutdown can complete prematurely in the face of a pending flush if
one gets sent shortly before CloseRequest is invoked. The callback for
that flush will then race with CloseRequest and may execute after parts
of or even the entire ImageCtx is destroyed, leading to use-after-free
and various segfaults.
Krunal Chheda [Tue, 10 Feb 2026 21:01:03 +0000 (16:01 -0500)]
rgw/notification: fix reserved_size drift in 2pc_queue causing ENOSPC errors
The urgent_data.reserved_size field was accumulating incorrect values over time due to a mismatch between what was added during reserve() and what was subtracted during commit()/abort(). This caused the reserved_size to grow unbounded, eventually hitting the queue capacity limit and returning ENOSPC errors even when the queue had plenty of actual space.
solution:
Add a one time self healing capability, where the reservation value is re calculated during the reserve and counter is updated with correct value.