Patrick Donnelly [Tue, 10 Feb 2026 17:12:25 +0000 (12:12 -0500)]
Merge PR #66244 into wip-pdonnell-testing-20260210.171210
* refs/pull/66244/head:
mgr/Gil.cc: simplify Gil(), ~Gil()
mgr/Gil.cc: do not use PyGILState_Check()
mgr: add mgr_subinterpreter_modules config
python-common/.../service_spec: implement ServiceSpec.__getnewargs__ to allow unpickle to work correctly
mgr: serialize python objects sent between subinterpreters via remote
Patrick Donnelly [Tue, 10 Feb 2026 17:12:24 +0000 (12:12 -0500)]
Merge PR #66294 into wip-pdonnell-testing-20260210.171210
* refs/pull/66294/head:
qa: update fs:upgrade to N-2 release for U
qa: update fs suite to rocky10
qa: skip dashboard install due to dependency noise
qa/suites/fs: use rocky-10 with cephadm
qa: use nft instead iptables
qa: use py3 builtin ipaddress module
Patrick Donnelly [Tue, 10 Feb 2026 17:12:19 +0000 (12:12 -0500)]
Merge PR #67102 into wip-pdonnell-testing-20260210.171210
* refs/pull/67102/head:
qa/workunits/rados/test_envlibrados_for_rocksdb.sh: Add Rocky support
qa/workunits/ceph-helpers-root: Add Rocky support for install packages
Patrick Donnelly [Tue, 10 Feb 2026 17:12:17 +0000 (12:12 -0500)]
Merge PR #67124 into wip-pdonnell-testing-20260210.171210
* refs/pull/67124/head:
mds: indicate whether SnapRealm is a subvolume in dump
mds: dump SnapRealm for src/dest in link operations
mds: abbreviate snaprealm in CInode dump
* refs/pull/67251/head:
qa: set column for insertion
qa: bail sqlite3 on any error
qa: use actual sqlite3 blob instead of string
test: use json_extract instead of awkward json_tree
We observed in Seastore, deletion of a large batch (default osd_target_transaction_size=30)
can take a significant amount of time.
Because this happens inside the peering_pp.process stage, it blocks the PG's peering pipeline.
During this block, any incoming OSDMap updates (PGAdvanceMap) are stalled behind the deletion work.
This eventually causes a global OSD-wide map progression hang because
the OSD cannot advance past an epoch until all PGs have processed
it.
To fix this, we are reducing osd_target_transaction_size to 5 to lower
conflict rates and allow deletion transactions to complete.
2026-02-08T13:02:24.439 INFO:tasks.workunit.client.0.trial031.stderr:Parse error near line 2: no such column: "start" - should this be a string literal in single-quotes?
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
test: use json_extract instead of awkward json_tree
Ideally this should be port better across sqlite3 versions. The sqlite3
on rocky10 failed because it started requiring components of the keys
to be quoted:
sqlite> select * from p as a, p as b where a.i=1 and b.i = 2 and a.fullkey = '$."libcephsqlite_vfs"."opf_sync".avgcount' and b.fullkey = '$."libcephsqlite_vfs"."opf_sync".avgcount';
i key value type atom id parent fullkey path i key value type atom id parent fullkey
- -------- ----- ------- ---- --- ------ ----------------------------------------- -------------------------------- - -------- ----- ------- ---- --- ------ ------------------
1 avgcount 4 integer 4 581 570 $."libcephsqlite_vfs"."opf_sync".avgcount $."libcephsqlite_vfs"."opf_sync" 2 avgcount 5 integer 5 581 570 $."libcephsqlite_v
Fixes: https://tracker.ceph.com/issues/74755 Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Patrick Donnelly [Wed, 19 Nov 2025 17:25:45 +0000 (12:25 -0500)]
qa: skip dashboard install due to dependency noise
2025-11-18T19:46:46.226 INFO:teuthology.orchestra.run.smithi008.stdout:/usr/bin/ceph: stderr Error ENOTSUP: Module 'alerts' is not enabled/loaded (required by command 'dashboard set-ssl-certificate'): use `ceph mgr module enable alerts` to enable it
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Kefu Chai [Sun, 8 Feb 2026 12:34:15 +0000 (20:34 +0800)]
doc/_ext: fix ceph_commands.py for new decorator-based command system
After commit 4aa9e246f, mgr modules migrated from using a class-level
COMMANDS list to decorator-based command registration using per-module
CLICommand instances (e.g., @BalancerCLICommand.Read('balancer status')).
This broke the ceph_commands.py Sphinx extension which was hardcoded to
expect m.COMMANDS to be a list, causing documentation builds to fail.
But not all modules are using this per-module CLICommand. Some modules are
fully migrated (balancer, hello, etc.) and use decorators, while others
are partially migrated (volumes, progress, stats, influx, k8sevents,
osd_perf_query, osd_support) - they have CLICommand defined but still
use the old COMMANDS list.
This fix updates _collect_module_commands() to handle three scenarios:
1. Fully migrated modules: Check CLICommand.dump_cmd_list() and use it
if it returns commands
2. Partially migrated modules: Fall back to the old COMMANDS list if
dump_cmd_list() returns empty
3. Legacy modules: Use COMMANDS list if CLICommand doesn't exist
This ensures the Sphinx extension works with modules in any migration
state, maintaining backwards compatibility while supporting the new
decorator pattern.
David Galloway [Mon, 26 Jan 2026 17:05:01 +0000 (12:05 -0500)]
qa: allowlist bpf podman denials on Rocky 10
Rocky Linux 10 logs SELinux AVCs for systemd BPF operations during container startup due to incomplete SELinux policy coverage. These AVCs occur in permissive mode, are reproducible without Ceph, and do not indicate functional failure. Tests should ignore this specific AVC class while continuing to fail on enforced denials.
Signed-off-by: David Galloway <david.galloway@ibm.com>
Laura Flores [Thu, 29 Jan 2026 22:09:25 +0000 (16:09 -0600)]
qa/suites/upgrade: update upgrade paths
Reef is EOL, so it should be removed from the upgrade
paths. Upgrade paths only go as far back as two releases,
and since the last two releases are Tentacle and Squid,
Reef is outside of this "N-2" schema.
Since Squid and Tentacle are the latest stable releases
if we look back from "main", then we need to make sure to
add Tentacle to the upgrade paths.
This modification is prompted by the fact that we can't
test upgrade paths from Reef to Main on Rocky10 packages.
But this change should happen regardless.
Fixes: https://tracker.ceph.com/issues/74609 Signed-off-by: Laura Flores <lflores@ibm.com>
Standard library containers (like std::map) correctly call delete, but
Valgrind falsely interprets this as a call to delete[] because GCC 14
folds the identical aligned delete operators into a single symbol. This
causes Valgrind to flag a mismatch against the non-array allocation.
Kefu Chai [Fri, 9 Jan 2026 23:53:29 +0000 (07:53 +0800)]
rgw: fix memory leak in RGWHTTPManager thread cleanup
Fix memory leak detected by AddressSanitizer in unittest_http_manager.
The test was failing with ASan enabled due to rgw_http_req_data objects
not being properly cleaned up when the HTTP manager thread exits.
ASan reported the following leaks:
Direct leak of 17152 byte(s) in 32 object(s) allocated from:
#0 operator new(unsigned long)
#1 RGWHTTPManager::add_request(RGWHTTPClient*)
/ceph/src/rgw/rgw_http_client.cc:946:33
#2 HTTPManager_SignalThread_Test::TestBody()
/ceph/src/test/rgw/test_http_manager.cc:132:10
Indirect leak of 768 byte(s) in 32 object(s) allocated from:
#0 operator new(unsigned long)
#1 rgw_http_req_data::rgw_http_req_data()
/ceph/src/rgw/rgw_http_client.cc:52:22
#2 RGWHTTPManager::add_request(RGWHTTPClient*)
/ceph/src/rgw/rgw_http_client.cc:946:37
SUMMARY: AddressSanitizer: 17920 byte(s) leaked in 64 allocation(s).
Root cause: The rgw_http_req_data class uses reference counting
(inherits from RefCountedObject). When a request is unregistered,
unregister_request() calls get() to increment the refcount, expecting
a corresponding put() to be called later.
In manage_pending_requests(), unregistered requests are properly
handled with both _unlink_request() and put(). However, in the thread
cleanup code (reqs_thread_entry exit path), only _unlink_request() was
called without the matching put(), causing a reference count leak.
The fix adds the missing put() call in the thread cleanup code to match
the reference counting pattern used in manage_pending_requests().
Test results:
- Before: 17,920 bytes leaked in 64 allocations
- After: 0 leaks, unittest_http_manager passes with ASan