Patrick Donnelly [Fri, 20 Mar 2026 21:49:53 +0000 (17:49 -0400)]
Merge PR #67102 into main
* refs/pull/67102/head:
qa/workunits/rados/test_envlibrados_for_rocksdb.sh: Add Rocky support
qa/workunits/ceph-helpers-root: Add Rocky support for install packages
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Patrick Donnelly [Fri, 20 Mar 2026 21:49:06 +0000 (17:49 -0400)]
Merge PR #66396 into main
* refs/pull/66396/head:
neorados: specify alignments for aligned_storage
Reviewed-by: Adam C. Emerson <aemerson@redhat.com> Reviewed-by: Laura Flores <lflores@redhat.com> Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com> Reviewed-by: Kefu Chai <k.chai@proxmox.com> Reviewed-by: Mark Kogan <mkogan@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Patrick Donnelly [Fri, 20 Mar 2026 21:44:45 +0000 (17:44 -0400)]
Merge PR #66244 into main
* refs/pull/66244/head:
mgr/Gil.cc: simplify Gil(), ~Gil()
mgr/Gil.cc: do not use PyGILState_Check()
mgr: add mgr_subinterpreter_modules config
python-common/.../service_spec: implement ServiceSpec.__getnewargs__ to allow unpickle to work correctly
mgr: serialize python objects sent between subinterpreters via remote
Reviewed-by: Nitzan Mordechai <nmordech@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Patrick Donnelly [Fri, 20 Mar 2026 21:39:45 +0000 (17:39 -0400)]
Merge PR #63859 into main
* refs/pull/63859/head:
qa/workunits/mgr: account for nvmeof module being "always-on"
mgr, qa: clarify module checks in DaemonServer
mgr, qa: add `pending_modules` to asock command
mgr, common, qa, doc: issue health error after max expiration is exceeded
mgr: ensure that all modules have started before advertising active mgr
Reviewed-by: Nitzan Mordechai <nmordech@redhat.com> Reviewed-by: Anthony D Atri <anthony.datri@gmail.com> Reviewed-by: Samuel Just <sjust@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Ilya Dryomov [Wed, 11 Mar 2026 11:04:24 +0000 (12:04 +0100)]
librbd/migration/QCOWFormat: avoid use-after-free in execute_request()
Both L2TableCache and QCOWFormat can be destroyed after the completion
for the last L2 cache request is posted, particularly so in unit tests.
The strand destructor doesn't drain the handler queue in any way but
merely ensures that previously posted handlers would get dispatched in
a non-concurrent fashion. As a result, use-after-free can ensue when
execute_request() unnecessarily dispatches itself for the last time.
David Galloway [Mon, 26 Jan 2026 17:05:01 +0000 (12:05 -0500)]
qa: allowlist bpf podman denials on Rocky 10
Rocky Linux 10 logs SELinux AVCs for systemd BPF operations during container startup due to incomplete SELinux policy coverage. These AVCs occur in permissive mode, are reproducible without Ceph, and do not indicate functional failure. Tests should ignore this specific AVC class while continuing to fail on enforced denials.
Signed-off-by: David Galloway <david.galloway@ibm.com>
Normally when fast devices are passed to batch command but
no fast allocations could be found the batch command will
do nothing and return an empty plan. This leads to issues
however because the return essentially makes this issue silent
which makes it hard to debug in certain scenarios. I propose
to change this to raise error, and have made changes in osd.py
to better log the errors and process the exceptions. This
shouldn't affect processes that much and the change in
osd.py ensures the raised errors will not interrupt the return
output. I've also changed the unit tests to account for
change.
Ilya Dryomov [Mon, 9 Mar 2026 11:57:28 +0000 (12:57 +0100)]
qa/workunits/rbd: drop racy assert in test_tasks_recovery()
Even though "ceph rbd task list" is executed immediately after
a successful "ceph rbd task add flatten", the operation may complete
in the interim and the task listing may come back empty legitimately.
Given that we are asserting that flatten actually occurs based on
"rbd info" output, there is no real need to try to briefly observe
the flatten task in the task list.
Alex Ainscow [Wed, 18 Mar 2026 14:51:57 +0000 (14:51 +0000)]
src: Move the decision to build the ISA plugin to the top level make file
Previously, the first time you build ceph, common did not see the correct
value of WITH_EC_ISA_PLUGIN. The consequence is that the global.yaml gets
build with osd_erasure_code_plugins not including isa. This is not great
given its our default plugin.
We considered simply removing this parameter from make entirely, but this
may require more discussion about supporting old hardware.
So the slightly ugly fix is to move this erasure-code specific declartion
to the top-level.
Fixes: https://tracker.ceph.com/issues/75537 Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
myoungwon oh [Sat, 7 Mar 2026 11:38:53 +0000 (20:38 +0900)]
crimson/os/seastore: handle duplicate keys in LogNode::remove_entry
Previously, LogNode::remove_entry returned early when a log_key was
found, assuming uniqueness. However, duplicate keys can exist in the
node if an older entry was previously removed.
This commit also adds a unit test to verify this scenario.
myoungwon oh [Tue, 3 Mar 2026 15:42:51 +0000 (00:42 +0900)]
crimson/os/seastore: reload head if modified
This commit also fixes the test case to verify that
the head is correctly allocated and updated
during omap_set_keys operations involving multiple keys.
myoungwon oh [Sat, 28 Feb 2026 04:38:16 +0000 (13:38 +0900)]
crimson/os/seastore, osd/PGLog: handle omap_iterate retry to avoid duplicate entries
Seastore omap_iterate may retry internally on conflicts, which can
cause PGLog to process the same entries multiple times when entries
are handled directly in the iteration callback.
Introduce a conflict hook in omap_iterate so callers can reset
iteration state on retry. PGLog now buffers entries during iteration and
applies process_entry() only after a successful pass, clearing the buffer
on retry to avoid duplicates.
myoungwon oh [Fri, 27 Feb 2026 08:01:59 +0000 (17:01 +0900)]
crimson/os/seastore: ensure data integrity with deep copy in omap_get_value
Previously, omap_get_value could return a bufferlist pointing to
memory without guaranteed lifetime. This patch introduces LogNode::copy_t
to distinguish between DEEP and SHALLOW copies.
- Default get_value to DEEP copy for external safety.
- Use SHALLOW copy in internal paths (e.g., remove_kv) to maintain performance.
- Refactor LogManager::omap_get_value to simplify coroutine flow.
myoungwon oh [Fri, 13 Feb 2026 02:06:02 +0000 (11:06 +0900)]
crimson/os/seastore: support for large kv pair in LogNode
Each log_key_t contains a chunk_idx field to manage values
that span multiple LogNodes when the value size exceeds the
maximum capacity of a single LogNode.
See detailed description in log_manager.h.
myoungwon oh [Mon, 19 Jan 2026 17:14:24 +0000 (02:14 +0900)]
crimson/os/seastore: optimize handling of batched requests
During 4KB random write workloads, SeaStore receives
batched dup_* entries in both omap_set_keys.
This change enables efficient batch processing of these
requests to reduce overhead.
myoungwon oh [Sat, 30 Aug 2025 12:18:12 +0000 (21:18 +0900)]
crimson/os/seastore: introduce omap_rm_keys interface in omap_manager
Deletion of pg_log_entry_t entries is performed by omap_rm_keys using a set.
For example, omap_rm_keys might be called with a set containing
pg_log_entry_t entries ranging from 0011.0001 to 0011.0010.
In this case, calling omap_rm_key individually for each entry is inefficient,
because each call triggers a traversal of the entire list.
To avoid this, omap_rm_keys with a set is introduced in omap_manager
to handle removal request more efficiently.
myoungwon oh [Fri, 13 Feb 2026 05:04:14 +0000 (14:04 +0900)]
crimson/os/seastore: remove duplicate keys for non-log entries
When writing a non-log key, remove any existing duplicate key
before inserting the new KV pair. With this change, full list
traversal is no longer required during remove_kv.
myoungwon oh [Thu, 1 Jan 2026 09:23:47 +0000 (18:23 +0900)]
crimson/os/seastore: make _fastinfo overwritable to minimize space overhead
This commit forces _fastinfo to be stored at the last position of a LogNode.
By doing so, _fastinfo can be overwritten by the next pg_log_entry.
Since _fastinfo has a fixed key with varying contents and is included in
every write transaction, placing it at the tail enables efficient overwrites.
As a result, this change reduces LogNode allocation and deallocation,
thereby lowering space overhead. Moreover, garbage collection for obsolete
key-value pairs is unnecessary due to overwrite semantics.