Ilya Dryomov [Wed, 11 Mar 2026 11:04:24 +0000 (12:04 +0100)]
librbd/migration/QCOWFormat: avoid use-after-free in execute_request()
Both L2TableCache and QCOWFormat can be destroyed after the completion
for the last L2 cache request is posted, particularly so in unit tests.
The strand destructor doesn't drain the handler queue in any way but
merely ensures that previously posted handlers would get dispatched in
a non-concurrent fashion. As a result, use-after-free can ensue when
execute_request() unnecessarily dispatches itself for the last time.
ceph-volume: fix inventory without /dev/vg/lv (slashed paths)
Ths makes ceph-volume use UdevData.preferred_block_path() in
get_devices() so it keeps /dev/vg/lv (slashed path form) when
it exists, else /dev/mapper/<name> (dashed path form).
This is needed for thin-pool LVs and environments where udev
does not create slashed paths.
David Galloway [Mon, 26 Jan 2026 17:05:01 +0000 (12:05 -0500)]
qa: allowlist bpf podman denials on Rocky 10
Rocky Linux 10 logs SELinux AVCs for systemd BPF operations during container startup due to incomplete SELinux policy coverage. These AVCs occur in permissive mode, are reproducible without Ceph, and do not indicate functional failure. Tests should ignore this specific AVC class while continuing to fail on enforced denials.
Signed-off-by: David Galloway <david.galloway@ibm.com>
Dan Mick [Mon, 16 Mar 2026 20:13:30 +0000 (13:13 -0700)]
container/make-manifest-list.py: add version support
Add mandatory -v/--version to select version to examine (to allow
multiple prerelease tags to exist). Reorder arguments so that
usage help in the 'missing version' case shows the long option names.
Requires change to ceph-release-containers job as well to pass
the --version argument.
This commit is part of a PR that includes an update to the "promote"
invocation of make-manifest-list.py, which is done manually and must
also contain the --version argument.
Normally when fast devices are passed to batch command but
no fast allocations could be found the batch command will
do nothing and return an empty plan. This leads to issues
however because the return essentially makes this issue silent
which makes it hard to debug in certain scenarios. I propose
to change this to raise error, and have made changes in osd.py
to better log the errors and process the exceptions. This
shouldn't affect processes that much and the change in
osd.py ensures the raised errors will not interrupt the return
output. I've also changed the unit tests to account for
change.
Ilya Dryomov [Mon, 9 Mar 2026 11:57:28 +0000 (12:57 +0100)]
qa/workunits/rbd: drop racy assert in test_tasks_recovery()
Even though "ceph rbd task list" is executed immediately after
a successful "ceph rbd task add flatten", the operation may complete
in the interim and the task listing may come back empty legitimately.
Given that we are asserting that flatten actually occurs based on
"rbd info" output, there is no real need to try to briefly observe
the flatten task in the task list.
Alex Ainscow [Wed, 18 Mar 2026 14:51:57 +0000 (14:51 +0000)]
src: Move the decision to build the ISA plugin to the top level make file
Previously, the first time you build ceph, common did not see the correct
value of WITH_EC_ISA_PLUGIN. The consequence is that the global.yaml gets
build with osd_erasure_code_plugins not including isa. This is not great
given its our default plugin.
We considered simply removing this parameter from make entirely, but this
may require more discussion about supporting old hardware.
So the slightly ugly fix is to move this erasure-code specific declartion
to the top-level.
Fixes: https://tracker.ceph.com/issues/75537 Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
CompleteMultipartUpload depends on this lock to ensure consistency of
uploads and protect against data loss, so we should try very hard to
hold this lock as long as it takes to complete successfully
MPRadosSerializer accomplishes this by spawning a background lock
renewal coroutine. this coroutine is started during a successful call to
try_lock(), and stopped before unlock() releases the lock
this duration ultimately gets passed down to cls_lock's set_duration()
function, which has overloads for both utime_t and ceph::timespan.
prefer ceph::timespan because it also works with boost asio timers
Casey Bodley [Thu, 12 Mar 2026 14:39:02 +0000 (10:39 -0400)]
rgw: check for broken lock before multipart complete
if lock renewal fails, is_locked() will return false. check that just
before upload->complete() goes on to write/overwrite the head object,
and return the same ERR_INTERNAL_ERROR from lock contention
myoungwon oh [Sat, 7 Mar 2026 11:38:53 +0000 (20:38 +0900)]
crimson/os/seastore: handle duplicate keys in LogNode::remove_entry
Previously, LogNode::remove_entry returned early when a log_key was
found, assuming uniqueness. However, duplicate keys can exist in the
node if an older entry was previously removed.
This commit also adds a unit test to verify this scenario.
myoungwon oh [Tue, 3 Mar 2026 15:42:51 +0000 (00:42 +0900)]
crimson/os/seastore: reload head if modified
This commit also fixes the test case to verify that
the head is correctly allocated and updated
during omap_set_keys operations involving multiple keys.
myoungwon oh [Sat, 28 Feb 2026 04:38:16 +0000 (13:38 +0900)]
crimson/os/seastore, osd/PGLog: handle omap_iterate retry to avoid duplicate entries
Seastore omap_iterate may retry internally on conflicts, which can
cause PGLog to process the same entries multiple times when entries
are handled directly in the iteration callback.
Introduce a conflict hook in omap_iterate so callers can reset
iteration state on retry. PGLog now buffers entries during iteration and
applies process_entry() only after a successful pass, clearing the buffer
on retry to avoid duplicates.
myoungwon oh [Fri, 27 Feb 2026 08:01:59 +0000 (17:01 +0900)]
crimson/os/seastore: ensure data integrity with deep copy in omap_get_value
Previously, omap_get_value could return a bufferlist pointing to
memory without guaranteed lifetime. This patch introduces LogNode::copy_t
to distinguish between DEEP and SHALLOW copies.
- Default get_value to DEEP copy for external safety.
- Use SHALLOW copy in internal paths (e.g., remove_kv) to maintain performance.
- Refactor LogManager::omap_get_value to simplify coroutine flow.
myoungwon oh [Fri, 13 Feb 2026 02:06:02 +0000 (11:06 +0900)]
crimson/os/seastore: support for large kv pair in LogNode
Each log_key_t contains a chunk_idx field to manage values
that span multiple LogNodes when the value size exceeds the
maximum capacity of a single LogNode.
See detailed description in log_manager.h.
myoungwon oh [Mon, 19 Jan 2026 17:14:24 +0000 (02:14 +0900)]
crimson/os/seastore: optimize handling of batched requests
During 4KB random write workloads, SeaStore receives
batched dup_* entries in both omap_set_keys.
This change enables efficient batch processing of these
requests to reduce overhead.
myoungwon oh [Sat, 30 Aug 2025 12:18:12 +0000 (21:18 +0900)]
crimson/os/seastore: introduce omap_rm_keys interface in omap_manager
Deletion of pg_log_entry_t entries is performed by omap_rm_keys using a set.
For example, omap_rm_keys might be called with a set containing
pg_log_entry_t entries ranging from 0011.0001 to 0011.0010.
In this case, calling omap_rm_key individually for each entry is inefficient,
because each call triggers a traversal of the entire list.
To avoid this, omap_rm_keys with a set is introduced in omap_manager
to handle removal request more efficiently.
myoungwon oh [Fri, 13 Feb 2026 05:04:14 +0000 (14:04 +0900)]
crimson/os/seastore: remove duplicate keys for non-log entries
When writing a non-log key, remove any existing duplicate key
before inserting the new KV pair. With this change, full list
traversal is no longer required during remove_kv.
myoungwon oh [Thu, 1 Jan 2026 09:23:47 +0000 (18:23 +0900)]
crimson/os/seastore: make _fastinfo overwritable to minimize space overhead
This commit forces _fastinfo to be stored at the last position of a LogNode.
By doing so, _fastinfo can be overwritten by the next pg_log_entry.
Since _fastinfo has a fixed key with varying contents and is included in
every write transaction, placing it at the tail enables efficient overwrites.
As a result, this change reduces LogNode allocation and deallocation,
thereby lowering space overhead. Moreover, garbage collection for obsolete
key-value pairs is unnecessary due to overwrite semantics.
qa/tasks/backfill_toofull.py: Fix assert failures with & without compression
The following issues with the test are addressed:
1. The test was encountering assertion failure (assert backfillfull < 0.9) with
compression enabled. This was because the condition was not factoring in the
compression ratio. Without it the backfillfull ratio can easily exceed 1. By
factoring in the compression ratio, the backfillfull ratio will be in the
range (0 - n), where n can vary depending on the type of compression used.
2. The main contributing factor for (1) above is the amount of data written to
the pool. The writes were time-bound earlier leading to excess data and
eventually the assertion failure. By limiting the data written to the OSDs
to 50% of the OSD capacity in the first phase and only 20% in the re-write
phase, the outcome of the test is more deterministic regardless of
compression being enabled or not.
3. A potential false cluster error is avoided by swapping the setting of
the nearfull-ratio and backfill-ratio after the re-write phase.
- removes storage type
- stabilizes overview card for loading data
- raw capcity shown when promethues not there
- multiple refresh intervals which may vcause sync issues and bugs hence moved the logic to parent - overview component
- Now all queries are updated at 5 s interval except data consumption - using promethues interval. This needs more refactor hence would do in a later PR
lvshuo2016 [Wed, 22 Oct 2025 10:09:52 +0000 (18:09 +0800)]
common,arch,cmake: add RISC-V crc32c support
This adds hardware-accelerated crc32c support for the RISC-V
architecture. It includes the feature implementation, necessary
CMake configuration, and plumbing in src/arch/riscv.c to correctly
detect and select the optimized instructions.