CompleteMultipartUpload depends on this lock to ensure consistency of
uploads and protect against data loss, so we should try very hard to
hold this lock as long as it takes to complete successfully
MPRadosSerializer accomplishes this by spawning a background lock
renewal coroutine. this coroutine is started during a successful call to
try_lock(), and stopped before unlock() releases the lock
this duration ultimately gets passed down to cls_lock's set_duration()
function, which has overloads for both utime_t and ceph::timespan.
prefer ceph::timespan because it also works with boost asio timers
Casey Bodley [Thu, 12 Mar 2026 14:39:02 +0000 (10:39 -0400)]
rgw: check for broken lock before multipart complete
if lock renewal fails, is_locked() will return false. check that just
before upload->complete() goes on to write/overwrite the head object,
and return the same ERR_INTERNAL_ERROR from lock contention
myoungwon oh [Sat, 7 Mar 2026 11:38:53 +0000 (20:38 +0900)]
crimson/os/seastore: handle duplicate keys in LogNode::remove_entry
Previously, LogNode::remove_entry returned early when a log_key was
found, assuming uniqueness. However, duplicate keys can exist in the
node if an older entry was previously removed.
This commit also adds a unit test to verify this scenario.
myoungwon oh [Tue, 3 Mar 2026 15:42:51 +0000 (00:42 +0900)]
crimson/os/seastore: reload head if modified
This commit also fixes the test case to verify that
the head is correctly allocated and updated
during omap_set_keys operations involving multiple keys.
myoungwon oh [Sat, 28 Feb 2026 04:38:16 +0000 (13:38 +0900)]
crimson/os/seastore, osd/PGLog: handle omap_iterate retry to avoid duplicate entries
Seastore omap_iterate may retry internally on conflicts, which can
cause PGLog to process the same entries multiple times when entries
are handled directly in the iteration callback.
Introduce a conflict hook in omap_iterate so callers can reset
iteration state on retry. PGLog now buffers entries during iteration and
applies process_entry() only after a successful pass, clearing the buffer
on retry to avoid duplicates.
myoungwon oh [Fri, 27 Feb 2026 08:01:59 +0000 (17:01 +0900)]
crimson/os/seastore: ensure data integrity with deep copy in omap_get_value
Previously, omap_get_value could return a bufferlist pointing to
memory without guaranteed lifetime. This patch introduces LogNode::copy_t
to distinguish between DEEP and SHALLOW copies.
- Default get_value to DEEP copy for external safety.
- Use SHALLOW copy in internal paths (e.g., remove_kv) to maintain performance.
- Refactor LogManager::omap_get_value to simplify coroutine flow.
myoungwon oh [Fri, 13 Feb 2026 02:06:02 +0000 (11:06 +0900)]
crimson/os/seastore: support for large kv pair in LogNode
Each log_key_t contains a chunk_idx field to manage values
that span multiple LogNodes when the value size exceeds the
maximum capacity of a single LogNode.
See detailed description in log_manager.h.
myoungwon oh [Mon, 19 Jan 2026 17:14:24 +0000 (02:14 +0900)]
crimson/os/seastore: optimize handling of batched requests
During 4KB random write workloads, SeaStore receives
batched dup_* entries in both omap_set_keys.
This change enables efficient batch processing of these
requests to reduce overhead.
myoungwon oh [Sat, 30 Aug 2025 12:18:12 +0000 (21:18 +0900)]
crimson/os/seastore: introduce omap_rm_keys interface in omap_manager
Deletion of pg_log_entry_t entries is performed by omap_rm_keys using a set.
For example, omap_rm_keys might be called with a set containing
pg_log_entry_t entries ranging from 0011.0001 to 0011.0010.
In this case, calling omap_rm_key individually for each entry is inefficient,
because each call triggers a traversal of the entire list.
To avoid this, omap_rm_keys with a set is introduced in omap_manager
to handle removal request more efficiently.
myoungwon oh [Fri, 13 Feb 2026 05:04:14 +0000 (14:04 +0900)]
crimson/os/seastore: remove duplicate keys for non-log entries
When writing a non-log key, remove any existing duplicate key
before inserting the new KV pair. With this change, full list
traversal is no longer required during remove_kv.
myoungwon oh [Thu, 1 Jan 2026 09:23:47 +0000 (18:23 +0900)]
crimson/os/seastore: make _fastinfo overwritable to minimize space overhead
This commit forces _fastinfo to be stored at the last position of a LogNode.
By doing so, _fastinfo can be overwritten by the next pg_log_entry.
Since _fastinfo has a fixed key with varying contents and is included in
every write transaction, placing it at the tail enables efficient overwrites.
As a result, this change reduces LogNode allocation and deallocation,
thereby lowering space overhead. Moreover, garbage collection for obsolete
key-value pairs is unnecessary due to overwrite semantics.
qa/tasks/backfill_toofull.py: Fix assert failures with & without compression
The following issues with the test are addressed:
1. The test was encountering assertion failure (assert backfillfull < 0.9) with
compression enabled. This was because the condition was not factoring in the
compression ratio. Without it the backfillfull ratio can easily exceed 1. By
factoring in the compression ratio, the backfillfull ratio will be in the
range (0 - n), where n can vary depending on the type of compression used.
2. The main contributing factor for (1) above is the amount of data written to
the pool. The writes were time-bound earlier leading to excess data and
eventually the assertion failure. By limiting the data written to the OSDs
to 50% of the OSD capacity in the first phase and only 20% in the re-write
phase, the outcome of the test is more deterministic regardless of
compression being enabled or not.
3. A potential false cluster error is avoided by swapping the setting of
the nearfull-ratio and backfill-ratio after the re-write phase.
- removes storage type
- stabilizes overview card for loading data
- raw capcity shown when promethues not there
- multiple refresh intervals which may vcause sync issues and bugs hence moved the logic to parent - overview component
- Now all queries are updated at 5 s interval except data consumption - using promethues interval. This needs more refactor hence would do in a later PR
lvshuo2016 [Wed, 22 Oct 2025 10:09:52 +0000 (18:09 +0800)]
common,arch,cmake: add RISC-V crc32c support
This adds hardware-accelerated crc32c support for the RISC-V
architecture. It includes the feature implementation, necessary
CMake configuration, and plumbing in src/arch/riscv.c to correctly
detect and select the optimized instructions.
Oguzhan Ozmen [Fri, 13 Mar 2026 21:56:18 +0000 (21:56 +0000)]
rgw/pubsub: fix uninitialized num_shards causing topic deletion hang
The num_shards member of rgw_pubsub_dest was not included in JSON
serialization (dump/decode_json), causing garbage values when topic
metadata synced between zones. This resulted in topic deletion
iterating millions of times over non-existent shards, blocking
frontend pause during realm reload for extended periods.
Redouane Kachach [Fri, 28 Nov 2025 08:38:45 +0000 (09:38 +0100)]
mgr/cephadm: Fix mgmt-gateway default port in get_port_start()
The mgmt-gateway port was already defaulted to 443 in most places, but
get_port_start() did not apply this default. Since the output of
get_port_start() is used both to configure the daemon ports which are
later used to to open them in firewalld, this inconsistency meant the
HTTPS port was not opened when firewalld service was active.
This change makes get_port_start() also default to port 443, ensuring
the daemon is configured correctly and the corresponding firewalld port
is opened as expected.
Vallari Agrawal [Fri, 13 Mar 2026 08:47:46 +0000 (14:17 +0530)]
qa: ignore NVMEOF_GATEWAY_DOWN in nvmeof_scalability.yaml
Sometimes during scale-up/scale-down, a gateway goes in
UNAVAILABLE state (which triggers NVMEOF_GATEWAY_DOWN warning)
for a couple of seconds and self-recovers.
In this, none of the scale test asserts fail.
So NVMEOF_GATEWAY_DOWN can be ignorelist, because scale test asserts
on expected gw count and checks if all expected gws are AVAILABLE
between each iteration of scale-up/scale-down.
Vallari Agrawal [Fri, 13 Mar 2026 08:32:06 +0000 (14:02 +0530)]
qa/tasks/nvmeof.py: retry do_check if gw in CREATED
In do_check(), ensure all the namespaces+listeners are
added in gateway (i.e. gateway not in CREATED state)
after gateway is restarted. This is to prevent going into
next iteration of tharshing while gateways are still being
updated.
Indira Sawant [Tue, 11 Nov 2025 17:51:43 +0000 (11:51 -0600)]
os/bluestore: add health warning for oversized BlueFS usage
Add a BLUESTORE_BLUEFS_OVERSIZED health warning when total BlueFS usage
(DB, WAL, and spillover on the slow device) exceeds a configurable ratio
of the main device size.
The threshold is controlled by the new configuration option
`bluestore_bluefs_warn_ratio` (default 0.06).
John Mulligan [Fri, 13 Mar 2026 17:42:09 +0000 (13:42 -0400)]
script/build-with-container: add CONFIGURE_ARGS env var to configure step
Add a new optional CONFIGURE_ARGS environment variable to the configure
step so that there's a mechanism to pass custom cmake options that
aren't handled elsewhere in the run-make.sh script.
Because configure is a rather fundamental build step it's probably
preferable to set this via an env file so that it persists across
rebuilds. Using an environment var here also avoids both needing to
change run-make.sh or add another CLI option to BWC which already has
too many.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Instead of "ceph orch daemon restart",
wait for daemon to come backup on it's own
during revival.
Also improve do_check retry logic.
And some logging improvements in nvmeof.thrasher task.