Bill Scales [Wed, 1 Oct 2025 14:52:23 +0000 (15:52 +0100)]
osd: Optimized EC missing list not updated on recovering shard
Shards that are recovering (last_complete != last_update) that
skip transactions and log entries because of partial writes are
using pwlc to advance last_update. However simply incrementing
last_update is not sufficient - there are scenarios where the
needed version of a missing object has to be updated.
If the shard is already missing object X at version V1 and there was
a partial write at V2 that did not update the shard, it does not need
to retain the log entry, but it does need to update the missing list
to say it needs V2 rather than V1. This ensures all shards report
a need for an object at the same version and avoids an assert in
MissingLoc::add_active_missing when the primary is trying to
combine the missing lists from all the shards to work out what has
to be recovered. Avoiding applying pwlc during the early phase
of the peering process ensures the missing list gets updated.
However if a shard is not missing object X and there was a partial
write at V2 that did not update the shard then at the end of peering
it is still necessary to advance last_upadte by applying pwlc. This
ensures that in later peering cycles the code does not change its
mind and think the shard is now missing object X.
The fix is to be more sophisticated about when pwlc can be used
to advance last_update for a recovering shard. The code now
passes in a parameter indicating whether we are in the early
(pre activate) or later phase of peering. This also means that
additional calls to apply_pwlc are needed when peering gets to
activating and is searching for missing to make updates that were
not made earlier.
Fixes: https://tracker.ceph.com/issues/73249 Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
benaryorg [Tue, 28 Oct 2025 03:24:53 +0000 (03:24 +0000)]
doc: detailed explanation of set_choose_tries
- specifically call the *crushtool* output a histogram
- include a surface explanation of how PG placement calculation works
- more info on `choose_total_tries`
- small but complete example for explanatory purposes
- that way people can follow along locally and test out things
Edwin Rodriguez [Mon, 27 Oct 2025 12:13:40 +0000 (08:13 -0400)]
compressor/zstd: modernize ZSTD API usage in ZstdCompressor
Update ZstdCompressor to use the modern ZSTD compression context API
and improve error handling:
- Replace deprecated ZSTD_createCStream() with ZSTD_createCCtx()
- Use ZSTD_CCtx_reset() with ZSTD_reset_session_and_parameters flag
instead of separate ZSTD_reset_session_only and ZSTD_CCtx_refCDict()
calls. This consolidates session and parameter reset into a single
operation.
- Add proper return value checking for all ZSTD API calls using
ZSTD_isError() to catch compression failures
- Ensure proper cleanup with ZSTD_freeCCtx() on all error paths to
prevent memory leaks
- Update corresponding free function from ZSTD_freeCStream() to
ZSTD_freeCCtx()
These changes align with ZSTD's recommended API usage patterns and
improve robustness by properly handling potential compression errors.
Casey Bodley [Fri, 31 Oct 2025 15:26:20 +0000 (11:26 -0400)]
cmake: fix checks for WITH_SYSTEM_QATLIB
commit 30681236678c7ee006a699b658233388b0f884c8 introduced the cmake
option WITH_SYSTEM_QATLIB, but the checks were based on nonexistent
variable WITH_SYSTEM_QAT
Ville Ojamo [Sat, 25 Oct 2025 08:18:09 +0000 (15:18 +0700)]
doc: Pin pip to <25.3 for RTD as a workaround for pybind
Readthedocs now uses pip 25.3 by default which requires PEP 517.
src/pybind/* does not provide pyproject.toml files for PEP 517.
For an immediate workaround to allow RTD builds to succeed, pin pip
version to earlier than 25.3.
Details for pybind in https://tracker.ceph.com/issues/73645
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Naman Munet [Fri, 24 Oct 2025 05:59:09 +0000 (11:29 +0530)]
mgr/dashboard: Edit user via UI throwing multiple server errors
Fixes: https://tracker.ceph.com/issues/73637
Commit includes:
Returning the default user ratelimit when the ratelimit for user is not set, hence eliminating the 500 error on UI
Dan Mick [Thu, 23 Oct 2025 21:58:12 +0000 (14:58 -0700)]
container/build.sh: add 'rocky-10' suffix if necessary
'fromtag' is already available as distillation of the
FROM_IMAGE environment variable: everything after last
slash, s/:/-/. Use it as a suffix if it's anything other than
"centos-9stream" so that multiple CI container tags can coexist.
prik73 [Tue, 27 May 2025 19:04:37 +0000 (00:34 +0530)]
mgr/dashboard: remove hardcoded strings in About component
Replaced inline values for user role, version prefix, and
localStorage keys with shared constants (`USER`, `VERSION_PREFIX`,
and `LocalStorage.DASHBOARD_USERNAME`). This keeps things DRY and
makes future updates easier.
Addresses part of the constant reuse cleanup in the dashboard.
prik73 [Fri, 16 May 2025 17:05:10 +0000 (22:35 +0530)]
mgr/dashboard: fix misaligned text links on login page
Fixes a UI regression introduced after the Carbon update where
the help-related links (Help, Security, Trademarks) on the login
page were misaligned. The links are now left-aligned under the
Ceph logo for visual consistency.
Afreen Misbah [Tue, 21 Oct 2025 18:20:19 +0000 (23:50 +0530)]
mgr/dashboard: Fix timestamps in APIs
- remove 'Z' from rbd APIs which are returning now `aware` timestamp
- `datetime.utcfromtimestamp` is deprectated so using `datetime.fromtimestamp(timestamp, tz=tz=timezone.utc)` thereby returning only `aware` timestamp and removing 'Z'.
- similarly `datetime.utcnow()` is deprecated , migrated to `datetime.now(timezone.utc)`
Nitzan Mordechai [Wed, 22 Oct 2025 05:41:56 +0000 (05:41 +0000)]
tasks/cbt_performance: Tolerate exceptions during performance data updates
If an exception occurs during the POST request to update CBT performance,
log the error instead of failing the entire job. This ensures that
intermittent update failures do not block the main workflow.
Tom Sollers [Tue, 21 Oct 2025 15:52:07 +0000 (16:52 +0100)]
qa: Add a new test to test-erasure-code-plugins.sh to test for new health warn
This commit adds a new test to test-erasure-code-plugins.sh that
tests for the health warning caused by having a erasure-code-profile
with the blaum-roth technique and a w+1 value that is not prime.
Fixes: http://tracker.ceph.com/issues/64419 Signed-off-by: Tom Sollers <tom.sollers@ibm.com>
Tom Sollers [Thu, 2 Oct 2025 13:36:16 +0000 (14:36 +0100)]
mon: Add a new health warning for blaum-roth erasure code profiles with a w+1 that is not prime
This commit adds a new health warning for when a user has an erasure code
profile using the blaum-roth technique which has a w+1 value that is not
prime.
Fixes: http://tracker.ceph.com/issues/64419 Signed-off-by: Tom Sollers <tom.sollers@ibm.com>