Kefu Chai [Tue, 12 May 2026 09:17:56 +0000 (17:17 +0800)]
debian: drop explicit libprotobuf dependency from ceph-osd-crimson
The ceph-osd-crimson package already lists ${shlibs:Depends} in its
Depends field, which generates the correct libprotobuf dependency for
the target distribution at build time (e.g. libprotobuf32t64 on
Trixie/Noble). The hardcoded libprotobuf23 entry is redundant and
breaks installations on distributions where protobuf ships under a
different package name.
Kefu Chai [Sat, 9 May 2026 06:39:17 +0000 (14:39 +0800)]
rgw/d4n: fix deprecated async_run overload in RedisPool
The async_run overload taking a logger argument is deprecated since
Boost 1.89. Use the 2-arg async_run(config, token) overload when
building with Boost >= 1.89, and fall back to the 3-arg overload
for Boost 1.87-1.88.
See https://www.boost.org/doc/libs/1_89_0/libs/redis/doc/html/redis/reference/boost/redis/basic_connection/async_run-04.html
Jon Bailey [Thu, 7 May 2026 12:28:01 +0000 (13:28 +0100)]
doc: Clarification of text in ec stretch cluster design
Information regarding min_size in the EC Cluster Design doc was unclear in regards to the intention of what we want to develop. This commit is to clarify this so it is clear to readers.
Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
Kefu Chai [Sun, 29 Mar 2026 05:47:47 +0000 (13:47 +0800)]
crimson/osd: acquire throttle when scanning replica/primary for backfill
The backfill state machine called budget_available() before deciding to
scan, but request_primary_scan() and request_replica_scan() never
actually acquired the throttle slot. This meant scans could proceed
without any resource reservation, defeating the QoS intent of the
throttler introduced in 791772f1c0.
In this change, we fix this by acquiring the throttle before initiating
each scan.
John Mulligan [Mon, 20 Apr 2026 20:07:19 +0000 (16:07 -0400)]
mgr/smb: add --wildcard and --recursive to smb cluster rm
Add new --wildcard and --recursive flags to the smb cluster rm
subcommands. These allow deleting clusters in bulk. The --wildcard
option works like the same option for share rm in that it allows the use
of globbing for the cluster IDs, this includes '*' to delete all
clusters. The --recursive option tells the command to also delete all
child resources (shares) when deleting a cluster.
This was previously doable by streaming the output of `ceph smb show
...` through (sed or) jq and flipping the intent to removed and piping
that to `ceph smb apply` - but this is clearly not obvious nor easy to
document versus these new options.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 20 Apr 2026 19:14:56 +0000 (15:14 -0400)]
mgr/smb: add glob style wildcard support to matcher object
Add glob/wildcard support to the matcher type in the handler.py file.
This will be used in future changes to make matching shares and/or
clusters easier by supporting glob style wildcards on some commands.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
msgr/async: make msgr_active_connections counter a gauge
msgr_active_connections tracks the current number of active connections
rather than a monotonic total. Register it as a gauge so perf reset does
not zero it while live connections may still later decrement the value.
John Mulligan [Thu, 16 Apr 2026 17:47:04 +0000 (13:47 -0400)]
ceph.spec.in: add new --with pypkg to be passed on to cmake
Add a new --with pypkg option that passes WITH_PYPKG to cmake.
This allows building with the new (experimental) python packaging
support. If this proves useful a future change can consider enabling
by default in some conditions.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 13 Apr 2026 21:24:39 +0000 (17:24 -0400)]
python-common: update CMakeLists.txt to optionally use new packaging
Add support for invoking the new pep517 based packaging mode added
in a previous commit. Because this approach will not work on older
distros and there seems to be spotty support for the new packging
form on debian/ubuntu (when nested within a additional layer like
CMake) I am choosing not to enable the new stuff by default.
View with `git diff -w`
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 13 Apr 2026 21:24:46 +0000 (17:24 -0400)]
python-common: add a pyproject.toml file
In order to support the current python packaging standards we need
to have a pyproject.toml [1] file. This file defines the project's
metadata and build tool.
For continuity, I have left setuptools in place as the build backend
so the existing steup.py is still in play. I also experimented with
flit as a back-end. Flit seemed to work OK but I was a bit unsure
how distro support for it would be when we started to roll out this
option. Thus, to be safe I decided to stay with setuptools for now.
John Mulligan [Mon, 13 Apr 2026 21:24:24 +0000 (17:24 -0400)]
cmake/modules: add PythonPackage.cmake
Time is marching on and the state of the art with python packaging has
not stood still. In Python 3.12, distutils has been removed after being
deprecated for a couple of versions. According to the Python Packaging
User Guide [1]: "However, `python setup.py` and the use of `setup.py` as a
command line tool are deprecated."
Currently, ceph provides a decent sized and growing library of python
code in `src/python-common/ceph`. It currently relies on `setup.py` and
the deprecated `python setup.py install` command. This change aims to be
the first step in moving toward a more contemporary approach so that we
don't get caught late when the older approaches really stop working.
Because ceph's primary diver of "build stuff" is CMake, there was an
existing `cmake/modules/Distutils.cmake` that invokes a `python setup.py
install` command. Rather than risk breaking older distros we add a new
`cmake/modules/PythonPackage.cmake` file that uses the PEP 517/518
[2][3] style
of packaging. I could not find some existing CMake support for this
so unfortunately I had to write this.
The approach taken is loosely based on what the rpm build process does.
It invokes pip's wheel subcommand to build a wheel (during the build
phase) and then uses pip to install the wheel to install the content
to the system.
A future commit will add conditional support for using this approach
in src/python-common.
Jamie Pryde [Fri, 1 May 2026 09:45:42 +0000 (10:45 +0100)]
cmake: Fix ISA-L build on arm
A typo in CFLAGS means we're passing an empty string to configure_cmd.
We are then overwriting the build environment CFLAGS with our empty string CFLAGS,
which can result in build failures in certain environments, as seen in the tracker.
This fix gets any build environment CFLAGS and appends the other flags
we want to use when building ISA-L 2.32.0
mgr/dashboard: "Access Denied" being shown on overview page for read-only user
Fix: https://tracker.ceph.com/issues/76293 Signed-off-by: Devika Babrekar <devika.babrekar@ibm.com>
Alex Ainscow [Fri, 24 Apr 2026 14:57:55 +0000 (15:57 +0100)]
osd: Fix incorrect rollback logic for partial write OI
Before this fix, when rolling back an OI, the system used the OI
from the primary to rollback to. This is wrong if the previous
write was a partial write. This may have a few consequences
during recovery (although its not clear any are serious) and in
EC direct reads, where a false-positive version mismatch will be
detected.
The test provided recreates the issue.
The fix provided modifies the rollback as it is being written.
Fixes: https://tracker.ceph.com/issues/76213 Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
# Conflicts:
# src/test/osd/TestECFailoverWithPeering.cc
Alex Ainscow [Mon, 27 Apr 2026 16:46:40 +0000 (17:46 +0100)]
osd: Avoid assertion on empty object read when reading multiple objects
Tracker 75432 hits an assert which is attempting to protect the system
against hanging, due to generating a read request which sends no messages.
The assert fired because recovery was attempting to read multiple objects
in a single read request. One object did not require any further shard
reads in order to recover, while the other did. The consequence is that
the assert fired on one of the objects.
The problem is simply that the assert is in the wrong place.
Fixes: https://tracker.ceph.com/issues/75432 Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Kefu Chai [Wed, 6 May 2026 03:32:03 +0000 (11:32 +0800)]
test/osd: add perf test for calc_pg_upmaps with mutual overfull upmap pairs
Reproduces the scenario from https://tracker.ceph.com/issues/63137: 500
cohort OSDs all at deviation +2, linked by pg_upmap_items rings with R=5
pairs each. Before the fix, every incoming pair triggered a test_change
call that got rejected -- calc_pg_upmaps took ~11 minutes. After, it
exits in ~200ms.
Hu-Yuxuan [Mon, 9 Oct 2023 09:49:54 +0000 (17:49 +0800)]
osd/OSDMap: skip upmap-items drop when um_from would become equally overfull
Improves pg-upmap balancer convergence speed on clusters where many OSDs
are mutually overfull and linked by pg_upmap_items -- a common pattern
after a scale-up that increases replica count.
When try_drop_remap_overfull finds an incoming [um_from -> osd] pair on an
overfull osd, it called test_change to check whether dropping the pair would
improve distribution. If um_from carries the same excess load as osd,
though, dropping just returns the PG to an equally crowded OSD -- test_change
rejects the move anyway. With many such mutual pairs, the wasted calls add
up to minutes before the balancer gives up.
This adds an early check: skip test_change when um_from would end up at
least as overfull as osd after receiving the PG back. This eliminates
O(n_cohort * R) wasted test_change calls per balancer pass and reduces
calc_pg_upmaps from minutes to milliseconds in the affected scenario.
This commit removes centos9 from crimson's supported distros. This is in
line with the wider ceph moving on to rocky10 from centos9. We have
established that crimson is compatible with rocky10. More details can be
found in this tracker: https://tracker.ceph.com/issues/75823. Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
Ronen Friedman [Sat, 2 May 2026 15:53:49 +0000 (15:53 +0000)]
crimson/osd: defer snap trimming while scrubbing
Classic OSD enforces mutual exclusion between scrubbing and snap
trimming via the WaitScrub state in the snap trim state machine.
Crimson was missing this, allowing both to run concurrently on the
same PG (visible as active+clean+scrubbing+deep+snaptrim), which
could prevent snap trimming from completing within the expected
timeout.
Defer snap trim initiation while PG_STATE_SCRUBBING is set, and
re-trigger it from notify_scrub_end() via kick_snap_trim().
This is a temporary fix until the full scrub scheduling code,
including is_scrub_queued_or_active(), is merged.
Add a handler for Transaction::OP_MERGE_COLLECTION in SeaStore's
_do_transaction_step() and the corresponding _merge_collection()
implementation. Since coll_t is not part of the onode key, no
onode re-keying is needed: the operation updates the destination
collection's split_bits and removes the source collection from the
collection B-tree, all within a single transaction.
Add a handler for Transaction::OP_MERGE_COLLECTION in CyanStore's
do_transaction_no_callbacks() and the corresponding _merge_collection()
implementation. Moves all objects from the source collection into the
destination, updates the destination's split_bits, and removes the
source collection from the store's collection map.
osd/SnapMapper::update_snaps() to handle a missing OBJ_ record
by falling back to add_oid() instead of silently creating an
inconsistent state (OBJ_ without matching SNA_ entries). This
was observed on replicas that had recently recovered objects:
the snap mapper entries created during recovery were not visible
to a subsequent snap-trim repop's update_snaps() call, leaving
the clone with no snap mapper entries. Scrub would then detect
and report the inconsistency as an error.
Promote snap mapper remove_oid/clear_snaps logging to dout(10)
and add apply_op_stats tracing to aid diagnosis of any remaining
stat or snap mapper drift.
Gil Bregman [Tue, 5 May 2026 08:53:25 +0000 (11:53 +0300)]
mgr/dashboard: Allow empty port value when adding a listener in NVMEoF CLI Fixes: https://tracker.ceph.com/issues/76410 Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
Kushal Deb [Tue, 5 May 2026 05:30:12 +0000 (11:00 +0530)]
mgr/cephadm: allow nvmeof group assignment for NVMe-oF services
NVMe-oF services created in older releases may have a service_id that does
not include the gateway group name, because those services could be created
without a group.
The current validation requires an NVMe-oF service_id to end with
the configured group name when spec.group is set. This is correct for new
services, but it blocks the documented upgrade flow for existing legacy
services where the user exports the existing spec, adds spec.group, and
applies it back.
Allow this narrow update path when the service already exists, the stored
spec has no group, and the incoming spec keeps the same service_id while
adding a non-empty group. Keep the existing validation for new services and
for services that already have a group, so duplicate or misleading group
configurations are still rejected.
rgw/test: add Journal mode support to bucket logging test suite
Add --logging-type flag to run the Python bucket logging test suite
in either Standard or Journal mode. The same tests run against both
logging types with no changes to test logic or assertions.
- Add --logging-type pytest CLI option (Standard default, Journal opt-in)
- Detect boto3 LoggingType extension availability at session startup
- Thread logging_type through helpers and test functions
- Add teuthology task YAML for Journal mode suite runs
- Install service-2.sdk-extras.json in the teuthology task when
logging_type is Journal (s3tests cleans it up after its own run,
so the file isn't available by the time our Journal job runs)
- Document Journal mode local usage in the test suite README