Shweta Bhosale [Mon, 11 May 2026 10:02:14 +0000 (15:32 +0530)]
mgr/nfs: reuse CephfsClient for path checks and earmark resolver
cephfs_path_is_dir defined an inner function decorated with lru_cache, so
each call got a new function object and an empty cache, CephfsClient(mgr)
ran every time. Moved caching to module-level cephfs_client_for_mgr(mgr)
and call it from cephfs_path_is_dir.
Passed that shared client into CephFSEarmarkResolver from the NFS module so
export create/apply does not construct a separate CephfsClient for
earmarks.
Kefu Chai [Sat, 9 May 2026 06:39:17 +0000 (14:39 +0800)]
rgw/d4n: fix deprecated async_run overload in RedisPool
The async_run overload taking a logger argument is deprecated since
Boost 1.89. Use the 2-arg async_run(config, token) overload when
building with Boost >= 1.89, and fall back to the 3-arg overload
for Boost 1.87-1.88.
See https://www.boost.org/doc/libs/1_89_0/libs/redis/doc/html/redis/reference/boost/redis/basic_connection/async_run-04.html
Jon Bailey [Thu, 7 May 2026 12:28:01 +0000 (13:28 +0100)]
doc: Clarification of text in ec stretch cluster design
Information regarding min_size in the EC Cluster Design doc was unclear in regards to the intention of what we want to develop. This commit is to clarify this so it is clear to readers.
Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
John Mulligan [Mon, 20 Apr 2026 20:07:19 +0000 (16:07 -0400)]
mgr/smb: add --wildcard and --recursive to smb cluster rm
Add new --wildcard and --recursive flags to the smb cluster rm
subcommands. These allow deleting clusters in bulk. The --wildcard
option works like the same option for share rm in that it allows the use
of globbing for the cluster IDs, this includes '*' to delete all
clusters. The --recursive option tells the command to also delete all
child resources (shares) when deleting a cluster.
This was previously doable by streaming the output of `ceph smb show
...` through (sed or) jq and flipping the intent to removed and piping
that to `ceph smb apply` - but this is clearly not obvious nor easy to
document versus these new options.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 20 Apr 2026 19:14:56 +0000 (15:14 -0400)]
mgr/smb: add glob style wildcard support to matcher object
Add glob/wildcard support to the matcher type in the handler.py file.
This will be used in future changes to make matching shares and/or
clusters easier by supporting glob style wildcards on some commands.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
msgr/async: make msgr_active_connections counter a gauge
msgr_active_connections tracks the current number of active connections
rather than a monotonic total. Register it as a gauge so perf reset does
not zero it while live connections may still later decrement the value.
Jamie Pryde [Fri, 1 May 2026 09:45:42 +0000 (10:45 +0100)]
cmake: Fix ISA-L build on arm
A typo in CFLAGS means we're passing an empty string to configure_cmd.
We are then overwriting the build environment CFLAGS with our empty string CFLAGS,
which can result in build failures in certain environments, as seen in the tracker.
This fix gets any build environment CFLAGS and appends the other flags
we want to use when building ISA-L 2.32.0
mgr/dashboard: "Access Denied" being shown on overview page for read-only user
Fix: https://tracker.ceph.com/issues/76293 Signed-off-by: Devika Babrekar <devika.babrekar@ibm.com>
Alex Ainscow [Mon, 27 Apr 2026 16:46:40 +0000 (17:46 +0100)]
osd: Avoid assertion on empty object read when reading multiple objects
Tracker 75432 hits an assert which is attempting to protect the system
against hanging, due to generating a read request which sends no messages.
The assert fired because recovery was attempting to read multiple objects
in a single read request. One object did not require any further shard
reads in order to recover, while the other did. The consequence is that
the assert fired on one of the objects.
The problem is simply that the assert is in the wrong place.
Fixes: https://tracker.ceph.com/issues/75432 Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Kefu Chai [Wed, 6 May 2026 03:32:03 +0000 (11:32 +0800)]
test/osd: add perf test for calc_pg_upmaps with mutual overfull upmap pairs
Reproduces the scenario from https://tracker.ceph.com/issues/63137: 500
cohort OSDs all at deviation +2, linked by pg_upmap_items rings with R=5
pairs each. Before the fix, every incoming pair triggered a test_change
call that got rejected -- calc_pg_upmaps took ~11 minutes. After, it
exits in ~200ms.
Hu-Yuxuan [Mon, 9 Oct 2023 09:49:54 +0000 (17:49 +0800)]
osd/OSDMap: skip upmap-items drop when um_from would become equally overfull
Improves pg-upmap balancer convergence speed on clusters where many OSDs
are mutually overfull and linked by pg_upmap_items -- a common pattern
after a scale-up that increases replica count.
When try_drop_remap_overfull finds an incoming [um_from -> osd] pair on an
overfull osd, it called test_change to check whether dropping the pair would
improve distribution. If um_from carries the same excess load as osd,
though, dropping just returns the PG to an equally crowded OSD -- test_change
rejects the move anyway. With many such mutual pairs, the wasted calls add
up to minutes before the balancer gives up.
This adds an early check: skip test_change when um_from would end up at
least as overfull as osd after receiving the PG back. This eliminates
O(n_cohort * R) wasted test_change calls per balancer pass and reduces
calc_pg_upmaps from minutes to milliseconds in the affected scenario.
This commit removes centos9 from crimson's supported distros. This is in
line with the wider ceph moving on to rocky10 from centos9. We have
established that crimson is compatible with rocky10. More details can be
found in this tracker: https://tracker.ceph.com/issues/75823. Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
Ronen Friedman [Sat, 2 May 2026 15:53:49 +0000 (15:53 +0000)]
crimson/osd: defer snap trimming while scrubbing
Classic OSD enforces mutual exclusion between scrubbing and snap
trimming via the WaitScrub state in the snap trim state machine.
Crimson was missing this, allowing both to run concurrently on the
same PG (visible as active+clean+scrubbing+deep+snaptrim), which
could prevent snap trimming from completing within the expected
timeout.
Defer snap trim initiation while PG_STATE_SCRUBBING is set, and
re-trigger it from notify_scrub_end() via kick_snap_trim().
This is a temporary fix until the full scrub scheduling code,
including is_scrub_queued_or_active(), is merged.
Add a handler for Transaction::OP_MERGE_COLLECTION in SeaStore's
_do_transaction_step() and the corresponding _merge_collection()
implementation. Since coll_t is not part of the onode key, no
onode re-keying is needed: the operation updates the destination
collection's split_bits and removes the source collection from the
collection B-tree, all within a single transaction.
Add a handler for Transaction::OP_MERGE_COLLECTION in CyanStore's
do_transaction_no_callbacks() and the corresponding _merge_collection()
implementation. Moves all objects from the source collection into the
destination, updates the destination's split_bits, and removes the
source collection from the store's collection map.
osd/SnapMapper::update_snaps() to handle a missing OBJ_ record
by falling back to add_oid() instead of silently creating an
inconsistent state (OBJ_ without matching SNA_ entries). This
was observed on replicas that had recently recovered objects:
the snap mapper entries created during recovery were not visible
to a subsequent snap-trim repop's update_snaps() call, leaving
the clone with no snap mapper entries. Scrub would then detect
and report the inconsistency as an error.
Promote snap mapper remove_oid/clear_snaps logging to dout(10)
and add apply_op_stats tracing to aid diagnosis of any remaining
stat or snap mapper drift.
Gil Bregman [Tue, 5 May 2026 08:53:25 +0000 (11:53 +0300)]
mgr/dashboard: Allow empty port value when adding a listener in NVMEoF CLI Fixes: https://tracker.ceph.com/issues/76410 Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
rgw/test: add Journal mode support to bucket logging test suite
Add --logging-type flag to run the Python bucket logging test suite
in either Standard or Journal mode. The same tests run against both
logging types with no changes to test logic or assertions.
- Add --logging-type pytest CLI option (Standard default, Journal opt-in)
- Detect boto3 LoggingType extension availability at session startup
- Thread logging_type through helpers and test functions
- Add teuthology task YAML for Journal mode suite runs
- Install service-2.sdk-extras.json in the teuthology task when
logging_type is Journal (s3tests cleans it up after its own run,
so the file isn't available by the time our Journal job runs)
- Document Journal mode local usage in the test suite README
The LibRadosAio.PoolEIOFlag test was unstable, skip due to:
- Test Unreliability (Timing Dependency): The test used a fixed
iteration count for IO submission and finished too quickly, missing the
target error.
* Fix: The submission loop now runs continuously (time-bounded)
until the EIO error is reliably caught, eliminating the timing issue.
Nitzan Mordechai [Wed, 26 Nov 2025 14:36:42 +0000 (14:36 +0000)]
aio_cxx: Fix mutual deadlock in PoolEIOflag test
The LibRadosAio.PoolEIOFlag test was unstable, it was hanging due to:
- Deadlock: The main thread held the shared mutex ('my_lock') while
calling thread join. This created a mutual deadlock.
* Fix: Mutex is unlocked before thread join using RAII scopes.
- also convert to std::jthread and drop the join
seastore/omap_manager/btree: change omap manager funcs to coroutines
This commit changes funcs in BTree OMap manager to coroutines. Apart
from cleaner code that's easier to follow this is done to fix ASan
heap-use-after-free asserts.
Example QA job with the error: https://pulpito.ceph.com/shraddhaag-2026-04-20_07:04:25-crimson-rados-main-distro-debug-trial/164374/ Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
seastore/omap_manager/btree: change node insert/del funcs to coroutines
This commit changes OMapLeafNode and OMapInnerNode funcs to coroutines
to improve readability and prevent any ASan heap-use-after-free asserts. Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
* refs/pull/67536/head:
qa/multisite: enable the multisite test for oidc.
rgw/oidc: plumb RGWObjVersionTracker through load/store for race detection
rgw/oidc rados: add rgwrados::oidcs namespace abstraction for cls_user for accounts.
rgw/rest-oidc: Forward all oidc mutation request to master zone.
rgw/oidc: add rgwrados::oidc interface to support multisite.