Adam Kupczyk [Thu, 25 Sep 2025 07:03:12 +0000 (03:03 -0400)]
extblkdev/fcm: Refuse to operate on multimedia lvm block devices
BlueStore is selecting were data is put to the device.
Merging 2 FCM devices together means that BlueStore will see free space
on one of the devices, but not know the other is full and asking to put
data there. It will cause -ENOSPC while free space is reported.
Adam Kupczyk [Thu, 22 Jan 2026 15:23:56 +0000 (15:23 +0000)]
os/bluestore: Add config bluestore_use_ebd
When EBD(extblkdev) plugin is in use usually it needs to present all the time.
For bluestore deployed with EBD plugin it makes it an error if bluestore tries
to mount and EBD plugin is not present.
Preload of extblkdev plugins was misplaced.
Moved loading plugins into BlueStore.
This way both OSD and tools can load plugins.
Plugins are now loaded only:
- before mkfs
- when extblkdev plugin is signalled in label meta
* refs/pull/68347/head:
nvmeofgw: propagate quorum feature to the NVMeofMonClient,
fix upgrade
code review changes
nvmeofgw: disaster set/clear command, introduced disaster-locations map,
nvmeofgw: added support to nvmeof stretched cluster:
nvmeofgw: prevent map corruption while processing beacons from deleted gws
mon: add NVMEOF_BEACON_DIFF to mon_feature_t and mon CompatSet
nvmeofgw: beacon diff implementation in the monitor and in the MonClient.
Reviewed-by: Alexander Indenbaum <aindenba@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@ibm.com> Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
* refs/pull/67321/head:
qa: set column for insertion
qa: bail sqlite3 on any error
qa: use actual sqlite3 blob instead of string
test: use json_extract instead of awkward json_tree
Patrick Donnelly [Thu, 30 Apr 2026 21:01:45 +0000 (14:01 -0700)]
Merge PR #68371 into tentacle
* refs/pull/68371/head:
qa/tasks/pykmip: archive pykmip log after server down
qa/tasks/pykmip: use OpenSSL names instead IANA
qa/tasks/pykmip: drop py2 deps
Revert "qa/rgw/crypt: disable failing kmip testing"
Patrick Donnelly [Thu, 30 Apr 2026 21:00:45 +0000 (14:00 -0700)]
Merge PR #66358 into tentacle
* refs/pull/66358/head:
rgw/auth: a forwarded CreateBucket request in case of multisite has an empty
rgw/s3: Always include x-amz-content-sha256 header in AWS v4 signatures for S3 compatibility
Patrick Donnelly [Thu, 30 Apr 2026 15:52:36 +0000 (08:52 -0700)]
Merge PR #68512 into tentacle
* refs/pull/68512/head:
mgr/dashboard: sync policy created for a bucket in Object >> Multi-site >> Sync-policy, is not reflecting under bucket's replication
Patrick Donnelly [Thu, 30 Apr 2026 15:51:59 +0000 (08:51 -0700)]
Merge PR #67840 into tentacle
* refs/pull/67840/head:
mgr/dashboard: Fix make check failures
mgr/dashboard: Round off y-axis value of area chart
mgr/dashboard: Fix padding of overview page
mgr/dashboard: Add capacity thresholds
mgr/dashboard: Fix loading states in storage overview card
mgr/dashboard: Add tootltip to storage overview
mgr/dashboard: Fixing message when prometheus is disabled in performance charts
mgr/dashboard: show miscellaneous data used
mgr/dashboard: fix consumption chart units
mgr/dashboard: rename expand-cluster to add-storage
mgr/dashboard: update onboarding screen as per design
mgr/dashboard: Fix scrubbing state
mgr/dashboard: Fix snapshot Api firing twice
mgr/dashboard: Add data resileincy panel
mgr/dashboard: Add data resileincy card
mgr/dashboard: add storage consumption card
mgr/dashboard : update telemetry notification for simple mode
mgr/dashboard:revamp on-borading screen
mgr/dashboard: Generic Performace Chart - Carbon
mgr/dashboard: Add filtering of alerts via route
mgr/dashboard: Add skeleton states for alerts card
mgr/dashboard: Fix css in alerts card
mgr/dashboard: Fix breaking layout in overview page
mgr/dashboard: Add hardware tab to health card
mgr/dashboard: Added variations of alerts card sub total layout
mgr/dashboard: Css fixes for health card and alerts card
fix for quorum in API
mgr/dashboard: Add systems tab to health card
mgr/dashboard: Add alerts card
mgr/dashboard: Add health check panel
mgr/dashboard: Add health card
mgr/dashboard: side-panel enhancements
mgr/dashboard: introduce side panel as a reusable component
mgr/dashboard: Removed Raw capacity toggle
mgr/dashboard: Added unit tests
Added qurey data
mgr/dashboard: Added tool definition tip
Added query tital and used capacity data
mgr/dashboard: Add storage card to overview page
Leonid Chernin [Tue, 17 Mar 2026 15:40:16 +0000 (17:40 +0200)]
nvmeofgw: propagate quorum feature to the NVMeofMonClient,
reverted feature bit NVMEOF_BEACON_DIFF:
-NVMeofGwMon adds a quorum_features indication to the MonClient map.
-MonClient initially sends beacons without applying the BEACON_DIFF logic.
-MonClient begins applying the BEACON_DIFF logic only when the BEACON_DIFF bit
is set in the quorum_features field of the NVMeoF monitor map.
-added mon commands:
nvme-gw set beacon-diff disable
nvme-gw set beacon-diff enable
-performed changes in encode/decode of the BEACON_DIFF feature
-reverted NVMEOF_BEACON_DIFF bit
Leonid Chernin [Thu, 23 Oct 2025 05:48:24 +0000 (08:48 +0300)]
nvmeofgw: added support to nvmeof stretched cluster:
GW commands added : set location and set admin state enable/disable
added start-failback <location> command.
failover logic is impacted by GW location
implement GW admin commands enable/disable
added map for location-failback-in-progress
failback between locations happens only by monitor command
implemented new ana-group relocation process used
when inter-location failback command sent
added upgrade rules
Leonid Chernin [Mon, 8 Dec 2025 20:54:44 +0000 (22:54 +0200)]
nvmeofgw: prevent map corruption while processing beacons from deleted gws
Fix race issue of map corruption when deleted gw sends beacons
but this gw data was removed from pending map and still exists in map.
Process beacons only if GW's data exists in both maps:
main-map and pending-map, otherwise just ignore beacons.
Samuel Just [Thu, 6 Nov 2025 23:54:50 +0000 (23:54 +0000)]
mon: add NVMEOF_BEACON_DIFF to mon_feature_t and mon CompatSet
NOPE NOPE
In order for the client to safely send BEACON_DIFF messages, it
needs to be the case that the leader at the time of receipt will
support BEACON_DIFF.
Simply using the connection features for the MonClient's target mon is
insufficient, because it might be a peon. If the peon supports
BEACON_DIFF and the leader does not the leader will either crash or
interpret it as a full BEACON. Neither outcome is acceptable.
Instead, we need to wire up a feature bit to the MonMap mon_feature_t
members and the CompatSet.
Adding FEATURE_BEACON_DIFF to ceph::features::mon get_supported()
and get_persistent() ensures that once all monitors in the quorum
support it, MonMap::get_required_features() will include it.
See Elector::propose_to_peers, Monitor::(win|lose)_election,
MonmapMonitor::apply_mon_features.
Once FEATURE_BEACON_DIFF is present in MonMap::get_required_features():
- Monitor::apply_monmap_to_compatset_features() will prevent
downgrades of the monitors by updating the CompatSet to include
CEPH_MON_FEATURE_INCOMPAT_NVMEOF_BEACON_DIFF
- Monitor::calc_quorum_requirements() will set
Monitor::required_features to require the NVMEOF_BEACON_DIFF
for any monitor peers.
- MonClient::get_monmap_required_features() will eventually include
ceph::features::mon::FEATURE_NVMEOF_BEACON_DIFF.
Leonid Chernin [Mon, 15 Sep 2025 11:04:04 +0000 (14:04 +0300)]
nvmeofgw: beacon diff implementation in the monitor and in the MonClient.
-monclient encodes subsystems by beacon-diff rules if BEACON_DIFF
bit is enabled by quorum
-monitor processes beacons by beacon-diff new schema
-monitor detects sequence out of order(ooo) condition and handles it
-in case ooo detected monitor send ack to the gw with the expected correct sequence
-monitor skips failovers for some interval when ooo detected
-monitor ignores all becons with incorrect sequences until gw sends expected one
-coding upgrade rules
Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> Fixes: https://tracker.ceph.com/issues/72394
(cherry picked from commit 3555a28e45c5b44289f12abe2fc843e21c7ebf87)
Conflicts:
src/mon/NVMeofGwMon.h
check_beacon_timeout() was in tentacle, and in main, but wasn't a part of this commit.
Conflict resolution - this should be there because it is required for fast failover
src/test/CMakeLists.txt
the problem was that there is an empty line at the end of the file in main, but not in tentacle
Patrick Donnelly [Tue, 28 Apr 2026 19:26:46 +0000 (12:26 -0700)]
Merge PR #68357 into tentacle
* refs/pull/68357/head:
qa/suites/upgrade/telemetry-upgrade: ignore expected health warning
qa/suites/orch/cephadm: replace "reef" with "v18.2.8"
qa/suites/fs/upgrade/mds_upgrade_sequence: replace "reef" with "v18.2.8"
qa/suites/upgrade: use tagged versions of reef
rgw: read_obj_policy() consults s3:prefix when deciding between 403/404
when read_obj_policy() gets ENOENT, it only returns 404 NoSuchKey if the
requester has s3:ListBucket permission. however, policy that allows
s3:ListBucket may be conditional on the s3:prefix to restrict listings
to certain paths/object names. add the requested object name to the iam
environment as s3:prefix to match aws behavior here
Patrick Donnelly [Mon, 27 Apr 2026 20:46:49 +0000 (16:46 -0400)]
Merge PR #68148 into tentacle
* refs/pull/68148/head:
qa: ignore NVMEOF_GATEWAY_DOWN in nvmeof_scalability.yaml
qa/tasks/nvmeof.py: retry do_check if gw in CREATED
qa/tasks/nvmeof.py: Fix tharsher daemon_rm revival
Patrick Donnelly [Mon, 27 Apr 2026 19:28:45 +0000 (15:28 -0400)]
Merge PR #68038 into tentacle
* refs/pull/68038/head:
librbd: store CRC32C with initial value -1 to match msgr2 validation
librbd: add rbd_aio_write_with_crc32c API for precomputed checksums
Patrick Donnelly [Wed, 22 Apr 2026 17:22:51 +0000 (13:22 -0400)]
Merge PR #67418 into tentacle
* refs/pull/67418/head:
mgr/cephadm: validate hostname in NodeProxyCache
node-proxy: improve HTTP error logging in client
node-proxy: get serial number instead of SKU
node-proxy: allow multiple sources per component
node-proxy: re-auth and retry once on 401
node-proxy: fix flake8 E721 in _dict_diff
node-proxy: make the update loop interval configurable
mgr/node-proxy: fix "ceph orch hardware status --category criticals"
node-proxy: normalize storage data per member
node-proxy: encapsulate send logic in dedicated method
node-proxy: log actual data delta in reporter
node-proxy: add periodic heartbeats in main and reporter loops
node-proxy: adjust log levels
node-proxy: add unit tests
node-proxy: add tox config for mypy, flake8, isort, black
node-proxy: black and isort formatting pass
node-proxy: fix mypy errors
node-proxy: handle nested Redfish paths for components
node-proxy: split out config, bootstrap and redfish logic
node-proxy: refactor config loading
node-proxy: add 'vendor based' redfish system selection
node-proxy: introduce component spec registry and overrides for updates
mgr/cephadm: safe status/health access in node-proxy agent and inventory
node-proxy: narrow build_data exception handling and re-raise
node-proxy: refactor Endpoint/EndpointMgr and fix chassis paths
node-proxy: use safe field access in storage update
node-proxy: reduce log verbosity for missing optional fields
Patrick Donnelly [Wed, 22 Apr 2026 17:21:13 +0000 (13:21 -0400)]
Merge PR #67343 into tentacle
* refs/pull/67343/head:
ceph-volume: fix test_reject_readonly_device unit test
ceph-volume: single lvs call to speed up exclude_lvm_osd_devices
ceph-volume: avoid Device() instantiation in lvm OSD filtering
ceph-volume: avoid RuntimeError on ceph-volume raw list with non-existent loop devices
Patrick Donnelly [Wed, 22 Apr 2026 14:38:38 +0000 (10:38 -0400)]
Merge PR #67850 into tentacle
* refs/pull/67850/head:
mgr, qa: clarify module checks in DaemonServer
mgr, qa: add `pending_modules` to asock command
mgr, common, qa, doc: issue health error after max expiration is exceeded
mgr: ensure that all modules have started before advertising active mgr
mgr/dashboard: Round off y-axis value of area chart
- by default y-axos set to 1 for all
- the value round off for area chart is seperated from y-axis ticks
- also fixes a bug where all IOPS y-ticks being repeated 1,1,0,0
Laura Flores [Fri, 12 Sep 2025 20:14:30 +0000 (20:14 +0000)]
mgr, qa: clarify module checks in DaemonServer
The current check groups modules not being
enabled with failing to initialize. In this commit,
we reorder the checks:
1: Screen for a module being enabled. If it's not,
issue an EOPNOTSUPP with instructions on how
to enable it.
2. Screen for if a module is active. If a module
is enabled, then the cluster expects it to
be active to support commands. If the module
took too long to initialize though, we will
catch this and issue an ETIMEDOUT error with
a link for troubleshooting.
Now, these two separate issues are not grouped
together, and they are checked in the right order.
Laura Flores [Thu, 11 Sep 2025 22:13:51 +0000 (22:13 +0000)]
mgr, qa: add `pending_modules` to asock command
Now, the command `ceph tell mgr mgr_status` will show a
"pending_modules" field. This is another way for Ceph operators
to check which modules haven't been initalized yet (in addition
to the health error).
This command was also added to testing scenarios in the workunit.
Laura Flores [Tue, 29 Jul 2025 22:46:46 +0000 (22:46 +0000)]
mgr, common, qa, doc: issue health error after max expiration is exceeded
----------------- Enhancement to the Original Fix -----------------
During a mgr failover, the active mgr is marked available if:
1. The mon has chosen a standby to be active
2. The chosen active mgr has all of its modules initialized
Now that we've improved the criteria for sending the "active" beacon
by enforcing it to retry initializing mgr modules, we need to account
for extreme cases in which the modules are stuck loading for a very long
time, or even indefinitely. In these extreme cases where the modules might
never initialize, we don't want to delay sending the "active" beacon for
too long. This can result in blocking other important mgr functionality,
such as reporting PG availability in the health status. We want
to avoid sending warnings about PGs being unknown in the health status when
that's not ultimately the problem.
To account for an exeptionally long module loading time, I added a new
configurable `mgr_module_load_expiration`. If we exceed this maximum amount
of time (in ms) allotted for the active mgr to load the mgr modules before declaring
availability, the standby will then proceed to mark itself "available" and
send the "active" beacon to the mon and unblock other critical mgr functionality.
If this happens, a health error will be issued at this time, indicating
which mgr modules got stuck initializing (See src/mgr/PyModuleRegistry.cc). The
idea is to unblock the rest of the mgr's critical functionality while making it
clear to Ceph operators that some modules are unusable.
The workunit was rewritten so it tests for these scenarios:
1. Normal module loading behavior (no health error should be issued)
2. Acceptable delay in module loading behavior (no health error should be
issued)
3. Unacceptable delay in module loading behavior (a health error should be
issued)
This documentation explains the "Module failed to initialize"
cluster error.
Users are advised to try failing over
the mgr to reboot the module initialization process,
then if the error persists, file a bug report. I decided
to write it this way instead of providing more complex
debugging tips such as advising to disable some mgr modules
since every case will be different depending on which modules
failed to initialize.
In the bug report, developers can ask for the health detail
output to narrow down which module is causing a bottleneck,
and then ask the user to try disabling certain modules until
the mgr is able to fully initialize.
Laura Flores [Fri, 25 Apr 2025 22:11:19 +0000 (22:11 +0000)]
mgr: ensure that all modules have started before advertising active mgr
----------------- Explanation of Problem ----------------
When the mgr is restarted or failed over via `ceph mgr fail` or during an
upgrade, mgr modules sometimes take longer to start up (this includes
loading their class, commands, and module options, and being removed
from the `pending_modules` map structure). This startup delay can happen
due to a cluster's specific hardware or if a code bottleneck is triggered in
a module’s `serve()` function (each mgr module has a `serve()` function that
performs initialization tasks right when the module is loaded).
When this startup delay occurs, any mgr module command issued against the
cluster around the same time fails with error saying that the command is not
supported:
```
$ ceph mgr fail; ceph fs volume ls
Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date
Module 'volumes' is not enabled/loaded (required by command 'fs volume ls'): use `ceph mgr module enable volumes` to enable it
```
We should try to lighten any bottlenecks in the mgr module `serve()`
functions wherever possible, but the root cause of this failure is that the
mgr sends a beacon to the mon too early, indicating that it is active before
the module loading has completed. Specifically, some of the mgr modules
have loaded their class but have not yet been deleted from the `pending_modules`
structure, indicating that they have not finished starting up.
--------------------- Explanation of Fix --------------------
This commit improves the criteria for sending the “active” beacon to the mon so
the mgr does not signal that it’s active too early. We do this through the following additions:
1. A new context `ActivePyModules::recheck_modules_start` that will be set if not all modules
have finished startup.
2. A new function `ActivePyModules::check_all_modules_started()` that checks if modules are
still pending startup; if all have started up (`pending_modules` is empty), then we send
the beacon right away. But if some are still pending, we pass the beacon task on to the new
recheck context `ActivePyModules::recheck_modules_start` so we know to send the beacon later.
3. Logic in ActivePyModules::start_one() that only gets triggered if the modules did not all finish
startup the first time we checked. We know this is the case if the new recheck context
`recheck_modules_start` was set from `nullptr`. The beacon is only sent once `pending_modules` is
confirmed to be empty, which means that all the modules have started up and are ready to support commands.
4. Adjustment of when the booleans `initializing` and `initialized` are set. These booleans come into play in
MgrStandby::send_beacon() when we check that the active mgr has been initialized (thus, it is available).
We only send the beacon when this boolean is set. Currently, we set these booleans at the end of Mgr::init(),
which means that it gets set early before `pending_modules` is clear. With this adjustment, the bools are set
only after we check that all modules have started up. The send_beacon code is triggered on mgr failover AND on
every Mgr::tick(), which occurs by default every two seconds. If we don’t adjust when these bools are set, we
only fix the mgr failover part, but the mgr still sends the beacon too early via Mgr::tick(). Below is the relevant
code from MgrStandby::send_beacon(), which is triggered in Mgr::background_init() AND in Mgr::tick():
```
// Whether I think I am available (request MgrMonitor to set me
// as available in the map)
bool available = active_mgr != nullptr && active_mgr->is_initialized();
auto addrs = available ? active_mgr->get_server_addrs() : entity_addrvec_t();
dout(10) << "sending beacon as gid " << monc.get_global_id() << dendl;
```
--------------------- Reproducing the Bug ----------------------
At face value, this issue is indeterministically reproducible since it
can depend on environmental factors or specific cluster workloads.
However, I was able to deterministically reproduce it by injecting a
bottleneck into the balancer module:
```
diff --git a/src/pybind/mgr/balancer/module.py b/src/pybind/mgr/balancer/module.py
index d12d69f..91c83fa8023 100644
--- a/src/pybind/mgr/balancer/module.py
+++ b/src/pybind/mgr/balancer/module.py
@@ -772,10 +772,10 @@ class Module(MgrModule):
self.update_pg_upmap_activity(plan) # update pg activity in `balancer status detail`
self.optimizing = False
+ # causing a bottleneck
+ for i in range(0, 1000):
+ for j in range (0, 1000):
+ x = i + j
+ self.log.debug("hitting the bottleneck in the balancer module")
self.log.debug('Sleeping for %d', sleep_interval)
self.event.wait(sleep_interval)
self.event.clear()
```
Then, the error reproduces every time by running:
```
$ ./bin/ceph mgr fail; ./bin/ceph telemetry show
Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date
Module 'telemetry' is not enabled/loaded (required by command 'telemetry show'): use `ceph mgr module enable telemetry` to enable it
```
With this fix, the active mgr is marked as "initialized" only after all
the modules have started up, and this error goes away. The command may
take a bit longer to execute depending on the extent of the delay.
This commit adds a dev-only config that can inject a longer
loading time into the mgr module loading sequence so we can
simulate this scenario in a test.
The config is 0 ms by default since we do not add any delay
outside of testing scenarios. The config can be adjusted
with the following command:
`ceph config set mgr mgr_module_load_delay <ms>`
A second dev-only config also allows you to specify which
module you want to be delayed in loading time. You may change
this with the following command:
`ceph config set mgr mgr_module_load_delay_name <module name>`
The workunit added here tests a simulated slow loading module
scenario to ensure that this case is properly handled.