ftbfs:
```
/home/jenkins-build/build/workspace/ceph-pull-requests/src/neorados/cls/fifo/detail/fifo.h:630:14: error: no member named 'parse' in namespace 'ceph'; did you mean 'pause'?
630 | auto n = ceph::parse<decltype(m.num)>(num);
| ^~~~~~~~~~~
```
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
(cherry picked from commit 3df1133c17b174c27c250cf7ac018199cc40b15b) Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Mon, 30 Jun 2025 20:54:46 +0000 (16:54 -0400)]
rgw/datalog: Manage and shutdown tasks properly
This is slightly ugly but good enough for now. Make sure we can block
when shutting down background tasks.
Remove a few `driver` parameters that are unused. This lets us
simplify the IAM Policy and Lua tests and not construct stores we
never use. (Which is good since we aren't running them under a cluster.)
Adam C. Emerson [Fri, 11 Jul 2025 18:57:02 +0000 (14:57 -0400)]
neorados/fifo: Rewrite as proper I/O object
Split nominal handle object and reference-counted
implementation. While we're at it, add lazy-open functionality.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 3097297dd39432d172d69454419fa83a908075f6) Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Thu, 26 Jun 2025 17:58:57 +0000 (13:58 -0400)]
{neorados,osdc}: Support subsystem cancellation
Tag operations with a subsystem so we can cancel them all in one go.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 2526eb573b789b33b7d9ebf1169491f13e2318bb) Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Conflicts:
src/rgw/driver/rados/rgw_service.cc
src/rgw/rgw_sal.cc
- `#ifdef`s for standalone Rados
src/rgw/driver/rados/rgw_datalog.cc
- Periodic re-run of recovery removed in main and pending backport
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Fri, 30 May 2025 20:54:45 +0000 (16:54 -0400)]
neorados: Hold reference to implementation across operations
Asynchrony combined with cancellations keeps leading to occasional
lifetime issues, so follow the best-practices of Asio I/O objects by
having completions keep a reference live.
The original NeoRados backing implements Asio's two-phase shutdown
properly.
The RadosClient backing does not, because it shares an Objecter with
completions that do not belong to it. In practice I don't think this
will matter since librados and neorados get shut down around the same
time.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 57c9723928b4d2b2148ca0dd4d505acdc071f8eb) Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Alex Ainscow [Fri, 27 Jun 2025 15:00:56 +0000 (16:00 +0100)]
osd: Deduplicate zeros in EC slice iterator
Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 06658fdac16dde95d20a8907511afb7fde7313da) Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
mgr/dashboard: Allow the user to re-use existing r
ealm/zg/zone and setup replication
1. Currently, we just allow the user to create a new realm/zg/zone and setup replication using the multi-site replication wizard. The ask is to allow the user to select the pre-existing realm/zg/zone and setup replication via automatic export and import of token as well.
2. Enable rgw module automatically in the selected cluster if its not
enabled
Update the "Disconnected+Remounted FS" section in
doc/cephfs/troubleshooting.rst, as suggested by Venky Shankar in https://github.com/ceph/ceph/pull/65129/files#r2312903062
according to `dpkg-buildflags`, ubuntu 24 raised this value to
`-D_FORTIFY_SOURCE=3` which causes `error: "_FORTIFY_SOURCE" redefined`
compilation failures because Ceph itself adds `-D_FORTIFY_SOURCE=2`
`_FORTIFY_SOURCE` is a hardening option. both our rpm and debian builds
already specify that via environment variables, so Ceph's cmake should
leave it alone
Dan Mick [Tue, 26 Aug 2025 00:45:21 +0000 (17:45 -0700)]
Remove git clean -fdx
either
1) a source tarball is supplied, in which case the local dir is
irrelevant, or
2) make-debs calls make-dist, which doesn't care about a dirty cwd
so it just punishes the unaware by removing things that they may
have wanted to keep.
Dan Mick [Sat, 23 Aug 2025 00:43:24 +0000 (17:43 -0700)]
make-debs.sh: invoke tar with --no-same-owner
When running as a normal user, tar does not attempt to preserve
owners set on the tar content files. When running as root, it does.
Containerized builds are running as root. Stop make-debs.sh from
trying to set other owners for files, and leaving files in the
host system with mapped UIDs other than the user running the container
(which causes jenkins to be unable to clear the workspace).
Dan Mick [Thu, 21 Aug 2025 20:00:43 +0000 (13:00 -0700)]
make-debs.sh: make "skip debug packages" conditional
Now that we're using make-debs.sh as a builder inside containers,
the default should be to build all the packages, including debug.
(Also, fix a typo.)
Afreen Misbah [Wed, 13 Aug 2025 06:49:02 +0000 (12:19 +0530)]
mgr/dashboard: Add /health/snapshot api
Fixes https://tracker.ceph.com/issues/72609
- The current minimal API relies on fetching data from osdmap and pgmap.
- These commands produce large, detailed payloads that become a performance bottleneck and impact scalability, especially in large clusters.
- To address this, we propose switching to the ceph snapshot API using ceph status command, which retrieves essential information directly from the cluster map.
- ceph status is significantly more lightweight compared to osdmap/pgmap, reducing payload sizes and processing overhead.
- This change ensures faster response times, improves system efficiency in large deployments, and minimizes unnecessary data transfer.
- update tests
- clipboard icon not displaying breaking several places
- cliboard icon on click gets filed primary green color losing the visibilty of icon. The icon now remain visible on click
- clipboard button for path and copy in tables on mouseover does not give `hand` but `cursor`. which was not ideal from a usability standpoint. This behavior has been updated to use the hand cursor making the interaction semantically correct and more intuitive for users.
Afreen Misbah [Mon, 11 Aug 2025 09:03:32 +0000 (14:33 +0530)]
mgr/dashboard: Replace capacity threshold data with prometheus metrics
- Fixes https://tracker.ceph.com/issues/72519
- the osd dump metrics is used in /api/osd/settings
- this metrics creates perf bottleneck when osds are 1000s
- replacing with similar prometheus metrics
- minor refactors - including renaming, comments.
Afreen Misbah [Wed, 6 Aug 2025 07:37:16 +0000 (13:07 +0530)]
mgr/dashboard: Stop rules api being polled on every page
- /rules ar epolled every 5 seconds on every page
- it is only required for alerts page where full rules list is shown in `Alerts` tab
- also added observable for getting rules instead of plain array
Niklas Hambüchen [Sat, 21 Jun 2025 17:46:13 +0000 (19:46 +0200)]
doc/rados/configuration: Mention show-with-defaults and ceph-conf
A small improvement based on
"Why is it still so difficult to just dump all config and where it comes from?"
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EZSLRYBYEWDA6YIARQVMUKQUWHAE3PGR/
`show-with-defaults` is very useful, and `ceph-conf` is mentioned
so that it's clear that it's legacy, and the user doesn't have to
wonder if it's actually useful but was forgotten in the list.
Zac Dover [Fri, 22 Aug 2025 08:39:29 +0000 (18:39 +1000)]
doc/cephfs: edit troubleshooting.rst (Slow MDS)
Move the "Slow requests (MDS)" section immediately after the first
section in this document ("Slow/Stuck Operations"), because the first
procedure on the page directs the reader to undertake the operation in
"Slow requests (MDS)" before trying anything else.
Ilya Dryomov [Thu, 21 Aug 2025 19:39:29 +0000 (21:39 +0200)]
mon/MonClient: post version request completions outside of monc_lock
dispatch() is allowed to invoke the completion object in the current
thread, before control returns from dispatch(). This isn't desirable
when it comes to discarding version requests in MonClient::shutdown()
and MonClient::_reopen_session() because completion objects could then
be invoked under monc_lock. In case of MonClient::_reopen_session() in
particular, this leads to an attempt to acquire monc_lock once again in
MonClient::get_version() on a retry due to monc_errc::session_reset
that is converted to errc::resource_unavailable_try_again:
MonClient::ms_handle_reset
< takes monc_lock >
MonClient::_reopen_session
< invokes the completion object via dispatch() with ec == monc_errc::session_reset >
Objecter::CB_Objecter_GetVersion::operator() [ ec == errc::resource_unavailable_try_again ]
Objecter::_wait_for_latest_osdmap
MonClient::get_version
< attempts to take monc_lock in the body of the lambda >
The end result is either a lockup or some form of undefined behavior.
The best possible outcome here is an exception (std::system_error with
"Resource deadlock avoided" error) and a successive call to
std::terminate().
This is a regression introduced in commit e81d4eae4e76 ("common/async:
Update `use_blocked` for newer asio"). Revert to posting version
request completions for the error cases in a way that is uniform with
the success case in MonClient::handle_get_version_reply().
Fixes: https://tracker.ceph.com/issues/72692 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit bd449f0ac823413a55069e3df9e163a4b4adbebd) Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Improve source rpm detection by adding a new detection method that
executes and rpm command in a container to get exactly the version of
the source rpm that the ceph.spec file would have generated. For
backwards compatibility and that I don't entirely trust myself to have
tested this the old methods are still available.
The old `--rpm-no-match-sha` is now an alias for `--srpm-match=any` to
cause it to build any (unique) ceph srpm it finds.
`--srpm-match=versionglob` retains the previous default behavior of
using a glob matching on the git id or ceph version value. The new
default of `--srpm-match=auto` implements the rpm command based behavior
described above.
All of this is wrapped in a new step `find-rpm` but that's mostly an
implementation detail and for testing.
Dan Mick [Wed, 13 Aug 2025 19:16:45 +0000 (12:16 -0700)]
pybind/mgr/dashboard/frontend: add NPM_CACHEDIR envvar, use in bwc
Add an optional NPM_CACHEDIR environment variable to serve as the
cache parameter for npm in the dashboard frontend build. The idea
is to allow it to persist across builds so that we decrease the load
on registry.npmjs.org, which has been throttling our requests when
using build-with-container.py, and also hopefully improve the time
of the frontend npm operations.
build-with-container.py also grows a --npm-cache-path option to allow
setting it for container builds and passing the envvar to the build.
John Mulligan [Wed, 21 May 2025 21:46:40 +0000 (17:46 -0400)]
dashboard: fix the workaround for unpacking node sources
My previous workaround in the dashboard for the unpacking of non-root
own tarball as the fake root of a container did not work because of the
strange quoting/escaping behavior of cmake (it tried to run `id -u` as a
single command, not a command and an argument).
Use single quoted string and old school backticks to work around this issue.
Fixes: 24dbfb5da4813c6588f9cd199b9f527bb67f1e88 Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 3a36180a373d91adcf9726660204f0cc1dcecba3)