mon/MonClient: wipe secrets and invalidate tickets on auth epoch change
* This causes service daemons to drop all known service tickets and request new
ones from the auth server.
* This causes the clients (and service daemons) to request new tickets from the
auth server which will include tickets signed with the new service keys.
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
This will be used to indicate to clients / service daemons that the auth
service keys have been rotated. Clients and service daemons are expected to
invalidate their tickets and reauth. Service daemons should wipe their service
keys.
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Patrick Donnelly [Wed, 26 Mar 2025 01:59:34 +0000 (21:59 -0400)]
mon/AuthMonitor: add dump-keys and wipe-rotating-service-keys
`auth dump-keys` allows examining the key types for each entity and also the
rotating session keys. This lets us confirm key upgrades are done as expected.
`wipe-rotating-service-keys` clears out existing non-auth service keys so that we do not
need to wait for the rotating key expiration. It is not disruptive so long as clients
renew their tickets when prompted by the auth epoch change.
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Matan Breizman [Mon, 9 Jun 2025 12:07:49 +0000 (12:07 +0000)]
include/common_fwd: Include Crypto classes
CryptoManager::cct is now used in CephContext ctor. To provide this
defintion
any ceph_context.cc target must also include Crypto.cc.
crimson-alien-common library which only had ceph_context.cc must now
also include Crypto.cc.
However, the fact that crimson-common also includes Crypto.cc would
cause multiple defintions
to any Crypto classes methods.
To resolve this, let's wrap all Crypto classes with TOPNSPC::common that
would be forwarded using common_fwd logic.
Yehuda Sadeh [Wed, 28 May 2025 19:51:19 +0000 (15:51 -0400)]
cephx: sign messages using hmac_sha256
if key type is newer than the original AES, calculate message
hash by using HMAC-SHA256.
We cannot use plain aes256k like we do with the aes key because
of the confounder. The other option would be to inject a
confounder, but that would weaken the cipher.
Yehuda Sadeh [Fri, 7 Mar 2025 18:20:58 +0000 (13:20 -0500)]
auth: add a configurable to control rotating keys cipher type
auth_service_cipher: a mon configurable that determines what type of cipher
the rotating keys are using. The configurable can change at runtime. Note
that the change does not invalidate existing keys, these would expire
based on their ttl.
Yehuda Sadeh [Thu, 27 Feb 2025 21:14:06 +0000 (16:14 -0500)]
auth/cephx: modify client + server challenges hashing
This applies when using ciphers that are not the original
AES-128 one. Use the hmac-sha256 hash now. With AES256KRB5
the original method of encrypting the combined challenges
doesn't work as the confounder randomizes the result.
Yehuda Sadeh [Thu, 27 Feb 2025 16:55:37 +0000 (11:55 -0500)]
ceph-authtool: support --key-type param
Also move the encryption handlers out of the ceph_context.
Handlers are now returned as a shared_ptr, to support the
creation of new handlers with different params (such as
the usage param).
Ilya Dryomov [Fri, 23 Jan 2026 13:48:53 +0000 (14:48 +0100)]
qa: don't assume that /dev/sda or /dev/vda is present in unmap.t
Instead of hard-coding the block device name, use the block device that
is backing the filesystem that the test is running on. We can be quite
sure it won't be an RBD device ;)
Disable OSD bench from benchmarking the OSDs for teuthology tests. This is to
help prevent a cluster warning pertaining to the IOPS value not lying within
a typical threshold range from being raised.
The tests can rely on the built-in static values as defined by
osd_mclock_max_capacity_iops_[ssd|hdd] which should be good enough.
Ilya Dryomov [Wed, 21 Jan 2026 18:41:41 +0000 (19:41 +0100)]
qa: krbd_blkroset.t: eliminate a race in the open_count test
Even at QD=1, dd may take less than 10 seconds to work its way to the
end of a 10M image, producing "No space left on device" error instead
of the expected "Operation not permitted" error which is supposed to
arise from the device getting marked read-only while opened.
With https://github.com/ceph/ceph-build/pull/2497 merged we no loger
build Tentacle+Crimson regularly. As Crimson no longer backport changes
into Tentacle, there's no reason to keep testing it.
Matan Breizman [Thu, 22 Jan 2026 10:00:25 +0000 (12:00 +0200)]
container/build.sh: Use dedicated debug tags
https://github.com/ceph/ceph-build/pull/2497 introduced a debug flavor.
This seems to cause conflicts with the image being pushed to quay as one
of the flavors might override the other.
Tag debug build containers explicitly.
Alternative solution would be to skip debug containers all together.
However. these might be useful for development purposes.
Note, prune-quay might also need to be updated once this is merged.
Kefu Chai [Thu, 22 Jan 2026 03:57:37 +0000 (11:57 +0800)]
cmake: fix undefined PY_LDFLAGS in distutils_install_cython_module
The distutils_install_cython_module() function was using ${PY_LDFLAGS}
without defining it, causing the linker to fail with:
/opt/rh/gcc-toolset-13/root/usr/libexec/gcc/x86_64-redhat-linux/13/ld:
cannot find -lrados: No such file or directory
This bug was introduced in commit d22734f6cb0 which changed:
set(ENV{LDFLAGS} "-L${CMAKE_LIBRARY_OUTPUT_DIRECTORY}")
to:
set(ENV{LDFLAGS} "${PY_LDFLAGS}")
However, PY_LDFLAGS was only defined in distutils_add_cython_module(),
not in distutils_install_cython_module(). This meant that during the
install phase, LDFLAGS was set to an empty string, and the linker
couldn't find librados.so and other Ceph libraries in the build
directory.
The bug was exposed by commit 719b74984605b490f23004eb41583a22c934c5fb
which changed rados.pxd to use C preprocessor conditionals (#ifdef
BUILD_DOC) instead of Cython's compile-time IF statements. This meant
the build now required proper linking during the install phase.
Fix by defining PY_LDFLAGS in distutils_install_cython_module():
Ville Ojamo [Fri, 16 Jan 2026 09:43:31 +0000 (16:43 +0700)]
doc/radosgw: change all intra-docs links to use ref (2 of 6)
Part 2 of 6 to make backporting easier. Depends on part 1.
Use the the ref role for all remaining links in doc/radosgw/ with the
exception of config-ref.rst which will depend on changes to rgw.yaml.in.
The external link definitions syntax being removed is intended for
linking to external websites and not for intra-docs links. Validity of
ref links will be checked during the docs build process.
Add labels for links targets if necessary.
Remove unused external link definitions in the modified files.
Use confval instead of literal text for 2 configuration keys in
vault.rst.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Ville Ojamo [Fri, 16 Jan 2026 08:55:27 +0000 (15:55 +0700)]
doc/radosgw: change all intra-docs links to use ref (1 of 6)
Part 1 of 6 to make backporting easier. Many of the following parts
depend on this.
Use the the ref role for all remaining links in doc/radosgw/ with the
exception of config-ref.rst which will depend on changes to rgw.yaml.in.
The external link definitions syntax being removed is intended for
linking to external websites and not for intra-docs links. Validity of
ref links will be checked during the docs build process.
Add labels for links targets if necessary.
Remove unused external link definitions in the modified files.
Use confval instead of literal text for 2 configuration keys in
vault.rst.
Use Ceph Object Gateway consistently in multisite.rst.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Ronen Friedman [Wed, 21 Jan 2026 12:37:24 +0000 (14:37 +0200)]
Merge pull request #66626 from ronen-fr/wip-rf-aborthp-justdoc
doc/ceph.rst: scrub-related 'tell pgid' commands
Related to https://github.com/ceph/ceph/pull/66515 Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com> Reviewed-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Afreen Misbah [Tue, 13 Jan 2026 20:47:40 +0000 (02:17 +0530)]
mgr/dashboard: Add productive card component
- add generic productive card component
- based on carbon design system
- there are two versions of card - with shadow(tinted affect) and without.
- applies gray10 theme which is decided by new designs.
Fix on_operator_abort_scrub() to handle the case where
the operator-initiated abort request arrives while the
'start scrub' message is still in the queue (i.e. -
is_queued_or_active() is true, but is_scrub_active()
is false).
Unlike our handling of, for example, FullReset in
PrimaryIdle::clear_state(), here we choose to ignore
the request:
Considering the added complexity to the FSM versus
the minimal benefit, it is better to just ignore this
very rare case, leaving it to the operator to re-issue
the abort command if needed.
Ronen Friedman [Thu, 4 Dec 2025 14:49:29 +0000 (08:49 -0600)]
osd/scrub: support an operator-abort command
The new explicit command aborts any ongoing scrub of the target PG,
including operator-initiated scrubs. That additional capability is needed now that
operator-initiated scrubs are no longer blocked by 'no-scrub' settings.
The scenario we are trying to help the operator with is:
- an operator issues a set of operator-initiated scrubs (e.g., via a
script), then realizes the mistake and wants to abort them all.
The abort command also downgrades the urgency level of the scrub target
(as otherwise the target would immediately restart, against the operator
wishes).
This commit implements the changes to the state machine and to the abort
logic, assuming the operator command was translated into an event.