Nizamudeen A [Thu, 6 Nov 2025 04:53:47 +0000 (10:23 +0530)]
mgr/dashboard: start node virtual-env after starting ceph cluster
in frontend e2e.sh file, we don't need to start the node venv early on
before the ceph cluster is started. we only need it for the `npm` or
`npx` commands. Starting node virtual env and then starting ceph will
cause the ceph cluster to assume the node-env python as the python
environment which breaks the cryptotools call.
So moving the node-env venv start after the ceph is created
John Mulligan [Fri, 25 Apr 2025 15:22:26 +0000 (11:22 -0400)]
mgr/dashboard: add an option to control the dashboard crypto caller
Add a mgr config option `crypto_caller` that lets a ceph user override
the default behavior of using the remote crypto caller. Supported
values are `internal` and `remote`.
John Mulligan [Fri, 25 Apr 2025 15:06:41 +0000 (11:06 -0400)]
mgr/cephadm: always use the internal cryptocaller
The cephadm modules needs to use python cryptography module for ssh (via
asyncssh) and thus there's no need to use the remote crypto caller in
cephadm. Configure cephadm to always use the internal cryptocaller.
John Mulligan [Fri, 25 Apr 2025 15:05:46 +0000 (11:05 -0400)]
python-common/cryptotools: catch all failures to read cert
Previously, the internal crypto caller would catch (and convert) some
errors when reading the cert but not all cases. Move the logic to catch
the errors to a common location and do it once consistently.
John Mulligan [Thu, 24 Apr 2025 18:36:58 +0000 (14:36 -0400)]
python-common/cryptotools: unify and organize all endpoint functions
Lightly reorganize and make the "endpoint" functions in cryptotools.py more
consistent and uniform. Use small functions for input and output
handling so that the handling is done the same way throughout. Pass a
pre-constructed crypto caller via the args to then endpoint functions.
Make generating the private key it's own named function rather than
one single (and only) function with overloaded behavior controlled by
a cli switch.
John Mulligan [Wed, 23 Apr 2025 15:23:43 +0000 (11:23 -0400)]
pybind/mgr: fix test case in test_tls.py
Why violate the typing in a test? mypy never noticed this because tests
are not type checked but there seems to be no need to turn a str into
bytes to pass to a function that is typed only as taking str!
John Mulligan [Wed, 23 Apr 2025 15:25:07 +0000 (11:25 -0400)]
python-common/cryptotools: fix error path in verify tls function
The remote verify_tls function was not raising errors when it should.
Fix the function so that it always returns an object when it succeeds or
fails gracefully. Always parse that function in the crypto caller class.
John Mulligan [Wed, 16 Apr 2025 18:56:28 +0000 (14:56 -0400)]
python-common/cryptotools: create CrytpoCaller interface class
Create a class to act as a common shim between the cryptotools external
functions and the mgr. It provides common conversion mechanisms and
could possibly act as an abstraction in case we decide to make
the external function calls in different ways in the future.
Paulo E. Castro [Sat, 5 Apr 2025 20:47:55 +0000 (21:47 +0100)]
pybind/mgr: Hack around the 'ImportError: PyO3 modules may only be initialized once per interpreter process' issue.
Fixes: https://tracker.ceph.com/issues/64213 Signed-off-by: Paulo E. Castro <pecastro@wormholenet.com>
(cherry picked from commit 717d0a6f3530ad3e07f4423002810327b2addcf1)
Conflicts:
src/pybind/mgr/mgr_util.py
- accepted the incoming changes
Kefu Chai [Sat, 3 Feb 2024 13:09:27 +0000 (21:09 +0800)]
mgr: add site package paths in PyModuleRegistry
before this change, we add the paths of site packages to sys.path
when starting subinterpretors for each of the mgr modules. this
works just fine. but in Python 3.11, it deprecates `PySys_SetPath()`
in favor of PyConfig machinary, which sets the module search paths
in PyConfig, before calling `Py_InitializeFromConfig()`. so, to
set the module search paths with the new machinary, we need to do
this in `PyModuleRegistry`, where we initialize the global Python
interpretor using the new PyConfig machinary. and since we've
switched to the new PyConfig machinary when compiling with Python 3.8
and up.
in this change, we set the module search paths in PyModuleRegistry.
because PyConfig imports the site packages by default, and we are
allowed to append a new path to the existing search paths, we just
append the configured `mgr_module_path`.
this change should silence the compiling warning like:
Kefu Chai [Sat, 3 Feb 2024 11:49:13 +0000 (19:49 +0800)]
mgr: set argv for python in PyModuleRegistry
before this change, we setup the progname for Python interpreter,
but setup the argv for it in PyModule. and we are using deprecated
API to initialize Python interpreter.
in this change, let's do this in a single place for better
maintainability. also, take this opportunity, to use the non-deprecated
API to initialize interpreter on Python >= 3.8.
this silence the warning when compiling ceph-mgr with CPython 3.12:
```
/var/ssd/ceph/src/mgr/PyModule.cc: In member function ‘int PyModule::load(PyThreadState*)’:
/var/ssd/ceph/src/mgr/PyModule.cc:363:20: warning: ‘void PySys_SetArgv(int, wchar_t**)’ is deprecated [-Wdeprecated-declarations]
363 | PySys_SetArgv(1, (wchar_t**)argv);
| ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/python3.12/Python.h:96,
from /var/ssd/ceph/src/mgr/BaseMgrModule.h:4,
from /var/ssd/ceph/src/mgr/PyModule.cc:14:
/usr/include/python3.12/sysmodule.h:13:38: note: declared here
13 | Py_DEPRECATED(3.11) PyAPI_FUNC(void) PySys_SetArgv(int, wchar_t **);
| ^~~~~~~~~~~~~
```
Kefu Chai [Sat, 3 Feb 2024 11:22:15 +0000 (19:22 +0800)]
mgr: stop using deprecated API to initialize Python
Py_SetProgramName() is deprecated since CPython 3.11, see
https://docs.python.org/3/c-api/init_config.html .
`Py_InitializeFromConfig()` and friends were introduced by CPython 3.8,
but we still need to support CPython 3.6 which is shipped by CentOS8.
so we have to be backward compatible with the older Python versions.
so let's use new machinary to initialize the Python interpretor, since
the minimal supported Python version is now CPython 3.9 which comes with
CentOS 9.
this change addresses following compiling warning:
```
[428/753] Building CXX object src/mgr/CMakeFiles/ceph-mgr.dir/PyModuleRegistry.cc.o
/var/ssd/ceph/src/mgr/PyModuleRegistry.cc: In member function ‘void PyModuleRegistry::init()’:
/var/ssd/ceph/src/mgr/PyModuleRegistry.cc:49:20: warning: ‘void Py_SetProgramName(const wchar_t*)’ is deprecated [-Wdeprecated-declarations]
49 | Py_SetProgramName(const_cast<wchar_t*>(WCHAR(MGR_PYTHON_EXECUTABLE)));
| ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/python3.12/Python.h:94,
from /var/ssd/ceph/src/mgr/PyModule.h:22,
from /var/ssd/ceph/src/mgr/PyModuleRegistry.h:18,
from /var/ssd/ceph/src/mgr/PyModuleRegistry.cc:14:
/usr/include/python3.12/pylifecycle.h:37:38: note: declared here
37 | Py_DEPRECATED(3.11) PyAPI_FUNC(void) Py_SetProgramName(const wchar_t *);
| ^~~~~~~~~~~~~~~~~`
```
Kefu Chai [Sat, 22 Jun 2024 01:10:10 +0000 (09:10 +0800)]
cmake: bump up required Python3 to 3.9
since we've dropped the support of CentOS 8 in favor of CentOS 9, and
the minmum Python3 version used by the suppored distros are 3.9. let's
bump up the Python3 version to 3.9. as we are going to remove the code
for older versions like Python 3.6 and 3.8 backward compatibility.
Kefu Chai [Fri, 14 Feb 2025 10:53:58 +0000 (18:53 +0800)]
do_cmake: use Python 3.12 on ubuntu >= 24
the "official" Python shipped along with Ubuntu 24.04 (Noble Numbat) is
Python 3.12. And some of our building have been upgraded to Ubuntu
24.04. But we are still using Python 3.10 on Ubuntu >= 22, this breaks
the build. And CMake fails like:
```
CMake Error at /usr/share/cmake-3.28/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find Python3 (missing: Python3_EXECUTABLE Python3_INCLUDE_DIRS
Python3_LIBRARIES Interpreter Development Development.Module
Development.Embed) (Required is exact version "3.10")
Reason given by package:
Interpreter: Wrong version for the interpreter "/bin/python3"
Kyr Shatskyy [Fri, 21 Nov 2025 21:20:04 +0000 (22:20 +0100)]
qa/workunits/rgw: drop netstat usage
The `netstat` is deprecated now in modern Linux and usually
requires an extra package dependency to be installed.
Usually it is `net-tools`, however, for example, opensuse,
`netstat` does not present in it. Thus, let us use `ss` as
an alternative.
When using `netstat -nltp` we get lines like:
'tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 25156/valgrind.bin \ntcp6 0 0 :::443 :::* LISTEN 25156/valgrind.bin \n'
When using `ss -nltp` we get lines like:
'LISTEN 0 4096 0.0.0.0:443 0.0.0.0:* users:(("memcheck-amd64-",pid=66045,fd=72))'
so we need to filter processes by `memcheck`. However further
parsing code works equivalently as for netstat.
Ilya Dryomov [Fri, 23 Jan 2026 13:48:53 +0000 (14:48 +0100)]
qa: don't assume that /dev/sda or /dev/vda is present in unmap.t
Instead of hard-coding the block device name, use the block device that
is backing the filesystem that the test is running on. We can be quite
sure it won't be an RBD device ;)
Ilya Dryomov [Wed, 21 Jan 2026 18:41:41 +0000 (19:41 +0100)]
qa: krbd_blkroset.t: eliminate a race in the open_count test
Even at QD=1, dd may take less than 10 seconds to work its way to the
end of a 10M image, producing "No space left on device" error instead
of the expected "Operation not permitted" error which is supposed to
arise from the device getting marked read-only while opened.
Ville Ojamo [Mon, 19 Jan 2026 13:06:46 +0000 (20:06 +0700)]
doc/cephadm: remove sections not apply to Squid in rgw.rst
4949311 backported changes that do not apply to Squid.
PR #63073 body and the commit referenced therein as cherry-pick do not
correspond to the diff. Remove the additions that do not apply to Squid:
- Wildcard SAN feature in 3c24753 only since Tentacle.
- Shutdown delay feature in b84bb72 only since Tentacle.
The third feature doc addition is valid, d620ba6 was backported to Squid
in PR #61350 for disable multisite sync traffic, commit 59b3f28. This
backport cherry-picked only the feature addition and missed the docs
commit 8878619. Leave this section in.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Kefu Chai [Wed, 24 Dec 2025 05:55:26 +0000 (13:55 +0800)]
debian/control: add iproute2 to build dependencies
Test scripts like qa/tasks/cephfs/mount.py expect the ip command to be
available in the container environment. Without it, tests fail with:
```
/bin/bash: line 1: ip: command not found
File "/ceph/qa/tasks/cephfs/mount.py", line 96, in cleanup_stale_netnses_and_bridge
p = remote.run(args=['ip', 'netns', 'list'],
...
teuthology.exceptions.CommandFailedError: Command failed with status 127: 'ip netns list'
```
Add iproute2 to the debian package build dependencies when the
<pkg.ceph.check> build profile is enabled. This ensures the package is
available during container-based builds, since buildcontainer-setup.sh
→ script/run-make.sh → install-deps.sh → debian/control → generated
dependency package chain respects build profiles configured via
`FOR_MAKE_CHECK` and `WITH_CRIMSON` environment variables set in
Dockerfile.build.
David Galloway [Tue, 16 Dec 2025 22:08:00 +0000 (17:08 -0500)]
install-deps: Replace apt-mirror
apt-mirror.front.sepia.ceph.com has happened to always work because we set up CNAMEs to gitbuilder.ceph.com.
That host is making its way to a new home upstate (literally and figuratively) so we'll get rid of the front subdomain since it's publicly accessible anyway and add TLS while we're at it.
Adam Kupczyk [Tue, 15 Apr 2025 08:37:25 +0000 (08:37 +0000)]
os/bluestore: Fix dirty_range in BlueStore::_do_remove
dirty_range used to have length = 1 byte.
This is good if whole extent is inside shard.
But this has proven not to be the case.
dirty_range(offset, length) is slower only when it crosses shard.
Rishabh Dave [Tue, 18 Feb 2025 12:30:03 +0000 (18:00 +0530)]
qa/cephfs: ignore warning that pg is stuck peering for upgrade jobs
Health warning "pg .* is stuck peering" is seen while Ceph cluster is
under the upgrade process during fs/upgrade QA job. Being an expected
warning, it should be added to the ignorelist.
And besides this one, we already ignore more severe warnings ("pg is
stuck inactive" and "pg is degrarded") for fs/upgrade jobs.
Fixes: https://tracker.ceph.com/issues/70023 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 9748de76e02254c6dc284dcc20ec5d5761760dcb)
Conflicts:
qa/cephfs/overrides/pg_health.yaml
- Line before the point where the patch was to be applied is different
comapred to main branch.
In statfs, when the quota root for a dir is discovered,
it uses that dir to base values for max_files and max_bytes.
This can be an issue when a dir is found with only one of two potential quota
fields. Take for instance, a dir with only max_files set and parent dir
has only max_bytes set. During a statfs call, it will then use the max_files
value for provided dir, but does not have a value for max_bytes. In this case,
this behavior will cause the size of the filesystem to be displayed.
Instead, find the quota root for max_files and max_bytes separately. This will
allow for mixed quotas to inherit missing values from its parent. In the above
example, max_files from current dir and max_bytes from parent dir will be
displayed.
Fixes: https://tracker.ceph.com/issues/73487 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit dd02ea9b18502b87ce815eba4286ae3516e334b3)
In cases where there is a single element in a batch_op_map,new_batch_head
is a nullptr, when this is retried at Finisher we'd hit one of the asserts when
dereferencing
In SingletonClient::init(), objecter->start() called before
monc->authenticate(), it makes conns of monc authencated before
monc->authenticate() called if mons reply faster, in this case,
monc will not subsribe monmap/config.
Naveen Naidu [Thu, 5 Dec 2024 04:21:21 +0000 (09:51 +0530)]
qa/workunit: update telemetry quincy/reef workunits with "basic_stretch_cluster" collection
Note, this is not a clean cherry pick. The 4dac20e updated the
`test_telemetry_reef_x.sh` and `test_telemetry_squid_x.sh` upgrade
workunits. These upgrade workunits test the upgrade of a cluster from
reef and squid (X-2) releases to the X version of cluster.
Since we are cherry picking the commit to squid (X release), we would
instead have to update the workunit files of quicy and reef i,e the
(X-2) releases.
DanWritesCode [Mon, 18 Dec 2023 21:09:07 +0000 (16:09 -0500)]
osd: add clear_shards_repaired command
This command will allow us to clear the OSD_TOO_MANY_REPAIRS alert
by setting the shard repair count to 0. This will help in cases where
the alert was a false positive, or a condition that has since cleared
at the disk level. Often, zeroing out the repair count is
better than muting the alert or restarting the OSD.
Fixes: https://tracker.ceph.com/issues/54182 Co-authored-by: David Zafman <dzafman@redhat.com> Signed-off-by: Daniel Radjenovic <dradjenovic@digitalocean.com>
(cherry picked from commit 78d6bfe54c3b9b60fab36a640b1ce77c8f022fa9)
mds: client is evicted when an export subtree task is interrupted
The importer will force open some sessions provided by the exporter but the client does not know about
the new sessions until the exporter notifies it, and the notifications cannot be sent if the exporter
is interrupted. The client does not renew the sessions regularly that it does not know about, so the client
will be evicted by the importer after `session_autoclose` seconds (300 seconds by default).
The sessions that are forced opened in the importer need to be closed when the import process is reversed.
Zhansong Gao [Fri, 26 May 2023 04:20:17 +0000 (12:20 +0800)]
mds: session in the importing state cannot be cleared if an export subtree task is interrupted while the state of importer is acking
The related sessions in the importer are in the importing state(`Session::is_importing` return true) when the state of importer is `acking`,
`Migrator::import_reverse` called by `MDCache::handle_resolve` should reverse the process to clear the importing state if the exporter restarts
at this time, but it doesn't do that actually because of its bug. And it will cause these sessions to not be cleared when the client is
unmounted(evicted or timeout) until the mds is restarted.
The bug in `import_reverse` is that it contains the code to handle state `IMPORT_ACKING` but it will never be executed because
the state is modified to `IMPORT_ABORTING` at the beginning. Move `stat.state = IMPORT_ABORTING` to the end of import_reverse
so that it can handle the state `IMPORT_ACKING`.
Casey Bodley [Fri, 3 Oct 2025 16:24:18 +0000 (12:24 -0400)]
rgw: fix 'bucket rm --bypass-gc' for copied objects
the `--bypass-gc` argument to `radosgw-admin bucket rm` causes us to
call `RadosBucket::remove_bypass_gc()`, which loops over the tail
objects and removes each with `RGWRados::delete_raw_obj_aio()`
however, this was removing the objects with `cls_rgw_remove_obj()`,
which is for head objects, not tails. tail objects must be removed with
`cls_refcount_put()`, which preserves them until the last copy is
removed
rename `delete_raw_obj_aio()` to `delete_tail_obj_aio()` to clarify its
purpose
Nitzan Mordechai [Wed, 22 Oct 2025 05:41:56 +0000 (05:41 +0000)]
tasks/cbt_performance: Tolerate exceptions during performance data updates
If an exception occurs during the POST request to update CBT performance,
log the error instead of failing the entire job. This ensures that
intermittent update failures do not block the main workflow.
The unlink subcommand did not handle unsharded bucket indices
appropriately. These are when the number of shards listed in the
bucket instance object is 0. In that case there will actually be 1
shard.
When number of shards as 0 is passed into the function that maps
object names to shards, it returns -1. And that was not handled
properly. That is now fixed.