Zac Dover [Mon, 3 Feb 2025 13:37:34 +0000 (23:37 +1000)]
doc/rados: improve pg_num/pgp_num info
Improve the guidance around setting pg_num, and clear up confusion
around whether pgp_num should be set manually or, indeed, if it even can
be set manually.
This PR was raised in response to Mark Schouten's email here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/CBDJTLTTIEZVG7GVZBX37UAWGYNSSMPD/
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit c43e7337212fe38e8db63d00345fa9858b3cb10a)
Patrick Donnelly [Fri, 28 Feb 2025 00:29:26 +0000 (19:29 -0500)]
Merge PR #57190 into reef
* refs/pull/57190/head:
pybind/mgr/mgr_module: turn off all automatic transactions
pybind/mgr: disable sqlite3/python autocommit
qa/tasks/mgr: add tests for sqlite autocommit
qa/tasks/vstart_runner: run daemons in foreground
qa/tasks/vstart_runner: add missing poll method
qa/suites/rados/mgr: add cli/devicehealth tasks
qa: reorganize mgr unit tests
qa: use position-independent link
qa: add missing terminating newline
pybind/mgr: add killpoint for sqlite3 database setup
mgr: allow specifying module option level
mon/MgrMonitor: promote standby when unsetting down flag
mon/MgrMonitor: only drop active if exists
Patrick Donnelly [Wed, 12 Feb 2025 02:28:40 +0000 (21:28 -0500)]
pybind/mgr/mgr_module: turn off all automatic transactions
I misunderstood autocommit=False in prior patches. The sqlite3 binding will
still create transactions automatically which confused newer bindings using
autocommit.
So, turn off automatic transaction management completely to maintain backwards
compatibility.
Fixes: https://tracker.ceph.com/issues/69912 Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit df49652987019d5eeec31c86332d8e69995d931a)
Naman Munet [Wed, 22 Jan 2025 10:59:20 +0000 (16:29 +0530)]
mgr/dashboard: Add confirmation textbox for resource name on delete action
Before:
=====
User was able to delete a single or multiple critical resources like ( images, snapshots, subvolumes, subvolume-groups, pools, hosts , OSDs, buckets, file system, services ) by just clicking on a checkbox.
After:
=====
User now has to type the resource name that they are deleting in the textbox on the delete modal, and then only they will be able to delete the critical resource.
Also from now onwards multiple selection for deletions of critical resources is not possible. Hence, user can delete only single resource at a time. On the other side, non-critical resources can be deleted in one go.
Ilya Dryomov [Tue, 18 Feb 2025 16:51:47 +0000 (17:51 +0100)]
test/rbd_mirror: clear Namespace::s_instance at the end of a test
TestMockPoolReplayer.Namespaces and NamespacesError tests leave behind
a dangling pointer to a stack-allocated MockNamespace which leads to an
easily reproducible use-after-free and segfault when tests are shuffled.
Ilya Dryomov [Mon, 17 Feb 2025 11:41:51 +0000 (12:41 +0100)]
test/rbd_mirror: flush watch/notify callbacks in TestImageReplayer
TestImageReplayer establishes its own (i.e. outside of the SUT code)
watch on the header of the remote image to be able to synchronize the
execution of the test with certain notifications. This watch is
established before the remote image is opened and is teared down until
after the remote image is closed but while the image replayer is still
running. The flush that is part of image close sequence thus isn't
guaranteed to cover all callbacks, especially for snapshot-based
mirroring where UnlinkPeerRequest spawned from Replayer::unlink_peer()
generates a notification on the remote image for each completed unlink.
Since TestImageReplayer further immediately deletes C_WatchCtx, pretty
much any test can segfault when C_WatchCtx::handle_notify() is invoked
by TestWatchNotify infrastructure. Because it's a virtual method, the
segfault often involves a completely bogus instruction pointer:
John Mulligan [Thu, 27 Jul 2023 18:17:36 +0000 (14:17 -0400)]
python-common: fix valid_addr on python 3.11
The behavior on python 3.11 regarding IPv4 addresses in bracket has
changed:
```
$ python3.8 -c 'from urllib.parse import urlparse; urlparse("http://[192.168.0.1]")'
[john@edfu ~]$ python3.11 -c 'from urllib.parse import urlparse; urlparse("http://[192.168.0.1]")'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib64/python3.11/urllib/parse.py", line 395, in urlparse
splitresult = urlsplit(url, scheme, allow_fragments)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/urllib/parse.py", line 500, in urlsplit
_check_bracketed_host(bracketed_host)
File "/usr/lib64/python3.11/urllib/parse.py", line 448, in
_check_bracketed_host
raise ValueError(f"An IPv4 address cannot be in brackets")
ValueError: An IPv4 address cannot be in brackets
```
This breaks the test in test_valid_addr that asserts that function
valid_addr returns the string "IPv4 address wrapped in brackets is
invalid".
Move the step that checks for brackets and dots above the urllib
check so that the function continues returning the expected string.
Adam King [Wed, 12 Feb 2025 16:32:24 +0000 (11:32 -0500)]
mgr/cephadm: use double quotes for NFSv4 RecoveryBackend in ganesha conf
This came directly from someone on the ganesha team. We've actually had
this use single quotes for a long time (at least since mid 2020) but I
believe recent feature work on the ganesha side exposed the issue
Adam King [Thu, 30 Jan 2025 14:15:37 +0000 (09:15 -0500)]
mgr/cephadm: create OSD daemon deploy specs through make_daemon_spec
That function handles setting up the extra container/entrypoint
args for the daemon during initial deployment. Having the
CephadmDaemonDeploySpec made directly in the OSD deployment
workflow means initial deployments of OSDs won't have the
extra container/entrypoint args from the spec
Michal Nasiadka [Wed, 11 Sep 2024 12:26:37 +0000 (14:26 +0200)]
cephadm: Support Docker Live Restore
Currently with Docker Live Restore [1] enabled and while restarting
Docker Engine - all Ceph container images will get restarted,
while the feature allows restarting docker.service without
containers downtime.
This is due to Requires=docker.service in systemd units templates,
which mandates that on docker.service restart - the ceph container
systemd units will be restarted as well.
Reworking Requires= to Wants= that is a weaker version of the former,
see [2].
Leaving After= entries, because they should allow systemd to correctly
order the startup (first docker, then ceph containers).
orch: refactor boolean handling in drive group spec
The intent of 42721c03ee6f was to address an issue where boolean
parameters weren't handled correctly.
I noticed that a parameter (`tpm2`) was missed, which made me realize
that maintaining a list of these boolean parameters is necessary.
To simplify things, we should only accept `"true"` or `"false"` (in any case),
allowing us to avoid the need to maintain a list of boolean parameters.
This change introduces a `list_drive_group_spec_bool_arg` to store boolean
arguments related to drive group specifications, simplifying the validation
process for boolean values by directly checking if the values are 'true' or 'false'.
Adam King [Wed, 5 Feb 2025 22:00:06 +0000 (17:00 -0500)]
mgr/cephadm: fix typo with vrrp_interfaces in keepalive setup
This was intended to be vrrp_interfaces, the variable actually
used later in the code. Instead, due to a typo, it was setting
a variable that is unused other than in the log line right after
it is set. Issue was introduced in
https://github.com/ceph/ceph/commit/58ddc4e20f7cead1f2594241450f4beb5230c746
Improve the grammar and correct the formatting of the "Upgrading root ca
certificates" procedure that was added to the documentation in https://github.com/ceph/ceph/pull/61867
John Mulligan [Tue, 20 Aug 2024 19:01:05 +0000 (15:01 -0400)]
src/script: add a script to help build ceph using containers
The build-with-container script tries to encapsulate nearly all major
build tasks using docker/podman containers. If there's no build image
locally it will create one for your. It provides targets for building
(make), testing (make check), building rpm packages or deb packages and
is designed to be fairly easily extended.
View the comment at the top of the source file for usage details.
John Mulligan [Tue, 20 Aug 2024 19:00:57 +0000 (15:00 -0400)]
build: add files needed to create a build container
A build container contains all the tools and dependencies needed to
build ceph. It provides a Container file and small script that
helps bootstrap the container setup. This script installs a few extra
things we need before farming most of the work out to install-deps.sh.
John Mulligan [Sat, 14 Sep 2024 10:31:23 +0000 (06:31 -0400)]
build: small script tweak to allow different build dirs
Move the mkdir line to allow for other builds dir naming schemes outside
of what appears in the .gitignore file. A tiny bit of added flexibility
at little cost.
John Mulligan [Mon, 14 Nov 2022 15:57:25 +0000 (10:57 -0500)]
src/script: add helper function has_build_dir
This function returns successfully if $BUILD_DIR exists and is valid.
This is a useful building block for automation around the build and
can be used to avoid re-running commands that fail is the build dir
exists already.
John Mulligan [Tue, 1 Nov 2022 18:58:16 +0000 (14:58 -0400)]
script: add gcc-toolset-11 support to discover_compiler
In order to configure, build, and run tests in a CentOS 8 (or similar)
container we need a functioning gcc-toolset compiler. This relies on
the environment script being sourced as well. cmake does not appear
to be able to discover this compiler own its own.
John Mulligan [Tue, 1 Nov 2022 18:51:57 +0000 (14:51 -0400)]
script: add discover_compiler function to lib-build.sh
The discover_compiler function is an abstraction over the current
compiler detection code in run-make.sh. It is intended to be flexible
enough to work on {centos,rhel} systems, but currently is just an
updated version of the logic from run-make.sh. The intent is that this
function will grow and become useful for other scripts used for
building (possibly do_cmake.sh for example).
John Mulligan [Tue, 1 Nov 2022 13:57:16 +0000 (09:57 -0400)]
script: move get_processors to lib-build.sh
This function can be more useful because the NPROC value can
be provided regardless of how many cores nproc actually detects
and may be handy in a restricted environment like a container.
The new version quotes some values and uses $((...)) as per shellcheck
warning that "expr is antiquated".