Improve the grammar and correct the formatting of the "Upgrading root ca
certificates" procedure that was added to the documentation in https://github.com/ceph/ceph/pull/61867
John Mulligan [Tue, 20 Aug 2024 19:01:05 +0000 (15:01 -0400)]
src/script: add a script to help build ceph using containers
The build-with-container script tries to encapsulate nearly all major
build tasks using docker/podman containers. If there's no build image
locally it will create one for your. It provides targets for building
(make), testing (make check), building rpm packages or deb packages and
is designed to be fairly easily extended.
View the comment at the top of the source file for usage details.
John Mulligan [Tue, 20 Aug 2024 19:00:57 +0000 (15:00 -0400)]
build: add files needed to create a build container
A build container contains all the tools and dependencies needed to
build ceph. It provides a Container file and small script that
helps bootstrap the container setup. This script installs a few extra
things we need before farming most of the work out to install-deps.sh.
John Mulligan [Sat, 14 Sep 2024 10:31:23 +0000 (06:31 -0400)]
build: small script tweak to allow different build dirs
Move the mkdir line to allow for other builds dir naming schemes outside
of what appears in the .gitignore file. A tiny bit of added flexibility
at little cost.
John Mulligan [Mon, 14 Nov 2022 15:57:25 +0000 (10:57 -0500)]
src/script: add helper function has_build_dir
This function returns successfully if $BUILD_DIR exists and is valid.
This is a useful building block for automation around the build and
can be used to avoid re-running commands that fail is the build dir
exists already.
John Mulligan [Tue, 1 Nov 2022 18:58:16 +0000 (14:58 -0400)]
script: add gcc-toolset-11 support to discover_compiler
In order to configure, build, and run tests in a CentOS 8 (or similar)
container we need a functioning gcc-toolset compiler. This relies on
the environment script being sourced as well. cmake does not appear
to be able to discover this compiler own its own.
John Mulligan [Tue, 1 Nov 2022 18:51:57 +0000 (14:51 -0400)]
script: add discover_compiler function to lib-build.sh
The discover_compiler function is an abstraction over the current
compiler detection code in run-make.sh. It is intended to be flexible
enough to work on {centos,rhel} systems, but currently is just an
updated version of the logic from run-make.sh. The intent is that this
function will grow and become useful for other scripts used for
building (possibly do_cmake.sh for example).
John Mulligan [Tue, 1 Nov 2022 13:57:16 +0000 (09:57 -0400)]
script: move get_processors to lib-build.sh
This function can be more useful because the NPROC value can
be provided regardless of how many cores nproc actually detects
and may be handy in a restricted environment like a container.
The new version quotes some values and uses $((...)) as per shellcheck
warning that "expr is antiquated".
John Mulligan [Mon, 31 Oct 2022 19:06:25 +0000 (15:06 -0400)]
script: add a common ci_debug function to print ci debug lines
Reduces some of the boilerplate around emitting the "CI_DEBUG:"
prefixed debug lines for the CI. Additionally, enables using
the FORCE_CI_DEBUG var to enable ci debug lines even when not
in a jenkins environment.
John Mulligan [Mon, 31 Oct 2022 17:50:56 +0000 (13:50 -0400)]
script: add lib-build.sh for common high level funcs and no main
The intention of this file is collect some of the most basic or common
shell functions used across the various build scripts.
I would also like to ensure that functions added here are validated
using `shellcheck`. Currently, there's no automation for this, just
the honor system, but eventually we can start automating validating
this and other scripts with shellcheck.
John Mulligan [Wed, 5 Oct 2022 14:19:32 +0000 (10:19 -0400)]
script: use install-deps.sh to install extra packages wanted by run-make.sh
The run-make.sh script's prepare method pulls in additional dependencies
that are needed by the CI build and tests. To avoid issues such as these
packages not being available until after install-deps.sh being run in a
container environment, we allow install-deps.sh and its new
INSTALL_EXTRA_PKGS input variable to handle all of the dependency
installation.
John Mulligan [Mon, 3 Oct 2022 19:08:30 +0000 (15:08 -0400)]
install-deps.sh: copy ubuntu/apt retry logic from run-make.sh
Copy the logic from run-make.sh into install-deps.sh so that we can later
remove it from run-make.sh. It helps prevent breakage when apt-get is
interrupted.
John Mulligan [Mon, 3 Oct 2022 18:43:19 +0000 (14:43 -0400)]
install-deps.sh: support INSTALL_EXTRA_PKGS
Instead of requiring other scripts to install packages independently,
teach install-deps.sh to install additional packages from the variable
INSTALL_EXTRA_PKGS. Now, other scripts should just set
INSTALL_EXTRA_PKGS and call install-deps.sh.
In particular, this fixes an issue installing packages in a clean (ex.
container) system that doesn't yet have repositories set up. Since this
task is performed by install-deps.sh already we avoid a chicken-and-egg
issue (or doing redundant work of setting up repos) in other scripts.
John Mulligan [Thu, 29 Sep 2022 14:34:12 +0000 (10:34 -0400)]
install-deps.sh: move functions above all "main" script body
Previously, the main part (top level body) of the script started and
then some function definitions occurred and then the main part of the
script resumed after that. I, and others, find this confusing so this
change moves the function definitions to occur before the main body of
the install-deps.sh script.
John Mulligan [Thu, 6 Oct 2022 17:43:41 +0000 (13:43 -0400)]
script: have run-make.sh honor BUILD_DIR like do_cmake.sh does
The BUILD_DIR environment variable is honored by do_cmake.sh in order to
create multiple build output directories. Before this change run-make.sh
did not support BUILD_DIR the same way as do_cmake.sh. This change makes
it possible to use BUILD_DIR with run-make.sh.
Yuval Lifshitz [Tue, 18 Feb 2025 19:09:17 +0000 (19:09 +0000)]
reef: fix issue with bucket notification test
the get_ip() function now uses the get_ip_http() implementation.
(without that test are sometimes failing with multiple RGWs).
however, the name was modified to get_ip(), and this was not updated
in all tests.
Zac Dover [Mon, 10 Feb 2025 08:12:34 +0000 (18:12 +1000)]
doc/cephadm: improve "Activate Existing OSDs".
Make three minor changes to doc/cephadm/services/osd.rst. These three
changes were suggested by Eugen Block, who reviewed this procedure after
developing it.
Zac Dover [Fri, 7 Feb 2025 01:32:20 +0000 (11:32 +1000)]
doc/cephadm: improve "Activate Existing OSDs"
Improve the section "Activate Existing OSDs".
Supplement the information in the "Activate Existing OSDs" section with
a procedure developed by Eugen Block, here:
https://heiterbiswolkig.blogs.nde.ag/2025/02/06/cephadm-activate-existing-osds/
This procedure explains how to activate OSDs on a host that, for
whatever reason, has had to have its operating system reinstalled.
Matthew Vernon [Wed, 28 Aug 2024 15:37:46 +0000 (16:37 +0100)]
cephadm: emit warning if daemon's image is not to be used
If an image is not specified, cephadm shell will use the image
corresponding to a Ceph daemon running on the host (and will log a
debug message to that effect).
However, it will only use that image if it appears in the output of:
This commit means that cephadm will emit a warning if the container
image it was going to use fails this check, so the operator has more
of a clue to what is going on.
Fixes: https://tracker.ceph.com/issues/67778 Signed-off-by: Matthew Vernon <mvernon@wikimedia.org>
(cherry picked from commit b863c93ef1a1ce85164584dd17c5e71441bc550f)
Yonatan Zaken [Mon, 12 Aug 2024 20:00:39 +0000 (23:00 +0300)]
mgr/orchestrator: fix encrypted flag handling in orch daemon add osd
The current implementation incorrectly parses this `encrypted` flag as a string rather than a boolean value.
This leads to unintended behavior causing an LVM encryption layer to be created regardless of whether `encrypted=True` or `encrypted=False` is passed.
The only way to prevent this behavior is by omitting the `encrypted` flag entirely.
This change prevents potential errors, aligning the behavior with user expectations.
Adam King [Mon, 15 Jul 2024 19:19:22 +0000 (15:19 -0400)]
qa/upgrade: use staggered upgrade features for reef-x/stress-split
This test was trying to partially upgrade the mons and OSDs by
kicking off an upgrade and then checking every 2 seconds if
enough had been upgraded. Since staggered upgrade parameters
were present in the initial reef release (not true for quincy)
it makes sense to use them instead in order to do this in a
more controlled manner.
Adam King [Mon, 15 Jul 2024 19:02:27 +0000 (15:02 -0400)]
qa/upgrade: fix checks to make sure upgrade is still in progress
Without checking both for the upgrade being in progress and that
the status isn't reporting an error, we can end up in a scenario
where the test is just waiting for an upgrade that has already
been marked failed and will never complete. This same sort of
change was already done in the orch suite upgrade tests and
has helped with jobs timing out there
Adam King [Mon, 1 Jul 2024 17:44:29 +0000 (13:44 -0400)]
cephadm: turn off cgroups_split setting when bootstrapping with --no-cgroups-split
If users provide the --no-cgroups-split tag when bootstrapping a
cluster, they probably want the cluster to continue to not use
cgroups split for daemon post bootstrap. Setting the
mgr/cephadm/cgroups_split setting to false accomplishes that.
Adam King [Thu, 27 Jun 2024 21:09:20 +0000 (17:09 -0400)]
mgr/rgw: fix setting rgw realm token in secondary site rgw spec
This was setting a field called "rgw_token" in the rgw spec
but this is not a real field in rgw specs. Instead we should
be setting "rgw_realm_token" which is what the field is
actually called.
Setting this nonexistent field causes the spec to be deleted
the first time cephadm needs to convert it from a json string
back into a python object (which happens whenever the module
restarts or the active mgr changes) which then causes all the
rgw daemons attached to the service to be removed
Dan van der Ster [Tue, 11 Jun 2024 20:31:05 +0000 (13:31 -0700)]
cephadm: disable ms_bind_ipv4 if we will enable ms_bind_ipv6
While bootstrapping an ipv6 cluster with an ipv6 initial mon, cephadm correctly enables ms_bind_ipv6=true.
However it leaves ms_bind_ipv4 as it's default (true).
As a result, daemons (osd, mds, ...) will attempt to bind to both ipv6 and ipv4.
Usually this results in an osdmap and fsmap like the following:
```
osd.2 up in weight 1 up_from 26 up_thru 909 down_at 0 last_clean_interval [0,0) [v2:[xxxx:4f8:d0:4401:3::29]:6800/3680761436,v1:[xxxx:4f8:d0:4401:3::29]:6801/3680761436,v2:0.0.0.0:6802/3680761436,v1:0.0.0.0:6803/3680761436] [v2:[xxxx:4f8:d0:4401:3::29]:6804/3680761436,v1:[xxxx:4f8:d0:4401:3::29]:6805/3680761436,v2:0.0.0.0:6806/3680761436,v1:0.0.0.0:6807/3680761436] exists,up 0978a571-cd00-4eba-b00b-f863603a9a70
```
Dual stack is not support by kernels (https://tracker.ceph.com/issues/49581) which leads to hard to debug issues for the end users. (corrupt map messages in dmesg).
Fix by disabling ms_bind_ipv4 in the case ipv6 is desired.
Fixes: https://tracker.ceph.com/issues/66436 Signed-off-by: Dan van der Ster <dan.vanderster@clyso.com> Signed-off-by: Joshua Blanch <joshua.blanch@clyso.com>
(cherry picked from commit 75f0ba5703200f4420a4b53d1c728167daf19909)
Adam King [Wed, 19 Jun 2024 19:04:21 +0000 (15:04 -0400)]
mgr/rgw: fix error handling in rgw zone create
This was returning either a list of strings or
a HandleCommandResult and in the latter case
it would error out trying to build the final
return message which covered up the original
error
John Mulligan [Mon, 10 Jun 2024 18:36:33 +0000 (14:36 -0400)]
cephadm: rename test_enclosure to test_host_facts
There was a whole file dedicated to the enclosure class from host_facts,
but no other tests for host facts. Rename the enclosure test file to
cover all of host_facts module (for the future).
John Mulligan [Mon, 10 Jun 2024 18:30:31 +0000 (14:30 -0400)]
cephadm: update hosts_facts to read apparmor profile names with spaces
Fixes: https://tracker.ceph.com/issues/66389
Update the host_facts class kernel_security method to correctly read
apparmor profile names that have spaces in them. Update the test to
verify this functionality.
Original-version-by: Sebastian Marsching <sebastian.marsching-git-2016@aquenos.com> Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit d40fe10b8a75402d518fb54f58c689331c854778)
John Mulligan [Mon, 10 Jun 2024 18:27:51 +0000 (14:27 -0400)]
cephadm: add a test case to cover reading apparmor profiles
Add a test case that covers the HostFacts functionality regarding
the apparmor kernel security (lsm) feature.
Put it in the test_enclosure.py file for now because enclosure is
part of host_facts module.
Ilya Dryomov [Wed, 29 Jan 2025 11:56:34 +0000 (12:56 +0100)]
librbd: stop filtering async request error codes
The roots of this go back to 2015 when snap create was changed to
filter EEXIST in commit 63f6c9bac9a4 ("librbd: fixed snap create race
conditions") and flatten respectively EINVAL in commit ef7e210c3f74
("librbd: better handling for duplicate flatten requests"). From there
this pattern made it to most other operations that can be proxied
including "rbd migration execute".
The motivation was to suppress generation of an "expected" error in
response to a duplicate async request notification for the operation.
However, doing this at the top of the handler (right before returning
to the caller) and for an error as generic as EINVAL is super fragile.
It's trivial for an error that is being filtered to sneak in with
a lower level change completely unnoticed. For example, live migration
recently added NBD stream which is implemented on top of libnbd and it
turns out that some libnbd APIs return EINVAL on various occasions when
the NBD endpoint disappears and an error like ENOTCONN would make more
sense. If this occurs during "rbd migration execute" operation, the
rest of librbd never learns that migration was disrupted and the image
is transitioned to MIGRATION_STATE_EXECUTED, thus handing a partially
imported (read: corrupted) image to the user.
Luckily, with commits 07fbc4b71df4 ("librbd: track complete async
operation requests") and 96bc20445afb ("librbd: track complete async
operation return code"), the scenario which originally prompted error
code filtering isn't an issue anymore. Despite a few shortcomings
(e.g. when an async request notification is acked with result 0, it's
impossible to tell whether a) a new operation was kicked off, b) there
is an operation that is still in progress or c) it's for an operation
that completed earlier but hasn't "expired" yet), even just commit 07fbc4b71df4 by itself prevents a duplicate notification from kicking
off a second operation that could generate an error for something that
actually succeeded. With that in mind, eradicate error code filtering
from Operations class.
edef [Thu, 16 Mar 2023 09:43:58 +0000 (09:43 +0000)]
common: use close_range on Linux
Fix rook/rook#10110, which occurs when _SC_OPEN_MAX/RLIMIT_NOFILE is
set to very large values (2^30), leaving fork_function pegging a core
busylooping.
The glibc wrappers closefrom(3)/close_range(3) are not available before
glibc 2.34, so we invoke the syscall directly. When glibc 2.34 is old
enough to be a reasonable hard minimum dependency, we should switch to
using closefrom.
If we're not running on (recent enough) Linux, we fall back to the
existing approach.
Thrashers that do not inherit from ThrasherGreenlet previously used a
method called do_join, which combined stop and join functionality. To
ensure consistency and clarity, we want all thrashers to use separate
stop, join, and stop_and_join methods.
This commit renames methods and implements missing stop and stop_and_join
methods in thrashers that did not inherit from ThrasherGreenlet.
John Mulligan [Tue, 21 Jan 2025 21:28:42 +0000 (16:28 -0500)]
container: add label ceph=True back
Add a label used by cephadm internally that was always set by
ceph-container [1] back to the new containerfile. This should
prevent issues with cephadm shell command thinking official ceph images
are not official ceph images.