John Mulligan [Tue, 20 Aug 2024 19:01:05 +0000 (15:01 -0400)]
src/script: add a script to help build ceph using containers
The build-with-container script tries to encapsulate nearly all major
build tasks using docker/podman containers. If there's no build image
locally it will create one for your. It provides targets for building
(make), testing (make check), building rpm packages or deb packages and
is designed to be fairly easily extended.
View the comment at the top of the source file for usage details.
John Mulligan [Tue, 20 Aug 2024 19:00:57 +0000 (15:00 -0400)]
build: add files needed to create a build container
A build container contains all the tools and dependencies needed to
build ceph. It provides a Container file and small script that
helps bootstrap the container setup. This script installs a few extra
things we need before farming most of the work out to install-deps.sh.
John Mulligan [Sat, 14 Sep 2024 10:31:23 +0000 (06:31 -0400)]
build: small script tweak to allow different build dirs
Move the mkdir line to allow for other builds dir naming schemes outside
of what appears in the .gitignore file. A tiny bit of added flexibility
at little cost.
John Mulligan [Mon, 14 Nov 2022 15:57:25 +0000 (10:57 -0500)]
src/script: add helper function has_build_dir
This function returns successfully if $BUILD_DIR exists and is valid.
This is a useful building block for automation around the build and
can be used to avoid re-running commands that fail is the build dir
exists already.
John Mulligan [Tue, 1 Nov 2022 18:58:16 +0000 (14:58 -0400)]
script: add gcc-toolset-11 support to discover_compiler
In order to configure, build, and run tests in a CentOS 8 (or similar)
container we need a functioning gcc-toolset compiler. This relies on
the environment script being sourced as well. cmake does not appear
to be able to discover this compiler own its own.
John Mulligan [Tue, 1 Nov 2022 18:51:57 +0000 (14:51 -0400)]
script: add discover_compiler function to lib-build.sh
The discover_compiler function is an abstraction over the current
compiler detection code in run-make.sh. It is intended to be flexible
enough to work on {centos,rhel} systems, but currently is just an
updated version of the logic from run-make.sh. The intent is that this
function will grow and become useful for other scripts used for
building (possibly do_cmake.sh for example).
John Mulligan [Tue, 1 Nov 2022 13:57:16 +0000 (09:57 -0400)]
script: move get_processors to lib-build.sh
This function can be more useful because the NPROC value can
be provided regardless of how many cores nproc actually detects
and may be handy in a restricted environment like a container.
The new version quotes some values and uses $((...)) as per shellcheck
warning that "expr is antiquated".
John Mulligan [Mon, 31 Oct 2022 19:06:25 +0000 (15:06 -0400)]
script: add a common ci_debug function to print ci debug lines
Reduces some of the boilerplate around emitting the "CI_DEBUG:"
prefixed debug lines for the CI. Additionally, enables using
the FORCE_CI_DEBUG var to enable ci debug lines even when not
in a jenkins environment.
John Mulligan [Mon, 31 Oct 2022 17:50:56 +0000 (13:50 -0400)]
script: add lib-build.sh for common high level funcs and no main
The intention of this file is collect some of the most basic or common
shell functions used across the various build scripts.
I would also like to ensure that functions added here are validated
using `shellcheck`. Currently, there's no automation for this, just
the honor system, but eventually we can start automating validating
this and other scripts with shellcheck.
John Mulligan [Wed, 5 Oct 2022 14:19:32 +0000 (10:19 -0400)]
script: use install-deps.sh to install extra packages wanted by run-make.sh
The run-make.sh script's prepare method pulls in additional dependencies
that are needed by the CI build and tests. To avoid issues such as these
packages not being available until after install-deps.sh being run in a
container environment, we allow install-deps.sh and its new
INSTALL_EXTRA_PKGS input variable to handle all of the dependency
installation.
John Mulligan [Mon, 3 Oct 2022 19:08:30 +0000 (15:08 -0400)]
install-deps.sh: copy ubuntu/apt retry logic from run-make.sh
Copy the logic from run-make.sh into install-deps.sh so that we can later
remove it from run-make.sh. It helps prevent breakage when apt-get is
interrupted.
John Mulligan [Mon, 3 Oct 2022 18:43:19 +0000 (14:43 -0400)]
install-deps.sh: support INSTALL_EXTRA_PKGS
Instead of requiring other scripts to install packages independently,
teach install-deps.sh to install additional packages from the variable
INSTALL_EXTRA_PKGS. Now, other scripts should just set
INSTALL_EXTRA_PKGS and call install-deps.sh.
In particular, this fixes an issue installing packages in a clean (ex.
container) system that doesn't yet have repositories set up. Since this
task is performed by install-deps.sh already we avoid a chicken-and-egg
issue (or doing redundant work of setting up repos) in other scripts.
John Mulligan [Thu, 29 Sep 2022 14:34:12 +0000 (10:34 -0400)]
install-deps.sh: move functions above all "main" script body
Previously, the main part (top level body) of the script started and
then some function definitions occurred and then the main part of the
script resumed after that. I, and others, find this confusing so this
change moves the function definitions to occur before the main body of
the install-deps.sh script.
John Mulligan [Thu, 6 Oct 2022 17:43:41 +0000 (13:43 -0400)]
script: have run-make.sh honor BUILD_DIR like do_cmake.sh does
The BUILD_DIR environment variable is honored by do_cmake.sh in order to
create multiple build output directories. Before this change run-make.sh
did not support BUILD_DIR the same way as do_cmake.sh. This change makes
it possible to use BUILD_DIR with run-make.sh.
Zac Dover [Mon, 10 Feb 2025 08:12:34 +0000 (18:12 +1000)]
doc/cephadm: improve "Activate Existing OSDs".
Make three minor changes to doc/cephadm/services/osd.rst. These three
changes were suggested by Eugen Block, who reviewed this procedure after
developing it.
Zac Dover [Fri, 7 Feb 2025 01:32:20 +0000 (11:32 +1000)]
doc/cephadm: improve "Activate Existing OSDs"
Improve the section "Activate Existing OSDs".
Supplement the information in the "Activate Existing OSDs" section with
a procedure developed by Eugen Block, here:
https://heiterbiswolkig.blogs.nde.ag/2025/02/06/cephadm-activate-existing-osds/
This procedure explains how to activate OSDs on a host that, for
whatever reason, has had to have its operating system reinstalled.
Ilya Dryomov [Wed, 29 Jan 2025 11:56:34 +0000 (12:56 +0100)]
librbd: stop filtering async request error codes
The roots of this go back to 2015 when snap create was changed to
filter EEXIST in commit 63f6c9bac9a4 ("librbd: fixed snap create race
conditions") and flatten respectively EINVAL in commit ef7e210c3f74
("librbd: better handling for duplicate flatten requests"). From there
this pattern made it to most other operations that can be proxied
including "rbd migration execute".
The motivation was to suppress generation of an "expected" error in
response to a duplicate async request notification for the operation.
However, doing this at the top of the handler (right before returning
to the caller) and for an error as generic as EINVAL is super fragile.
It's trivial for an error that is being filtered to sneak in with
a lower level change completely unnoticed. For example, live migration
recently added NBD stream which is implemented on top of libnbd and it
turns out that some libnbd APIs return EINVAL on various occasions when
the NBD endpoint disappears and an error like ENOTCONN would make more
sense. If this occurs during "rbd migration execute" operation, the
rest of librbd never learns that migration was disrupted and the image
is transitioned to MIGRATION_STATE_EXECUTED, thus handing a partially
imported (read: corrupted) image to the user.
Luckily, with commits 07fbc4b71df4 ("librbd: track complete async
operation requests") and 96bc20445afb ("librbd: track complete async
operation return code"), the scenario which originally prompted error
code filtering isn't an issue anymore. Despite a few shortcomings
(e.g. when an async request notification is acked with result 0, it's
impossible to tell whether a) a new operation was kicked off, b) there
is an operation that is still in progress or c) it's for an operation
that completed earlier but hasn't "expired" yet), even just commit 07fbc4b71df4 by itself prevents a duplicate notification from kicking
off a second operation that could generate an error for something that
actually succeeded. With that in mind, eradicate error code filtering
from Operations class.
edef [Thu, 16 Mar 2023 09:43:58 +0000 (09:43 +0000)]
common: use close_range on Linux
Fix rook/rook#10110, which occurs when _SC_OPEN_MAX/RLIMIT_NOFILE is
set to very large values (2^30), leaving fork_function pegging a core
busylooping.
The glibc wrappers closefrom(3)/close_range(3) are not available before
glibc 2.34, so we invoke the syscall directly. When glibc 2.34 is old
enough to be a reasonable hard minimum dependency, we should switch to
using closefrom.
If we're not running on (recent enough) Linux, we fall back to the
existing approach.
Thrashers that do not inherit from ThrasherGreenlet previously used a
method called do_join, which combined stop and join functionality. To
ensure consistency and clarity, we want all thrashers to use separate
stop, join, and stop_and_join methods.
This commit renames methods and implements missing stop and stop_and_join
methods in thrashers that did not inherit from ThrasherGreenlet.
John Mulligan [Tue, 21 Jan 2025 21:28:42 +0000 (16:28 -0500)]
container: add label ceph=True back
Add a label used by cephadm internally that was always set by
ceph-container [1] back to the new containerfile. This should
prevent issues with cephadm shell command thinking official ceph images
are not official ceph images.
Ilya Dryomov [Thu, 30 Jan 2025 19:30:18 +0000 (20:30 +0100)]
doc/rbd: use https links in live import examples
Even though it's explicitly said that "http" stream can be used to
import via both HTTP and HTTPS, it can still be confusing that "type":
"http" is expected to go with "url": "https://...". Switch example
URLs from HTTP to HTTPS to make it more obvious.
Ilya Dryomov [Mon, 27 Jan 2025 11:29:54 +0000 (12:29 +0100)]
osd/OSDCap: fix misleading grammar comments
The restrictions on pool name and namespace have been independent of
each other for ages. Specifying namespace[=]<namespace> doesn't require
specifying pool[=]<pool> like is currently suggested -- neither for
regular "allow" grants nor for "profile" grants.
Ilya Dryomov [Fri, 24 Jan 2025 19:47:11 +0000 (20:47 +0100)]
mon/OSDMonitor: relax cap enforcement for unmanaged snapshots
Since commit 4972e054b32c ("mon/OSDMonitor: enforce caps when
creating/deleting unmanaged snapshots"), a) write access to the MON
service, b) write access to the OSD service for a pool or c) permission
for "osd pool op unmanaged-snap" command for a pool is required. For
"profile rbd" we configure read-only access to the MON service and rely
on write access to the OSD service, however the corresponding check in
is_osd_writable() is too strict.
A OSD cap like "profile rbd namespace=myns" or "allow w namespace=myns"
allows write access to myns namespace of any pool, but is_osd_writable()
disallows operations with unmanaged snapshots with such a cap because
its match.pool_namespace.pool_name.empty() is true. This condition
appears to serve as the "doesn't include support for the application
tag" guard, but it should actually be match.pool_tag.is_match_all()
(or match.pool_tag.application.empty() if open-coded) -- no restriction
on the pool name doesn't automatically mean that there is a restriction
on the application tag.
Dan Mick [Thu, 23 Jan 2025 02:28:15 +0000 (18:28 -0800)]
container/build.sh: fix up org vs. repo naming
release builds were using the wrong container repo name because of
confused variable naming and inadequate separation. Keep the hostname,
org name, and repo name in separate variables, and assemble the full
path with a version when tagging is done.