Kefu Chai [Mon, 27 Jun 2022 13:18:57 +0000 (21:18 +0800)]
ceph.spec.in: use %enable_devtoolset11 to enable GTS-11
%enable_devtoolset11 redefines %___build_pre by appending
`source scl_source enable gcc-toolset-11` to it. `___build_pre` should
be able to populate this setting to both %build and %install. and hence
address the FTBFS where we need to use the tool chain from GTS-11.
This commit also explicilty sets the toolset when calling
boost's bootstrap.sh, because the latter is only capable
of autodetecting gcc if you have an unversioned `g++` in
the path, which may not be the case if only gcc11-c++ is
installed (the unversioned gcc-c++ package includes the
/usr/bin/g++ symlink, but depending on which version of
openSUSE we build on, that might point to a different gcc
version, so we don't want to use that).
This can be surprising but we actually compile things during
the `install` stage of `rpm-build`. The example is the pybind's
`setup.py` which builds `rados_dummy.c`.
ceph.spec.in: use gcc-toolset-10 for building crimson
This commit bumps up the toolset version but only to build crimson.
That is, the classical OSD stays unaffected.
The reason behind the upgrade is the following FTBFS:
```
[ 32%] Building CXX object src/seastar/CMakeFiles/seastar.dir/src/core/reactor.cc.o
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11345-ga3bb1485/rpm/el8/BUILD/ceph-17.0.0-11345-ga3bb1485/src/seastar/src/core/reactor.cc: In constructor ‘seastar::reactor::reactor(std::shared_ptr<seastar::smp>, seastar::alien::instance&, unsigned int, seastar::reactor_backend_selector, seastar::reactor_config)’:
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11345-ga3bb1485/rpm/el8/BUILD/ceph-17.0.0-11345-ga3bb1485/src/seastar/src/core/reactor.cc:926:90: error: use of deleted function ‘seastar::condition_variable::condition_variable()’
926 | , _thread_pool(std::make_unique<thread_pool>(this, seastar::format("syscall-{}", id))) {
| ^
In file included from /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11345-ga3bb1485/rpm/el8/BUILD/ceph-17.0.0-11345-ga3bb1485/src/seastar/include/seastar/core/reactor.hh:74,
from /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11345-ga3bb1485/rpm/el8/BUILD/ceph-17.0.0-11345-ga3bb1485/src/seastar/src/core/reactor.cc:32:
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11345-ga3bb1485/rpm/el8/BUILD/ceph-17.0.0-11345-ga3bb1485/src/seastar/include/seastar/core/condition-variable.hh:157:5: note: ‘seastar::condition_variable::condition_variable() noexcept’ is implicitly deleted because its exception-specification does not match the implicit exception-specification ‘’
157 | condition_variable() noexcept = default;
```
osd: Handle oncommits and wait for future work items from mClock queue
When a worker thread with the smallest thread index waits for future work
items from the mClock queue, oncommit callbacks are called. But after the
callback, the thread has to continue waiting instead of returning back to
the ShardedThreadPool::shardedthreadpool_worker() loop. Returning results
in the threads with the smallest index across all shards to busy loop
causing very high CPU utilization.
The fix involves reacquiring the shard_lock and waiting on sdata_cond
until notified or until time period lapses. After this, the smallest
thread index repopulates the oncommit queue from the context_queue
if there were any additions.
Kefu Chai [Fri, 5 Aug 2022 08:40:41 +0000 (16:40 +0800)]
cmake: remove spaces in macro used for compiling cython code
we are facing following FTBFS on jammy + GCC-11.2 + Cython 0.29 +
CMake 3.22:
creating /home/jenkins-build/build/workspace/ceph-api/build/lib/cython_modules/temp.linux-x86_64-3.10/home/jenkins-build/build/workspace/ceph-api/build/src/pybind/cephfs
compile options: '-I/usr/include/python3.10 -I/usr/include/python3.10 -c'
extra options: '-Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -iquote/home/jenkins-build/build/workspace/ceph-api/src/include -w -Dvoid0=dead_function(void) -D__Pyx_check_single_interpreter(ARG)=ARG ## 0 -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2'
cc: /home/jenkins-build/build/workspace/ceph-api/build/src/pybind/cephfs/cephfs.c
cc: warning: ##: linker input file unused because linking not done
cc: error: ##: linker input file not found: No such file or directory
cc: warning: 0: linker input file unused because linking not done
cc: error: 0: linker input file not found: No such file or directory
it seems cython is not able to escape the space in the "extra options"
anymore, so the "##" and "0" are considered as object files passed to
compiler in addition to cephfs.c.
in this change the spaces are removed to help cython to make the right
decision.
We don't backport PRs merged into doc/releases. Therefore, when one browses to an older Ceph release version on docs.ceph.com (e.g., https://docs.ceph.com/en/pacific/), the information is out of date at best.
The doc/releases page is only accurate if browsing https://docs.ceph.com/en/latest/, for example.
So this post_checkout command will make sure we've checked out doc/releases from main before building and publishing.
Luis Domingues [Tue, 19 Jul 2022 09:04:34 +0000 (11:04 +0200)]
mgr/cephadm: add parsing for config on osd specs
Cephadm, while parsing spec files, can parse ceph configuration
for almost all the services, except for OSDs, where it fails
with a nasty "unexpected keyword argument config".
mgr/volumes: Fix subvolume creation in FIPS enabled system.
The md5 checksum is used in the construction of legacy
subvolume config filename. It's not used for security reason.
Hence marking the 'usedforsecurity' flag to false to
make it FIPs compliant.
The usage of md5 was always in there. The commit 373a04cf734
made it to get exercised in 'open_subvol' which is pre-requisite
for all the subvolume operations and hence subvolume
creation has failed.
When the class `Device` is instantiated with a path instead of a
block device, it fails like following.
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/site-packages/ceph_volume/util/device.py", line 130, in __init__
self._parse()
File "/usr/lib/python3.6/site-packages/ceph_volume/util/device.py", line 233, in _parse
self.ceph_device = disk.has_bluestore_label(self.path)
File "/usr/lib/python3.6/site-packages/ceph_volume/util/disk.py", line 906, in has_bluestore_label
with open(device_path, "rb") as fd:
IsADirectoryError: [Errno 21] Is a directory: '/var/lib/ceph/osd/ceph-0/'
```
passing a path instead of a block device is valid, `simple scan` needs it.
ceph-volume unit tests shouldn't actually create contents on the
filesystem from where it runs (even though they are written in a tmp
dir), let's use pyfakefs.
ceph-volume shouldn't report devices `/dev/sdy` and `/dev/sdz`, they will never be
available in such a scenario. Considering this, in a host with a bunch of mpath devices
it will pollute the inventory output.
Zack Cerza [Tue, 21 Jun 2022 17:28:30 +0000 (11:28 -0600)]
ceph-volume: Rename env var; add warning
So that we can have a nice big warning that fires once per invocation, I
think using a callable class with a class attribute seems like a decent
approach. A closure could work too.
Zack Cerza [Tue, 17 May 2022 17:29:02 +0000 (11:29 -0600)]
ceph-volume: Optionally consume loop devices
A similar proposal was rejected in #24765; I understand the logic
behind the rejection, but this will allow us to run Ceph clusters on
machines that lack disk resources for testing purposes. We just need to
make it impossible to accidentally enable, and make it clear it is
unsupported.
This check was added in commit ecd3778a6f9a ("rbd-mirror: ensure that
the last non-primary snapshot cannot be pruned") as an additional
safeguard against pruning an incomplete non-primary snapshot in case
there is no predecessor mirror snapshot. However it still fires if the
predecessor is there but happens to be a primary demotion snapshot.
A bogus "incomplete local non-primary snapshot" error is reported and
the replayer gets stuck.
Remove completed_non_primary_snapshots_exist tracking as the presence
of the predecessor in the incomplete non-primary snapshot pruning arm
is already ensured by "m_local_snap_id_start > 0" condition.
Incomplete non-primary snapshot handling is bifurcated depending
on whether any data objects have been copied. If no data objects
have been copied, an incomplete non-primary snapshot is assumed to
be malformed and gets pruned; the sync is restarted from scratch.
It doesn't really thrash anything, just repeatedly restarts the
workload on top of a dirty cache file. rbd_pwl_cache_recovery is
more on point and gets covered by existing CODEOWNERS.
Yin Congmin [Fri, 7 Jan 2022 07:03:44 +0000 (15:03 +0800)]
qa/tasks: add thrash test for persistent write log cache
add thrash test for persistent write log cache. run rbd bench
on persistent write log cache, thrashes rbd bench, test the
recovery function of persistent write log cache.
rbd: don't default empty pool name unless namespace is specified
Commit 96f05a7956b3 ("rbd: delay determination of default pool name")
broke "rbd perf image iostat" and "rbd perf image iotop" GLOBAL_POOL_KEY
support (the ability to blend all rbd pools together into a single
view).
Ilya Dryomov [Fri, 20 May 2022 12:05:03 +0000 (14:05 +0200)]
qa/suites/rbd: disable workunit timeout for dynamic_features_no_cache
The I/O workload in this test is xfstests (qa/run_xfstests_qemu.sh)
which isn't subjected to any timeout other than global max_job_time
limit in any other subsuite (e.g. qemu/workloads/qemu_xfstests.yaml).
But here, there is a parallel "op" workload defined as a workunit.
The workunit task has a default timeout of 3 hours which is effectively
imposed on the entire job. In the "rbd cache = false" configuration,
it's sometimes exceeded.
librbd: bail from schedule_request_lock() if already lock owner
Race condition may be hit if there are multiple pending locks for the
same image and pending callbacks. Abort exclusive lock process if
already exclusive lock owner.
Fixes: https://tracker.ceph.com/issues/56549 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit 3527d2c764626c09c5ede80ae844551fd8845756)
when option --gpg-url is specified, the name used for the gpg filename is missing and throws an exception
this adds the string "manual" to the gpg key : /etc/apt/trusted.gpg.d/ceph.manual.gpg
Adam King [Tue, 26 Jul 2022 13:55:05 +0000 (09:55 -0400)]
mgr/cephadm: clear error message when resuming upgrade
the message field in the output of "ceph orch upgrade status"
will first take the value of the error field of the UpgradeState,
and if only if it blank/None, display an info string we periodically
update throughout the upgrade with useful info such as that
we're upgrading a daemon of a particular type or pulling an image
on a certain host. When an upgrade fails, we set the error field
of the UpgradeState, pause the upgrade and raise a health warning.
Sometimes, the user is able to resolve the issue and simply resume
the upgrade. The issue here is, in that case, the error field of
the UpgradeState is still set, so instead of seeing the useful info
messages, it will continue to display an error message that may
no longer be relevant. By emptying the error field of the UpgradeState
when upgrades are resumed, we return to normal behavior of
displaying the info string, and will only show another error message
if another error actually occurs.