Kefu Chai [Thu, 24 Jun 2021 03:43:03 +0000 (11:43 +0800)]
ceph.spec.in: move ceph_volume python module into ceph-osd
the ceph-volume tool is composed of the cli frontend and ceph_volume
python module. in 02bc369e052125f50c7d3a7fe9b311215291c84d, its cli
frontend is moved from ceph-base package to ceph-osd. but the python
module was left in ceph-base.
since the only consumer of ceph_volume python package is ceph-volume,
better off moving this python module into ceph-osd. this also aligns
the rpm packaging with the deb packaging, where ceph-osd deb package
also include ceph_volume python module.
we could extract ceph-volumne into its own package, so it can be an
arch-independent package. let's leave it as a follow-up change.
Kefu Chai [Thu, 24 Jun 2021 03:06:31 +0000 (11:06 +0800)]
ceph.spec.in: move "Requires: python3-setuptools" from ceph-base to ceph-osd
python3-setuptools was originally added to ceph-base as a dependency of
ceph-detect-init, see https://tracker.ceph.com/issues/14864. but since
ceph-disk and ceph-detect-init were replaced by ceph-volume, and were
removed from the debian packaging in ee6bc23e892369b14668faea04c3ef1b5776a6b6.
there is no need to have python3-setuptools in the ceph-base packages
anymore.
but since we are still using pkg_resources module provided by setuptools
in ceph-volume, we need to preserve this runtime dependency in ceph-osd.
as ceph-osd packages ceph-volume.
please note, pkg_resources module is also used by cephadm to poke around
ceph_iscsi python module installed in a container, so python-setuptools
should be installed along with ceph-iscsi if we need a better
interoperability between ceph-iscsi and cephadm. this is not in the
scope of this change.
Kefu Chai [Wed, 23 Jun 2021 05:37:40 +0000 (13:37 +0800)]
crimson: s/crimson::do_until/crimson::repeat/
seastar::do_until() takes a predicate functor and a continuation. while
seastar::repeat() takes a single continuation which returns
stop_iteration::yes or stop_iteration::no. in general, we want to mirror
and extend the facilities offered by seastar instead of changing it in
an unexpected way. while crimson::do_until() only take a single
continuation which returns a bool. in hope to be more consistent, in
this change, all occurances of crimson::do_until are replaced with
crimson::repeat.
Kefu Chai [Wed, 23 Jun 2021 08:11:24 +0000 (16:11 +0800)]
crimson/common/errorator: consider return value as ready future in maybe_handle_error_t
this behavior mirrors seasetar::futurize::apply(), where non-future and
non-void return values are converted to future<>, and returned instead.
this change could simplify some use cases where we always return an
immediately available future.
Patrick Donnelly [Wed, 23 Jun 2021 02:34:01 +0000 (19:34 -0700)]
Merge PR #41385 into master
* refs/pull/41385/head:
mon/FSCommands: add command to rename a file system
qa/cephfs: split test_admin.TestAdminCommands
mds: remove 'fs_name' from MDSRank
Reviewed-by: Rishabh Dave <ridave@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Fri, 18 Jun 2021 16:27:54 +0000 (09:27 -0700)]
mds: avoid journaling overhead for ceph.dir.subvolume for no-op case
In preparation for acquiring the xlock on the directory inode, the MDS
must journal a few events before continuing on with the setvxattr. This
can cause significant delays in the volumes ceph-mgr module which needs
to regularly enable this vxattr from multiple code paths. We could cache
in that module whether the vxattr is set but it's also pretty easy to
adjust the MDS to acquire a rdlock on the directory to check if the
subvolume flag is already set. That is much lighter weight and the lock
is generally readily available.
Fixes: https://tracker.ceph.com/issues/51276 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
chunmei-liu [Mon, 21 Jun 2021 21:52:54 +0000 (14:52 -0700)]
crimson/seastore: fix reactor stalled and rbd_open failed
omap manger use keys' reference, so the keys' string set should
be kept alive during omap operation. use value capture instead of
reference capture.
otherwise, during omap operation, the keys contents will be changed
and cause logger insert_iterator very long time to stall reactor.
omap_get_value also failed, rbd_open failed.
crimson/osd: fix construction of InternalClientRequest in DEBUG builds.
The assert in the ctor of `InternalClientRequest` actually operates on
the ctor's argument we `std::moved` from, not on the class' member.
When a debug build is used, this translates into failures like the one
below:
crimson/os: synchronize producers with consumers in AlienStore's queues.
Some time ago we replaced the single, `boost::lockfree`-based queue
in `ThreadPool` with the in-house, lockish `ShardedWorkQueue` vector.
Unfortunately, pushing into such queue isn't synchronized with
consuming from it -- the former happens without locking the `mutex`.
As the underlying primitive behind `ShardedWorkQueue::pending` is
plain `std::deque`, it's unsafe to operate that way in multi-thread
environment. Indeed, weirdly looking crashes have been spotted at Sepia:
```
(virtualenv) rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-06-21_14:49:36-rados-master-distro-basic-smithi/6182668$ less ./remote/smithi196/log/ceph-osd.7.log.gz
...
0# 0x000055862FD67ADF in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007FB22CF36B20 in /lib64/libpthread.so.0
4# 0x00005586357540E4 in ceph-osd
5# 0x00007FB22CF36B20 in /lib64/libpthread.so.0
6# pthread_cond_timedwait in /lib64/libpthread.so.0
7# crimson::os::ThreadPool::loop(std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned long) in ceph-osd
8# 0x00005586313E303B in ceph-osd
9# 0x00007FB22CC51BA3 in /lib64/libstdc++.so.6
10# 0x00007FB22CF2C14A in /lib64/libpthread.so.0
11# clone in /lib64/libc.so.6
Fault at location: 0x18
daemon-helper: command crashed with signal 11
```
This fix introduces the synchronization to the `push_back()` method of
`ShardedWorkQueue`. The side effect is that it may stall the reactor.
Therefore, a follow-up change that switches to e.g. `boost::lockfree`
is expected.
Kefu Chai [Sat, 19 Jun 2021 11:12:36 +0000 (19:12 +0800)]
crimson/common: extract parallel_for_each_state out
if `parallel_for_each_state` is defined as a nested class in errorator,
clang fails to compile it:
../src/crimson/common/errorator.h:716:47: error: no class named 'parallel_for_each_state' in 'errorator<AllowedErrors...>'
friend class errorator<AllowedErrors...>::parallel_for_each_state;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
and the forward declaration does not help. so we have to extract it
out of the errorator. to speed up the compilation, it is moved into
errorator-loop.h. its name mirrors `include/seastar/core/loop.h`.
we could extract the `errorator<>::parallel_for_each()` out as well,
as its return type can be deduced from the type of Iterator and Func.
Patrick Donnelly [Sat, 19 Jun 2021 02:49:15 +0000 (19:49 -0700)]
Merge PR #40997 into master
* refs/pull/40997/head:
test: add test to verify adding an active peer back to source
pybind/mirroring: disallow adding a active peer back to source
pybind/cephfs: interface to fetch file system id
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Sat, 19 Jun 2021 02:47:53 +0000 (19:47 -0700)]
Merge PR #36823 into master
* refs/pull/36823/head:
qa : add a test for the cmd, dump cache
mds : add timeout to the command, dump cache, to prevent it from running too long and affecting the service
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Kefu Chai [Fri, 18 Jun 2021 03:14:29 +0000 (11:14 +0800)]
ceph.spec.in: bump up the required version of fmt-devel to 6.2.1
6.2.1 is the version packaged by EPEL8, in other words, this is the
version we've been testing. so to be more consistent with the
known-to-be-good version, let's bump up the required version.
Kefu Chai [Thu, 17 Jun 2021 01:35:40 +0000 (09:35 +0800)]
crimson/tools/store_nbd: better cleanup
* remove unix domain socket file when cleanup
so we don't need to remove it manually after each run.
* shutdown input and output streams when cleanup
so reactor does not watch them anymore.
Greg Farnum [Thu, 17 Jun 2021 19:56:20 +0000 (19:56 +0000)]
mon: Sanely set the default CRUSH rule when creating pools in stretch mode
If we get a pool create request while in stretch mode that does not explicitly
specify a crush rule, look at the stretch-mode pools and their rules, and
select the most common one.
Also update set_up_stretch_mode.sh to add a few more rules that let me test
this locally.
Ramana Raja [Tue, 20 Apr 2021 21:36:07 +0000 (17:36 -0400)]
mon/FSCommands: add command to rename a file system
The fs_name of the relevant MDSMap is set to the new name. Also,
the application tags of the data pools and the meta data pool of
the file system is set to the new name.
Fixes: https://tracker.ceph.com/issues/47276 Signed-off-by: Ramana Raja <rraja@redhat.com>
Rishabh Dave [Tue, 15 Jun 2021 08:49:17 +0000 (14:19 +0530)]
qa/cephfs: split test_admin.TestAdminCommands
Splitting TestAdminCommands makes it easier to exercise a particular
"ceph fs" subcommand. Also, rename class TestSubCmdFsAuthorize to
TestFsAuthorize.
Ramana Raja [Tue, 18 May 2021 05:00:29 +0000 (01:00 -0400)]
mds: remove 'fs_name' from MDSRank
There isn't a need to store a file system's name in a MDSRank
object. The MDSRank has a pointer to a MDSMap object, which already
stores the name.
Also, there isn't a need to pass the file system name in the MMDSMap
message. It should be sufficient to pass the MDSMap in the MMDSMap
message as the the file system name is stored in the MDSMap. Pass a
empty string as map_fs_name in the MMDSMap message. It is simpler than
removing map_fs_name from the message payload altogether.
Fixes: https://tracker.ceph.com/issues/50852 Signed-off-by: Ramana Raja <rraja@redhat.com>