Kefu Chai [Wed, 23 Jun 2021 05:37:40 +0000 (13:37 +0800)]
crimson: s/crimson::do_until/crimson::repeat/
seastar::do_until() takes a predicate functor and a continuation. while
seastar::repeat() takes a single continuation which returns
stop_iteration::yes or stop_iteration::no. in general, we want to mirror
and extend the facilities offered by seastar instead of changing it in
an unexpected way. while crimson::do_until() only take a single
continuation which returns a bool. in hope to be more consistent, in
this change, all occurances of crimson::do_until are replaced with
crimson::repeat.
Kefu Chai [Wed, 23 Jun 2021 08:11:24 +0000 (16:11 +0800)]
crimson/common/errorator: consider return value as ready future in maybe_handle_error_t
this behavior mirrors seasetar::futurize::apply(), where non-future and
non-void return values are converted to future<>, and returned instead.
this change could simplify some use cases where we always return an
immediately available future.
Patrick Donnelly [Wed, 23 Jun 2021 02:34:01 +0000 (19:34 -0700)]
Merge PR #41385 into master
* refs/pull/41385/head:
mon/FSCommands: add command to rename a file system
qa/cephfs: split test_admin.TestAdminCommands
mds: remove 'fs_name' from MDSRank
Reviewed-by: Rishabh Dave <ridave@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
crimson/osd: fix construction of InternalClientRequest in DEBUG builds.
The assert in the ctor of `InternalClientRequest` actually operates on
the ctor's argument we `std::moved` from, not on the class' member.
When a debug build is used, this translates into failures like the one
below:
crimson/os: synchronize producers with consumers in AlienStore's queues.
Some time ago we replaced the single, `boost::lockfree`-based queue
in `ThreadPool` with the in-house, lockish `ShardedWorkQueue` vector.
Unfortunately, pushing into such queue isn't synchronized with
consuming from it -- the former happens without locking the `mutex`.
As the underlying primitive behind `ShardedWorkQueue::pending` is
plain `std::deque`, it's unsafe to operate that way in multi-thread
environment. Indeed, weirdly looking crashes have been spotted at Sepia:
```
(virtualenv) rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-06-21_14:49:36-rados-master-distro-basic-smithi/6182668$ less ./remote/smithi196/log/ceph-osd.7.log.gz
...
0# 0x000055862FD67ADF in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007FB22CF36B20 in /lib64/libpthread.so.0
4# 0x00005586357540E4 in ceph-osd
5# 0x00007FB22CF36B20 in /lib64/libpthread.so.0
6# pthread_cond_timedwait in /lib64/libpthread.so.0
7# crimson::os::ThreadPool::loop(std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned long) in ceph-osd
8# 0x00005586313E303B in ceph-osd
9# 0x00007FB22CC51BA3 in /lib64/libstdc++.so.6
10# 0x00007FB22CF2C14A in /lib64/libpthread.so.0
11# clone in /lib64/libc.so.6
Fault at location: 0x18
daemon-helper: command crashed with signal 11
```
This fix introduces the synchronization to the `push_back()` method of
`ShardedWorkQueue`. The side effect is that it may stall the reactor.
Therefore, a follow-up change that switches to e.g. `boost::lockfree`
is expected.
Kefu Chai [Sat, 19 Jun 2021 11:12:36 +0000 (19:12 +0800)]
crimson/common: extract parallel_for_each_state out
if `parallel_for_each_state` is defined as a nested class in errorator,
clang fails to compile it:
../src/crimson/common/errorator.h:716:47: error: no class named 'parallel_for_each_state' in 'errorator<AllowedErrors...>'
friend class errorator<AllowedErrors...>::parallel_for_each_state;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
and the forward declaration does not help. so we have to extract it
out of the errorator. to speed up the compilation, it is moved into
errorator-loop.h. its name mirrors `include/seastar/core/loop.h`.
we could extract the `errorator<>::parallel_for_each()` out as well,
as its return type can be deduced from the type of Iterator and Func.
Patrick Donnelly [Sat, 19 Jun 2021 02:49:15 +0000 (19:49 -0700)]
Merge PR #40997 into master
* refs/pull/40997/head:
test: add test to verify adding an active peer back to source
pybind/mirroring: disallow adding a active peer back to source
pybind/cephfs: interface to fetch file system id
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Sat, 19 Jun 2021 02:47:53 +0000 (19:47 -0700)]
Merge PR #36823 into master
* refs/pull/36823/head:
qa : add a test for the cmd, dump cache
mds : add timeout to the command, dump cache, to prevent it from running too long and affecting the service
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Kefu Chai [Fri, 18 Jun 2021 03:14:29 +0000 (11:14 +0800)]
ceph.spec.in: bump up the required version of fmt-devel to 6.2.1
6.2.1 is the version packaged by EPEL8, in other words, this is the
version we've been testing. so to be more consistent with the
known-to-be-good version, let's bump up the required version.
Kefu Chai [Thu, 17 Jun 2021 01:35:40 +0000 (09:35 +0800)]
crimson/tools/store_nbd: better cleanup
* remove unix domain socket file when cleanup
so we don't need to remove it manually after each run.
* shutdown input and output streams when cleanup
so reactor does not watch them anymore.
Greg Farnum [Thu, 17 Jun 2021 19:56:20 +0000 (19:56 +0000)]
mon: Sanely set the default CRUSH rule when creating pools in stretch mode
If we get a pool create request while in stretch mode that does not explicitly
specify a crush rule, look at the stretch-mode pools and their rules, and
select the most common one.
Also update set_up_stretch_mode.sh to add a few more rules that let me test
this locally.
Ramana Raja [Tue, 20 Apr 2021 21:36:07 +0000 (17:36 -0400)]
mon/FSCommands: add command to rename a file system
The fs_name of the relevant MDSMap is set to the new name. Also,
the application tags of the data pools and the meta data pool of
the file system is set to the new name.
Fixes: https://tracker.ceph.com/issues/47276 Signed-off-by: Ramana Raja <rraja@redhat.com>
Rishabh Dave [Tue, 15 Jun 2021 08:49:17 +0000 (14:19 +0530)]
qa/cephfs: split test_admin.TestAdminCommands
Splitting TestAdminCommands makes it easier to exercise a particular
"ceph fs" subcommand. Also, rename class TestSubCmdFsAuthorize to
TestFsAuthorize.
Ramana Raja [Tue, 18 May 2021 05:00:29 +0000 (01:00 -0400)]
mds: remove 'fs_name' from MDSRank
There isn't a need to store a file system's name in a MDSRank
object. The MDSRank has a pointer to a MDSMap object, which already
stores the name.
Also, there isn't a need to pass the file system name in the MMDSMap
message. It should be sufficient to pass the MDSMap in the MMDSMap
message as the the file system name is stored in the MDSMap. Pass a
empty string as map_fs_name in the MMDSMap message. It is simpler than
removing map_fs_name from the message payload altogether.
Fixes: https://tracker.ceph.com/issues/50852 Signed-off-by: Ramana Raja <rraja@redhat.com>
Joseph Sawaya [Tue, 15 Jun 2021 22:07:51 +0000 (18:07 -0400)]
mgr/rook: fix some mypy typing errors in rook_cluster.py
This commit fixes some errors caught by mypy in rook_cluster.py. Most of the
errors were caused by the update of the rook-client-python submodule in a previous
commit.
Joseph Sawaya [Tue, 15 Jun 2021 18:14:40 +0000 (14:14 -0400)]
mgr/rook: pass zone attribute to CephObjectStore CR when creating rgw
This commit passes the zone attribute to the CephObjectStore CR when
creating a RGW instance using the Rook Orchestrator backend:
`ceph orch apply rgw <rgw-name> --realm=<realm-name> --zone=<zone-name>`
Nizamudeen A [Tue, 15 Jun 2021 08:47:58 +0000 (14:17 +0530)]
mgr/dashboard: Fix 500 error while exiting out of maintenance
When you add a host in maintenance mode and then exit the maintenance
mode, a 500 server error will popup which will interrupt the whole
exit maintenance process and leave the host in an unknown/offline state.
It happened when I was setting the status of the host through the
HostSpec(). With this change, I am using the enter_maintenance api of
the orch to enable the maintenance.
Fixes: https://tracker.ceph.com/issues/51218 Signed-off-by: Nizamudeen A <nia@redhat.com>
Patrick Donnelly [Wed, 16 Jun 2021 16:30:41 +0000 (09:30 -0700)]
mon/MDSMonitor: check fscid exists for legacy case
If a client does not have permission to see the legacy fs, the monitor
will throw an exception when looking up the mdsmap later in the code.
We need to check existence for both code paths.
Fixes: https://tracker.ceph.com/issues/51077 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Kefu Chai [Wed, 16 Jun 2021 16:04:37 +0000 (00:04 +0800)]
crimson/osd: use stop_signal from seastar
and disable app_cfg.auto_handle_sigint_sigterm, otherwise app template
handles SIGINT and SIGTERM by itself, and calls app.stop(). but we don't
use this mechinary at all. we use seastar::defer() instead of
seastar::at_exit() for doing graceful shutdown and cleanup.