Ronen Friedman [Wed, 23 Jun 2021 17:02:28 +0000 (20:02 +0300)]
osd/scrub: replace a ceph_assert() with a test
We are using two distinct conditions to decide whether a candidate PG is already being scrubbed. The OSD checks pgs_scrub_active(), while the PG asserts on the value of PG_STATE_FLAG.
There is a time window when PG_STATE_FLAG is set but is_scrub_active() wasn't yet set. is_reserving() covers most of that period, but the ceph_assert is just before the is_reserving check.
Kefu Chai [Wed, 23 Jun 2021 05:37:40 +0000 (13:37 +0800)]
crimson: s/crimson::do_until/crimson::repeat/
seastar::do_until() takes a predicate functor and a continuation. while
seastar::repeat() takes a single continuation which returns
stop_iteration::yes or stop_iteration::no. in general, we want to mirror
and extend the facilities offered by seastar instead of changing it in
an unexpected way. while crimson::do_until() only take a single
continuation which returns a bool. in hope to be more consistent, in
this change, all occurances of crimson::do_until are replaced with
crimson::repeat.
Kefu Chai [Wed, 23 Jun 2021 08:11:24 +0000 (16:11 +0800)]
crimson/common/errorator: consider return value as ready future in maybe_handle_error_t
this behavior mirrors seasetar::futurize::apply(), where non-future and
non-void return values are converted to future<>, and returned instead.
this change could simplify some use cases where we always return an
immediately available future.
Patrick Donnelly [Wed, 23 Jun 2021 02:34:01 +0000 (19:34 -0700)]
Merge PR #41385 into master
* refs/pull/41385/head:
mon/FSCommands: add command to rename a file system
qa/cephfs: split test_admin.TestAdminCommands
mds: remove 'fs_name' from MDSRank
Reviewed-by: Rishabh Dave <ridave@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
chunmei-liu [Mon, 21 Jun 2021 21:52:54 +0000 (14:52 -0700)]
crimson/seastore: fix reactor stalled and rbd_open failed
omap manger use keys' reference, so the keys' string set should
be kept alive during omap operation. use value capture instead of
reference capture.
otherwise, during omap operation, the keys contents will be changed
and cause logger insert_iterator very long time to stall reactor.
omap_get_value also failed, rbd_open failed.
crimson/osd: fix construction of InternalClientRequest in DEBUG builds.
The assert in the ctor of `InternalClientRequest` actually operates on
the ctor's argument we `std::moved` from, not on the class' member.
When a debug build is used, this translates into failures like the one
below:
crimson/os: synchronize producers with consumers in AlienStore's queues.
Some time ago we replaced the single, `boost::lockfree`-based queue
in `ThreadPool` with the in-house, lockish `ShardedWorkQueue` vector.
Unfortunately, pushing into such queue isn't synchronized with
consuming from it -- the former happens without locking the `mutex`.
As the underlying primitive behind `ShardedWorkQueue::pending` is
plain `std::deque`, it's unsafe to operate that way in multi-thread
environment. Indeed, weirdly looking crashes have been spotted at Sepia:
```
(virtualenv) rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-06-21_14:49:36-rados-master-distro-basic-smithi/6182668$ less ./remote/smithi196/log/ceph-osd.7.log.gz
...
0# 0x000055862FD67ADF in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007FB22CF36B20 in /lib64/libpthread.so.0
4# 0x00005586357540E4 in ceph-osd
5# 0x00007FB22CF36B20 in /lib64/libpthread.so.0
6# pthread_cond_timedwait in /lib64/libpthread.so.0
7# crimson::os::ThreadPool::loop(std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned long) in ceph-osd
8# 0x00005586313E303B in ceph-osd
9# 0x00007FB22CC51BA3 in /lib64/libstdc++.so.6
10# 0x00007FB22CF2C14A in /lib64/libpthread.so.0
11# clone in /lib64/libc.so.6
Fault at location: 0x18
daemon-helper: command crashed with signal 11
```
This fix introduces the synchronization to the `push_back()` method of
`ShardedWorkQueue`. The side effect is that it may stall the reactor.
Therefore, a follow-up change that switches to e.g. `boost::lockfree`
is expected.
Kefu Chai [Sat, 19 Jun 2021 11:12:36 +0000 (19:12 +0800)]
crimson/common: extract parallel_for_each_state out
if `parallel_for_each_state` is defined as a nested class in errorator,
clang fails to compile it:
../src/crimson/common/errorator.h:716:47: error: no class named 'parallel_for_each_state' in 'errorator<AllowedErrors...>'
friend class errorator<AllowedErrors...>::parallel_for_each_state;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
and the forward declaration does not help. so we have to extract it
out of the errorator. to speed up the compilation, it is moved into
errorator-loop.h. its name mirrors `include/seastar/core/loop.h`.
we could extract the `errorator<>::parallel_for_each()` out as well,
as its return type can be deduced from the type of Iterator and Func.
Patrick Donnelly [Sat, 19 Jun 2021 02:49:15 +0000 (19:49 -0700)]
Merge PR #40997 into master
* refs/pull/40997/head:
test: add test to verify adding an active peer back to source
pybind/mirroring: disallow adding a active peer back to source
pybind/cephfs: interface to fetch file system id
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Sat, 19 Jun 2021 02:47:53 +0000 (19:47 -0700)]
Merge PR #36823 into master
* refs/pull/36823/head:
qa : add a test for the cmd, dump cache
mds : add timeout to the command, dump cache, to prevent it from running too long and affecting the service
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Kefu Chai [Fri, 18 Jun 2021 03:14:29 +0000 (11:14 +0800)]
ceph.spec.in: bump up the required version of fmt-devel to 6.2.1
6.2.1 is the version packaged by EPEL8, in other words, this is the
version we've been testing. so to be more consistent with the
known-to-be-good version, let's bump up the required version.
Kefu Chai [Thu, 17 Jun 2021 01:35:40 +0000 (09:35 +0800)]
crimson/tools/store_nbd: better cleanup
* remove unix domain socket file when cleanup
so we don't need to remove it manually after each run.
* shutdown input and output streams when cleanup
so reactor does not watch them anymore.
Greg Farnum [Thu, 17 Jun 2021 19:56:20 +0000 (19:56 +0000)]
mon: Sanely set the default CRUSH rule when creating pools in stretch mode
If we get a pool create request while in stretch mode that does not explicitly
specify a crush rule, look at the stretch-mode pools and their rules, and
select the most common one.
Also update set_up_stretch_mode.sh to add a few more rules that let me test
this locally.
Ramana Raja [Tue, 20 Apr 2021 21:36:07 +0000 (17:36 -0400)]
mon/FSCommands: add command to rename a file system
The fs_name of the relevant MDSMap is set to the new name. Also,
the application tags of the data pools and the meta data pool of
the file system is set to the new name.
Fixes: https://tracker.ceph.com/issues/47276 Signed-off-by: Ramana Raja <rraja@redhat.com>
Rishabh Dave [Tue, 15 Jun 2021 08:49:17 +0000 (14:19 +0530)]
qa/cephfs: split test_admin.TestAdminCommands
Splitting TestAdminCommands makes it easier to exercise a particular
"ceph fs" subcommand. Also, rename class TestSubCmdFsAuthorize to
TestFsAuthorize.
Ramana Raja [Tue, 18 May 2021 05:00:29 +0000 (01:00 -0400)]
mds: remove 'fs_name' from MDSRank
There isn't a need to store a file system's name in a MDSRank
object. The MDSRank has a pointer to a MDSMap object, which already
stores the name.
Also, there isn't a need to pass the file system name in the MMDSMap
message. It should be sufficient to pass the MDSMap in the MMDSMap
message as the the file system name is stored in the MDSMap. Pass a
empty string as map_fs_name in the MMDSMap message. It is simpler than
removing map_fs_name from the message payload altogether.
Fixes: https://tracker.ceph.com/issues/50852 Signed-off-by: Ramana Raja <rraja@redhat.com>
Joseph Sawaya [Tue, 15 Jun 2021 22:07:51 +0000 (18:07 -0400)]
mgr/rook: fix some mypy typing errors in rook_cluster.py
This commit fixes some errors caught by mypy in rook_cluster.py. Most of the
errors were caused by the update of the rook-client-python submodule in a previous
commit.
Joseph Sawaya [Tue, 15 Jun 2021 18:14:40 +0000 (14:14 -0400)]
mgr/rook: pass zone attribute to CephObjectStore CR when creating rgw
This commit passes the zone attribute to the CephObjectStore CR when
creating a RGW instance using the Rook Orchestrator backend:
`ceph orch apply rgw <rgw-name> --realm=<realm-name> --zone=<zone-name>`