crimson/osd: fix assertion failure in OpSequencer.
`OpSequencer` assumes that ID of a previous client request
is always lower than ID of current one. This is reflected
by the assertion in `OpSequencer::start_op()`. It triggered
the following failure [1] in Teuthology:
```
DEBUG 2021-05-07 08:01:41,227 [shard 0] osd - client_request(id=1, detail=osd_op(client.4171.0:1 2.2 2.7c339972 (undecoded) ondisk+retry+read+known_if_redirected e29) v8) same_interval_since: 31
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-3910-g1b18e076/rpm/el8/BUILD/ceph- 17.0.0-3910-g1b18e076/src/crimson/osd/osd_operation_sequencer.h:38: seastar::futurize_t<Result> crimson::osd::OpSequencer::start_op(HandleT&, uint64_t, uint64_t, FuncT&&) [with HandleT = crimson::PipelineHa
ndle; FuncT = crimson::interruptible::interruptor<InterruptCond>::wrap_function(Func&&) [with Func = crimson::osd::ClientRequest::start()::<lambda()> mutable::<lambda(Ref<crimson::osd::PG>)> mutable::<lambd
a()> mutable::<lambda()>; InterruptCond = crimson::osd::IOInterruptCondition]::<lambda()>; Result = crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<>
>; seastar::futurize_t<Result> = crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<> >; uint64_t = long unsigned int]: Assertion `prev_op < this_op' fai
led.
Aborting on shard 0.
Backtrace:
Segmentation fault.
Backtrace:
0# 0x00005592B028932F in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<6>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007F57B72E7B20 in /lib64/libpthread.so.0
4# gsignal in /lib64/libc.so.6
5# abort in /lib64/libc.so.6
6# 0x00007F57B58E2B09 in /lib64/libc.so.6
7# 0x00007F57B58F0DE6 in /lib64/libc.so.6
8# 0x00005592ABB8484D in ceph-osd
9# 0x00005592ABB8ACB3 in ceph-osd
10# seastar::continuation<seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >, seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>, seastar::future<boost::intrusive_ptr<crimson::osd::PG> >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>, seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >(seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >&&, seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>&, seastar::future_state<boost::intrusive_ptr<crimson::osd::PG> >&&)#1}, boost::intrusive_ptr<crimson::osd::PG> >::run_and_dispose() in ceph-osd
11# 0x00005592B357F88F in ceph-osd
12# 0x00005592B3584DD0 in ceph-osd
```
Crash analysis resulted in two observations:
1. during the request execution the acting set got
changed, the request was interrupted and a try
to re-execute it emerged;
2. the interrupted request was the very first client
request the OSD has ever seen.
Code analysis showed a problem in how `ClientRequest`
establishes `prev_op_id`: although supposed to be performed
only once for a request, it can get executed twice but only
for the very first request `OpSequencer` saw.
```cpp
void ClientRequest::may_set_prev_op()
{
// set prev_op_id if it's not set yet
if (__builtin_expect(prev_op_id == 0, true)) {
prev_op_id = sequencer.get_last_issued();
}
}
```
Unfortunately, `0` isn't a distincted value that cannot
be returned by `get_last_issued()`:
// the id of last op which is issued
uint64_t last_issued = 0;
```
As a result, `OpSequencer` returned on the second call
a new value (actually `this_op`) violating the assertion.
The commit fixes the problem by switching from a designated
value to `std::optional`.
Kefu Chai [Wed, 19 May 2021 10:02:59 +0000 (18:02 +0800)]
test/crimson/seastore: teardown in reactor
otherwise, we rely on the destructor of TMTestState to teardown the
fixuture created in TMTestState::_init(), but TMTestState::_init() is
called in reactor. the objects like seastar::metric_groups are
supposed to be destroyed on the same thread where they are created.
because they use thread local storage of storing persisting their status.
if we destroy objects like seastar::metric_groups on different reactor
or thread where they are created, we would have memory leak and
unexpected behavior.
Kefu Chai [Wed, 19 May 2021 04:32:39 +0000 (12:32 +0800)]
crimson/os: do not create PerfService for SeaStore
because we are replacing PerfCounter with seastart::metrics in crimson,
and the former has already a sharded service builtin in seastar, there
is no need to create PerfService or its counterpart for SeaStore
anymore.
The autoscaler by default will start out each pool with minimal
pgs and `scale-up` the pgs when there is more usage in each pool.
Users can now use the commands:
`osd pool set autoscale-profile scale-down` to make the pools
start out with a full complement of pgs and only `scale-down`
when usage ratio across the pools are not even.
`osd pool set autoscale-profile scale-up` (by default) to make the pools
start out with minimal pgs and `scale-up` the pgs when there
is more usage in each pool.
Edited KVMonitor.cc file to make the `autoscale_profile` variable
persistent.
Edited tests/test_cal_final_pg_target.py so that it takes into account
the new `profile` argument when calling cal_final_pg_target(). Also,
added some new test cases for when profile is `scale-up`
Renamed tests/test_autoscaler.py to a more appropriate name:
tests/test_cal_ratio.py
Sage Weil [Wed, 19 May 2021 11:55:30 +0000 (07:55 -0400)]
Merge PR #41286 into master
* refs/pull/41286/head:
qa/suites/orch/rook: disable centos for now
qa/suites/orch/rook/smoke: initial smoke suite
qa/tasks/rook: ROOK_HOSTPATH_REQUIRES_PRIVILEGED=true on centos
qa/tasks/rook: simplify shutdown
qa/tasks/rook: archive logs
qa/tasks/rook: more orderly cluster teardown
qa/tasks/rook: deploy ceph via rook on top of kubernetes
qa/tasks/kubeadm: install kubernetes with kubeadm
qa/suites: move rados/cephadm -> orch/cephadm; symlink
qa/tasks/cephadm: add whitespace between functions
qa/tasks/cephadm: clean up ctx.manager setup
luo rixin [Wed, 19 May 2021 02:27:18 +0000 (10:27 +0800)]
common/crc32c_aarch64: fix crc32c unittest failed on aarch64
On centos 8.2 for aarch64 with gcc 8.3, the complier will use
register v0 conflicting with the register v0 be usded in inline
asm code. Adding the related registers into clobber list to inform
complier avoiding the confict.
Fixes: https://tracker.ceph.com/issues/50835 Signed-off-by: luo rixin <luorixin@huawei.com>
Patrick Donnelly [Tue, 18 May 2021 20:36:05 +0000 (13:36 -0700)]
Merge PR #40885 into master
* refs/pull/40885/head:
doc: document cephfs-mirror configuration options
cephfs-mirror: use sensible mount timeout when mounting local/remote fs
test: add tests for settting mount timeout
pybind/cephfs: add interface to set mount timeout
libcephfs: add interface to set mount timeout
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Sage Weil [Tue, 18 May 2021 14:54:15 +0000 (09:54 -0500)]
qa/tasks/rook: simplify shutdown
For some reason deleting common.yaml sometimes fails. Not really
sure why, but since we will tear down kubernetes anyway this
cleanup isn't really needed.
Sage Weil [Fri, 7 May 2021 18:24:31 +0000 (13:24 -0500)]
qa/tasks/kubeadm: install kubernetes with kubeadm
- install k8s with kubeadm
- initial support for flannel only
- remove taint from bootstrap/master node
- create PVs for all scratch_devs + a 'scratch' SC
- kubeadm.kubectl task
Ronen Friedman [Sat, 15 May 2021 19:14:38 +0000 (22:14 +0300)]
test: recovery_scrub: do not display 'repair' status on auto-repair deep-scrub
A new test: auto_repair_bluestore_tag.
Based on auto_repair_bluestore_basic. Sets auto-repair, starts a periodic
deep-scrub, then verifies that the PG state, while scrubbing, is 'scrubbing+deep'
and not 'scrubbing+deep+repair'.
Kefu Chai [Sat, 15 May 2021 04:18:07 +0000 (12:18 +0800)]
crimson/os: define SeastoreCollection as a class
to be consistent with the forward declaration. C++ standard does not
differentiate class from struct in this perspective. but Clang warngs at
seeing it. so silence the warning.
Patrick Donnelly [Tue, 18 May 2021 02:46:56 +0000 (19:46 -0700)]
Merge PR #41239 into master
* refs/pull/41239/head:
librbd: use uint64_t instead of size_t for SparseExtent::length
mgr/PyModule: use Py_ssize_t for the PyList index
os/bluestore: print size_t using %xz
client: print int64_t using PRId64
Reviewed-by: Neha Ojha <nojha@redhat.com> Reviewed-by: Igor Fedotov <ifedotov@suse.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>