Sage Weil [Thu, 6 May 2021 22:47:27 +0000 (18:47 -0400)]
mgr/nfs: take --ingress argument to 'nfs cluster create'
It is likely that the rook/k8s variation of ingress will not take a
virtual_ip argument. We want to make sure that ingress yes/no can be
specified independent of the virtual_ip.
Sage Weil [Thu, 6 May 2021 14:57:46 +0000 (10:57 -0400)]
cephadm: --stop-signal=SIGTERM
haproxy's container image tells docker|podman to send SIGUSR1 for a "clean"
shutdown. For NFS, the connections never close, so we will always hit the
podman|docker 10s timeout and get a SIGKILL. That, in turn, causes haproxy
to exit with 143, and puts the systemd unit in a failed state.
This highlights a general problem(?) with stopping containers: if they don't
do it quickly then we'll end up in this error state. We don't directly
address that here.
Avoid this problem by always stopping containers with SIGTERM. In the
haproxy case, that means an immediate shutdown (no graceful drain of
open connections). In theory we could do this only for haproxy with
NFS, but we can easily imagine RGW connections that don't close in 10s
either, and we don't want containers exiting in error state--we just
want the proxy to stop quickly.
Sage Weil [Mon, 3 May 2021 15:48:45 +0000 (11:48 -0400)]
mgr/orchestrator: default nfs pool, namespaces
Apply nfs default pool (currently 'nfs-ganesha'), and default the
namespace to the service_id.
There is no practical reason for users to ever need to change this, and
requiring them to provide this informaiton at config/apply time just
complicates life.
Sage Weil [Wed, 5 May 2021 16:59:44 +0000 (12:59 -0400)]
mgr/nfs: remove 'nfs cluster update'
This command is very awkward to implement unless all service spec fields
are always required. That will soon mean both the placement *and*
virtual_ip (if any), making it much less useful for a human to make use
of.
Instead, let them update yaml, or adjust the nfs and/or ingress specs
directly. I don't think this command is needed.
crimson/osd: fix assertion failure in OpSequencer.
`OpSequencer` assumes that ID of a previous client request
is always lower than ID of current one. This is reflected
by the assertion in `OpSequencer::start_op()`. It triggered
the following failure [1] in Teuthology:
```
DEBUG 2021-05-07 08:01:41,227 [shard 0] osd - client_request(id=1, detail=osd_op(client.4171.0:1 2.2 2.7c339972 (undecoded) ondisk+retry+read+known_if_redirected e29) v8) same_interval_since: 31
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-3910-g1b18e076/rpm/el8/BUILD/ceph- 17.0.0-3910-g1b18e076/src/crimson/osd/osd_operation_sequencer.h:38: seastar::futurize_t<Result> crimson::osd::OpSequencer::start_op(HandleT&, uint64_t, uint64_t, FuncT&&) [with HandleT = crimson::PipelineHa
ndle; FuncT = crimson::interruptible::interruptor<InterruptCond>::wrap_function(Func&&) [with Func = crimson::osd::ClientRequest::start()::<lambda()> mutable::<lambda(Ref<crimson::osd::PG>)> mutable::<lambd
a()> mutable::<lambda()>; InterruptCond = crimson::osd::IOInterruptCondition]::<lambda()>; Result = crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<>
>; seastar::futurize_t<Result> = crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<> >; uint64_t = long unsigned int]: Assertion `prev_op < this_op' fai
led.
Aborting on shard 0.
Backtrace:
Segmentation fault.
Backtrace:
0# 0x00005592B028932F in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<6>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007F57B72E7B20 in /lib64/libpthread.so.0
4# gsignal in /lib64/libc.so.6
5# abort in /lib64/libc.so.6
6# 0x00007F57B58E2B09 in /lib64/libc.so.6
7# 0x00007F57B58F0DE6 in /lib64/libc.so.6
8# 0x00005592ABB8484D in ceph-osd
9# 0x00005592ABB8ACB3 in ceph-osd
10# seastar::continuation<seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >, seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>, seastar::future<boost::intrusive_ptr<crimson::osd::PG> >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>, seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >(seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >&&, seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>&, seastar::future_state<boost::intrusive_ptr<crimson::osd::PG> >&&)#1}, boost::intrusive_ptr<crimson::osd::PG> >::run_and_dispose() in ceph-osd
11# 0x00005592B357F88F in ceph-osd
12# 0x00005592B3584DD0 in ceph-osd
```
Crash analysis resulted in two observations:
1. during the request execution the acting set got
changed, the request was interrupted and a try
to re-execute it emerged;
2. the interrupted request was the very first client
request the OSD has ever seen.
Code analysis showed a problem in how `ClientRequest`
establishes `prev_op_id`: although supposed to be performed
only once for a request, it can get executed twice but only
for the very first request `OpSequencer` saw.
```cpp
void ClientRequest::may_set_prev_op()
{
// set prev_op_id if it's not set yet
if (__builtin_expect(prev_op_id == 0, true)) {
prev_op_id = sequencer.get_last_issued();
}
}
```
Unfortunately, `0` isn't a distincted value that cannot
be returned by `get_last_issued()`:
// the id of last op which is issued
uint64_t last_issued = 0;
```
As a result, `OpSequencer` returned on the second call
a new value (actually `this_op`) violating the assertion.
The commit fixes the problem by switching from a designated
value to `std::optional`.
sunilkumarn417 [Wed, 19 May 2021 10:02:45 +0000 (15:32 +0530)]
qa/tasks/cephadm: Include bootstrap registry options for downstream
- registry-url, registry-username and registry-password bootstrap options are
supported now. This is needed to access monitoring service container images.
- usage of RHEL distribution based cephadm in download_cephadm task.
Kefu Chai [Mon, 24 May 2021 02:21:52 +0000 (10:21 +0800)]
vstart.sh: specify mon_data_avail_crit in ceph.conf
ceph-mon consumes this option when it boots, and exits if the ratio
of free space is lower than the specified number, which is 5% by
default. but we use `ceph -c $conf_fn config assimilate-conf -i -`
to absorb these option after monitor starts. so, without this change,
the default value of mon_data_avail_crit is always used, if machine
has lower ratio of free space on the partition where mon store is
located, ceph-mon just exists with the error message like:
2021-05-24T01:53:14.644+0000 7ff64961e580 -1 error: monitor data
filesystem reached concerning levels of available storage space
(available: 4% 17 GiB)
after this change, the option is written in ceph.conf, and can be
read by ceph-mon when it boots. so the overriden value of 1% has
the chance to take effect. this helps to address some test failures
found in our "make check" runs performed by jenkins on machines whose
disk space is enough for completing the test, but its ratio of free
space is lower than 5%.
Kefu Chai [Thu, 20 May 2021 02:20:50 +0000 (10:20 +0800)]
common/cmdparse: use string_view for the key
for better usability and performance. as the main use case of
cmd_getval() and cmd_putval() only uses a literal string for the key.
it's a waste to build a std::string out of it and throw it away after
looking the cmdmap with it.
rbd-mirror: fix segfault in snapshot replayer shutdown
If an error arises in the init flow of the snapshot replayer and the
function returns before the call on `register_local_update_watcher`
the value of `m_update_watch_ctx` will not be initialized. Therefore,
on the shutdown phase, the replayer will try to free this pointer
and segfault.
This commit fixes this issue by setting `m_update_watch_ctx` to
`nullptr`.
Fixes: https://tracker.ceph.com/issues/50931 Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
Sage Weil [Fri, 21 May 2021 13:51:47 +0000 (09:51 -0400)]
qa/tasks/cephadm.conf: log_to_journald=false
For teuthology runs, we set log_to_stderr=false, so that we only see
derr-level events in the container log (and teuthology.log). Now that we
log directly to journald, set log_to_journald=false too, so that we don't
see level-20 logs in teuthology.log.
Kefu Chai [Fri, 21 May 2021 12:10:38 +0000 (20:10 +0800)]
crimson/osd: disable allow_guessing when parsing command line options
we pass "--id <n>" to ceph-osd for specifying the osd id, but seastar
app template also provides an option of "--idle-poll-time-us arg".
boost::program_option::command_line_parser() uses default_style when
parsing options. and default_style includes allow_guessing, which in
turn matches partial option as well, so "--id" matches with "--idle"
when we are trying to figure out which options are consumed by seastar
app template, and which are not. see
https://www.boost.org/doc/libs/1_76_0/doc/html/boost/program_options/command_line_style/style_t.html
so, in this change, stype is specified explicitly, and "allow_guessing"
is removed from the "default_style" before being passed to style(), so
that only the full option name are matched.
Kefu Chai [Fri, 21 May 2021 04:10:50 +0000 (04:10 +0000)]
os/bluestore/bluestore_tool: use boost::filesystem as an alternative
the libstdc++ shipped with GCC 7.5 does not have good support of
std::filesystem, among other things, it does not offer
std::filesystem::weakly_canonical(). but boost::filesystem does.
and boost::filesystem is compatible with std::filesystem to some
degree. so let's use it if <filesystem> is not available, we can
take it as a signal that std::filesystem is not quite ready yet.