Patrick Donnelly [Tue, 18 May 2021 02:46:56 +0000 (19:46 -0700)]
Merge PR #41239 into master
* refs/pull/41239/head:
librbd: use uint64_t instead of size_t for SparseExtent::length
mgr/PyModule: use Py_ssize_t for the PyList index
os/bluestore: print size_t using %xz
client: print int64_t using PRId64
Reviewed-by: Neha Ojha <nojha@redhat.com> Reviewed-by: Igor Fedotov <ifedotov@suse.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Kefu Chai [Mon, 17 May 2021 17:13:03 +0000 (01:13 +0800)]
crimson/os: use compile-time validation
libfmt does compile-time format argument validation of the format string
and the argument when the the user-defined literal is used. but the
downside is that the formatter materialize the whole formatted string
into a std::string, before printing them argument into seastar's log buffer
inserter. presumably, the inserter would be more efficient in
comparision to the pre-format approach. so this validation is only
enabled for non NDEBUG build. so it is able to help us to identify
errors like
Patrick Donnelly [Mon, 17 May 2021 15:38:41 +0000 (08:38 -0700)]
Merge PR #41314 into master
* refs/pull/41314/head:
qa/tasks/nfs: add test to check if cmds fail on not passing required arguments
mgr/nfs: fix flake8 missing whitespace around parameter equals error
mgr/nfs: annotate _cmd_nfs_* methods return value
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
crimson/monc: fix send_message() racing with reopen_session().
The `send_message()` method is a high-level facility for
communicating with a monitor. If there is an active conn
available, it sends the message immediately; otherwise
the message is queued. This method assumes the queue is
already drained if the connection is available.
`active_con` is managed by `reopen_session()` where it's
first cleared and then reset after finding new alive mon.
This is followed by draining the `pending_messages` queue
which happens in `on_session_opened()` after the `MAuth`
exchange is finished.
Unfortunately, the path from the `active_con` clearing
to draining the queue is long and divided into multiple
continuations which results in lack of atomicity. When
e.g. `run_command()` interleaves the stages, following
crash happens:
```
INFO 2021-05-07 08:13:43,914 [shard 0] monc - do_auth_single: mon v2:172.21.15.82:6805/34166 => v2:172.21.15.82:3300/0 returns auth_reply(proto 2 0 (0) Success) v1: 0
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-3910-g1b18e076/rpm/el8/BUILD/ceph-17.0.0-3910-g1b18e076/src/crimson/mon/MonClient.cc:1034: seastar::future<> crimson::mon::Client::send_message(MessageRef): Assertion `pending_messages.empty()' failed.
Aborting on shard 0.
Backtrace:
0# 0x000055CDE6DB532F in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<6>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007FC1BF20BB20 in /lib64/libpthread.so.0
4# gsignal in /lib64/libc.so.6
5# abort in /lib64/libc.so.6
6# 0x00007FC1BD806B09 in /lib64/libc.so.6
7# 0x00007FC1BD814DE6 in /lib64/libc.so.6
8# crimson::mon::Client::send_message(boost::intrusive_ptr<Message>) in ceph-osd
9# crimson::mon::Client::renew_subs() in ceph-osd
10# 0x000055CDE764FB0B in ceph-osd
11# 0x000055CDE10457F0 in ceph-osd
12# 0x000055CDEA0AB88F in ceph-osd
13# 0x000055CDEA0B0DD0 in ceph-osd
14# 0x000055CDEA2689FB in ceph-osd
15# 0x000055CDE9DC0EDA in ceph-osd
16# main in ceph-osd
17# __libc_start_main in /lib64/libc.so.6
18# _start in ceph-osd
```
The problem caused following failure at Sepia:
http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-07_07:41:02-rados-master-distro-basic-smithi/6104549
Xuehan Xu [Thu, 11 Mar 2021 05:21:31 +0000 (13:21 +0800)]
crimson/osd: optimize crimson-osd's client requests process parallelism
Make client requests go to the concurrent pipeline stage "wait_repop" once they
are "submitted" to the underlying objectstore, which means their on-disk order
is guaranteed, so that successive client requests can go into the "process"
pipeline stage.
WriteLogCacheEntry gets appended to persist_log_entries before
write_data_pos is updated with the actual media offset. Because
push_back() makes a copy, the updated write_data_pos value never
makes it to media, making recovery impossible.
Ilya Dryomov [Thu, 13 May 2021 11:11:57 +0000 (13:11 +0200)]
librbd/cache/pwl/ssd: actually use first_{valid,free}_entry on recovery
first_valid_entry and first_free_entry pointers are read from media
but not actually used: both m_first_valid_entry and m_first_free_entry
get assigned 0 (or garbage). next_log_pos gets the same value as well
meaning that not only no recovery is attempted but the cache also gets
corrupted because DATA_RING_BUFFER_OFFSET is not applied.
Ilya Dryomov [Sat, 8 May 2021 08:24:37 +0000 (10:24 +0200)]
librbd/cache/pwl/ssd: don't count log entries
In ssd mode log entries are variable size. Attempting to count and
impose watermarks on the number of log entries is bogus because the
total number of entries it would take to fill the cache to capacity
is also variable and can't be precisely estimated.
All parameters are integers and none of them are (in-)out, so don't
take them by reference. Additionally num_lanes, num_log_entries and
num_unpublished_reserves don't need to be 64-bit as their respective
fields in AbstractWriteLog are 32-bit.
* faster compilation, so the cmake generator can process .yaml.in files
in parallel.
* allow daemons to include a subset of options which it is interested
in.
* better maintainability. by grouping options in different .yaml.in
files, developers understand who are the consumers of an option.
in this change, options only read by mgr are extracted into mgr.yaml.in,
and options only read by osd are extracted into osd.yaml.in.
so all options in mgr.yaml.in should have "services: mgr" in their
definition by default. the ones in osd.yaml.in have "services: osd".
in the case where options are consumed by multiple services or tools,
the option should add "common" to its "services" if it is supposed to be
consumed by a tool, or "mon" if it is read by monitor as well.
but it takes time to audit all the options, so only part of them are
processed.
the list of .yaml.in file might change over time before we finish the
.yaml.in file split, but cmake would fail to figure out the list without
rerunning "cmake", so when a new .yaml.in file is introduced, developer
might end up with a FTBFS after pulling the change from remote repo.
so, we need to revert the file(GLOB ..) change, until all .yaml.in file
are created.
Kefu Chai [Thu, 13 May 2021 16:02:47 +0000 (00:02 +0800)]
crimson/net/Socket: do not reset FixedCPUServerSocket::shutdown_gate
the copy constructor of seastar::gate is deleted explicitly. so we
cannot reset FixedCPUServerSocket::shutdown_gate by assigning a new
seastar::gate to it.
since we don't reuse a FixedCPUServerSocket after calling
FixedCPUServerSocket::destroy(), it's safe to leave a closed gate after
calling FixedCPUServerSocket::reset()
Sage Weil [Thu, 13 May 2021 13:57:14 +0000 (09:57 -0400)]
Merge PR #39550 into master
* refs/pull/39550/head:
mgr/cephadm: induce retune of osd memory on osd creation
qa/tasks/cephadm.conf: autotune osd memory by default
mgr/cephadm: do not autotune when _no_autotune_memory label is present
mgr/cephadm: autotune osd memory
common: add osd_memory_target_autotune
mgr/cephadm: report memory usage, request (limit) in 'orch ps'
doc/cephadm/host-management: document _admin group
mgr/orchestrator: fix help formatting