Merge pull request #43010 from mgfritch/cephadm-log-thread-ident
cephadm: add thread ident to log messages
Reviewed-by: Adam King <adking@redhat.com> Reviewed-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com> Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
This is a regression introduced by 9212420, when the host is using a
logical partition then lsblk reports that partition as a child from the
physical device.
That logical partition is prefixed by the `└─` character.
This leads the `raw list` subcommand to show the lsblk error on the stderr.
```
$ ceph-volume raw list
{}
stderr: lsblk: `-/dev/sda1: not a block device
```
Merge pull request #42919 from sebastian-philipp/cephadm-async-close-conn
mgr/cephadm: Also make ssh._reset_con async
Reviewed-by: Michael Fritch <mfritch@suse.com> Reviewed-by: Melissa Li <li.melissa.kun@gmail.com> Reviewed-by: Nizamudeen A <nia@redhat.com> Reviewed-by: Adam King <adking@redhat.com>
crimson/common: explicitly reraise handled signal in FatalSignal.
Over the current approach where we just reset the handler to
default and allow CPU to re-execute the segfaulting instruction,
the explicit `::reraise()` is:
1. immune to a race condition if muliple threads run into
troubles the same time;
2. easier to understand and similar to the classic OSD.
admin/doc-requirements: use funcparserlib from github
funcparserlib is pulled in as a dependency by blockdiag. the latest version of
funcparserlib available on pypi is v0.3.6 which is not compatible with
Python3.8.
in this change, funcparserlib is installed from github instead to
address the build failure like:
File "/home/docs/checkouts/readthedocs.org/user_builds/ceph/envs/41855/lib/python3.8/site-packages/sphinxcontrib/seqdiag.py", line 26, in <module>
import seqdiag.utils.rst.nodes
File "/home/docs/checkouts/readthedocs.org/user_builds/ceph/envs/41855/lib/python3.8/site-packages/seqdiag/utils/rst/nodes.py", line 16, in <module>
from blockdiag.utils.rst import nodes
File "/home/docs/checkouts/readthedocs.org/user_builds/ceph/envs/41855/lib/python3.8/site-packages/blockdiag/utils/rst/nodes.py", line 21, in <module>
import blockdiag.builder
File "/home/docs/checkouts/readthedocs.org/user_builds/ceph/envs/41855/lib/python3.8/site-packages/blockdiag/builder.py", line 16, in <module>
from blockdiag import parser
File "/home/docs/checkouts/readthedocs.org/user_builds/ceph/envs/41855/lib/python3.8/site-packages/blockdiag/parser.py", line 43, in <module>
from funcparserlib.parser import (a, finished, forward_decl, many, maybe, skip,
File "/home/docs/checkouts/readthedocs.org/user_builds/ceph/envs/41855/lib/python3.8/site-packages/funcparserlib/parser.py", line 123
except NoParseError, e:
^
SyntaxError: invalid syntax
once https://github.com/vlasovskikh/funcparserlib/issues/65 is
addressed, we should drop this change.
In SyncPointLogOperation::clear_earlier_sync_point(),
sync_point->log_entry->next_sync_point_entry was prematurely set to
nullptr in clear_earlier_sync_point(). It is in write op stage, but
next_sync_point_entry is used in writeback stage in
handle_flushed_sync_point().
handle_flushed_sync_point() may pass a nullptr
cause assert in m_work_queue.The solution is to move the statement
that set next_sync_point_entry to nullptr after it is used.
The `ceph-volume lvm migrate/new-db/new-wal` commands don't support
running on non systemd systems or within containers.
Like other ceph-volume commands (lvm activate/batch/zap or raw activate)
we also need to be able to use the --no-systemd flag.
On top of "profile rbd" permissions, "profile rbd-mirror-peer" also
allows getting rbd/mirror and setting rbd/mirror/peer/* config keys.
This is what "rbd mirror pool peer bootstrap create" does.
also avoid using `map[key] = val` for setting an item in map, as, if
he key does not exist in map, `map[key]` would have to create a value
using its default ctor, and then call the `operator=(bufferlist&&)` to
set it.
crimson/osd: fix Watch::connect() behaviour on reconnect.
It's perfectly legal for a client to reconnect to particular `Watch`
using different socket / `Connection` than original one. This shall
include proper handling of the watch timer which is currently broken
as, when reconnecting, we don't cancel the timer. This leaded to the
following crash at Sepia:
```
rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-09-02_07:44:51-rados-master-distro-basic-smithi/6372357$ less ./remote/smithi183/log/ceph-osd.4.log.gz
...
DEBUG 2021-09-02 08:10:45,462 [shard 0] osd - client_request(id=12, detail=m=[osd_op(client.5087.0:93 7.1e 7:7c7084bd:::repobj:head {watch reconnect cookie 94478891024832 gen 1} snapc 0={} ondisk+write+know
n_if_redirected e40) v8]): got obc lock
...
DEBUG 2021-09-02 08:10:45,462 [shard 0] osd - do_op_watch
INFO 2021-09-02 08:10:45,462 [shard 0] osd - found existing watch by client.5087
DEBUG 2021-09-02 08:10:45,462 [shard 0] osd - do_op_watch_subop_watch
INFO 2021-09-02 08:10:45,462 [shard 0] osd - found existing watch watch(cookie 94478891024832 30s 172.21.15.150:0/3544196211) by client.5087
...
INFO 2021-09-02 08:10:45,462 [shard 0] osd - op_effect: found existing watcher: 94478891024832,client.5087
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-7406-g9d30203c/rpm/el8/BUILD/ceph- 17.0.0-7406-g9d30203c/src/seastar/include/seastar/core/timer.hh:95: void seastar::timer<Clock>::arm_state(seastar::timer<Clock>::time_point, std::optional<typename Clock::duration>) [with Clock = seastar::l
owres_clock; seastar::timer<Clock>::time_point = std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long int, std::ratio<1, 1000> > >; typename Clock::duration = std::chrono::duration<long
int, std::ratio<1, 1000> >]: Assertion `!_armed' failed.
Aborting on shard 0.
Backtrace:
0# 0x000055CC052CF0B6 in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const&) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<6>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007FA58349FB20 in /lib64/libpthread.so.0
4# gsignal in /lib64/libc.so.6
5# abort in /lib64/libc.so.6
6# 0x00007FA581A98C89 in /lib64/libc.so.6
7# 0x00007FA581AA6A76 in /lib64/libc.so.6
8# 0x000055CC0BEEE9DD in ceph-osd
9# crimson::osd::Watch::connect(seastar::shared_ptr<crimson::net::Connection>, bool) in ceph-osd
10# 0x000055CC00B1D246 in ceph-osd
11# 0x000055CBFFEF01AE in ceph-osd
...
```
qa: Use osd_op_queue=wpq for tests using filestore backend.
Force a subset of tests that explicitly employ the filestore backend to
use WPQ scheduler. This is because mclock scheduler will not be
optimized for filestore.
Samuel Just [Wed, 1 Sep 2021 21:44:14 +0000 (14:44 -0700)]
crimson/os/seastore/lba_manager/btree/lba_btree: fix FTBFS on gcc 9
gcc-9 doesn't seem to consider interator nothrow move constructible with
the default move constructor implementation yielding the following build
failure:
m/el8/BUILD/ceph-17.0.0-7373-gfc349212/src/seastar/include/seastar/core/future.hh:584:58: error: static assertion failed: Types must be no-throw move constructible
584 | static_assert(std::is_nothrow_move_constructible<T>::value,
Joseph Sawaya [Wed, 18 Aug 2021 15:52:39 +0000 (11:52 -0400)]
qa/tasks/rook: add OSD creation to Rook QA
This commit adds OSD creation to the Rook QA tasks. The Rook task will
explicitly wait for the mgr to start and the CLI to work (instead of
implicitly doing so while waiting for 'ceph osd dump' to work).
Then it will do `ceph orch apply osd --all-available-devices` to create
OSDs on the rest of the PVs.