Tatjana Dehler [Tue, 28 Jul 2020 11:18:56 +0000 (13:18 +0200)]
mgr/dashboard: wait longer for health status to be cleared
Because of reasons the cluster needs more time to recover from
HEALTH_WARN while changes are made by `test_pool_update_metadata`.
Lets wait several times for the cluster status to be HEALTH_OK
again.
Fixes: https://tracker.ceph.com/issues/46573 Signed-off-by: Tatjana Dehler <tdehler@suse.com>
`ceph-volume lvm zap` command fails under certain conditions.
when passing `--osd-id` or `--osd-fsid` to `ceph-volume lvm zap` command
it tries to zap additionnal devices that have nothing to do with the osd
being zapped.
When calling `api.get_lvs()` in `ensure_associated_lvs()` we have to
pass the osd-id/osd-fsid information so only related devices are
returned by `get_lvs()` method
crimson/os/alienstore: always use fsid in bluestore
alienstore should not be stateful in this perspective, it should proxy
all acccess of fsid to bluestore.
there are couple issues in existing implementation:
* when mkfs, bluestore tries to generate a new osd_fsid if the specified
one is empty. but we explicitly pass the given uuid down to
AlienStore::mkfs() so the bluestore can use it. so we should pass it
down instad of storing it locally.
* when persisting superblock in OSD::mkfs(), superblock.osd_fsid() is
read from store->get_fsid(), if user specifies an empty uuid, we
should persist the generated uuid in the superblock.
in this change, all access to fsid is proxied to the underlying
bluestore.
osd sends a MOSDMarkMeDown message to monitor and waits for its ack
before timeout, so if we can stop osd before stopping mon, stop.sh can
return sooner without waiting until the timeout.
to avoid the attempts to connect an OSD which is bound to a v2
address to a v1 address of a mgr.
in general, osd is bound to both v1 and v2 addresses, but crimson
msgr does not support multiple bound address at the time of writing, so
to avoid the failures when trying to connect to incompatible addresses,
let's filter out them when connecting to monitor. this change
silence warnings like:
peer_addr_for_me v1:172.21.15.106:60008/0 type doesn't match myaddr
v2:0.0.0.0:6802/26710
to avoid the attempts to connect an OSD which is bound to a v2 address
to a v1 addrss of a monitor.
in general, osd is bound to both v1 and v2 addresses, but crimson msgr
does not support multiple bound address at the time of writing, so to
avoid the failures when trying to connect to incompatible addresses,
let's filter out them when connecting to monitor. this change silence
warnings like:
peer_addr_for_me v1:172.21.15.106:60008/0 type doesn't match myaddr
v2:0.0.0.0:6802/26710
This will crash the choose_acting() procedure as it will mistakenly
think that peer 3 should continue to perform asynchronous recovery
(e.g., due to num_objects_missing = 1) in contrast to fully
backfill-recovered.
While I did not dig into the real cause, there are a couple of
possible explanations of how num_objects can be off. I think that
if a roll forward or log replay could delete something twice, maybe
there would be an undercount. Or maybe something as simple as a
corruption.
Since _update_calc_stats() is going to fix num_objects_missing
for that peer anyway, let's make sure it always starts with a
clean state.
crimson/net: do not reset need_addr before learning it
because we don't bind both v1 and v2 addresses, when monitor returns a
v1 peer address, as the client side, crimson-osd just drops the
connection. but this failed attempt to learn the myaddr resets
`need_addr`. and this prevents crimson-osd from learning the v2 address
returned by monitor.
in this change, we reset need_addr only after it is learned from the
peer.
* add --flavor option, which is "default" by default, so one can, for
example, pass "--flavor crimson" to ceph-debug-docker
* extract $repo_url to avoid repeating the shared bits between centos
and debian derivatives envs.
ceph.spec.in: cull _FORTIFY_SOURCE macro from CXXFLAGS for seastar
seastar uses setjmp() and longjmp() to implement coroutine, but
longjmp() is defined as ____longjmp_chk() by GCC if _FORTIFY_SOURC is
defined. ____longjmp_chk() simply bails out with an error message if
the dest stack pointer is higher than the src stack pointer, or the dest
stack pointer is not in the sigaltstack. in the case of seastar, the dst
%sp is not necessarily higher than src stack pointer, and it's not
handling a signal for switching the thread context. that's why we have
the "longjmp causes uninitialized stack frame" error when running
crimson-osd on RHEL/CentOS 8 using the prebuilt rpm packages.
the optflags rpm macro adds -D_FORTIFY_SOURCE=2 to CFLAGS and CXXFLAGS,
so even seastar tries to pass -U_FORTIFY_SOURCE to GCC, there is chance
that cmake append CXXFLAGS at the end of the option list passed to GCC.
and this renders seastar's attempt to undefine _FORTIFY_SOURCE useless.
another way to address this issue is to undefine this macro in
seastar:src/core/thread.cc. but since seastar tries neutralize the macro
in its cmake script instead of source file, i assume they have their
considerations. let's drop it in the rpm recipe instead.
8987f94416f453829eae6dda08837ef5a42531c6 introduced the osd_lock for the
bench command. Taking the osd_lock in bench can lead to deadlocks, causing the
command to hang as seen in https://tracker.ceph.com/issues/43888.
Fixes: https://tracker.ceph.com/issues/43888 Signed-off-by: Adam Kupczyk <akupczyk@redhat.com> Signed-off-by: Neha Ojha <nojha@redhat.com>
Sebastian Wagner [Wed, 22 Jul 2020 20:12:00 +0000 (22:12 +0200)]
Merge pull request #35667 from jschmid1/cephadm_deterministic_simplescheduler
mgr/cephadm: rework --dry-run/previews
Reviewed-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com> Reviewed-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com> Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com> Reviewed-by: Stephan Müller <smueller@suse.com>
Jason Dillaman [Wed, 22 Jul 2020 15:25:56 +0000 (11:25 -0400)]
librbd: flush all queued object IO from simple scheduler
Normally IO is tracked via the AioCompletion's async_op but the
scheduler will "complete" writes while the IO might be still
executing. Therefore, prior to shutting down this dispatch layer
we need to wait for all IO to complete.
Fixes: https://tracker.ceph.com/issues/46668 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
* default to centos:8, as we've moved to centos:8 now
* do not assume that the base image is centos:7, use centos:8 if it is
specified.
* install python3-* packages for centos:8 and install python36-*
packages for centos:7. as el8 is now a python3 distro, and
centos:7 now has python36.
* s/screen/tmux/. because screen is now offered by EPEL, while tmux
is in BaseOS.
qa/keystone: pin python-openstackclient and osc-lib
keystone's dependencies are installed using its tox.ini,
which in turn uses a constraints file of
https://releases.openstack.org/constraints/upper/ussuri,
and it pins cliff to 3.1.0, which is not able to fulfill the requirement
of osc-lib 2.2.0. as it needs needs cliff>=3.2.0. per
https://releases.openstack.org/ussuri/, the latest osc-lib for
ussuri is 2.0.0. and osc-lib>=2.0.0 is required by
python-openstackclient 2.5.1, so let's use it instead of using the latest
one.
if we install cliff==3.1.0 along with python-openstackclient==5.2.1,
we will have following error, as `CommandManager.add_command_group()`
method was added to cliff in 3.2.0. see
https://opendev.org/openstack/cliff/commit/8477c4dbd0cf651b9b4ba4a4934de69d5942bfc2,
so cliff failed to work with the latest openstackclient, like:
2020-06-29T17:26:23.402 INFO:teuthology.orchestra.run.smithi039.stderr:'CommandManager' object has no attribute 'add_command_group'
2020-06-29T17:26:23.402 INFO:teuthology.orchestra.run.smithi039.stderr:Traceback (most recent call last):
2020-06-29T17:26:23.403 INFO:teuthology.orchestra.run.smithi039.stderr: File "/home/ubuntu/cephtest/keystone/.tox/venv/lib/python3.6/site-packages/cliff/app.py", line 264, in run
2020-06-29T17:26:23.403 INFO:teuthology.orchestra.run.smithi039.stderr: self.initialize_app(remainder)
2020-06-29T17:26:23.403 INFO:teuthology.orchestra.run.smithi039.stderr: File "/home/ubuntu/cephtest/keystone/.tox/venv/lib/python3.6/site-packages/openstackclient/shell.py", line 133, in
initialize_app
2020-06-29T17:26:23.403 INFO:teuthology.orchestra.run.smithi039.stderr: super(OpenStackShell, self).initialize_app(argv)
2020-06-29T17:26:23.403 INFO:teuthology.orchestra.run.smithi039.stderr: File "/home/ubuntu/cephtest/keystone/.tox/venv/lib/python3.6/site-packages/osc_lib/shell.py", line 442, in initialize_app
2020-06-29T17:26:23.404 INFO:teuthology.orchestra.run.smithi039.stderr: self._load_plugins()
2020-06-29T17:26:23.404 INFO:teuthology.orchestra.run.smithi039.stderr: File "/home/ubuntu/cephtest/keystone/.tox/venv/lib/python3.6/site-packages/openstackclient/shell.py", line 104, in
_load_plugins
2020-06-29T17:26:23.404 INFO:teuthology.orchestra.run.smithi039.stderr: self.command_manager.add_command_group(cmd_group)
2020-06-29T17:26:23.404 INFO:teuthology.orchestra.run.smithi039.stderr:AttributeError: 'CommandManager' object has no attribute 'add_command_group'
2020-06-29T17:26:23.404 INFO:teuthology.orchestra.run.smithi039.stderr:Traceback (most recent call last):
2020-06-29T17:26:23.405 INFO:teuthology.orchestra.run.smithi039.stderr: File "/home/ubuntu/cephtest/keystone/.tox/venv/lib/python3.6/site-packages/osc_lib/shell.py", line 134, in run
2020-06-29T17:26:23.405 INFO:teuthology.orchestra.run.smithi039.stderr: ret_val = super(OpenStackShell, self).run(argv)
2020-06-29T17:26:23.405 INFO:teuthology.orchestra.run.smithi039.stderr: File "/home/ubuntu/cephtest/keystone/.tox/venv/lib/python3.6/site-packages/cliff/app.py", line 264, in run
2020-06-29T17:26:23.405 INFO:teuthology.orchestra.run.smithi039.stderr: self.initialize_app(remainder)
2020-06-29T17:26:23.405 INFO:teuthology.orchestra.run.smithi039.stderr: File "/home/ubuntu/cephtest/keystone/.tox/venv/lib/python3.6/site-packages/openstackclient/shell.py", line 133, in
initialize_app
2020-06-29T17:26:23.405 INFO:teuthology.orchestra.run.smithi039.stderr: super(OpenStackShell, self).initialize_app(argv)
2020-06-29T17:26:23.406 INFO:teuthology.orchestra.run.smithi039.stderr: File "/home/ubuntu/cephtest/keystone/.tox/venv/lib/python3.6/site-packages/osc_lib/shell.py", line 442, in initialize_app
2020-06-29T17:26:23.406 INFO:teuthology.orchestra.run.smithi039.stderr: self._load_plugins()
2020-06-29T17:26:23.406 INFO:teuthology.orchestra.run.smithi039.stderr: File "/home/ubuntu/cephtest/keystone/.tox/venv/lib/python3.6/site-packages/openstackclient/shell.py", line 104, in
_load_plugins
2020-06-29T17:26:23.406 INFO:teuthology.orchestra.run.smithi039.stderr: self.command_manager.add_command_group(cmd_group)
2020-06-29T17:26:23.406 INFO:teuthology.orchestra.run.smithi039.stderr:AttributeError: 'CommandManager' object has no attribute 'add_command_group'
in this change the openstackclients version is pin'ed to the
latest stable of 5.2.1. will have a separated PR to bump up
the cliff version on teuthology side.
replace boost::bind() with std::bind(), the use of the Boost
placeholders in the default namespace is deprecated. let's take
this opportunity to use its counterpart in standard library