mon/MonClient: reset authenticate_err in _reopen_session()
Otherwise, if "mon host" list has at least one unqualified IP address
without a port and both msgr1 and msgr2 are turned on, there is a race
affecting MonClient::authenticate().
For backwards compatibility reasons such an address is expanded into
two entries, each being treated as a separate monitor. For example,
"mon host = 1.2.3.4" generates the following initial monmap:
0: v1:1.2.3.4:6789/0
1: v2:1.2.3.4:3300/0
See MonMap::_add_ambiguous_addr() for details.
Then, the following can happen:
1. we connect to both endpoints and attempt to authenticate
2. authenticate() sets authenticate_err to 1 and sleeps on auth_cond
3. msgr1 authenticates first (i.e. it gets the final MAuth message
before msgr2 gets the monmap)
4. active_con is set to msgr1 connection, msgr2 connection is closed
as redundant
5. _finish_auth() sets authenticate_err to 0 and signals auth_cond,
but before either the monmap is received or authenticate() wakes
up, msgr1 connection is closed due to a network hiccup
6. ms_handle_reset() calls _reopen_session() which clears active_con
and again connects to both endpoints and attempts to authenticate
7. authenticate() wakes up, sees that there is no active_con and goes
back to sleep, but this time with authenticate_err == 0
8. msgr2 authenticates first but doesn't call _finish_auth() because
it is called only if authenticate_err == 1
9. active_con is set to msgr2 connection, msgr1 connection is closed
as redundant
10. authenticate() hangs on auth_cond until timeout defaulting to 5
minutes
The discrepancy between msgr1 and msgr2 plays a key role. For msgr1,
authentication is considered to be complete as soon as the final MAuth
message is received -- the monmap is not waited for. For msgr2,
authentication is considered to be complete only after the monmap is
received.
Avoid the race by setting authenticate_err to 1 in _reopen_session(),
so that _finish_auth() is called on/after every authentication attempt
instead of just the first one.
qa/workunits/mon/test_mon_config_key: use subprocess.run() instead of proc.communicate()
the loop of proc.communicate() on python3.6, where we always are able to
get something out of stdout and/or stderr PIPEs. and the `stdout` and
`stderr` keep growing until out of memory. and teuthology considers
the command crashed after a while.
AddCephTest and googletest's CMake scripts also call
find_package(Python3...), but they do not specify the required minor
version of Python3. by default, find_package(Python3...) picks the highest
available python3. so, if we have multiple python3 versions installed in the
system, and the highest python3 version is not the one specified by the
-DWITH_PYTHON3=3.x.y in the cmake command line, we might end up using a
different python3 for the ceph CLI. and even worse, the required python3
package might not available for the picked python3 interpreter found by
googletest. as, in general, only a single python3 has the full access to
prepackaged python3-* shipped by a GNU/Linux distro.
in this change, the configure_file() calls are rearranged to the top of
src/CMakeLists.txt, so they have less chance to use the "polluted" cmake
variable for their subvars.
this change address the test failure where we have, for instance, python3.8
installed on RHEL8/CentOS8, where python3.6 is the python3 which has
the access to the python3-* packages.
should leave it to do_cmake.sh to decide which python3 version to use,
there is case that we have multiple python3 installed, but only one of them
is fully supported by the distro, in the sense that python3-* packages
are packaged for that python3.
Merge pull request #40901 from tchaikov/wip-mgr-rook
cmake: let WITH_MGR_ROOK_CLIENT depend on WITH_MGR
Reviewed-by: Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Willem Jan Withagen <wjw@digiware.nl> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
cmake: let WITH_MGR_ROOK_CLIENT depend on WITH_MGR
it does not depend on WITH_MGR_DASHBOARD_FRONTEND, which is disabled by
default and is used for enable/disable the inclusion of dashboard
support. while root client is used by orchestrator. so it should depend
on WITH_MGR not WITH_MGR_DASHBOARD_FRONTEND.
read_extents in all except one case was used to read a known single extent
-- replace those users with read_extent. store-nbd uses read_extents as
intended, but other users will need to be able to deal with zero mappings.
monitoring/grafana: Remove erroneous elements in hosts-overview Grafana dashboard
The hosts-overview Grafana dashboard json file contains a repeated element, making
it invalid JSON. Some JSON parsers handle this. However, this prevents Jsonnet
from parsing the dashboard, which prevents the deployment of this dashboard via
Jsonnet.
Taken altogether with the `mgr.4100 v2:172.21.15.52:6800/30259] closed!`
debug this suggests that a call to `report()` occurred (likely from the
timer) but we were in the middle of the unatomic reconnect sequence:
Sage Weil [Fri, 16 Apr 2021 22:10:46 +0000 (18:10 -0400)]
Merge PR #40888 into master
* refs/pull/40888/head:
qa/tasks/cephadm: ignore --keep-logs failure
qa/tasks/cephadm: use yaml.dump_all()
qa/suites/rados/cephadm/smoke-*: use cephadm.wait_for_service
qa/tasks/cephadm: tear down clsuter before gathering logs
qa/suites/rados/cephadm/smoke-roleless: test rgw-ingress
mgr/cephadm: remove virtual_ip check during scheduling
mgr/orchestrator: orch ls: leave off virtual_ip prefixlen
qa/tasks/cephadm: add wait_for_service
qa/tasks/cephadm: allow skip_monitor_stack=true
qa/tasks/cephadm: do subst_vip for cephadm.shell and .apply
qa/tasks/vip: add vip task to allocate virtual IPs
qa/suites/rados/cephadm/smoke-roleless: add rgw-ingress test case
qa/tasks/cephadm: shell: take 'all-roles' or 'all-hosts'
qa/tasks/cephadm: let cephadm.shell take string or list
Merge pull request #40732 from neha-ojha/wip-50217
common/options/global.yaml.in: increase default value of bluestore_cache_trim_max_skip_pinned
Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Mark Nelson <mnelson@redhat.com> Reviewed-by: Adam Kupczyk <akupczyk@redhat.com> Reviewed-by: Igor Fedotov <ifedotov@suse
Sage Weil [Mon, 12 Apr 2021 14:17:17 +0000 (10:17 -0400)]
mgr/cephadm: allow mgr colo if mgr_standby_modules=false
If the standby mgr daemons' modules aren't listening on any ports, then we
can schedule multiple on the same host.
Note that this may make 'orch ps' output misleading, as ports will be
reported for each mgr instance, but only one of them will actually be
listening at any one time (if they are behaving, at least!). Treat a
mgr port check error as non-fatal.
Sage Weil [Tue, 13 Apr 2021 14:11:31 +0000 (10:11 -0400)]
mgr/MgrStandby: add mgr_standby_modules option
Add config option to control whether the standby modules are started.
Default to true (no change in behavior), but if set to false the standby
mgr modules don't do the redirect business.
common/options/global.yaml.in: increase default value of bluestore_cache_trim_max_skip_pinned
This option controls the rate of trimming of onodes and the earlier default of
64 has been seen to be too low for large clusters, leading to buildup of
onodes resulting in memory growth.
Increase the default value to 1000, since there are no known downsides to it.
rgw: add support to consume user given ca cert for vault
Currently RGW can authenticate with vault via SSL using system certs.
With this patch user can provide custom ca cert and location of the file
can be specified in ceph.conf like this :
rgw_crypt_require_ssl = <file path>
Fixes: https://tracker.ceph.com/issues/47776 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
common/options,doc: extract formatted desc into .yaml.in
* add a field named "fmt_desc", which is the description formatted using
reStructuredText. it is preserved as it is if it's different from the
desc or long_desc of an option. we can consolidate it with long_desc
in future, and use pretty printer which has minimal support for
reStructuredText for printing the formatted descriptions for a better
user experience of command line. but at this moment, fmt_desc has
only one consumer: the "ceph_confval" sphinx extension which extracts
and translate the options yaml file to reStructuredText, which is in
turn rendered by sphinx.
* remove unused options from the doc
- journal_queue_max_ops
- journal_queue_max_bytes