AddCephTest and googletest's CMake scripts also call
find_package(Python3...), but they do not specify the required minor
version of Python3. by default, find_package(Python3...) picks the highest
available python3. so, if we have multiple python3 versions installed in the
system, and the highest python3 version is not the one specified by the
-DWITH_PYTHON3=3.x.y in the cmake command line, we might end up using a
different python3 for the ceph CLI. and even worse, the required python3
package might not available for the picked python3 interpreter found by
googletest. as, in general, only a single python3 has the full access to
prepackaged python3-* shipped by a GNU/Linux distro.
in this change, the configure_file() calls are rearranged to the top of
src/CMakeLists.txt, so they have less chance to use the "polluted" cmake
variable for their subvars.
this change address the test failure where we have, for instance, python3.8
installed on RHEL8/CentOS8, where python3.6 is the python3 which has
the access to the python3-* packages.
should leave it to do_cmake.sh to decide which python3 version to use,
there is case that we have multiple python3 installed, but only one of them
is fully supported by the distro, in the sense that python3-* packages
are packaged for that python3.
Merge pull request #40901 from tchaikov/wip-mgr-rook
cmake: let WITH_MGR_ROOK_CLIENT depend on WITH_MGR
Reviewed-by: Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Willem Jan Withagen <wjw@digiware.nl> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
cmake: let WITH_MGR_ROOK_CLIENT depend on WITH_MGR
it does not depend on WITH_MGR_DASHBOARD_FRONTEND, which is disabled by
default and is used for enable/disable the inclusion of dashboard
support. while root client is used by orchestrator. so it should depend
on WITH_MGR not WITH_MGR_DASHBOARD_FRONTEND.
Sage Weil [Fri, 16 Apr 2021 22:10:46 +0000 (18:10 -0400)]
Merge PR #40888 into master
* refs/pull/40888/head:
qa/tasks/cephadm: ignore --keep-logs failure
qa/tasks/cephadm: use yaml.dump_all()
qa/suites/rados/cephadm/smoke-*: use cephadm.wait_for_service
qa/tasks/cephadm: tear down clsuter before gathering logs
qa/suites/rados/cephadm/smoke-roleless: test rgw-ingress
mgr/cephadm: remove virtual_ip check during scheduling
mgr/orchestrator: orch ls: leave off virtual_ip prefixlen
qa/tasks/cephadm: add wait_for_service
qa/tasks/cephadm: allow skip_monitor_stack=true
qa/tasks/cephadm: do subst_vip for cephadm.shell and .apply
qa/tasks/vip: add vip task to allocate virtual IPs
qa/suites/rados/cephadm/smoke-roleless: add rgw-ingress test case
qa/tasks/cephadm: shell: take 'all-roles' or 'all-hosts'
qa/tasks/cephadm: let cephadm.shell take string or list
Merge pull request #40732 from neha-ojha/wip-50217
common/options/global.yaml.in: increase default value of bluestore_cache_trim_max_skip_pinned
Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Mark Nelson <mnelson@redhat.com> Reviewed-by: Adam Kupczyk <akupczyk@redhat.com> Reviewed-by: Igor Fedotov <ifedotov@suse
common/options/global.yaml.in: increase default value of bluestore_cache_trim_max_skip_pinned
This option controls the rate of trimming of onodes and the earlier default of
64 has been seen to be too low for large clusters, leading to buildup of
onodes resulting in memory growth.
Increase the default value to 1000, since there are no known downsides to it.
common/options,doc: extract formatted desc into .yaml.in
* add a field named "fmt_desc", which is the description formatted using
reStructuredText. it is preserved as it is if it's different from the
desc or long_desc of an option. we can consolidate it with long_desc
in future, and use pretty printer which has minimal support for
reStructuredText for printing the formatted descriptions for a better
user experience of command line. but at this moment, fmt_desc has
only one consumer: the "ceph_confval" sphinx extension which extracts
and translate the options yaml file to reStructuredText, which is in
turn rendered by sphinx.
* remove unused options from the doc
- journal_queue_max_ops
- journal_queue_max_bytes
Commit 5505fc0051a3 ("common: generate legacy_config_opts.h from
.yaml.in files") inadvertently reverted a change of a default value by
adding a second "default" key with the old value. This was corrected
in commit 75e07f8638ef ("common/options/global: correct default of
auth_mon_ticket_ttl"), but highlights that mis-merging a yaml file
is rather easy.
To prevent this happening again, fail the build if duplicate keys
exist in any of src/common/options/*.yaml.in files.
Sage Weil [Thu, 15 Apr 2021 22:55:00 +0000 (17:55 -0500)]
qa/tasks/cephadm: tear down clsuter before gathering logs
We dont' always stop all services, because teuthology doesn't know about
things it didn't start. Use rm-cluster to tear things down, but do not
remove the logs themselves. After we get logs, we'll clean up completely.
Sage Weil [Fri, 16 Apr 2021 12:14:28 +0000 (08:14 -0400)]
Merge PR #40870 into master
* refs/pull/40870/head:
auth/cephx: make KeyServer::build_session_auth_info() less confusing
auth/cephx: cap ticket validity by expiration of "next" key
auth/cephx: drop redundant KeyServerData::get_service_secret() overload
Reviewed-by: Guillaume Abrioux <gabrioux@redhat.com> Reviewed-by: Adam King <adking@redhat.com> Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com> Reviewed-by: Michael Fritch <mfritch@suse.com>
Patrick Donnelly [Fri, 16 Apr 2021 04:06:31 +0000 (21:06 -0700)]
Merge PR #40539 into master
* refs/pull/40539/head:
cephfs-top: set the cursor to be invisible
cephfs-top: self-adapt the display according the window size
cephfs-top: use the default window object from curses.wrapper()
cephfs-top: improve the output
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
Patrick Donnelly [Fri, 16 Apr 2021 04:03:19 +0000 (21:03 -0700)]
Merge PR #39660 into master
* refs/pull/39660/head:
qa: Update the mdsmap schema in mgr/dashboard/test_health.py
doc: add lsflags command to Administrative Commands document
qa: test fs lsflags command
mon: add command to print fs flags
mds: print each flag value
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
core: fix compiler warning due to difference in order of struct memebers
Clang on FreeBSD reports:
```
Building CXX object src/global/CMakeFiles/libglobal_objs.dir/pidfile.cc.o
../src/global/pidfile.cc:170:5: warning: ISO C++ requires field designators to be specified in declaration order; field 'l_whence' will be initialized after field 'l_start' [-Wreorder-init-list]
.l_start = 0,
^~~~~~~~~~~~
../src/global/pidfile.cc:169:17: note: previous initialization for field 'l_whence' is here
.l_whence = SEEK_SET,
^~~~~~~~
```
And Linux and BSD have different orders in their `struct flock`
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
auth/cephx: make KeyServer::build_session_auth_info() less confusing
The second KeyServer::build_session_auth_info() overload is used only
by the monitor, for mon <-> mon authentication. The monitor passes in
service_secret (mon secret) and secret_id (-1). The TTL is irrelevant
because there is no rotation.
However the signature doesn't make it obvious. Clarify that
service_secret and secret_id are input parameters and info is the only
output parameter.
to silence the health warning of "mons are allowing insecure global_id
reclaim", which prevents the cluster from being active+clean. couple
tests are expecting a warning free cluster before they starts.
as this option is enabled by default for appeasing the old clients, but when it
comes to most of upstream testing, we can just disable it.
auth/cephx: cap ticket validity by expiration of "next" key
If auth_mon_ticket_ttl is increased by several times as done in
commit 522a52e6c258 ("auth/cephx: rotate auth tickets less often"),
active clients eventually get stuck because the monitor sends out an
auth ticket with a bogus validity. The ticket is secured with the
"current" secret that is scheduled to expire according to the old TTL,
but the validity of the ticket is set to the new TTL. As a result,
the client simply doesn't attempt to renew, letting the secrets rotate
potentially more than once. When that happens, the client first hits
auth authorizer errors as it tries to renew service tickets and when
it finally gets to renewing the auth ticket, it hits the insecure
global_id reclaim wall.
Cap TTL by expiration of "next" key -- the "current" key may be
milliseconds away from expiration and still be used, legitimately.
Do it in KeyServerData alongside key rotation code and propagate the
capped TTL to the upper layer.
as the left-hand operator is promoted to off_t which is a signed
integer, while rgw_max_chunk_size will be an unsigned after the
yaml-to-cxx migration. so let's cast it to `off_t` before comparing
them.
the same applies to rgw_copy_obj_progress_every_bytes.
common: generate legacy_config_opts.h from .yaml.in files
* add a setting named "with_legacy" to .yaml.in files, so
each option with a true "with_legacy" will have an entry
in legacy_config_opts.h.
* preserve the comments from legacy_config_opts.h to .yaml.in,
some of them are solely for developers, but some of them are
good reading for users as well. we can use them for "desc"
field in a follow-up change.
* move common/legacy_config_opts.h to common/options/legacy_config_opts.h
as legacy_config_opts.h is "closer" to the options directory
than other sources files under src/common.
* update y2c.py to generate separate .h files which are in turn
included by legacy_config_opts.h
* add a target named "legacy-option-headers", and let
some targets depend on it so that these headers generated by
y2c.py can be generated before the .cc files including them
are compiled.
test/cls_cas: allow multi hobjects tracked by cls_cas
in d2737fd41a146e8efe3162cdc39845226bd5a756, we started to use multiset
for tracking the references of hobject for snapshot support. as the same
hobject maps to multiple snapshots. and we don't want to consider
different snapshots as the same entry tracked by cls_cas.
but cls_cas.dup_get() still tries to verify that the `get` operation
is able to dedup the same referenced "source". but this does not apply
to "by_object" trunk ref type anymore.
since we cannot check/choose the chunk ref type used by OSD from the
client of the cls_cas, in this change, cls_cas.dup_get() is updated
to adapt the change solely for "by_object". otherwise we could skip
this test for "by_object" type and/or define another test for other
chunk ref types.
J. Eric Ivancich [Wed, 14 Apr 2021 17:55:22 +0000 (13:55 -0400)]
rgw: during reshard lock contention, adjust logging
When RGW fails to get a lock on a reshard log, we log it in such a way
that it looks like an error. Instead we'll make sure that the log
message is informational.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>