Merge pull request #40732 from neha-ojha/wip-50217
common/options/global.yaml.in: increase default value of bluestore_cache_trim_max_skip_pinned
Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Mark Nelson <mnelson@redhat.com> Reviewed-by: Adam Kupczyk <akupczyk@redhat.com> Reviewed-by: Igor Fedotov <ifedotov@suse
common/options/global.yaml.in: increase default value of bluestore_cache_trim_max_skip_pinned
This option controls the rate of trimming of onodes and the earlier default of
64 has been seen to be too low for large clusters, leading to buildup of
onodes resulting in memory growth.
Increase the default value to 1000, since there are no known downsides to it.
Commit 5505fc0051a3 ("common: generate legacy_config_opts.h from
.yaml.in files") inadvertently reverted a change of a default value by
adding a second "default" key with the old value. This was corrected
in commit 75e07f8638ef ("common/options/global: correct default of
auth_mon_ticket_ttl"), but highlights that mis-merging a yaml file
is rather easy.
To prevent this happening again, fail the build if duplicate keys
exist in any of src/common/options/*.yaml.in files.
Sage Weil [Fri, 16 Apr 2021 12:14:28 +0000 (08:14 -0400)]
Merge PR #40870 into master
* refs/pull/40870/head:
auth/cephx: make KeyServer::build_session_auth_info() less confusing
auth/cephx: cap ticket validity by expiration of "next" key
auth/cephx: drop redundant KeyServerData::get_service_secret() overload
Reviewed-by: Guillaume Abrioux <gabrioux@redhat.com> Reviewed-by: Adam King <adking@redhat.com> Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com> Reviewed-by: Michael Fritch <mfritch@suse.com>
Patrick Donnelly [Fri, 16 Apr 2021 04:06:31 +0000 (21:06 -0700)]
Merge PR #40539 into master
* refs/pull/40539/head:
cephfs-top: set the cursor to be invisible
cephfs-top: self-adapt the display according the window size
cephfs-top: use the default window object from curses.wrapper()
cephfs-top: improve the output
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
Patrick Donnelly [Fri, 16 Apr 2021 04:03:19 +0000 (21:03 -0700)]
Merge PR #39660 into master
* refs/pull/39660/head:
qa: Update the mdsmap schema in mgr/dashboard/test_health.py
doc: add lsflags command to Administrative Commands document
qa: test fs lsflags command
mon: add command to print fs flags
mds: print each flag value
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
auth/cephx: make KeyServer::build_session_auth_info() less confusing
The second KeyServer::build_session_auth_info() overload is used only
by the monitor, for mon <-> mon authentication. The monitor passes in
service_secret (mon secret) and secret_id (-1). The TTL is irrelevant
because there is no rotation.
However the signature doesn't make it obvious. Clarify that
service_secret and secret_id are input parameters and info is the only
output parameter.
to silence the health warning of "mons are allowing insecure global_id
reclaim", which prevents the cluster from being active+clean. couple
tests are expecting a warning free cluster before they starts.
as this option is enabled by default for appeasing the old clients, but when it
comes to most of upstream testing, we can just disable it.
auth/cephx: cap ticket validity by expiration of "next" key
If auth_mon_ticket_ttl is increased by several times as done in
commit 522a52e6c258 ("auth/cephx: rotate auth tickets less often"),
active clients eventually get stuck because the monitor sends out an
auth ticket with a bogus validity. The ticket is secured with the
"current" secret that is scheduled to expire according to the old TTL,
but the validity of the ticket is set to the new TTL. As a result,
the client simply doesn't attempt to renew, letting the secrets rotate
potentially more than once. When that happens, the client first hits
auth authorizer errors as it tries to renew service tickets and when
it finally gets to renewing the auth ticket, it hits the insecure
global_id reclaim wall.
Cap TTL by expiration of "next" key -- the "current" key may be
milliseconds away from expiration and still be used, legitimately.
Do it in KeyServerData alongside key rotation code and propagate the
capped TTL to the upper layer.
as the left-hand operator is promoted to off_t which is a signed
integer, while rgw_max_chunk_size will be an unsigned after the
yaml-to-cxx migration. so let's cast it to `off_t` before comparing
them.
the same applies to rgw_copy_obj_progress_every_bytes.
common: generate legacy_config_opts.h from .yaml.in files
* add a setting named "with_legacy" to .yaml.in files, so
each option with a true "with_legacy" will have an entry
in legacy_config_opts.h.
* preserve the comments from legacy_config_opts.h to .yaml.in,
some of them are solely for developers, but some of them are
good reading for users as well. we can use them for "desc"
field in a follow-up change.
* move common/legacy_config_opts.h to common/options/legacy_config_opts.h
as legacy_config_opts.h is "closer" to the options directory
than other sources files under src/common.
* update y2c.py to generate separate .h files which are in turn
included by legacy_config_opts.h
* add a target named "legacy-option-headers", and let
some targets depend on it so that these headers generated by
y2c.py can be generated before the .cc files including them
are compiled.
test/cls_cas: allow multi hobjects tracked by cls_cas
in d2737fd41a146e8efe3162cdc39845226bd5a756, we started to use multiset
for tracking the references of hobject for snapshot support. as the same
hobject maps to multiple snapshots. and we don't want to consider
different snapshots as the same entry tracked by cls_cas.
but cls_cas.dup_get() still tries to verify that the `get` operation
is able to dedup the same referenced "source". but this does not apply
to "by_object" trunk ref type anymore.
since we cannot check/choose the chunk ref type used by OSD from the
client of the cls_cas, in this change, cls_cas.dup_get() is updated
to adapt the change solely for "by_object". otherwise we could skip
this test for "by_object" type and/or define another test for other
chunk ref types.
J. Eric Ivancich [Wed, 14 Apr 2021 17:55:22 +0000 (13:55 -0400)]
rgw: during reshard lock contention, adjust logging
When RGW fails to get a lock on a reshard log, we log it in such a way
that it looks like an error. Instead we'll make sure that the log
message is informational.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
* CVE-2021-20288:
qa/standalone: default to disable insecure global id reclaim
qa/suites/upgrade/octopus-x: disable insecure global_id reclaim health warnings
qa/tasks/ceph[adm].conf[.template]: disable insecure global_id reclaim health alerts
cephadm: set auth_allow_insecure_global_id_reclaim for mon on bootstrap
mon/HealthMonitor: raise AUTH_INSECURE_GLOBAL_ID_RENEWAL[_ALLOWED]
auth/cephx: ignore CEPH_ENTITY_TYPE_AUTH in requested keys
auth/cephx: rotate auth tickets less often
mon: fail fast when unauthorized global_id (re)use is disallowed
auth/cephx: option to disallow unauthorized global_id (re)use
auth/cephx: make cephx_decode_ticket() take a const ticket_blob
auth/AuthServiceHandler: keep track of global_id and whether it is new
auth/AuthServiceHandler: build_cephx_response_header() is cephx-specific
auth/AuthServiceHandler: drop unused start_session() args
mon/MonClient: drop global_id arg from _add_conn() and _add_conns()
mon/MonClient: reset auth state in shutdown()
mon/MonClient: preserve auth state on reconnects
mon/MonClient: claim active_con's auth explicitly
mon/MonClient: resurrect "waiting for monmap|config" timeouts
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
Sage Weil [Wed, 14 Apr 2021 13:39:38 +0000 (09:39 -0400)]
Merge PR #40734 into master
* refs/pull/40734/head:
mgr/cephadm: make prometheus scrape ingress haproxy
doc/cephadm: remove big warning about stability
doc/cepham/compatibility: rgw-ha -> ingress; note possibility of breaking changes
mgr/cephadm: ingress: add optional virtual_interface_networks
doc/cephadm/rgw: clean up example spec
mgr/cephadm/services/ingress: less verbose about prepare_create
doc/cephadm/rgw: add note about which ethernet interface is used
cephadm: make keepalived unit fiddle sysctl settings
mgr/orchestrator: report external endpoints from 'orch ls'
mgr/orchestrator: drop - when no ports
doc/cephadm/rgw: update docs for ingress service
mgr/cephadm: use per_host_daemon feature in scheduler
mgr/cephadm/schedule: add per_host_daemon_type support
mgr/cephadm: HA_RGW -> Ingress
mgr/cephadm: include daemon_type in DaemonPlacement
mgr/cephadm: update list-networks to report interface names too
mgr/orchestrator: streamline 'orch ps' PORTS formatting
mgr/cephadm/schedule: handle multiple ports per daemon
mgr/cephadm/utils: resolve_ip(): prefer IPv4
instead of using a dedicated target for updating civetweb.h, use
add_custom_command() with OUTPUT option.
so the generated rule does not rerun the command specified by the
civetweb_h target every time we build the targets depending on it.
use the header file as the dependency helps to improve the readability
as there is one less link in the dependency chain, and by specifying the
OUTPUT, cmake is able to figure out the dependency on the header file so
it does not try to regenerate it every time.
tools/osdmaptool: mark unused variable [[maybe_unused]]
to silence warning from GCC when performing release build, like:
../src/tools/osdmaptool.cc: In function ‘int main(int, const char**)’:
../src/tools/osdmaptool.cc:472:9: warning: variable ‘r’ set but not used [-Wunused-but-set-variable]
472 | int r = clock_gettime(CLOCK_MONOTONIC, &round_start);
| ^
in the latest document generated from RtD, the spacing after `ul li p`
elements is set to 24px as the plain `p` elements. but this the lists
more sparse and difficult to read.
in this change, the spacing is restored to 0 as it was in old theme.css
in sphinx_rtd_theme.
mon: MMonProbe: direct MMonJoin messages to the leader, instead of the first mon
When monitors are joining a cluster, they may send an MMonJoin message to place
themselves correctly in the map in either handle_probe_reply() or
finish_election(). These messages must be sent to the leader -- monitors do not
forward each other's messages.
Unfortunately, this scenario was missed when converting the monitors to support
connectivity-based elections, and they're sending these messages to
quorum.begin(). Fix this by including an explicit leader in MMonProbe (that the
new monitor may reference in handle_probe_reply) and using the leader
value in both locations.
It may be that the virtual IP we want to use is not in the same network
as any existing IPs on the host. In that case, allow the spec to specify
a list of networks to match against existing IPs so that a match will
identify an ethernet interface to use.