Sage Weil [Sun, 21 Mar 2021 18:25:06 +0000 (13:25 -0500)]
Merge PR #40247 into pacific
* refs/pull/40247/head:
common: reset last_log_sent when clog_to_monitors is updated
logclient: move LogChannel::set_log_to_monitors(bool v) to LogClient.cc
Sage Weil [Sun, 21 Mar 2021 14:38:49 +0000 (09:38 -0500)]
Merge PR #40129 into pacific
* refs/pull/40129/head:
osd: PeeringState: implement an acting_set_writeable() function
osd: PeeringState: fix a boolean conditional direction
osd: PeeringState: fix stretch peering so PGs can go peered but not active
osd: PeeringState: don't add acting-set OSDs to candidate set in stretch mode
osd: PeeringState: fix calc_replicated_acting_stretch() syntax/logic
osd: PeeringState: respect stretch peering constraints for async recovery
osd: PeeringState: add a comment about using size as a proxy for activateable
osd: check for is_stretch_pool() in stretch_set_can_peer()
scripts: some additions to help with local testing
script: set_up_stretch_mode: include OSDs in root=default so pg creation works
Kefu Chai [Sat, 20 Mar 2021 05:00:01 +0000 (13:00 +0800)]
install-deps.sh: remove existing ceph-libboost of different version
we install different versions of precompiled ceph-libboost packages
for different branches when building and testing them on ubuntu test
nodes. for instance,
- nautilus: v1.72
- octopus, pacific: v1.73
they share the same set of test nodes. and these ceph-libboost packages
conflict with each other, because they install files to the same places.
in order to avoid the confliction, we should uninstall existing packages
before installing a different version of ceph-libboost packages.
ceph-libboost${version}-dev is a package providing the shared headers of
boost library, so, in this change we check if it is installed before
returning or removing the existing packages.
Kefu Chai [Fri, 19 Mar 2021 04:05:45 +0000 (12:05 +0800)]
pybind/mgr/dashboard: bump flake8 to 3.9.0
to address the failure of
ERROR: Cannot install -r requirements-lint.txt (line 2) and -r requirements-lint.txt (line 8) because these package versions have conflicting dependencies.
The conflict is caused by:
flake8 3.8.4 depends on pycodestyle<2.7.0 and >=2.6.0a1
autopep8 1.5.6 depends on pycodestyle>=2.7.0
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
Gerald Yang [Wed, 3 Mar 2021 04:37:15 +0000 (04:37 +0000)]
common: reset last_log_sent when clog_to_monitors is updated
When clog_to_monitors is disabled, "last_log" still keeps increasing by
get_next_seq() if OSD writes info to clog
But "last_log_sent" doesn't increase, if we disable clog_to_monitors for
a bit longer and then re-enabling it, the num_unsent could be bigger than
log_queue_size(), it will trigger an assertion in _get_mon_log_message
We need to reset last_log_sent to last_log before updating clog_to_monitors
Jason Dillaman [Wed, 17 Mar 2021 19:29:37 +0000 (15:29 -0400)]
test: ignore failures to force-enable lockdep
PR #40062 tweaked the behavior of lockdep to compile it out
of the code entirely for release builds. This fixes several
gtests where lockdep was force-enabled.
Ilya Dryomov [Wed, 17 Mar 2021 10:00:33 +0000 (11:00 +0100)]
qa: krbd_blkroset.t: update for separate hw and user read-only flags
Since kernel 5.12, hardware read-only state and user read-only
policy (BLKROGET/SET ioctls) are tracked separately in the block
layer. As the purpose of our ->set_read_only() method was exactly
that, it was removed.
As a side effect, BLKROSET no longer returns EROFS on an attempt
to make a read-only mapping read-write with "blockdev --setrw".
The policy gets updated, but the device remains read-only as before
because the hardware (== mapping) state is controlled by the driver.
Xiubo Li [Thu, 4 Feb 2021 06:14:13 +0000 (14:14 +0800)]
mgr: enhance the rados service
For some use cases, like the tcmu-runner, there maybe handreds or
thousands of LUNs, and then for each LUN it will register one service
daemon, then in the `ceph -s` output will be full of useless info.
This will allow to classify the sevices service daemons in one
specified format by adding two pairs in metadata:
TYPE: will be used to replace the default "daemon(s)"
showed in `ceph -s`. If absent, the "daemon" will be used.
PREFIX: if present the active members will be classified
by the prefix instead of "daemon_name".
For exmaple for iscsi gateways, it will be something likes:
"daemon_type" : "portal"
"daemon_prefix" : "gw${N}"
Then the `ceph -s` output will be:
...
services:
mon: 3 daemons, quorum a,b,c (age 50m)
mgr: x(active, since 49m)
mds: a:1 {0=c=up:active} 2 up:standby
osd: 3 osds: 3 up (since 49m), 3 in (since 49m)
iscsi: 8 portals active (gw0, gw1, gw2, gw3, gw4, gw5, gw6, gw7)
...
Rachanaben Patel [Tue, 16 Mar 2021 22:37:46 +0000 (15:37 -0700)]
doc/RBD:fixes for ceph-immutable-object-cache daemon enable command
Document for rbd-persistent-read-only-cache show how to manage
ceph-immutable-object-cache daemon using systemd.
command example needs fixing.It should be
osd: Disable sleep times for all best effort clients of mclock
If mClockScheduler is scheduling IOs then the various sleep options
for the best effort clients of mclock viz. pg_delete, snaptrim and
scrub are disabled so as to not affect the QoS being applied.
osd: Add config options for cost per io & byte for the mclock scheduler
The cost per io and cost per byte options for hdd and ssd are specified
and set to default values determined using experiments on hdds and ssds
using a cost model. The values are used in calc_scaled_cost() to
determine the scaled cost for every OpSchedulerItem that is enqueued
within the mClockScheduler.
qa/tasks: Add additional wait_for_clean() check in lost_unfound tasks.
At the end of the lost_unfound tests add an additional wait_for_clean()
check to ensure that recoveries get enough time to complete before
proceeding and avoid failures down the line. For e.g. failure like
"Scrubbing terminated -- not all pgs were active and clean." is because
recoveries on the PGs did not get sufficient time to complete even though
they were bound to eventually complete.
mgr/dashboard: select any object gateway on local cluster.
Dashboard backend settings:
- Refactoring: now accepting more than 1 type of value.
- RGW_API_ACCESS_KEY & RGW_API_SECRET_KEY accept string (backward compatibility: legacy behavior) as well as dictionary of strings for connecting multiple daemons.
- Ease of use: deprecated: mgr/dashboard/RGW_API_USER_ID: not useful anymore (kept for backward compatibility).
UI/UX:
- Created context component (to be shown only on rgw-related routes) for selecting operating daemon.
- Daemon selector only shown if there is more than 1 daemon running on a local cluster (to reduce cognitive load).
Fixes: https://tracker.ceph.com/issues/47375 Signed-off-by: Alfonso MartÃnez <almartin@redhat.com>
(cherry picked from commit 94fe271b06f1e87d37850ac20dd31fa2314e8dfe)
Sage Weil [Tue, 16 Mar 2021 20:14:19 +0000 (15:14 -0500)]
Merge PR #40135 into pacific
* refs/pull/40135/head:
pybind/mgr: correct a MgrModule annotation
mgr/ceph_module: add type annotation to BaseMgrModule
mgr/prometheus: fix warning of possibly unbound variables
mgr/prometheus: flake8 cleanups
mgr/prometheus: fix import failure (flake8)
mgr/prometheus: add type annotations
mgr/prometheus: raise at seeing unknown status
mgr/prometheus: implement command using CLIReadCommand
mgr/{prometheus,telemetry}: appease mypy
mgr/prometheus: add prometheus to flake8 test
mgr/prometheus: escape special chars using r-string
pybind/mgr/prometheus: PEP8 cleanups
pybind/mgr/prometheus: add typing annotations
mgr/prometheus: introduce metric for collection time
mgr/cephadm: fix 'auth caps' fallback
mgr/cephadm: ensure mgr metadata is not none
qa/suites/rados/cephadm: add back centos+rhel with kubic podman
qa/suites/rados/cephadm/upgrade: deploy a legacy r.z-style rgw
qa/suites/rados/cephadm/upgrade: start at 15.2.9 to test iscsi upgrade
qa/tasks/cephadm.py: don't set mgr count to +1
doc/cephadm: add note about deprecation of NFSv3
doc/cephadm: remove step to restart the mgr
doc/cephadm: use `reconfig` instead of `redeploy`
doc/cephadm: update custom j2 config-key name
doc/cephadm: use 'apt' to install cephadm on Ubuntu
mgr/cephadm: remove duplicate labels when adding a host
mgr/cephadm: tolerate failure to update daemon caps
mgr/cephadm: fix get_keyring_with_caps
python-common: fix PlacementSpec target size method
python-common: count-per-host must be combined with label or hosts or host_pattern
mgr/cephadm: handle bare 'count-per-host:NNN', fix comments
mgr/cephadm/schedule: remove Scheduler abstraction (for now at least)
mgr/cephadm/schedule: calculate additions/removals in place()
mgr/cephadm/schedule: allow colocation of certain daemon types
mgr/cephadm/schedule: shuffle candidates, not final placements
mgr/cephadm/schedule: pass per-type allow_colo to the scheduler
mgr/cephadm/services/cephadmservice: fix typo
mgr/cephadm/schedule: pass daemons, not get_daemons_func
mgr/cephadm: use local var
mgr/cephadm/schedule: move host filtering into get_candidates()
python-common/ceph/deployment/service_spec: disallow max-per-host + explicit placement
mgr/cephadm/schedule: respect count-per-host
mgr/cephadm: adjust deployment logic to allow multiple daemons per host
python-common: add count-per-host to PlacementSpec
mgr/cephadm: do not worry about even # of monitors
mgr/cephadm: add iscsi and nfs to upgrade
mgr/cephadm: update caps if necessary when getting keyring
mgr/cephadm: add cephfs-mirror to CEPH_UPGRADE_ORDER
cephadm: Add cephfs-mirror
qa/cephadm: Add cephfs-mirror test
qa/tasks: some type annotations
mgr/orch: Add cephfs-mirror to enum
mgr/cephadm: Add CephfsMirrorService
mgr/orch: replace def add_{type}(...) with generic add_daemon()
mgr/cephadm: drop `create_func` arg from _add_daemon
mgr/cephadm: move CephadmExporter to new module
mgr/cephadm: fix CephadmExporter deployment
cephadm: exporter: use os.path.realpath(__file__)
mgr/cephadm: root mode: call (and deploy) cephadm binary
cephadm: Get rid of injected_argv
cephadm: Make path to cephadm binary unique
python-common: continue to allow RGWSpec(realm=r,zone=z)
PendingReleaseNodes: note changes in cephadm rgw behavior
qa/tasks/cephadm: drop realm.zone convention for rgw
doc: update docs
doc/cephadm: rewrite "adoption process"
doc/cephadm: rewrite "preparation" in adoption.rst
doc/cephadm: add prompts to adoption.rst
doc/cephadm: rewrite part of adoption.rst
python-common/ceph/deployment: RGWSpec: accept (and drop) subcluster arg
mgr/orchestrator: drop $realm.$zone naming convention
mgr/cephadm: rgw: do not mess with realm configuration
mgr/cephadm:Document the cephadm config-check feature
mgr/cephadm:fix to resolve mypy issue
mgr/cephadm:add unit test for the lookup_check helper
mgr/cephadm:Drop active healthcheck during a disable request
mgr/cephadm:Added helper function to return a specific healthcheck
mgr/cephadm:unit test added for nics better than most
mgr/cephadm:skip an alert if the linkspeed is better than most
mgr/cephadm:fix mypy warning
mgr/cephadm:Remove check from ceph metadata gathering
mgr/cephadm:Add unit test for hosts without public network NIC
mgr/cephadm:Minor updates to address review comments
mgr/cephadm:Added CLI interface for the configuration checker
mgr/cephadm:Multiple updates related to the addition of the CLI
mgr/cephadm:Moved 'ownership' of the checker to cephadm
mgr/cephadm:Unit tests updated to account for upgrades
mgr/cephadm:Updates to CephadmConfigChecks class
mgr/cephadm:Adds unit tests for the CephadmConfigChecks class
mgr/cephadm:add module option to enable configuration checks
mgr/cephadm:added ceph version consistency check
mgr/cephadm: added config checker to main serve loop
mgr/cephadm: adding check logic
mgr/cephadm: resolve rebase conflicts
mgr/cephadm:Document the intergration with libstoragemgmt
mgr/cephadm:Enable cephadm device scan to use LSM
mgr/cephadm: prevent traceback when invalid osd id passed to 'orch osd rm stop'
mgr/cephadm: do not prime service cache on reconfig
mgr/cephadm/osd: PEP-8 fix
mgr/cephadm: Activate existing OSDs
mgr/cephadm: osd: Use _run_cephadm_json()
mgr/cephadm: document ok_to_stop output argument for clarity
mgr/DaemonServer: make warning language a bit friendlier
mgr/cephadm/upgrade: improve language a bit
mgr/cephadm/upgrade: restart multiple osds at once
mgr/cephadm: gather other osds that are safe to stop
mgr/cephadm: optional pass 'known' through to ok_to_stop
mgr/cephadm/upgrade: log start/stop/pause/resume
mgr/cephadm: add CEPHADM_STRAY_DAEMON unittest
mgr/cephadm: alias rgw-nfs -> nfs
qa/tasks/cephadm: remove mirror code
cephadm: fixup `alrady` -> `already`
cephadm: Change outer quotes to avoid escaping inner quotes (Q003)
cephadm: Remove bad quotes from multiline string (Q001)
cephadm: Remove bad quotes (Q000)
cephadm: introduce flake8-quotes
cephadm: line break after binary operator (W504)
cephadm: blank line contains whitespace (W293)
cephadm: trailing whitespace (W291)
cephadm: local variable 'e' is assigned to but never used (F841)
cephadm: 'select' imported but unused (F401)
cephadm: ambiguous variable name 'l' (E741)
cephadm: do not use bare 'except' (E722)
cephadm: statement ends with a semicolon (E703)
cephadm: module level import not at top of file (E402)
cephadm: expected 1 blank line before a nested definition (E306)
cephadm: expected 2 blank lines after end of function or class (E305)
cephadm: too many blank lines (E303)
cephadm: expected 2 blank lines, found 1 (E302)
cephadm: expected 1 blank line, found 0 (E301)
cephadm: too many leading '#' for block comment (E266)
cephadm: block comment should start with '# ' (E265)
cephadm: at least two spaces before inline comment (E261)
cephadm: unexpected spaces around keyword / parameter equals (E251)
cephadm: multiple spaces after ',' (E241)
cephadm: missing whitespace after ':' (E231)
cephadm: missing whitespace around arithmetic operator (E226)
cephadm: missing whitespace around operator (E225)
cephadm: whitespace before ':' (E203)
cephadm: whitespace after '{' (E201)
cephadm: continuation line unaligned for hanging indent (E131)
cephadm: continuation line under-indented for visual indent (E128)
cephadm: continuation line over-indented for visual indent (E127)
cephadm: continuation line over-indented for hanging indent (E126)
cephadm: continuation line with same indent as next logical line (E125)
cephadm: closing bracket does not match visual indentation (E124)
cephadm: ... does not match indentation of opening bracket's line (E123)
cephadm: continuation line missing indentation or outdented (E122)
cephadm: continuation line under-indented for hanging indent (E121)
cephadm: over-indented (E117)
cephadm: introduce flake8
Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com>
Kefu Chai [Wed, 10 Feb 2021 07:49:03 +0000 (15:49 +0800)]
mgr/prometheus: add prometheus to flake8 test
for the explanation why we should add a line break before a binary
operator. see
https://www.python.org/dev/peps/pep-0008/#should-a-line-break-before-or-after-a-binary-operator
Sage Weil [Mon, 15 Mar 2021 22:20:25 +0000 (17:20 -0500)]
mgr/cephadm: ensure mgr metadata is not none
This hunk is from aca45d7d08fd8c3f32849331eba4620e2726282a, a much
larger change in master that added type annotations all over the place.
It just brings src/pybind/mgr/cephadm fully in sync with master.
Sage Weil [Mon, 15 Mar 2021 16:55:36 +0000 (11:55 -0500)]
mgr/cephadm: tolerate failure to update daemon caps
If we're upgrading from 15.2.0, we may fail to update caps. Instead of
failing the upgrade hard, warn to the log and continue. This is less
than ideal, but the caps will get corrected the next time the daemon is
redeployed on the next upgrade, and most likely the previous caps will
continue to work (given they were presumably working before the upgrade).
Sage Weil [Fri, 12 Mar 2021 16:15:35 +0000 (10:15 -0600)]
mgr/cephadm: fix get_keyring_with_caps
1- Pass caps to 'auth get-or-create'
2- Only try 'auth caps' if the get-or-create failed
Note that the 'auth caps' step can fail if upgrading from 15.2.0 since
'profile mgr' didn't include 'auth caps' until 15.2.1. We're not
addressing that for now...
Fixes: 7c0d532f3a4839f4199a13773fb5fa8b6fb3f183 Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 6127d7f20bc8a6ad02d8ea144584eaf2bfc9590e)
Sage Weil [Wed, 10 Mar 2021 22:31:31 +0000 (17:31 -0500)]
python-common: count-per-host must be combined with label or hosts or host_pattern
I think this is better for the same reason we made PlacementSpec() not
mean 'all hosts' by default. If you really want N daemons for every host
in the cluster, be specific with 'count-per-host:2 *'.