- Remove `ProtectClock=true` for our systemd service templates
Fixes: https://tracker.ceph.com/issues/50347 Signed-off-by: Wong Hoi Sing Edison <hswong3i@pantarei-design.com>
(cherry picked from commit 85bc551b179d940a50cbdfd0c20848e3187c70a6)
Nizamudeen A [Sun, 25 Apr 2021 11:01:00 +0000 (16:31 +0530)]
mgr/dashboard: Generate NPM dependencies manifest
A txt file with all the dependencies and its version & url link. Fixes: https://tracker.ceph.com/issues/50515 Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 825ea98915bcb0ab4bbaefb478ee0a0b8e506933)
Sage Weil [Thu, 6 May 2021 15:00:11 +0000 (10:00 -0500)]
Merge PR #41151 into pacific
* refs/pull/41151/head:
mgr/cephadm: ceph-volume verbose only when fails
qa/workunits/cephadm/test_cephadm: test zap-osds
cephadm: add --zap-osds argument to rm_cluster
cephadm: implement zap-osds --fsid ... command
doc/cephadm: add podman version note to install
mgr/cephadm: check hostname resolution before adding host
cephadm: provide a way to checkhost connection without /etc/hosts passed the shell
doc/cephadm: remove /etc/hosts from list of hostname resoltion methods
qa/suites/rados/cephadm/smoke-roleless: test client-keyring
qa/tasks/cephadm.py: adjust client.admin key mode; place on all hosts
cephadm: distribute client.admin keyring+conf to label:_admin on bootstrap
doc/cephadm: document the default 'admin' label
mgr/cephadm: 'ceph orch client-keyring ...' commands to manage keyring files
mgr/cephadm: reimplement ceph.conf pushing
mgr/cephadm: use _write_remote_file for ceph.conf
mgr/cephadm: _write_remote_file helper
mgr/cephadm: add placementspec for which hosts get ceph.conf
mgr/cephadm: skip ok-to-stop for mons in upgrade if < 3 mons
mgr/cephadm: don't allow upgrade start with less than 2 mgrs
cephadm: re-assimilate user provided conf after mgr created
cephadm: allow several public networks be matched
mgr/cephadm: The command of 'ceph orch daemon restart mgr.xxx' may case mgr daemon loop to restart
doc/cephadm: add a single word
doc/cephadm: adding "device" to a sentence
doc/cephadm: rewrite "nfs.rst"
mgr/cephadm: s/_hosts_with_daemon_inventory/_schedulable_hosts/
mgr/cephadm: don't remove daemons from hosts in maintenance or offline mode
mgr/cephadm: default status for daemons on maintenance hosts to stopped
qa/tasks/cephadm: fix ctx archive check for teuthology
python-common: use OrderedDict instead of Set to remove duplicates from host labels list
mgr/cephadm: less noise about osd specs
mgr/cephadm: kick serve loop when adding/removing labels
mgr/cephadm: do not place osds on _no_schedule hosts
doc/cephadm: document _no_schedule label
mgr/cephadm: fix 'orch ls' count to reflect schedulable hosts
mgr/cephadm: do not schedule on _no_schedule hosts
doc/cephadm: osd.rst -- removing colons
doc/cephadm: osd: rewrite "additional opts"
doc/cephadm: rewrite "advanced osd s. specs"
doc/cephadm: rewrite "delcarative state" in osd.rst
mgr/MgrStandby: fix config observer
mgr/MgrStandby: respawn if mgr_standby_modules changes
qa/tasks/mgr/test_dashboard: skip test_standby if mgr_standby_modules=false
qa/suites/rados/cephadm/smoke-*: use cephadm.wait_for_service
qa/suites/rados/cephadm/smoke-singlehost: test --single-host-defaults
cephadm: add --single-host-defaults option to bootstrap
mgr/cephadm: allow mgr colo if mgr_standby_modules=false
mgr/MgrStandby: add mgr_standby_modules option
cephadm: ignore apparmor if profiles file is empty
Reviewed-by: Sebastian Wagner <swagner@suse.com> Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com>
Zac Dover [Tue, 30 Mar 2021 15:31:28 +0000 (01:31 +1000)]
doc/cephadm: add podman version note to install
This PR adds a note to the cephadm installation
guide that informs users that only podman version
2.0.0 and higher work with Ceph, with the
exception of podman version 2.2.1, which does
not work with Ceph. There is also a note regarding
kubic stable 3.0.1 working, but only with newer
kernels.
Sage Weil [Thu, 22 Apr 2021 12:12:49 +0000 (08:12 -0400)]
cephadm: distribute client.admin keyring+conf to label:_admin on bootstrap
If we are placing ceph.conf in /etc/ceph (the default), tell the cluster
to continue doing this going forward to hosts with the '_admin' label.
This doesn't induce the user to add the admin label to other hosts too,
unfortunately--e probably want them to add the admin label to other mons,
for instance--but it is a start.
Sage Weil [Wed, 21 Apr 2021 17:06:21 +0000 (13:06 -0400)]
mgr/cephadm: 'ceph orch client-keyring ...' commands to manage keyring files
Teach cephadm to manage keyring files on cluster hosts. These keys must
already exist in the mon auth database--cephadm does not create them if
they don't exist (and will issue warnings to the log if they do not).
A ceph.conf is pushed implicitly along with the keyring file.
Each keyring added will be pushed to the hosts described by the placement
spec with the appropriate ownership and mode. If the ownership, mode, or
path are modified, the files are rewritten or removed as need.
If the client-keyring entry is removed, the keyring files are removed.
Sage Weil [Tue, 20 Apr 2021 16:58:13 +0000 (12:58 -0400)]
mgr/cephadm: add placementspec for which hosts get ceph.conf
Add a config option to control which hosts (by default, *) get a
ceph.conf (if the bool manage_etc_ceph_ceph_conf option is enabled).
We don't modify the existing option because changing a type makes for a
messy migration: we have to sort out which section the config option is
in to change it. Also, a simple on/off which is more friendly than
specifying "*" to enable something.
mgr/cephadm: The command of 'ceph orch daemon restart mgr.xxx' may case mgr daemon loop to restart
Scene:
The mgr daemon is active. After execing restart command, it may be save "scheduled_daemon_actions": {"mgr.xxx": "restart"}}" to config-key.
So the mgr daemon will restart before call rm_scheduled_daemon_action which case mgr daemon will load restart forever.
Fix mgr infinite restart issue refering to the same solution as 'ceph orch daemon redeploy'.
Adam King [Wed, 14 Apr 2021 20:07:46 +0000 (16:07 -0400)]
mgr/cephadm: default status for daemons on maintenance hosts to stopped
we do not refresh the daemons on maintenance hosts so our info
on them is always outdated. Therefore, the best option is to
assume maintenance mode is working correctly and the daemons
are stopped
Sage Weil [Mon, 12 Apr 2021 14:17:17 +0000 (10:17 -0400)]
mgr/cephadm: allow mgr colo if mgr_standby_modules=false
If the standby mgr daemons' modules aren't listening on any ports, then we
can schedule multiple on the same host.
Note that this may make 'orch ps' output misleading, as ports will be
reported for each mgr instance, but only one of them will actually be
listening at any one time (if they are behaving, at least!). Treat a
mgr port check error as non-fatal.
Sage Weil [Tue, 4 May 2021 14:57:16 +0000 (09:57 -0500)]
mgr/MgrStandby: add mgr_standby_modules option
Add config option to control whether the standby modules are started.
Default to true (no change in behavior), but if set to false the standby
mgr modules don't do the redirect business.
While displaying the host pattern in the OSDs placement tab, it gets splited with semi-colons. Also adjusted the column size of Container Image ID and Placement columns.
Kefu Chai [Thu, 11 Mar 2021 13:13:13 +0000 (21:13 +0800)]
mon/OSDMonitor: drop stale failure_info
failure_info keeps strong references of the MOSDFailure messages
sent by osd or peon monitors, whenever monitor starts to handle
an MOSDFailure message, it registers it in its OpTracker. and
the failure report messageis unregistered when monitor acks them
by either canceling them or replying the reporters with a new
osdmap marking the target osd down. but if this does not happen,
the failure reports just pile up in OpTracker. and monitor considers
them as slow ops. and they are reported as SLOW_OPS health warning.
in theory, it does not take long to mark an unresponsive osd down if
we have enough reporters. but there is chance, that a reporter fails
to cancel its report before it reboots, and the monitor also fails
to collect enough reports and mark the target osd down. so the
target osd never gets an osdmap marking it down, so it won't send
an alive message to monitor to fix this.
in this change, we check for the stale failure info in tick(), and
simply drop the stale reports. so the messages can released and
marked "done".
will add a trim failures call in the loop, which mutates failure_info,
while we are still iterating this map. so have to restructure the loop
a little bit.
Kefu Chai [Thu, 11 Mar 2021 09:45:49 +0000 (17:45 +0800)]
mon/OSDMonitor: do not return no_reply() again
we always return "no_op" message to proxy monitor in
`OSDMonitor::prepare_failure()` at the very beginning of this method. so
no need to reply the peon again when discarding the failure report.
Kefu Chai [Thu, 11 Mar 2021 09:09:57 +0000 (17:09 +0800)]
mon/Monitor: early return if routed request is not found
* early return if routed request is not found in routed_requests.
reduce the indent level, for better readability.
* do not look up the request twice. for better performance.
* use unique_ptr<> for holding the request, for better readability
Sage Weil [Wed, 28 Apr 2021 15:44:21 +0000 (10:44 -0500)]
Merge PR #40922 into pacific
* refs/pull/40922/head:
pybind/ceph_argparse: print --format flag name in help descs
mgr/cephadm: don't list non ceph daemons as needing upgrade in upgrade check
qa/tasks/cephadm: ignore --keep-logs failure
qa/tasks/cephadm: use yaml.dump_all()
qa/suites/rados/cephadm/smoke-*: use cephadm.wait_for_service
qa/tasks/cephadm: tear down clsuter before gathering logs
qa/suites/rados/cephadm/smoke-roleless: test rgw-ingress
mgr/cephadm: remove virtual_ip check during scheduling
mgr/orchestrator: orch ls: leave off virtual_ip prefixlen
qa/tasks/cephadm: add wait_for_service
qa/tasks/cephadm: allow skip_monitor_stack=true
qa/tasks/cephadm: do subst_vip for cephadm.shell and .apply
qa/tasks/vip: add vip task to allocate virtual IPs
qa/suites/rados/cephadm/smoke-roleless: add rgw-ingress test case
qa/tasks/cephadm: shell: take 'all-roles' or 'all-hosts'
qa/tasks/cephadm: let cephadm.shell take string or list
doc/cephadm: wrong command for single daemon events
mgr/cephadm: place maximum on placement count based on host count
mgr/cephadm: fix nfs-rgw stray daemon
mgr/cephadm: skip-ssh flag enables cephadm mgr module
mgr/cephadm: report exception during upgrade in upgrade status
qa/suites/rados/thrash: shorten radosbench
mgr/cephadm: remove old haproxy and keepalived templates
mgr/orchestrator: validate lists in spec jsons
python-common: Verify service spec is not None
python-common: Verify data_devices is not None
mgr/orchestrator: DG loads properly the unmanaged attribute
mgr/orchestractor: rgw realm and zone flags must both be provided
mgr/cephadm: make prometheus scrape ingress haproxy
doc/cephadm: remove big warning about stability
doc/cepham/compatibility: rgw-ha -> ingress; note possibility of breaking changes
doc/cephadm: rewrite "dry run" section in osd.rst
doc/cephadm: rewrite part of "deploy osds"
doc/cephadm: rewrite osd.rst "Remove an OSD"
doc/cephadm: rewrite osd.rst - list devices
doc/cephadm: break mon section into sections
doc/cephadm: rewrite "deploying add. mons"
doc: fixes for cephadm documentation
doc/cephadm: remove warning about cephadm in production
doc/cephadm: Add Compatibility with Podman Versions
doc/cephadm: rewrite "index.rst"
doc/cephadm: explicitly show host requirments in adding host section
mgr/cephadm: ingress: add optional virtual_interface_networks
doc/cephadm/rgw: clean up example spec
mgr/cephadm/services/ingress: less verbose about prepare_create
doc/cephadm/rgw: add note about which ethernet interface is used
cephadm: make keepalived unit fiddle sysctl settings
mgr/orchestrator: report external endpoints from 'orch ls'
mgr/orchestrator: drop - when no ports
doc/cephadm/rgw: update docs for ingress service
mgr/cephadm: use per_host_daemon feature in scheduler
cephadm: fix a typo
mgr/cephadm/schedule: add per_host_daemon_type support
mgr/cephadm: HA_RGW -> Ingress
mgr/cephadm: include daemon_type in DaemonPlacement
mgr/cephadm: update list-networks to report interface names too
mgr/orchestrator: streamline 'orch ps' PORTS formatting
mgr/cephadm/schedule: handle multiple ports per daemon
mgr/cephadm/utils: resolve_ip(): prefer IPv4
cephadm: cleanup extra slash in runtime dir
cephadm: use split cgroup strategy for podman
cephadm: use class to represent container engine
mgr/cephadm: don't cleanup the daemon keyring on failed redeploy
mgr/cephadm: fix orch host add with multiple labels and no addr
doc/cephadm: remove keepalived_user from haproxy docs
rpm: re-disable SUSE lttng build on z390x
ceph.spec.in: enable tcmalloc and lttng on s390x
pacific: mds: "cluster [WRN] Scrub error on inode 0x1000000039d (/client.0/tmp/blogbench-1.0/src/blogtest_in) see mds.a log and `damage ls` output for details"
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>