Sage Weil [Tue, 20 Apr 2021 16:58:13 +0000 (12:58 -0400)]
mgr/cephadm: add placementspec for which hosts get ceph.conf
Add a config option to control which hosts (by default, *) get a
ceph.conf (if the bool manage_etc_ceph_ceph_conf option is enabled).
We don't modify the existing option because changing a type makes for a
messy migration: we have to sort out which section the config option is
in to change it. Also, a simple on/off which is more friendly than
specifying "*" to enable something.
mgr/cephadm: The command of 'ceph orch daemon restart mgr.xxx' may case mgr daemon loop to restart
Scene:
The mgr daemon is active. After execing restart command, it may be save "scheduled_daemon_actions": {"mgr.xxx": "restart"}}" to config-key.
So the mgr daemon will restart before call rm_scheduled_daemon_action which case mgr daemon will load restart forever.
Fix mgr infinite restart issue refering to the same solution as 'ceph orch daemon redeploy'.
Adam King [Wed, 14 Apr 2021 20:07:46 +0000 (16:07 -0400)]
mgr/cephadm: default status for daemons on maintenance hosts to stopped
we do not refresh the daemons on maintenance hosts so our info
on them is always outdated. Therefore, the best option is to
assume maintenance mode is working correctly and the daemons
are stopped
Sage Weil [Mon, 12 Apr 2021 14:17:17 +0000 (10:17 -0400)]
mgr/cephadm: allow mgr colo if mgr_standby_modules=false
If the standby mgr daemons' modules aren't listening on any ports, then we
can schedule multiple on the same host.
Note that this may make 'orch ps' output misleading, as ports will be
reported for each mgr instance, but only one of them will actually be
listening at any one time (if they are behaving, at least!). Treat a
mgr port check error as non-fatal.
Sage Weil [Tue, 4 May 2021 14:57:16 +0000 (09:57 -0500)]
mgr/MgrStandby: add mgr_standby_modules option
Add config option to control whether the standby modules are started.
Default to true (no change in behavior), but if set to false the standby
mgr modules don't do the redirect business.
Sage Weil [Wed, 28 Apr 2021 15:44:21 +0000 (10:44 -0500)]
Merge PR #40922 into pacific
* refs/pull/40922/head:
pybind/ceph_argparse: print --format flag name in help descs
mgr/cephadm: don't list non ceph daemons as needing upgrade in upgrade check
qa/tasks/cephadm: ignore --keep-logs failure
qa/tasks/cephadm: use yaml.dump_all()
qa/suites/rados/cephadm/smoke-*: use cephadm.wait_for_service
qa/tasks/cephadm: tear down clsuter before gathering logs
qa/suites/rados/cephadm/smoke-roleless: test rgw-ingress
mgr/cephadm: remove virtual_ip check during scheduling
mgr/orchestrator: orch ls: leave off virtual_ip prefixlen
qa/tasks/cephadm: add wait_for_service
qa/tasks/cephadm: allow skip_monitor_stack=true
qa/tasks/cephadm: do subst_vip for cephadm.shell and .apply
qa/tasks/vip: add vip task to allocate virtual IPs
qa/suites/rados/cephadm/smoke-roleless: add rgw-ingress test case
qa/tasks/cephadm: shell: take 'all-roles' or 'all-hosts'
qa/tasks/cephadm: let cephadm.shell take string or list
doc/cephadm: wrong command for single daemon events
mgr/cephadm: place maximum on placement count based on host count
mgr/cephadm: fix nfs-rgw stray daemon
mgr/cephadm: skip-ssh flag enables cephadm mgr module
mgr/cephadm: report exception during upgrade in upgrade status
qa/suites/rados/thrash: shorten radosbench
mgr/cephadm: remove old haproxy and keepalived templates
mgr/orchestrator: validate lists in spec jsons
python-common: Verify service spec is not None
python-common: Verify data_devices is not None
mgr/orchestrator: DG loads properly the unmanaged attribute
mgr/orchestractor: rgw realm and zone flags must both be provided
mgr/cephadm: make prometheus scrape ingress haproxy
doc/cephadm: remove big warning about stability
doc/cepham/compatibility: rgw-ha -> ingress; note possibility of breaking changes
doc/cephadm: rewrite "dry run" section in osd.rst
doc/cephadm: rewrite part of "deploy osds"
doc/cephadm: rewrite osd.rst "Remove an OSD"
doc/cephadm: rewrite osd.rst - list devices
doc/cephadm: break mon section into sections
doc/cephadm: rewrite "deploying add. mons"
doc: fixes for cephadm documentation
doc/cephadm: remove warning about cephadm in production
doc/cephadm: Add Compatibility with Podman Versions
doc/cephadm: rewrite "index.rst"
doc/cephadm: explicitly show host requirments in adding host section
mgr/cephadm: ingress: add optional virtual_interface_networks
doc/cephadm/rgw: clean up example spec
mgr/cephadm/services/ingress: less verbose about prepare_create
doc/cephadm/rgw: add note about which ethernet interface is used
cephadm: make keepalived unit fiddle sysctl settings
mgr/orchestrator: report external endpoints from 'orch ls'
mgr/orchestrator: drop - when no ports
doc/cephadm/rgw: update docs for ingress service
mgr/cephadm: use per_host_daemon feature in scheduler
cephadm: fix a typo
mgr/cephadm/schedule: add per_host_daemon_type support
mgr/cephadm: HA_RGW -> Ingress
mgr/cephadm: include daemon_type in DaemonPlacement
mgr/cephadm: update list-networks to report interface names too
mgr/orchestrator: streamline 'orch ps' PORTS formatting
mgr/cephadm/schedule: handle multiple ports per daemon
mgr/cephadm/utils: resolve_ip(): prefer IPv4
cephadm: cleanup extra slash in runtime dir
cephadm: use split cgroup strategy for podman
cephadm: use class to represent container engine
mgr/cephadm: don't cleanup the daemon keyring on failed redeploy
mgr/cephadm: fix orch host add with multiple labels and no addr
doc/cephadm: remove keepalived_user from haproxy docs
rpm: re-disable SUSE lttng build on z390x
ceph.spec.in: enable tcmalloc and lttng on s390x
pacific: mds: "cluster [WRN] Scrub error on inode 0x1000000039d (/client.0/tmp/blogbench-1.0/src/blogtest_in) see mds.a log and `damage ls` output for details"
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Sage Weil [Thu, 15 Apr 2021 22:55:00 +0000 (17:55 -0500)]
qa/tasks/cephadm: tear down clsuter before gathering logs
We dont' always stop all services, because teuthology doesn't know about
things it didn't start. Use rm-cluster to tear things down, but do not
remove the logs themselves. After we get logs, we'll clean up completely.
mgr/orchestrator: DG loads properly the unmanaged attribute
Fixes: https://tracker.ceph.com/issues/49805 Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
(cherry picked from commit 0af4ad8614e426adf60eec32bd4b36974c5cb30b)
Zac Dover [Wed, 24 Mar 2021 15:47:17 +0000 (01:47 +1000)]
doc/cephadm: rewrite "dry run" section in osd.rst
This rewrites the "dry run" section of the "OSD Service"
chapter of the Cephdam documentation. This commit makes
minor changes that reduce the cognitive load of the
reader.
Zac Dover [Wed, 24 Mar 2021 14:39:01 +0000 (00:39 +1000)]
doc/cephadm: rewrite part of "deploy osds"
This reorganizes the section "Deploy OSDs"
in the "OSD Service" chapter of the Cephadm
Guide. Two new sections, "Listing Storage
Devices" and "Creating New OSDs" gather
information under headings in a sensible way,
making the information more accessible to someone
skimming this Guide.
Zac Dover [Sun, 28 Mar 2021 19:23:08 +0000 (05:23 +1000)]
doc/cephadm: rewrite osd.rst "Remove an OSD"
This commit rewrites the entire "Remove an OSD"
section of the "OSD Service" chapter of the
cephadm book.
I got carried away and didn't break this one into
four smaller PRs, and I'm sorry in advance to
whomever ends up reviewing this. I'll break "Advanced
OSD Service Specifications", the next section in the
queue, into multiple sections.
Zac Dover [Mon, 15 Mar 2021 15:03:06 +0000 (01:03 +1000)]
doc/cephadm: break mon section into sections
This PR breaks the "Deploy Additional Monitors" section
of the cephadm documentation into several subsections
whose titles spotlight the matter under discussion in
those respective subsections.
inb4: Another PR is on deck that rewrites the sentences
in this chapter of the cephadm documentation. I'd like
to get this chapter broken up into these subsections before
I rewrite those sentences. So I'm hoping for no grammatical
mission creep on this one. The grammar and clarity updates
are coming.
Jeff Layton [Fri, 29 Jan 2021 19:15:26 +0000 (14:15 -0500)]
doc: fixes for cephadm documentation
Be sure to note that python 3 is a prerequisite. Minimal centos 8
installs don't have it, for instance.
Also, we probably don't want to hardcode an octopus URL into the
suggested curl command. Change it to fill that in with
"|stable-release|", which should always point to the latest released
version name.
Fixes: https://tracker.ceph.com/issues/49806 Signed-off-by: Jeff Layton <jlayton@redhat.com>
(cherry picked from commit bf69cdc68970789a7410928bd8a1af34d0d9b6a2)