Merge branch 'nautilus' into wip-device-telemetry-nautilus

author Sage Weil <sage@newdream.net>

Fri, 15 Nov 2019 14:23:53 +0000 (08:23 -0600)

committer GitHub <noreply@github.com>

Fri, 15 Nov 2019 14:23:53 +0000 (08:23 -0600)
author Sage Weil <sage@newdream.net>
Fri, 15 Nov 2019 14:23:53 +0000 (08:23 -0600)
committer GitHub <noreply@github.com>
Fri, 15 Nov 2019 14:23:53 +0000 (08:23 -0600)
diff --cc PendingReleaseNotes

index 15191386d5da063dce33ab91c38acddf7bb85c36,cb764db5aaa0c1522ca273df76f619a6d7c3d99a..ad80bf43022567bac0fefd5b5117b9cb4bacb5bc
--- 1/PendingReleaseNotes
--- 2/PendingReleaseNotes
+++ b/PendingReleaseNotes
@@@ -25,58 -25,18 +25,73 @@@
     objects and the other deletes them. Read the troubleshooting section
     of the dynamic resharding docs for details.
   
+ +14.2.5
+ +------
+ +
+ +* The telemetry module now has a 'device' channel, enabled by default, that
+ +  will report anonymized hard disk and SSD health metrics to telemetry.ceph.com
+ +  in order to build and improve device failure prediction algorithms.  Because
+ +  the content of telemetry reports has changed, you will need to either re-opt-in
+ +  with::
+ +
+ +    ceph telemetry on
+ +
+ +  You can view exactly what information will be reported first with::
+ +
+ +    ceph telemetry show
+ +    ceph telemetry show device   # specifically show the device channel
+ +
+ +  If you are not comfortable sharing device metrics, you can disable that
+ +  channel first before re-opting-in:
+ +
+ +    ceph config set mgr mgr/telemetry/channel_crash false
+ +    ceph telemetry on
+ +
+ +* The telemetry module now reports more information about CephFS file systems,
+ +  including:
+ +
+ +    - how many MDS daemons (in total and per file system)
+ +    - which features are (or have been) enabled
+ +    - how many data pools
+ +    - approximate file system age (year + month of creation)
+ +    - how many files, bytes, and snapshots
+ +    - how much metadata is being cached
+ +
+ +  We have also added:
+ +
+ +    - which Ceph release the monitors are running
+ +    - whether msgr v1 or v2 addresses are used for the monitors
+ +    - whether IPv4 or IPv6 addresses are used for the monitors
+ +    - whether RADOS cache tiering is enabled (and which mode)
+ +    - whether pools are replicated or erasure coded, and
+ +      which erasure code profile plugin and parameters are in use
+ +    - how many hosts are in the cluster, and how many hosts have each type of daemon
+ +    - whether a separate OSD cluster network is being used
+ +    - how many RBD pools and images are in the cluster, and how many pools have RBD mirroring enabled
+ +    - how many RGW daemons, zones, and zonegroups are present; which RGW frontends are in use
+ +    - aggregate stats about the CRUSH map, like which algorithms are used, how big buckets are, how many rules are defined, and what tunables are in use
+ +
+ +  If you had telemetry enabled, you will need to re-opt-in with::
+ +
+ +    ceph telemetry on
+ +
+ +  You can view exactly what information will be reported first with::
+ +
+ +    ceph telemetry show        # see everything
+ +    ceph telemetry show basic  # basic cluster info (including all of the new info)
+ +
+ * A health warning is now generated if the average osd heartbeat ping
+   time exceeds a configurable threshold for any of the intervals
+   computed.  The OSD computes 1 minute, 5 minute and 15 minute
+   intervals with average, minimum and maximum values.  New configuration
+   option ``mon_warn_on_slow_ping_ratio`` specifies a percentage of
+   ``osd_heartbeat_grace`` to determine the threshold.  A value of zero
+   disables the warning.  New configuration option
+  ``mon_warn_on_slow_ping_time`` specified in milliseconds over-rides the
+   computed value, causes a warning
+   when OSD heartbeat pings take longer than the specified amount.
+   New admin command ``ceph daemon mgr.# dump_osd_network [threshold]`` command will
+   list all connections with a ping time longer than the specified threshold or
+   value determined by the config options, for the average for any of the 3 intervals.
+   New admin command ``ceph daemon osd.# dump_osd_network [threshold]`` will
+   do the same but only including heartbeats initiated by the specified OSD.
diff --cc src/mgr/ActivePyModules.cc
Simple merge
diff --cc src/pybind/mgr/balancer/module.py
Simple merge
diff --cc src/pybind/mgr/devicehealth/module.py
Simple merge
author	Sage Weil <sage@newdream.net>
	Fri, 15 Nov 2019 14:23:53 +0000 (08:23 -0600)
committer	GitHub <noreply@github.com>
	Fri, 15 Nov 2019 14:23:53 +0000 (08:23 -0600)
		1	2
PendingReleaseNotes	patch \|	diff1 \|	diff2 \|	blob \| history
src/mgr/ActivePyModules.cc	patch \|	diff1 \|	diff2 \|	blob \| history
src/pybind/mgr/balancer/module.py	patch \|	diff1 \|	diff2 \|	blob \| history
src/pybind/mgr/devicehealth/module.py	patch \|	diff1 \|	diff2 \|	blob \| history