From: Sage Weil <sage@newdream.net>
Date: Fri, 15 Nov 2019 14:23:53 +0000 (-0600)
Subject: Merge branch 'nautilus' into wip-device-telemetry-nautilus
X-Git-Tag: v14.2.5~62^2~4
X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=364f39c567ef029d303f3da4314265f9f00a14c9;p=ceph.git

Merge branch 'nautilus' into wip-device-telemetry-nautilus
---

364f39c567ef029d303f3da4314265f9f00a14c9
diff --cc PendingReleaseNotes
index 15191386d5d,cb764db5aaa..ad80bf43022
--- a/PendingReleaseNotes
+++ b/PendingReleaseNotes
@@@ -25,58 -25,18 +25,73 @@@
    objects and the other deletes them. Read the troubleshooting section
    of the dynamic resharding docs for details.
  
 +14.2.5
 +------
 +
 +* The telemetry module now has a 'device' channel, enabled by default, that
 +  will report anonymized hard disk and SSD health metrics to telemetry.ceph.com
 +  in order to build and improve device failure prediction algorithms.  Because
 +  the content of telemetry reports has changed, you will need to either re-opt-in
 +  with::
 +
 +    ceph telemetry on
 +
 +  You can view exactly what information will be reported first with::
 +
 +    ceph telemetry show
 +    ceph telemetry show device   # specifically show the device channel
 +
 +  If you are not comfortable sharing device metrics, you can disable that
 +  channel first before re-opting-in:
 +
 +    ceph config set mgr mgr/telemetry/channel_crash false
 +    ceph telemetry on
 +
 +* The telemetry module now reports more information about CephFS file systems,
 +  including:
 +
 +    - how many MDS daemons (in total and per file system)
 +    - which features are (or have been) enabled
 +    - how many data pools
 +    - approximate file system age (year + month of creation)
 +    - how many files, bytes, and snapshots
 +    - how much metadata is being cached
 +
 +  We have also added:
 +
 +    - which Ceph release the monitors are running
 +    - whether msgr v1 or v2 addresses are used for the monitors
 +    - whether IPv4 or IPv6 addresses are used for the monitors
 +    - whether RADOS cache tiering is enabled (and which mode)
 +    - whether pools are replicated or erasure coded, and
 +      which erasure code profile plugin and parameters are in use
 +    - how many hosts are in the cluster, and how many hosts have each type of daemon
 +    - whether a separate OSD cluster network is being used
 +    - how many RBD pools and images are in the cluster, and how many pools have RBD mirroring enabled
 +    - how many RGW daemons, zones, and zonegroups are present; which RGW frontends are in use
 +    - aggregate stats about the CRUSH map, like which algorithms are used, how big buckets are, how many rules are defined, and what tunables are in use
 +
 +  If you had telemetry enabled, you will need to re-opt-in with::
 +
 +    ceph telemetry on
 +
 +  You can view exactly what information will be reported first with::
 +
 +    ceph telemetry show        # see everything
 +    ceph telemetry show basic  # basic cluster info (including all of the new info)
 +
+ * A health warning is now generated if the average osd heartbeat ping
+   time exceeds a configurable threshold for any of the intervals
+   computed.  The OSD computes 1 minute, 5 minute and 15 minute
+   intervals with average, minimum and maximum values.  New configuration
+   option ``mon_warn_on_slow_ping_ratio`` specifies a percentage of
+   ``osd_heartbeat_grace`` to determine the threshold.  A value of zero
+   disables the warning.  New configuration option
+  ``mon_warn_on_slow_ping_time`` specified in milliseconds over-rides the
+   computed value, causes a warning
+   when OSD heartbeat pings take longer than the specified amount.
+   New admin command ``ceph daemon mgr.# dump_osd_network [threshold]`` command will
+   list all connections with a ping time longer than the specified threshold or
+   value determined by the config options, for the average for any of the 3 intervals.
+   New admin command ``ceph daemon osd.# dump_osd_network [threshold]`` will
+   do the same but only including heartbeats initiated by the specified OSD.