From: Sage Weil Date: Fri, 15 Nov 2019 14:23:53 +0000 (-0600) Subject: Merge branch 'nautilus' into wip-device-telemetry-nautilus X-Git-Tag: v14.2.5~62^2~4 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=364f39c567ef029d303f3da4314265f9f00a14c9;p=ceph.git Merge branch 'nautilus' into wip-device-telemetry-nautilus --- 364f39c567ef029d303f3da4314265f9f00a14c9 diff --cc PendingReleaseNotes index 15191386d5d,cb764db5aaa..ad80bf43022 --- a/PendingReleaseNotes +++ b/PendingReleaseNotes @@@ -25,58 -25,18 +25,73 @@@ objects and the other deletes them. Read the troubleshooting section of the dynamic resharding docs for details. +14.2.5 +------ + +* The telemetry module now has a 'device' channel, enabled by default, that + will report anonymized hard disk and SSD health metrics to telemetry.ceph.com + in order to build and improve device failure prediction algorithms. Because + the content of telemetry reports has changed, you will need to either re-opt-in + with:: + + ceph telemetry on + + You can view exactly what information will be reported first with:: + + ceph telemetry show + ceph telemetry show device # specifically show the device channel + + If you are not comfortable sharing device metrics, you can disable that + channel first before re-opting-in: + + ceph config set mgr mgr/telemetry/channel_crash false + ceph telemetry on + +* The telemetry module now reports more information about CephFS file systems, + including: + + - how many MDS daemons (in total and per file system) + - which features are (or have been) enabled + - how many data pools + - approximate file system age (year + month of creation) + - how many files, bytes, and snapshots + - how much metadata is being cached + + We have also added: + + - which Ceph release the monitors are running + - whether msgr v1 or v2 addresses are used for the monitors + - whether IPv4 or IPv6 addresses are used for the monitors + - whether RADOS cache tiering is enabled (and which mode) + - whether pools are replicated or erasure coded, and + which erasure code profile plugin and parameters are in use + - how many hosts are in the cluster, and how many hosts have each type of daemon + - whether a separate OSD cluster network is being used + - how many RBD pools and images are in the cluster, and how many pools have RBD mirroring enabled + - how many RGW daemons, zones, and zonegroups are present; which RGW frontends are in use + - aggregate stats about the CRUSH map, like which algorithms are used, how big buckets are, how many rules are defined, and what tunables are in use + + If you had telemetry enabled, you will need to re-opt-in with:: + + ceph telemetry on + + You can view exactly what information will be reported first with:: + + ceph telemetry show # see everything + ceph telemetry show basic # basic cluster info (including all of the new info) + + * A health warning is now generated if the average osd heartbeat ping + time exceeds a configurable threshold for any of the intervals + computed. The OSD computes 1 minute, 5 minute and 15 minute + intervals with average, minimum and maximum values. New configuration + option ``mon_warn_on_slow_ping_ratio`` specifies a percentage of + ``osd_heartbeat_grace`` to determine the threshold. A value of zero + disables the warning. New configuration option + ``mon_warn_on_slow_ping_time`` specified in milliseconds over-rides the + computed value, causes a warning + when OSD heartbeat pings take longer than the specified amount. + New admin command ``ceph daemon mgr.# dump_osd_network [threshold]`` command will + list all connections with a ping time longer than the specified threshold or + value determined by the config options, for the average for any of the 3 intervals. + New admin command ``ceph daemon osd.# dump_osd_network [threshold]`` will + do the same but only including heartbeats initiated by the specified OSD.