From: Sage Weil Date: Mon, 3 Feb 2020 15:31:36 +0000 (-0600) Subject: doc/releases: octopus draft notes X-Git-Tag: v15.1.1~482^2 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=a216baa83e610b25d22120d1de3adf005724a0e6;p=ceph.git doc/releases: octopus draft notes Signed-off-by: Sage Weil --- diff --git a/doc/bootstrap.rst b/doc/bootstrap.rst index 0400fd4552c13..fe8b1b4b215f0 100644 --- a/doc/bootstrap.rst +++ b/doc/bootstrap.rst @@ -1,3 +1,5 @@ +.. _bootstrap: + ======================== Installation (cephadm) ======================== diff --git a/doc/mgr/cephadm.rst b/doc/mgr/cephadm.rst index f2241ea229d99..c4a7ef386700f 100644 --- a/doc/mgr/cephadm.rst +++ b/doc/mgr/cephadm.rst @@ -1,3 +1,5 @@ +.. _cephadm: + ==================== cephadm orchestrator ==================== diff --git a/doc/releases/general.rst b/doc/releases/general.rst index 1033116f86a98..344e6c5c6a2a4 100644 --- a/doc/releases/general.rst +++ b/doc/releases/general.rst @@ -121,6 +121,9 @@ Release timeline .. ceph_timeline:: releases.yml development nautilus mimic luminous kraken jewel infernalis hammer giant firefly emperor +.. _Octopus: ../octopus +.. _15.1.0: ../octopus#v15-1-0-octopus + .. _Nautilus: ../nautilus .. _14.2.7: ../nautilus#v14-2-7-nautilus .. _14.2.6: ../nautilus#v14-2-6-nautilus diff --git a/doc/releases/index.rst b/doc/releases/index.rst index 374dc0a232108..b5806a7326fac 100644 --- a/doc/releases/index.rst +++ b/doc/releases/index.rst @@ -7,6 +7,14 @@ Ceph Releases (index) .. toctree:: :maxdepth: 1 +Pending Release +--------------- + +.. toctree:: + :maxdepth: 1 + + Octopus + Active Releases --------------- diff --git a/doc/releases/octopus.rst b/doc/releases/octopus.rst new file mode 100644 index 0000000000000..f8c2d065ab603 --- /dev/null +++ b/doc/releases/octopus.rst @@ -0,0 +1,491 @@ +v15.1.0 Octopus +=============== + +.. note: This is a release candidate and not (yet) intended for production use. + +These are draft notes for the upcoming Octopus release. + +Major Changes from Nautilus +--------------------------- + +- *General*: + + * A new deployment tool called **cephadm** has been introduced that + integrates Ceph daemon deployment and management via containers + into the orchestration layer. For more information see + :ref:`cephadm` and :ref:`bootstrap`. + * Health alerts can now be muted, either temporarily or permanently. + * A simple 'alerts' capability has been introduced to send email + health alerts for clusters deployed without the benefit of an + existing external monitoring infrastructure. + * Health alerts are now raised for recent Ceph daemons crashes. + + +- *RADOS*: + + * Objects can now be brought in sync during recovery by copying only + the modified portion of the object, reducing tail latencies during + recovery. + * The PG autoscaler feature introduced in Nautilus is enabled for + new pools by default, allowing new clusters to autotune *pg num* + without any user intervention. The default values for new pools + and RGW/CephFS metadata pools have also been adjusted to perform + well for most users. + * BlueStore has received serveral improvements and performance + updates, including improved accounting for "omap" (key/value) + object data by pool, improved cache memory management, and a + reduced allocation unit size for SSD devices. + * Snapshot trimming metadata is now managed in a more efficient and + scalable fashion. + + +- *RBD* block storage: + + * Clone operations now preserve the sparseness of the underlying RBD image. + * The trash feature has been improved to (optionally) automatically + move old parent images to the trash when their children are all + deleted or flattened. + * The ``rbd-nbd`` tool has been improved to use more modern kernel interfaces. + * Caching has been improved to be more efficient and performant. + + +- *RGW* object storage: + + * Multi-site replication can now be managed on a per-bucket basis (EXPERIMENTAL). + * WORM? + * bucket tagging? + +- *CephFS* distributed file system: + + * ? + + +Upgrading from Mimic or Nautilus +-------------------------------- + +Notes +~~~~~ + +* You can monitor the progress of your upgrade at each stage with the + ``ceph versions`` command, which will tell you what ceph version(s) are + running for each type of daemon. + +Instructions +~~~~~~~~~~~~ + +#. Make sure your cluster is stable and healthy (no down or + recovering OSDs). (Optional, but recommended.) + +#. Set the ``noout`` flag for the duration of the upgrade. (Optional, + but recommended.):: + + # ceph osd set noout + +#. Upgrade monitors by installing the new packages and restarting the + monitor daemons. For example, on each monitor host,:: + + # systemctl restart ceph-mon.target + + Once all monitors are up, verify that the monitor upgrade is + complete by looking for the ``octopus`` string in the mon + map. The command:: + + # ceph mon dump | grep min_mon_release + + should report:: + + min_mon_release 15 (nautilus) + + If it doesn't, that implies that one or more monitors hasn't been + upgraded and restarted and/or the quorum does not include all monitors. + +#. Upgrade ``ceph-mgr`` daemons by installing the new packages and + restarting all manager daemons. For example, on each manager host,:: + + # systemctl restart ceph-mgr.target + + Verify the ``ceph-mgr`` daemons are running by checking ``ceph + -s``:: + + # ceph -s + + ... + services: + mon: 3 daemons, quorum foo,bar,baz + mgr: foo(active), standbys: bar, baz + ... + +#. Upgrade all OSDs by installing the new packages and restarting the + ceph-osd daemons on all OSD hosts:: + + # systemctl restart ceph-osd.target + + You can monitor the progress of the OSD upgrades with the + ``ceph versions`` or ``ceph osd versions`` commands:: + + # ceph osd versions + { + "ceph version 13.2.5 (...) mimic (stable)": 12, + "ceph version 15.2.0 (...) octopus (stable)": 22, + } + +#. Upgrade all CephFS MDS daemons. For each CephFS file system, + + #. Reduce the number of ranks to 1. (Make note of the original + number of MDS daemons first if you plan to restore it later.):: + + # ceph status + # ceph fs set max_mds 1 + + #. Wait for the cluster to deactivate any non-zero ranks by + periodically checking the status:: + + # ceph status + + #. Take all standby MDS daemons offline on the appropriate hosts with:: + + # systemctl stop ceph-mds@ + + #. Confirm that only one MDS is online and is rank 0 for your FS:: + + # ceph status + + #. Upgrade the last remaining MDS daemon by installing the new + packages and restarting the daemon:: + + # systemctl restart ceph-mds.target + + #. Restart all standby MDS daemons that were taken offline:: + + # systemctl start ceph-mds.target + + #. Restore the original value of ``max_mds`` for the volume:: + + # ceph fs set max_mds + +#. Upgrade all radosgw daemons by upgrading packages and restarting + daemons on all hosts:: + + # systemctl restart ceph-radosgw.target + +#. Complete the upgrade by disallowing pre-Octopus OSDs and enabling + all new Octopus-only functionality:: + + # ceph osd require-osd-release octopus + +#. If you set ``noout`` at the beginning, be sure to clear it with:: + + # ceph osd unset noout + +#. Verify the cluster is healthy with ``ceph health``. + + If your CRUSH tunables are older than Hammer, Ceph will now issue a + health warning. If you see a health alert to that effect, you can + revert this change with:: + + ceph config set mon mon_crush_min_required_version firefly + + If Ceph does not complain, however, then we recommend you also + switch any existing CRUSH buckets to straw2, which was added back + in the Hammer release. If you have any 'straw' buckets, this will + result in a modest amount of data movement, but generally nothing + too severe.:: + + ceph osd getcrushmap -o backup-crushmap + ceph osd crush set-all-straw-buckets-to-straw2 + + If there are problems, you can easily revert with:: + + ceph osd setcrushmap -i backup-crushmap + + Moving to 'straw2' buckets will unlock a few recent features, like + the `crush-compat` :ref:`balancer ` mode added back in Luminous. + + +#. If you are upgrading from Mimic, or did not already do so when you + upgraded to Nautlius, we recommened you enable the new :ref:`v2 + network protocol `, issue the following command:: + + ceph mon enable-msgr2 + + This will instruct all monitors that bind to the old default port + 6789 for the legacy v1 protocol to also bind to the new 3300 v2 + protocol port. To see if all monitors have been updated,:: + + ceph mon dump + + and verify that each monitor has both a ``v2:`` and ``v1:`` address + listed. + +#. Consider enabling the :ref:`telemetry module ` to send + anonymized usage statistics and crash information to the Ceph + upstream developers. To see what would be reported (without actually + sending any information to anyone),:: + + ceph mgr module enable telemetry + ceph telemetry show + + If you are comfortable with the data that is reported, you can opt-in to + automatically report the high-level cluster metadata with:: + + ceph telemetry on + + For more information about the telemetry module, see :ref:`the + documentation `. + + +Upgrading from pre-Mimic releases (like Luminous) +------------------------------------------------- + +You *must* first upgrade to Mimic (13.2.z) or Nautilus (14.2.z) before +upgrading to Octopus. + + +Upgrade compatibility notes +--------------------------- + +* The RGW "num_rados_handles" has been removed. + If you were using a value of "num_rados_handles" greater than 1 + multiply your current "objecter_inflight_ops" and + "objecter_inflight_op_bytes" paramaeters by the old + "num_rados_handles" to get the same throttle behavior. + +* Ceph now packages python bindings for python3.6 instead of + python3.4, because python3 in EL7/EL8 is now using python3.6 + as the native python3. see the `announcement _` + for more details on the background of this change. + +* librbd now uses a write-around cache policy be default, + replacing the previous write-back cache policy default. + This cache policy allows librbd to immediately complete + write IOs while they are still in-flight to the OSDs. + Subsequent flush requests will ensure all in-flight + write IOs are completed prior to completing. The + librbd cache policy can be controlled via a new + "rbd_cache_policy" configuration option. + +* librbd now includes a simple IO scheduler which attempts to + batch together multiple IOs against the same backing RBD + data block object. The librbd IO scheduler policy can be + controlled via a new "rbd_io_scheduler" configuration + option. + +* RGW: radosgw-admin introduces two subcommands that allow the + managing of expire-stale objects that might be left behind after a + bucket reshard in earlier versions of RGW. One subcommand lists such + objects and the other deletes them. Read the troubleshooting section + of the dynamic resharding docs for details. + +* RGW: Bucket naming restrictions have changed and likely to cause + InvalidBucketName errors. We recommend to set ``rgw_relaxed_s3_bucket_names`` + option to true as a workaround. + +* In the Zabbix Mgr Module there was a typo in the key being send + to Zabbix for PGs in backfill_wait state. The key that was sent + was 'wait_backfill' and the correct name is 'backfill_wait'. + Update your Zabbix template accordingly so that it accepts the + new key being send to Zabbix. + +* zabbix plugin for ceph manager now includes osd and pool + discovery. Update of zabbix_template.xml is needed + to receive per-pool (read/write throughput, diskspace usage) + and per-osd (latency, status, pgs) statistics + +* The format of all date + time stamps has been modified to fully + conform to ISO 8601. The old format (``YYYY-MM-DD + HH:MM:SS.ssssss``) excluded the ``T`` separator between the date and + time and was rendered using the local time zone without any explicit + indication. The new format includes the separator as well as a + ``+nnnn`` or ``-nnnn`` suffix to indicate the time zone, or a ``Z`` + suffix if the time is UTC. For example, + ``2019-04-26T18:40:06.225953+0100``. + + Any code or scripts that was previously parsing date and/or time + values from the JSON or XML structure CLI output should be checked + to ensure it can handle ISO 8601 conformant values. Any code + parsing date or time values from the unstructured human-readable + output should be modified to parse the structured output instead, as + the human-readable output may change without notice. + +* The ``bluestore_no_per_pool_stats_tolerance`` config option has been + replaced with ``bluestore_fsck_error_on_no_per_pool_stats`` + (default: false). The overall default behavior has not changed: + fsck will warn but not fail on legacy stores, and repair will + convert to per-pool stats. + +* The disaster-recovery related 'ceph mon sync force' command has been + replaced with 'ceph daemon <...> sync_force'. + +* The ``osd_recovery_max_active`` option now has + ``osd_recovery_max_active_hdd`` and ``osd_recovery_max_active_ssd`` + variants, each with different default values for HDD and SSD-backed + OSDs, respectively. By default ``osd_recovery_max_active`` now + defaults to zero, which means that the OSD will conditionally use + the HDD or SSD option values. Administrators who have customized + this value may want to consider whether they have set this to a + value similar to the new defaults (3 for HDDs and 10 for SSDs) and, + if so, remove the option from their configuration entirely. + +* monitors now have a `ceph osd info` command that will provide information + on all osds, or provided osds, thus simplifying the process of having to + parse `osd dump` for the same information. + +* The structured output of ``ceph status`` or ``ceph -s`` is now more + concise, particularly the `mgrmap` and `monmap` sections, and the + structure of the `osdmap` section has been cleaned up. + +* A health warning is now generated if the average osd heartbeat ping + time exceeds a configurable threshold for any of the intervals + computed. The OSD computes 1 minute, 5 minute and 15 minute + intervals with average, minimum and maximum values. New + configuration option ``mon_warn_on_slow_ping_ratio`` specifies a + percentage of ``osd_heartbeat_grace`` to determine the threshold. A + value of zero disables the warning. New configuration option + ``mon_warn_on_slow_ping_time`` specified in milliseconds over-rides + the computed value, causes a warning when OSD heartbeat pings take + longer than the specified amount. New admin command ``ceph daemon + mgr.# dump_osd_network [threshold]`` command will list all + connections with a ping time longer than the specified threshold or + value determined by the config options, for the average for any of + the 3 intervals. New admin command ``ceph daemon osd.# + dump_osd_network [threshold]`` will do the same but only including + heartbeats initiated by the specified OSD. + +* Inline data support for CephFS has been deprecated. When setting the flag, + users will see a warning to that effect, and enabling it now requires the + ``--yes-i-really-really-mean-it`` flag. If the MDS is started on a + filesystem that has it enabled, a health warning is generated. Support for + this feature will be removed in a future release. + +* ``ceph {set,unset} full`` is not supported anymore. We have been using + ``full`` and ``nearfull`` flags in OSD map for tracking the fullness status + of a cluster back since the Hammer release, if the OSD map is marked ``full`` + all write operations will be blocked until this flag is removed. In the + Infernalis release and Linux kernel 4.7 client, we introduced the per-pool + full/nearfull flags to track the status for a finer-grained control, so the + clients will hold the write operations if either the cluster-wide ``full`` + flag or the per-pool ``full`` flag is set. This was a compromise, as we + needed to support the cluster with and without per-pool ``full`` flags + support. But this practically defeated the purpose of introducing the + per-pool flags. So, in the Mimic release, the new flags finally took the + place of their cluster-wide counterparts, as the monitor started removing + these two flags from OSD map. So the clients of Infernalis and up can benefit + from this change, as they won't be blocked by the full pools which they are + not writing to. In this release, ``ceph {set,unset} full`` is now considered + as an invalid command. And the clients will continue honoring both the + cluster-wide and per-pool flags to be backward comaptible with pre-infernalis + clusters. + +* The telemetry module now reports more information. + + First, there is a new 'device' channel, enabled by default, that + will report anonymized hard disk and SSD health metrics to + telemetry.ceph.com in order to build and improve device failure + prediction algorithms. If you are not comfortable sharing device + metrics, you can disable that channel first before re-opting-in:: + + ceph config set mgr mgr/telemetry/channel_device false + + Second, we now report more information about CephFS file systems, + including: + + - how many MDS daemons (in total and per file system) + - which features are (or have been) enabled + - how many data pools + - approximate file system age (year + month of creation) + - how many files, bytes, and snapshots + - how much metadata is being cached + + We have also added: + + - which Ceph release the monitors are running + - whether msgr v1 or v2 addresses are used for the monitors + - whether IPv4 or IPv6 addresses are used for the monitors + - whether RADOS cache tiering is enabled (and which mode) + - whether pools are replicated or erasure coded, and + which erasure code profile plugin and parameters are in use + - how many hosts are in the cluster, and how many hosts have each type of daemon + - whether a separate OSD cluster network is being used + - how many RBD pools and images are in the cluster, and how many pools have RBD mirroring enabled + - how many RGW daemons, zones, and zonegroups are present; which RGW frontends are in use + - aggregate stats about the CRUSH map, like which algorithms are used, how + big buckets are, how many rules are defined, and what tunables are in + use + + If you had telemetry enabled, you will need to re-opt-in with:: + + ceph telemetry on + + You can view exactly what information will be reported first with:: + + ceph telemetry show # see everything + ceph telemetry show basic # basic cluster info (including all of the new info) + +* Following invalid settings now are not tolerated anymore + for the command `ceph osd erasure-code-profile set xxx`. + * invalid `m` for "reed_sol_r6_op" erasure technique + * invalid `m` and invalid `w` for "liber8tion" erasure technique + +* New OSD daemon command dump_recovery_reservations which reveals the + recovery locks held (in_progress) and waiting in priority queues. + +* New OSD daemon command dump_scrub_reservations which reveals the + scrub reservations that are held for local (primary) and remote (replica) PGs. + +* Previously, ``ceph tell mgr ...`` could be used to call commands + implemented by mgr modules. This is no longer supported. Since + luminous, using ``tell`` has not been necessary: those same commands + are also accessible without the ``tell mgr`` portion (e.g., ``ceph + tell mgr influx foo`` is the same as ``ceph influx foo``. ``ceph + tell mgr ...`` will now call admin commands--the same set of + commands accessible via ``ceph daemon ...`` when you are logged into + the appropriate host. + +* The ``ceph tell`` and ``ceph daemon`` commands have been unified, + such that all such commands are accessible via either interface. + Note that ceph-mgr tell commands are accessible via either ``ceph + tell mgr ...`` or ``ceph tell mgr. ...``, and it is only + possible to send tell commands to the active daemon (the standbys do + not accept incoming connections over the network). + +* Ceph will now issue a health warning if a RADOS pool as a ``pg_num`` + value that is not a power of two. This can be fixed by adjusting + the pool to a nearby power of two:: + + ceph osd pool set pg_num + + Alternatively, the warning can be silenced with:: + + ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false + +* The format of MDSs in `ceph fs dump` has changed. + +* The ``mds_cache_size`` config option is completely removed. Since luminous, + the ``mds_cache_memory_limit`` config option has been preferred to configure + the MDS's cache limits. + +* The ``pg_autoscale_mode`` is now set to ``on`` by default for newly + created pools, which means that Ceph will automatically manage the + number of PGs. To change this behavior, or to learn more about PG + autoscaling, see :ref:`pg-autoscaler`. Note that existing pools in + upgraded clusters will still be set to ``warn`` by default. + +* The ``upmap_max_iterations`` config option of mgr/balancer has been + renamed to ``upmap_max_optimizations`` to better match its behaviour. + +* ``mClockClientQueue`` and ``mClockClassQueue`` OpQueue + implementations have been removed in favor of of a single + ``mClockScheduler`` implementation of a simpler OSD interface. + Accordingly, the ``osd_op_queue_mclock*`` family of config options + has been removed in favor of the ``osd_mclock_scheduler*`` family + of options. + +* The config subsystem now searches dot ('.') delineated prefixes for + options. That means for an entity like ``client.foo.bar``, it's + overall configuration will be a combination of the global options, + ``client``, ``client.foo``, and ``client.foo.bar``. Previously, + only global, ``client``, and ``client.foo.bar`` options would apply. + This change may affect the configuration for clients that include a + ``.`` in their name. + + Note that this only applies to configuration options in the diff --git a/doc/releases/releases.yml b/doc/releases/releases.yml index 3cd3eec3cf876..fbc32e7861ad1 100644 --- a/doc/releases/releases.yml +++ b/doc/releases/releases.yml @@ -12,6 +12,11 @@ # If a version might represent an actual number (e.g. 0.80) quote it. # releases: + octopus: + releases: + - version: 15.1.0 + released: 2020-01-29 + target_eol: 2021-06-01 nautilus: releases: - version: 14.2.7