From: Zac Dover <zac.dover@proton.me>
Date: Fri, 30 Aug 2024 22:43:04 +0000 (+1000)
Subject: doc: Add Squid 19.2.0 release notes
X-Git-Tag: v20.0.0~952^2~11
X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=f78f7950dfa9c15c53c89aa755e356c3369522bd;p=ceph.git

doc: Add Squid 19.2.0 release notes

Add release notes for the first stable release of Ceph Squid (19.2.0).

Signed-off-by: Zac Dover <zac.dover@proton.me>
---

diff --git a/doc/releases/index.rst b/doc/releases/index.rst
index 8a84e19489681..90b9e3168ee05 100644
--- a/doc/releases/index.rst
+++ b/doc/releases/index.rst
@@ -21,6 +21,7 @@ security fixes.
    :maxdepth: 1
    :hidden:
 
+   Squid (v19.2.*) <squid>
    Reef (v18.2.*) <reef>
    Quincy (v17.2.*) <quincy>
 
@@ -59,8 +60,8 @@ receive bug fixes or backports).
 Release timeline
 ----------------
 
-.. ceph_timeline_gantt:: releases.yml reef quincy
-.. ceph_timeline:: releases.yml reef quincy
+.. ceph_timeline_gantt:: releases.yml squid reef quincy
+.. ceph_timeline:: releases.yml squid reef quincy
 
 .. _Reef: reef
 .. _18.2.0: reef#v18-2-0-reef
diff --git a/doc/releases/releases.yml b/doc/releases/releases.yml
index d6a18389567a3..713483cdc266b 100644
--- a/doc/releases/releases.yml
+++ b/doc/releases/releases.yml
@@ -12,6 +12,12 @@
 # If a version might represent an actual number (e.g. 0.80) quote it.
 #
 releases:
+  squid:
+    target_eol: 2026-XX-XX
+    releases:
+      - version: 19.2.0
+        released: 2024-XX-XX
+
   reef:
     target_eol: 2025-08-01
     releases:
diff --git a/doc/releases/squid.rst b/doc/releases/squid.rst
new file mode 100644
index 0000000000000..8c2a65abea992
--- /dev/null
+++ b/doc/releases/squid.rst
@@ -0,0 +1,314 @@
+=====
+Squid
+=====
+
+Squid is the 19th stable release of Ceph.
+
+v19.2.0 Squid
+=============
+
+Ceph
+~~~~
+
+* ceph: a new `--daemon-output-file` switch is available for `ceph tell`
+  commands to dump output to a file local to the daemon. For commands which
+  produce large amounts of output, this avoids a potential spike in memory
+  usage on the daemon, allows for faster streaming writes to a file local to
+  the daemon, and reduces time holding any locks required to execute the
+  command. For analysis, it is necessary to retrieve the file from the host
+  running the daemon manually. Currently, only ``--format=json|json-pretty``
+  are supported.
+* ``cls_cxx_gather`` is marked as deprecated.
+* Tracing: The blkin tracing feature (see
+  https://docs.ceph.com/en/reef/dev/blkin/) is now deprecated in favor of
+  Opentracing
+  (https://docs.ceph.com/en/reef/dev/developer_guide/jaegertracing/) and will
+  be removed in a later release.
+* PG dump: The default output of ``ceph pg dump --format json`` has changed.
+  The default JSON format produces a rather massive output in large clusters
+  and isn't scalable. So we have removed the 'network_ping_times' section from
+  the output. Details in the tracker: https://tracker.ceph.com/issues/57460
+
+CephFS
+~~~~~~
+
+* CephFS: MDS evicts clients which are not advancing their request tids which
+  causes a large buildup of session metadata resulting in the MDS going
+  read-only due to the RADOS operation exceeding the size threshold.
+  `mds_session_metadata_threshold` config controls the maximum size that a
+  (encoded) session metadata can grow.
+* CephFS: A new "mds last-seen" command is available for querying the last time
+  an MDS was in the FSMap, subject to a pruning threshold.
+* CephFS: For clusters with multiple CephFS file systems, all the snap-schedule
+  commands now expect the '--fs' argument.
+* CephFS: The period specifier ``m`` now implies minutes and the period
+  specifier ``M`` now implies months. This has been made consistent with the
+  rest of the system.
+* CephFS: Running the command "ceph fs authorize" for an existing entity now
+  upgrades the entity's capabilities instead of printing an error. It can now
+  also change read/write permissions in a capability that the entity already
+  holds. If the capability passed by user is same as one of the capabilities
+  that the entity already holds, idempotency is maintained.
+* CephFS: Two FS names can now be swapped, optionally along with their IDs,
+  using "ceph fs swap" command. The function of this API is to facilitate
+  file system swaps for disaster recovery. In particular, it avoids situations
+  where a named file system is temporarily missing which would prompt a higher
+  level storage operator (like Rook) to recreate the missing file system.
+  See https://docs.ceph.com/en/latest/cephfs/administration/#file-systems
+  docs for more information.
+* CephFS: Before running the command "ceph fs rename", the filesystem to be
+  renamed must be offline and the config "refuse_client_session" must be set
+  for it. The config "refuse_client_session" can be removed/unset and
+  filesystem can be online after the rename operation is complete.
+* CephFS: Disallow delegating preallocated inode ranges to clients. Config
+  `mds_client_delegate_inos_pct` defaults to 0 which disables async dirops
+  in the kclient.
+* CephFS: MDS log trimming is now driven by a separate thread which tries to
+  trim the log every second (`mds_log_trim_upkeep_interval` config). Also, a
+  couple of configs govern how much time the MDS spends in trimming its logs.
+  These configs are `mds_log_trim_threshold` and `mds_log_trim_decay_rate`.
+* CephFS: Full support for subvolumes and subvolume groups is now available
+* CephFS: The `subvolume snapshot clone` command now depends on the config
+  option `snapshot_clone_no_wait` which is used to reject the clone operation
+  when all the cloner threads are busy. This config option is enabled by
+  default which means that if no cloner threads are free, the clone request
+  errors out with EAGAIN.  The value of the config option can be fetched by
+  using: `ceph config get mgr mgr/volumes/snapshot_clone_no_wait` and it can be
+  disabled by using: `ceph config set mgr mgr/volumes/snapshot_clone_no_wait
+  false`
+  for snap_schedule Manager module.
+* CephFS: Commands ``ceph mds fail`` and ``ceph fs fail`` now require a
+  confirmation flag when some MDSs exhibit health warning MDS_TRIM or
+  MDS_CACHE_OVERSIZED. This is to prevent accidental MDS failover causing
+  further delays in recovery.
+* CephFS: fixes to the implementation of the ``root_squash`` mechanism enabled
+  via cephx ``mds`` caps on a client credential require a new client feature
+  bit, ``client_mds_auth_caps``. Clients using credentials with ``root_squash``
+  without this feature will trigger the MDS to raise a HEALTH_ERR on the
+  cluster, MDS_CLIENTS_BROKEN_ROOTSQUASH. See the documentation on this warning
+  and the new feature bit for more information.
+* CephFS: Expanded removexattr support for cephfs virtual extended attributes.
+  Previously one had to use setxattr to restore the default in order to
+  "remove".  You may now properly use removexattr to remove. You can also now
+  remove layout on root inode, which then will restore layout to default
+  layout.
+* CephFS: cephfs-journal-tool is guarded against running on an online file
+  system.  The 'cephfs-journal-tool --rank <fs_name>:<mds_rank> journal reset'
+  and 'cephfs-journal-tool --rank <fs_name>:<mds_rank> journal reset --force'
+  commands require '--yes-i-really-really-mean-it'.
+* CephFS: "ceph fs clone status" command will now print statistics about clone
+  progress in terms of how much data has been cloned (in both percentage as
+  well as bytes) and how many files have been cloned.
+* CephFS: "ceph status" command will now print a progress bar when cloning is
+  ongoing. If clone jobs are more than the cloner threads, it will print one
+  more progress bar that shows total amount of progress made by both ongoing
+  as well as pending clones. Both progress are accompanied by messages that
+  show number of clone jobs in the respective categories and the amount of
+  progress made by each of them.
+* cephfs-shell: The cephfs-shell utility is now packaged for RHEL 9 / CentOS 9
+  as required python dependencies are now available in EPEL9.
+
+CephX
+~~~~~
+
+* cephx: key rotation is now possible using `ceph auth rotate`. Previously,
+  this was only possible by deleting and then recreating the key.
+
+Dashboard
+~~~~~~~~~
+
+* Dashboard: Rearranged Navigation Layout: The navigation layout has been
+  reorganized for improved usability and easier access to key features.
+* Dashboard: CephFS Improvments
+  * Support for managing CephFS snapshots and clones, as well as snapshot
+    schedule management
+  * Manage authorization capabilities for CephFS resources
+  * Helpers on mounting a CephFS volume
+* Dashboard: RGW Improvements
+  * Support for managing bucket policies
+  * Add/Remove bucket tags
+  * ACL Management
+  * Several UI/UX Improvements to the bucket form
+
+MGR
+~~~
+
+* MGR/REST: The REST manager module will trim requests based on the
+  'max_requests' option.  Without this feature, and in the absence of manual
+  deletion of old requests, the accumulation of requests in the array can lead
+  to Out Of Memory (OOM) issues, resulting in the Manager crashing.
+
+Monitoring
+~~~~~~~~~~
+
+* Monitoring: Grafana dashboards are now loaded into the container at runtime
+  rather than building a grafana image with the grafana dashboards. Official
+  Ceph grafana images can be found in quay.io/ceph/grafana
+* Monitoring: RGW S3 Analytics: A new Grafana dashboard is now available,
+  enabling you to visualize per bucket and user analytics data, including total
+  GETs, PUTs, Deletes, Copies, and list metrics.
+* The ``mon_cluster_log_file_level`` and ``mon_cluster_log_to_syslog_level``
+  options have been removed. Henceforth, users should use the new generic
+  option ``mon_cluster_log_level`` to control the cluster log level verbosity
+  for the cluster log file as well as for all external entities.
+
+RADOS
+~~~~~
+
+* RADOS: ``A POOL_APP_NOT_ENABLED`` health warning will now be reported if the
+  application is not enabled for the pool irrespective of whether the pool is
+  in use or not. Always tag a pool with an application using ``ceph osd pool
+  application enable`` command to avoid reporting of POOL_APP_NOT_ENABLED
+  health warning for that pool. The user might temporarily mute this warning
+  using ``ceph health mute POOL_APP_NOT_ENABLED``.
+* RADOS: `get_pool_is_selfmanaged_snaps_mode` C++ API has been deprecated due
+  to being prone to false negative results.  It's safer replacement is
+  `pool_is_in_selfmanaged_snaps_mode`.
+* RADOS: For bug 62338 (https://tracker.ceph.com/issues/62338), we did not
+  choose to condition the fix on a server flag in order to simplify
+  backporting.  As a result, in rare cases it may be possible for a PG to flip
+  between two acting sets while an upgrade to a version with the fix is in
+  progress.  If you observe this behavior, you should be able to work around it
+  by completing the upgrade or by disabling async recovery by setting
+  osd_async_recovery_min_cost to a very large value on all OSDs until the
+  upgrade is complete: ``ceph config set osd osd_async_recovery_min_cost
+  1099511627776``
+* RADOS: A detailed version of the `balancer status` CLI command in the
+  balancer module is now available. Users may run `ceph balancer status detail`
+  to see more details about which PGs were updated in the balancer's last
+  optimization.  See https://docs.ceph.com/en/latest/rados/operations/balancer/
+  for more information.
+* RADOS: Read balancing may now be managed automatically via the balancer
+  manager module. Users may choose between two new modes: ``upmap-read``, which
+  offers upmap and read optimization simultaneously, or ``read``, which may be
+  used to only optimize reads. For more detailed information see
+  https://docs.ceph.com/en/latest/rados/operations/read-balancer/#online-optimization.
+
+RBD
+~~~
+
+* RBD: When diffing against the beginning of time (`fromsnapname == NULL`) in
+  fast-diff mode (`whole_object == true` with `fast-diff` image feature enabled
+  and valid), diff-iterate is now guaranteed to execute locally if exclusive
+  lock is available.  This brings a dramatic performance improvement for QEMU
+  live disk synchronization and backup use cases.
+* RBD: The ``try-netlink`` mapping option for rbd-nbd has become the default
+  and is now deprecated. If the NBD netlink interface is not supported by the
+  kernel, then the mapping is retried using the legacy ioctl interface.
+* RBD: The option ``--image-id`` has been added to `rbd children` CLI command,
+  so it can be run for images in the trash.
+* RBD: `Image::access_timestamp` and `Image::modify_timestamp` Python APIs now
+  return timestamps in UTC.
+* RBD: Support for cloning from non-user type snapshots is added.  This is
+  intended primarily as a building block for cloning new groups from group
+  snapshots created with `rbd group snap create` command, but has also been
+  exposed via the new `--snap-id` option for `rbd clone` command.
+* RBD: The output of `rbd snap ls --all` command now includes the original
+  type for trashed snapshots.
+* RBD: `RBD_IMAGE_OPTION_CLONE_FORMAT` option has been exposed in Python
+  bindings via `clone_format` optional parameter to `clone`, `deep_copy` and
+  `migration_prepare` methods.
+* RBD: `RBD_IMAGE_OPTION_FLATTEN` option has been exposed in Python bindings
+  via `flatten` optional parameter to `deep_copy` and `migration_prepare`
+  methods.
+
+RGW
+~~~
+
+* RGW: GetObject and HeadObject requests now return a x-rgw-replicated-at
+  header for replicated objects. This timestamp can be compared against the
+  Last-Modified header to determine how long the object took to replicate.
+* RGW: S3 multipart uploads using Server-Side Encryption now replicate
+  correctly in multi-site. Previously, the replicas of such objects were
+  corrupted on decryption.  A new tool, ``radosgw-admin bucket resync encrypted
+  multipart``, can be used to identify these original multipart uploads. The
+  ``LastModified`` timestamp of any identified object is incremented by 1ns to
+  cause peer zones to replicate it again.  For multi-site deployments that make
+  any use of Server-Side Encryption, we recommended running this command
+  against every bucket in every zone after all zones have upgraded.
+* RGW: Introducing a new data layout for the Topic metadata associated with S3
+  Bucket Notifications, where each Topic is stored as a separate RADOS object
+  and the bucket notification configuration is stored in a bucket attribute.
+  This new representation supports multisite replication via metadata sync and
+  can scale to many topics. This is on by default for new deployments, but is
+  is not enabled by default on upgrade. Once all radosgws have upgraded (on all
+  zones in a multisite configuration), the ``notification_v2`` zone feature can
+  be enabled to migrate to the new format. See
+  https://docs.ceph.com/en/squid/radosgw/zone-features for details. The "v1"
+  format is now considered deprecated and may be removed after 2 major releases.
+* RGW: New tools have been added to radosgw-admin for identifying and
+  correcting issues with versioned bucket indexes. Historical bugs with the
+  versioned bucket index transaction workflow made it possible for the index
+  to accumulate extraneous "book-keeping" olh entries and plain placeholder
+  entries. In some specific scenarios where clients made concurrent requests
+  referencing the same object key, it was likely that a lot of extra index
+  entries would accumulate. When a significant number of these entries are
+  present in a single bucket index shard, they can cause high bucket listing
+  latencies and lifecycle processing failures. To check whether a versioned
+  bucket has unnecessary olh entries, users can now run ``radosgw-admin
+  bucket check olh``. If the ``--fix`` flag is used, the extra entries will
+  be safely removed. A distinct issue from the one described thus far, it is
+  also possible that some versioned buckets are maintaining extra unlinked
+  objects that are not listable from the S3/ Swift APIs. These extra objects
+  are typically a result of PUT requests that exited abnormally, in the middle
+  of a bucket index transaction - so the client would not have received a
+  successful response. Bugs in prior releases made these unlinked objects easy
+  to reproduce with any PUT request that was made on a bucket that was actively
+  resharding. Besides the extra space that these hidden, unlinked objects
+  consume, there can be another side effect in certain scenarios, caused by
+  the nature of the failure mode that produced them, where a client of a bucket
+  that was a victim of this bug may find the object associated with the key to
+  be in an inconsistent state. To check whether a versioned bucket has unlinked
+  entries, users can now run ``radosgw-admin bucket check unlinked``. If the
+  ``--fix`` flag is used, the unlinked objects will be safely removed. Finally,
+  a third issue made it possible for versioned bucket index stats to be
+  accounted inaccurately. The tooling for recalculating versioned bucket stats
+  also had a bug, and was not previously capable of fixing these inaccuracies.
+  This release resolves those issues and users can now expect that the existing
+  ``radosgw-admin bucket check`` command will produce correct results. We
+  recommend that users with versioned buckets, especially those that existed
+  on prior releases, use these new tools to check whether their buckets are
+  affected and to clean them up accordingly.
+* RGW: The User Accounts feature unlocks several new AWS-compatible IAM APIs
+  for the self-service management of users, keys, groups, roles, policy and
+  more. Existing users can be adopted into new accounts. This process is
+  optional but irreversible. See https://docs.ceph.com/en/squid/radosgw/account
+  and https://docs.ceph.com/en/squid/radosgw/iam for details.
+* RGW: On startup, radosgw and radosgw-admin now validate the ``rgw_realm``
+  config option. Previously, they would ignore invalid or missing realms and go
+  on to load a zone/zonegroup in a different realm. If startup fails with a
+  "failed to load realm" error, fix or remove the ``rgw_realm`` option.
+* RGW: The radosgw-admin commands ``realm create`` and ``realm pull`` no longer
+  set the default realm without ``--default``.
+* RGW: Fixed an S3 Object Lock bug with PutObjectRetention requests that
+  specify a RetainUntilDate after the year 2106. This date was truncated to 32
+  bits when stored, so a much earlier date was used for object lock
+  enforcement.  This does not effect PutBucketObjectLockConfiguration where a
+  duration is given in Days.  The RetainUntilDate encoding is fixed for new
+  PutObjectRetention requests, but cannot repair the dates of existing object
+  locks. Such objects can be identified with a HeadObject request based on the
+  x-amz-object-lock-retain-until-date response header.
+* S3 ``Get/HeadObject`` now supports the query parameter ``partNumber`` to read
+  a specific part of a completed multipart upload.
+* RGW: The SNS CreateTopic API now enforces the same topic naming requirements
+  as AWS: Topic names must be made up of only uppercase and lowercase ASCII
+  letters, numbers, underscores, and hyphens, and must be between 1 and 256
+  characters long.
+* RGW: Notification topics are now owned by the user that created them.  By
+  default, only the owner can read/write their topics. Topic policy documents
+  are now supported to grant these permissions to other users. Preexisting
+  topics are treated as if they have no owner, and any user can read/write them
+  using the SNS API.  If such a topic is recreated with CreateTopic, the
+  issuing user becomes the new owner.  For backward compatibility, all users
+  still have permission to publish bucket notifications to topics owned by
+  other users. A new configuration parameter,
+  ``rgw_topic_require_publish_policy``, can be enabled to deny ``sns:Publish``
+  permissions unless explicitly granted by topic policy.
+* RGW: Fix issue with persistent notifications where the changes to topic param
+  that were modified while persistent notifications were in the queue will be
+  reflected in notifications.  So if user sets up topic with incorrect config
+  (password/ssl) causing failure while delivering the notifications to broker,
+  can now modify the incorrect topic attribute and on retry attempt to delivery
+  the notifications, new configs will be used.
+* RGW: in bucket notifications, the ``principalId`` inside ``ownerIdentity``
+  now contains the complete user ID, prefixed with the tenant ID.