From: Rishabh Dave <ridave@redhat.com>
Date: Fri, 18 Oct 2024 14:34:18 +0000 (+0530)
Subject: Merge pull request #59420 from rishabh-d-dave/max-mds-confirm
X-Git-Tag: v20.0.0~802
X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=8978b85bb063870ba9257877143c5f56a729586b;p=ceph.git

Merge pull request #59420 from rishabh-d-dave/max-mds-confirm

mon,cephfs: require confirmation when changing max_mds on unhealthy cluster

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
---

8978b85bb063870ba9257877143c5f56a729586b
diff --cc PendingReleaseNotes
index 1a4e26e747fb,c35924c6e869..8af2a262dff9
--- a/PendingReleaseNotes
+++ b/PendingReleaseNotes
@@@ -12,20 -12,15 +12,28 @@@
    of the column showing the state of a group snapshot in the unformatted CLI
    output is changed from 'STATUS' to 'STATE'. The state of a group snapshot
    that was shown as 'ok' is now shown as 'complete', which is more descriptive.
 +* Based on tests performed at scale on a HDD based Ceph cluster, it was found
 +  that scheduling with mClock was not optimal with multiple OSD shards. For
 +  example, in the test cluster with multiple OSD node failures, the client
 +  throughput was found to be inconsistent across test runs coupled with multiple
 +  reported slow requests. However, the same test with a single OSD shard and
 +  with multiple worker threads yielded significantly better results in terms of
 +  consistency of client and recovery throughput across multiple test runs.
 +  Therefore, as an interim measure until the issue with multiple OSD shards
 +  (or multiple mClock queues per OSD) is investigated and fixed, the following
 +  change to the default HDD OSD shard configuration is made:
 +   - osd_op_num_shards_hdd = 1 (was 5)
 +   - osd_op_num_threads_per_shard_hdd = 5 (was 1)
 +  For more details see https://tracker.ceph.com/issues/66289.
  
+ * CephFS: Modifying the FS setting variable "max_mds" when a cluster is
+   unhealthy now requires users to pass the confirmation flag
+   (--yes-i-really-mean-it). This has been added as a precaution to tell the
+   users that modifying "max_mds" may not help with troubleshooting or recovery
+   effort. Instead, it might further destabilize the cluster.
+ 
+ 
+ 
  >=19.0.0
  
  * cephx: key rotation is now possible using `ceph auth rotate`. Previously,