From: Rishabh Dave Date: Fri, 18 Oct 2024 14:34:18 +0000 (+0530) Subject: Merge pull request #59420 from rishabh-d-dave/max-mds-confirm X-Git-Tag: v20.0.0~802 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=8978b85bb063870ba9257877143c5f56a729586b;p=ceph.git Merge pull request #59420 from rishabh-d-dave/max-mds-confirm mon,cephfs: require confirmation when changing max_mds on unhealthy cluster Reviewed-by: Patrick Donnelly Reviewed-by: Venky Shankar --- 8978b85bb063870ba9257877143c5f56a729586b diff --cc PendingReleaseNotes index 1a4e26e747fb,c35924c6e869..8af2a262dff9 --- a/PendingReleaseNotes +++ b/PendingReleaseNotes @@@ -12,20 -12,15 +12,28 @@@ of the column showing the state of a group snapshot in the unformatted CLI output is changed from 'STATUS' to 'STATE'. The state of a group snapshot that was shown as 'ok' is now shown as 'complete', which is more descriptive. +* Based on tests performed at scale on a HDD based Ceph cluster, it was found + that scheduling with mClock was not optimal with multiple OSD shards. For + example, in the test cluster with multiple OSD node failures, the client + throughput was found to be inconsistent across test runs coupled with multiple + reported slow requests. However, the same test with a single OSD shard and + with multiple worker threads yielded significantly better results in terms of + consistency of client and recovery throughput across multiple test runs. + Therefore, as an interim measure until the issue with multiple OSD shards + (or multiple mClock queues per OSD) is investigated and fixed, the following + change to the default HDD OSD shard configuration is made: + - osd_op_num_shards_hdd = 1 (was 5) + - osd_op_num_threads_per_shard_hdd = 5 (was 1) + For more details see https://tracker.ceph.com/issues/66289. + * CephFS: Modifying the FS setting variable "max_mds" when a cluster is + unhealthy now requires users to pass the confirmation flag + (--yes-i-really-mean-it). This has been added as a precaution to tell the + users that modifying "max_mds" may not help with troubleshooting or recovery + effort. Instead, it might further destabilize the cluster. + + + >=19.0.0 * cephx: key rotation is now possible using `ceph auth rotate`. Previously,