]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
2 years agodoc/glossary: update bluestore entry 51694/head
Zac Dover [Mon, 22 May 2023 21:41:09 +0000 (07:41 +1000)]
doc/glossary: update bluestore entry

Update the BlueStore entry in the glossary, explaining that as of Reef
BlueStore and only BlueStore (and not FileStore) is the storage backend
for Ceph.

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit bcee264276128f622c35e3aab81fdecb2b8afc10)

2 years agoMerge pull request #50979 from batrick/i59294
Yuri Weinstein [Mon, 22 May 2023 23:27:51 +0000 (19:27 -0400)]
Merge pull request #50979 from batrick/i59294

quincy: MgrMonitor: batch commit OSDMap and MgrMap mutations

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
2 years agoMerge pull request #50964 from ajarr/wip-58998-quincy
Yuri Weinstein [Mon, 22 May 2023 23:26:45 +0000 (19:26 -0400)]
Merge pull request #50964 from ajarr/wip-58998-quincy

quincy: mgr: store names of modules that register RADOS clients in the MgrMap

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2 years agoMerge pull request #50893 from cfsnyder/wip-59329-quincy
Yuri Weinstein [Mon, 22 May 2023 23:26:10 +0000 (19:26 -0400)]
Merge pull request #50893 from cfsnyder/wip-59329-quincy

quincy: kv/RocksDBStore: Add CompactOnDeletion support

Reviewed-by: Igor Fedotov <ifedotov@suse.com>
2 years agoMerge pull request #50693 from kamoltat/wip-ksirivad-backport-quincy-50334
Yuri Weinstein [Mon, 22 May 2023 23:25:27 +0000 (19:25 -0400)]
Merge pull request #50693 from kamoltat/wip-ksirivad-backport-quincy-50334

quincy: pybind/mgr/pg_autoscaler: Reorderd if statement for the func: _maybe_adjust

Reviewed-by: Laura Flores <lflores@redhat.com>
2 years agoMerge pull request #50480 from ljflores/wip-58954-quincy
Yuri Weinstein [Mon, 22 May 2023 23:24:51 +0000 (19:24 -0400)]
Merge pull request #50480 from ljflores/wip-58954-quincy

quincy: mgr/telemetry: make sure histograms are formatted in `all` commands

Reviewed-by: Yaarit Hatuka <yaarithatuka@gmail.com>
2 years agoMerge pull request #51620 from zdover23/wip-doc-2023-05-21-backport-51618-to-quincy
zdover23 [Mon, 22 May 2023 01:27:31 +0000 (11:27 +1000)]
Merge pull request #51620 from zdover23/wip-doc-2023-05-21-backport-51618-to-quincy

quincy: doc: Add missing `ceph` command in documentation section `REPLACING A…

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2 years agodoc: Add missing `ceph` command in documentation section `REPLACING AN OSD` 51620/head
Alexander Proschek [Sat, 20 May 2023 21:06:09 +0000 (14:06 -0700)]
doc: Add missing `ceph` command in documentation section `REPLACING AN OSD`

Signed-off-by: Alexander Proschek <alexander.proschek@protonmail.com>
Signed-off-by: Alexander Proschek <alexander.proschek@protonmail.com>
(cherry picked from commit 0557d5e465556adba6d25db62a40ba55a5dd2400)

2 years agoMerge pull request #51596 from zdover23/wip-doc-2023-05-20-backport-51594-to-quincy
zdover23 [Fri, 19 May 2023 20:19:48 +0000 (06:19 +1000)]
Merge pull request #51596 from zdover23/wip-doc-2023-05-20-backport-51594-to-quincy

quincy: doc/rados: edit data-placement.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2 years agodoc/rados: edit data-placement.rst 51596/head
Zac Dover [Fri, 19 May 2023 16:26:45 +0000 (02:26 +1000)]
doc/rados: edit data-placement.rst

Edit doc/rados/data-placement.rst.

Co-authored-by: Cole Mitchell <cole.mitchell@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 32600c27c4dca6b9d5fae9892c0a1660b672781c)

2 years agoMerge pull request #51586 from zdover23/wip-doc-2023-05-19-backport-51580-to-quincy
Anthony D'Atri [Fri, 19 May 2023 12:13:53 +0000 (08:13 -0400)]
Merge pull request #51586 from zdover23/wip-doc-2023-05-19-backport-51580-to-quincy

quincy: doc/radosgw: explain multisite dynamic sharding

2 years agodoc/radosgw: explain multisite dynamic sharding 51586/head
Zac Dover [Thu, 18 May 2023 21:07:02 +0000 (07:07 +1000)]
doc/radosgw: explain multisite dynamic sharding

Add a note to doc/radosgw/dynamicresharding.rst and a note to
doc/radosgw/multisite.rst that explains that dynamic resharding is not
supported in releases prior to Reef.

This commit is made in response to a request from Mathias Chapelain.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit d4ed4223d914328361528990f89f1ee4acd30e79)

2 years agoMerge pull request #51577 from zdover23/wip-doc-2023-05-19-backport-51572-to-quincy
Anthony D'Atri [Thu, 18 May 2023 22:42:16 +0000 (18:42 -0400)]
Merge pull request #51577 from zdover23/wip-doc-2023-05-19-backport-51572-to-quincy

quincy: doc/rados: line-edit devices.rst

2 years agodoc/rados: line-edit devices.rst 51577/head
Zac Dover [Thu, 18 May 2023 14:13:41 +0000 (00:13 +1000)]
doc/rados: line-edit devices.rst

Edit doc/rados/operations/devices.rst.

Co-authored-by: Cole Mitchell <cole.mitchell@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 8d589b43d76a4e291c96c3750d068dba18eb9309)

2 years agoMerge pull request #51490 from zdover23/wip-doc-2023-05-16-backport-51485-to-quincy
zdover23 [Thu, 18 May 2023 14:50:20 +0000 (00:50 +1000)]
Merge pull request #51490 from zdover23/wip-doc-2023-05-16-backport-51485-to-quincy

quincy: doc/start/os-recommendations: drop 4.14 kernel and reword guidance

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2 years agoMerge pull request #51543 from zdover23/wip-doc-2023-05-18-backport-51534-to-quincy
Anthony D'Atri [Wed, 17 May 2023 22:44:15 +0000 (18:44 -0400)]
Merge pull request #51543 from zdover23/wip-doc-2023-05-18-backport-51534-to-quincy

quincy: doc/cephfs: line-edit "Mirroring Module"

2 years agodoc/cephfs: line-edit "Mirroring Module" 51543/head
Zac Dover [Wed, 17 May 2023 12:25:38 +0000 (22:25 +1000)]
doc/cephfs: line-edit "Mirroring Module"

Line-edit the "Mirroring Module" section of
doc/cephfs/cephfs-mirroring.rst. Add prompts and formatting where such
things contribute to the realization of adequate sentences.

This commit is a follow-up to https://github.com/ceph/ceph/pull/51505.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit dd8855d9a934bcdd6a026f1308ba7410b1e143e3)

2 years agoMerge pull request #51521 from zdover23/wip-doc-2023-05-17-backport-51505-to-quincy
Anthony D'Atri [Wed, 17 May 2023 12:56:56 +0000 (08:56 -0400)]
Merge pull request #51521 from zdover23/wip-doc-2023-05-17-backport-51505-to-quincy

quincy: doc: explain cephfs mirroring `peer_add` step in detail

2 years agoMerge pull request #51525 from aaSharma14/wip-61179-quincy
Nizamudeen A [Wed, 17 May 2023 12:30:18 +0000 (18:00 +0530)]
Merge pull request #51525 from aaSharma14/wip-61179-quincy

quincy: mgr/dashboard: fix regression caused by cephPgImabalance alert

Reviewed-by: Nizamudeen A <nia@redhat.com>
2 years agomgr/dashboard: fix regression caused by cephPgImabalance alert 51525/head
Aashish Sharma [Mon, 8 May 2023 07:19:13 +0000 (12:49 +0530)]
mgr/dashboard: fix regression caused by cephPgImabalance alert

because of an earlier fix delivered, there is a regression caused by it
due to which alerts are not getting displayed in the active alerts tab.
This PR intends to fix this issue.

Fixes: https://tracker.ceph.com/issues/59666
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit d0a1431fb836f1dd227df85f9e75e098edfdeac9)

2 years agodoc: explain cephfs mirroring `peer_add` step in detail 51521/head
Venky Shankar [Tue, 16 May 2023 05:25:34 +0000 (10:55 +0530)]
doc: explain cephfs mirroring `peer_add` step in detail

@zdover23 reached out regarding missing explanation for `peer_add`
step in cephfs mirroring documentation. Add some explanation and
and example to make the step clear.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 6a6e887ff1f7f7d76db7f30f8410783b2f8153b0)

2 years agoMerge pull request #51503 from zdover23/wip-doc-2023-05-16-backport-51492-to-quincy
Anthony D'Atri [Wed, 17 May 2023 00:49:59 +0000 (20:49 -0400)]
Merge pull request #51503 from zdover23/wip-doc-2023-05-16-backport-51492-to-quincy

quincy: doc/start: KRBD feature flag support note

2 years agodoc/start: KRBD feature flag support note 51503/head
Zac Dover [Mon, 15 May 2023 17:04:43 +0000 (03:04 +1000)]
doc/start: KRBD feature flag support note

Add KRBD feature flag support note to doc/start/os-recommendations.rst.

This change was suggested by Anthony D'Atri in https://github.com/ceph/ceph/pull/51485.

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 2a619dba2d22749e6facaf8dd0d370e16a1672c4)

2 years agoMerge pull request #51478 from zdover23/wip-doc-2023-05-15-backport-51473-to-quincy
Anthony D'Atri [Mon, 15 May 2023 16:51:46 +0000 (12:51 -0400)]
Merge pull request #51478 from zdover23/wip-doc-2023-05-15-backport-51473-to-quincy

quincy: doc/rados: edit devices.rst

2 years agodoc/start/os-recommendations: drop 4.14 kernel and reword guidance 51490/head
Ilya Dryomov [Fri, 12 May 2023 11:55:32 +0000 (13:55 +0200)]
doc/start/os-recommendations: drop 4.14 kernel and reword guidance

The 4.14 LTS kernel has less than a year left in terms of maintenance,
drop it.

Also, the current wording with an explicit list of kernels tends to go
stale: it's missing the latest 6.1 LTS kernel.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit b697f96f3872f178f024e85799445386204a96e1)

2 years agodoc/rados: edit devices.rst 51478/head
Zac Dover [Mon, 15 May 2023 01:01:19 +0000 (11:01 +1000)]
doc/rados: edit devices.rst

Line-edit doc/rados/operations/devices.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 8321b457a25a4394439f908c500091ce30e0736a)

2 years agoMerge pull request #51470 from zdover23/wip-doc-2023-05-14-backport-51175-to-quincy
Anthony D'Atri [Sun, 14 May 2023 11:07:08 +0000 (07:07 -0400)]
Merge pull request #51470 from zdover23/wip-doc-2023-05-14-backport-51175-to-quincy

quincy: doc: add link to "documenting ceph" to index.rst

2 years agodoc: add link to "documenting ceph" to index.rst 51470/head
Zac Dover [Fri, 21 Apr 2023 20:59:04 +0000 (22:59 +0200)]
doc: add link to "documenting ceph" to index.rst

Add a link to the landing page of docs.ceph.com to direct documentation
contributors to documentation-related information.

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 155a382cb2e8b80dca260ca7abdc3cc89c805edb)

2 years agoMerge pull request #51466 from zdover23/wip-doc-2023-05-13-backport-51463-to-quincy
zdover23 [Sat, 13 May 2023 14:18:44 +0000 (00:18 +1000)]
Merge pull request #51466 from zdover23/wip-doc-2023-05-13-backport-51463-to-quincy

quincy: doc/cephfs: edit fs-volumes.rst (1 of x)

Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
2 years agodoc/cephfs: edit fs-volumes.rst (1 of x) 51466/head
Zac Dover [Fri, 12 May 2023 15:49:14 +0000 (01:49 +1000)]
doc/cephfs: edit fs-volumes.rst (1 of x)

Edit the syntax of the English language in the file
doc/cephfs/fs-volumes.rst up to (but not including) the section called
"FS Subvolumes".

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit a1184070a1a3d2f6c1462c62f88fe70df5626c36)

2 years agoMerge pull request #51459 from zdover23/wip-doc-2023-05-12-backport-51458-to-quincy
zdover23 [Fri, 12 May 2023 13:28:55 +0000 (23:28 +1000)]
Merge pull request #51459 from zdover23/wip-doc-2023-05-12-backport-51458-to-quincy

quincy: doc/cephfs: rectify prompts in fs-volumes.rst

Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
2 years agodoc/cephfs: rectify prompts in fs-volumes.rst 51459/head
Zac Dover [Fri, 12 May 2023 10:35:25 +0000 (20:35 +1000)]
doc/cephfs: rectify prompts in fs-volumes.rst

Make sure all prompts are unselectable. This PR is meant to be
backported to Reef, Quincy, and Pacific, to get all of the prompts into
a fit state so that a line-edit can be performed on the Englsh language
in this file.

Follows https://github.com/ceph/ceph/pull/51427.

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 1f88f10fe6d2069d3d474fe490e69a809afb1f56)

2 years agoMerge pull request #51435 from zdover23/wip-doc-2023-05-11-backport-51427-to-quincy
zdover23 [Fri, 12 May 2023 12:43:04 +0000 (22:43 +1000)]
Merge pull request #51435 from zdover23/wip-doc-2023-05-11-backport-51427-to-quincy

quincy: doc/cephfs: fix prompts in fs-volumes.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2 years agodoc/cephfs: fix prompts in fs-volumes.rst 51435/head
Zac Dover [Wed, 10 May 2023 14:52:50 +0000 (00:52 +1000)]
doc/cephfs: fix prompts in fs-volumes.rst

Fixed a regression introduced in
e5355e3d66e1438d51de6b57eae79fab47cd0184 that broke the unselectable
prompts in the RST.

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit e019948783adf41207d70e8cd2540d335e07b80b)

2 years agoMerge pull request #51372 from zdover23/wip-doc-2023-05-06-backport-51359-to-quincy
zdover23 [Wed, 10 May 2023 13:23:14 +0000 (23:23 +1000)]
Merge pull request #51372 from zdover23/wip-doc-2023-05-06-backport-51359-to-quincy

quincy: doc/cephfs: repairing inaccessible FSes

Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
2 years agoMerge pull request #51420 from zdover23/wip-doc-2023-05-10-backport-51403-to-quincy
Anthony D'Atri [Wed, 10 May 2023 12:24:07 +0000 (08:24 -0400)]
Merge pull request #51420 from zdover23/wip-doc-2023-05-10-backport-51403-to-quincy

quincy: doc/start: fix "Planet Ceph" link

2 years agodoc/start: fix "Planet Ceph" link 51420/head
Zac Dover [Tue, 9 May 2023 03:39:10 +0000 (13:39 +1000)]
doc/start: fix "Planet Ceph" link

Fix a link to Planet Ceph on the doc/start/get-involved.rst page.

Reported 2023 Apr 21, here:
https://pad.ceph.com/p/Report_Documentation_Bugs

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 67ebc206648144e533b627b9c22f29695764b26b)

2 years agoMerge pull request #51398 from zdover23/wip-doc-2023-05-09-backport-51394-to-quincy
Anthony D'Atri [Tue, 9 May 2023 08:48:53 +0000 (04:48 -0400)]
Merge pull request #51398 from zdover23/wip-doc-2023-05-09-backport-51394-to-quincy

quincy: doc/dev/encoding.txt: update per std::optional

2 years agoMerge pull request #51401 from zdover23/wip-doc-2023-05-09-backport-51392-to-quincy
Anthony D'Atri [Tue, 9 May 2023 08:37:46 +0000 (04:37 -0400)]
Merge pull request #51401 from zdover23/wip-doc-2023-05-09-backport-51392-to-quincy

quincy: doc: update multisite doc

2 years agodoc: update multisite doc 51401/head
parth-gr [Mon, 8 May 2023 13:53:29 +0000 (19:23 +0530)]
doc: update multisite doc

cmd for getting zone group was spelled incorrectly
Updated to rdosgw-admin

Signed-off-by: parth-gr <paarora@redhat.com>
(cherry picked from commit edab93b2f15b19f05a86aab499ba11b56135aaf3)

2 years agodoc/dev/encoding.txt: update per std::optional 51398/head
Radoslaw Zarzynski [Mon, 8 May 2023 14:41:22 +0000 (14:41 +0000)]
doc/dev/encoding.txt: update per std::optional

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 622829cebcca8ae4ec6f0463a4d74c909998a92d)

2 years agoMerge pull request #49973 from sseshasa/wip-quincy-fix-mclk-rec-backfill-cost
Neha Ojha [Mon, 8 May 2023 18:49:22 +0000 (11:49 -0700)]
Merge pull request #49973 from sseshasa/wip-quincy-fix-mclk-rec-backfill-cost

quincy: osd: mClock recovery/backfill cost fixes

Reviewed-by: Samuel Just <sjust@redhat.com>
2 years agoMerge pull request #51390 from zdover23/wip-doc-2023-05-08-backport-51387-to-quincy
zdover23 [Mon, 8 May 2023 13:37:13 +0000 (23:37 +1000)]
Merge pull request #51390 from zdover23/wip-doc-2023-05-08-backport-51387-to-quincy

quincy: doc/rados: stretch-mode.rst (other commands)

Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
2 years agodoc/rados: stretch-mode.rst (other commands) 51390/head
Zac Dover [Mon, 8 May 2023 11:08:49 +0000 (21:08 +1000)]
doc/rados: stretch-mode.rst (other commands)

Edit the "Other Commands" section of
doc/rados/operations/stretch-mode.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit fde33f1a5b8dbd03c096140887e04038a82f3076)

2 years agoMerge pull request #51367 from rhcs-dashboard/quincy-crud
Nizamudeen A [Mon, 8 May 2023 10:45:07 +0000 (16:15 +0530)]
Merge pull request #51367 from rhcs-dashboard/quincy-crud

quincy: mgr/dashboard CRUD component backport

Reviewed-by: Pegonzal <NOT@FOUND>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
2 years agoqa/: Override mClock profile to 'high_recovery_ops' for qa tests 49973/head
Sridhar Seshasayee [Sat, 29 Apr 2023 04:58:04 +0000 (10:28 +0530)]
qa/: Override mClock profile to 'high_recovery_ops' for qa tests

The qa tests are not client I/O centric and mostly focus on triggering
recovery/backfills and monitor them for completion within a finite amount
of time. The same holds true for scrub operations.

Therefore, an mClock profile that optimizes background operations is a
better fit for qa related tests. The osd_mclock_profile is therefore
globally overriden to 'high_recovery_ops' profile for the Rados suite as
it fits the requirement.

Also, many standalone tests expect recovery and scrub operations to
complete within a finite time. To ensure this, the osd_mclock_profile
options is set to 'high_recovery_ops' as part of the run_osd() function
in ceph-helpers.sh.

A subset of standalone tests explicitly used 'high_recovery_ops' profile.
Since the profile is now set as part of run_osd(), the earlier overrides
are redundant and therefore removed from the tests.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agodoc/: Modify mClock configuration documentation to reflect profile changes
Sridhar Seshasayee [Tue, 11 Apr 2023 17:57:05 +0000 (23:27 +0530)]
doc/: Modify mClock configuration documentation to reflect profile changes

Modify the relevant documentation to reflect:

- change in the default mClock profile to 'balanced'
- new allocations for ops across mClock profiles
- change in the osd_max_backfills limit
- miscellaneous changes related to warnings.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agocommon/options/osd.yaml.in: Change mclock max sequential bandwidth for SSDs
Sridhar Seshasayee [Tue, 11 Apr 2023 16:48:51 +0000 (22:18 +0530)]
common/options/osd.yaml.in: Change mclock max sequential bandwidth for SSDs

The osd_mclock_max_sequential_bandwidth_ssd is changed to 1200 MiB/s as
a reasonable middle ground considering the broad range of SSD capabilities.
This allows the mClock's cost model to extract the SSDs capability
depending on the cost of the IO being performed.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd/: Retain the default osd_max_backfills limit to 1 for mClock
Sridhar Seshasayee [Tue, 11 Apr 2023 16:28:35 +0000 (21:58 +0530)]
osd/: Retain the default osd_max_backfills limit to 1 for mClock

The earlier limit of 3 was still aggressive enough to have an impact on
the client and other competing operations. Retain the current default
for mClock. This can be modified if necessary after setting the
osd_mclock_override_recovery_settings option.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agocommon/options/osd.yaml.in: change mclock profile default to balanced
Samuel Just [Tue, 11 Apr 2023 15:15:38 +0000 (08:15 -0700)]
common/options/osd.yaml.in: change mclock profile default to balanced

Let's use the middle profile as the default.
Modify the standalone tests accordingly.

Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd/scheduler/mClockScheduler: avoid limits for recovery
Samuel Just [Tue, 11 Apr 2023 15:10:04 +0000 (08:10 -0700)]
osd/scheduler/mClockScheduler: avoid limits for recovery

Now that recovery operations are split between background_recovery and
background_best_effort, rebalance qos params to avoid penalizing
background_recovery while idle.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: add counters for ops delayed due to degraded|unreadable target
Samuel Just [Mon, 10 Apr 2023 21:18:49 +0000 (14:18 -0700)]
osd/: add counters for ops delayed due to degraded|unreadable target

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: add counters for queue latency for PGRecovery[Context]
Samuel Just [Thu, 6 Apr 2023 21:15:02 +0000 (14:15 -0700)]
osd/: add counters for queue latency for PGRecovery[Context]

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: add per-op latency averages for each recovery related message
Samuel Just [Thu, 6 Apr 2023 20:50:48 +0000 (20:50 +0000)]
osd/: add per-op latency averages for each recovery related message

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: differentiate priority for PGRecovery[Context]
Samuel Just [Thu, 6 Apr 2023 07:04:05 +0000 (00:04 -0700)]
osd/: differentiate priority for PGRecovery[Context]

PGs with degraded objects should be higher priority.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: add MSG_OSD_PG_(BACKFILL|BACKFILL_REMOVE|SCAN) as recovery messages
Samuel Just [Thu, 6 Apr 2023 05:57:48 +0000 (22:57 -0700)]
osd/: add MSG_OSD_PG_(BACKFILL|BACKFILL_REMOVE|SCAN) as recovery messages

Otherwise, these end up as PGOpItem and therefore as immediate:

class PGOpItem : public PGOpQueueable {
...
  op_scheduler_class get_scheduler_class() const final {
    auto type = op->get_req()->get_type();
    if (type == CEPH_MSG_OSD_OP ||
  type == CEPH_MSG_OSD_BACKOFF) {
      return op_scheduler_class::client;
    } else {
      return op_scheduler_class::immediate;
    }
  }
...
};

This was probably causing a bunch of extra interference with client
ops.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: differentiate scheduler class for undersized/degraded vs data movement
Samuel Just [Thu, 6 Apr 2023 05:57:42 +0000 (22:57 -0700)]
osd/: differentiate scheduler class for undersized/degraded vs data movement

Recovery operations on pgs/objects that have fewer than the configured
number of copies should be treated more urgently than operations on
pgs/objects that simply need to be moved to a new location.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/.../OpSchedulerItem: add MSG_OSD_PG_PULL to is_recovery_msg
Samuel Just [Thu, 6 Apr 2023 04:30:18 +0000 (04:30 +0000)]
osd/.../OpSchedulerItem: add MSG_OSD_PG_PULL to is_recovery_msg

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: move PGRecoveryMsg check from osd into PGRecoveryMsg::is_recovery_msg
Samuel Just [Thu, 6 Apr 2023 04:23:23 +0000 (04:23 +0000)]
osd/: move PGRecoveryMsg check from osd into PGRecoveryMsg::is_recovery_msg

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/: move get_recovery_op_priority into PeeringState next to get_*_priority
Samuel Just [Thu, 6 Apr 2023 03:45:19 +0000 (03:45 +0000)]
osd/: move get_recovery_op_priority into PeeringState next to get_*_priority

Consolidate methods governing recovery scheduling in PeeringState.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/scheduler: simplify qos specific params in OpSchedulerItem
Samuel Just [Tue, 4 Apr 2023 23:34:17 +0000 (23:34 +0000)]
osd/scheduler: simplify qos specific params in OpSchedulerItem

is_qos_item() was only used in operator<< for OpSchedulerItem.  However,
it's actually useful to see priority for mclock items since it affects
whether it goes into the immediate queues and, for some types, the
class.  Unconditionally display both class_id and priority.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/scheduler: remove unused PGOpItem::maybe_get_mosd_op
Samuel Just [Tue, 4 Apr 2023 23:22:59 +0000 (23:22 +0000)]
osd/scheduler: remove unused PGOpItem::maybe_get_mosd_op

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/scheduler: remove OpQueueable::get_order_locker() and supporting machinery
Samuel Just [Tue, 4 Apr 2023 23:13:41 +0000 (23:13 +0000)]
osd/scheduler: remove OpQueueable::get_order_locker() and supporting machinery

Apparently unused.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/scheduler: remove OpQueueable::get_op_type() and supporting machinery
Samuel Just [Tue, 4 Apr 2023 23:05:56 +0000 (23:05 +0000)]
osd/scheduler: remove OpQueueable::get_op_type() and supporting machinery

Apparently unused.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoPeeringState::clamp_recovery_priority: use std::clamp
Samuel Just [Mon, 3 Apr 2023 20:31:46 +0000 (13:31 -0700)]
PeeringState::clamp_recovery_priority: use std::clamp

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agodoc: Modify mClock configuration documentation to reflect new cost model
Sridhar Seshasayee [Sat, 25 Mar 2023 07:16:09 +0000 (12:46 +0530)]
doc: Modify mClock configuration documentation to reflect new cost model

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd: Retain overridden mClock recovery settings across osd restarts
Sridhar Seshasayee [Tue, 21 Feb 2023 13:01:32 +0000 (18:31 +0530)]
osd: Retain overridden mClock recovery settings across osd restarts

Fix an issue where an overridden mClock recovery setting (set prior to
an osd restart) could be lost after an osd restart.

For e.g., consider that prior to an osd restart, the option
'osd_max_backfill' was successfully set to a value different from the
mClock default. If the osd was restarted for some reason, the
boot-up sequence was incorrectly resetting the backfill value to the
mclock default within the async local/remote reservers. This fix
ensures that no change is made if the current overriden value is
different from the mClock default.

Modify an existing standalone test to verify that the local and remote
async reservers are updated to the desired number of backfills under
normal conditions and also across osd restarts.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd: Set default max active recovery and backfill limits for mClock
Sridhar Seshasayee [Mon, 20 Mar 2023 13:24:57 +0000 (18:54 +0530)]
osd: Set default max active recovery and backfill limits for mClock

Client ops are sensitive to the recovery load and must be carefully
set for osds whose underlying device is HDD. Tests revealed that
recoveries with osd_max_backfills = 10 and osd_recovery_max_active_hdd = 5
were still aggressive and overwhelmed client ops. The built-in defaults
for mClock are now set to:

    1) osd_recovery_max_active_hdd = 3
    2) osd_recovery_max_active_ssd = 10
    3) osd_max_backfills = 3

The above may be modified if necessary by setting
osd_mclock_override_recovery_settings option.

Fixes: https://tracker.ceph.com/issues/58529
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd/scheduler/mClockScheduler: make is_rotational const
Sridhar Seshasayee [Wed, 29 Mar 2023 19:33:08 +0000 (01:03 +0530)]
osd/scheduler/mClockScheduler: make is_rotational const

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd/scheduler/mClockScheduler: simplify profile handling
Sridhar Seshasayee [Wed, 29 Mar 2023 19:31:29 +0000 (01:01 +0530)]
osd/scheduler/mClockScheduler: simplify profile handling

Previously, setting default configs from the configured profile was
split across:
- enable_mclock_profile_settings
- set_mclock_profile - sets mclock_profile class member
- set_*_allocations - updates client_allocs class member
- set_profile_config - sets profile based on client_allocs class member

This made tracing the effect of changing the profile pretty challenging
due passing state through class member variables.

Instead, define a simple profile_t with three constexpr values
corresponding to the three profiles and handle it all in a single
set_config_defaults_from_profile() method.

Signed-off-by: Samuel Just <sjust@redhat.com>
2 years agoosd: Modify mClock scheduler's cost model to represent cost in bytes
Sridhar Seshasayee [Thu, 9 Feb 2023 15:35:22 +0000 (21:05 +0530)]
osd: Modify mClock scheduler's cost model to represent cost in bytes

The mClock scheduler's cost model for HDDs/SSDs is modified and now
represents the cost of an IO in terms of bytes.

The cost parameters, namely, osd_mclock_cost_per_io_usec_[hdd|ssd]
and osd_mclock_cost_per_byte_usec_[hdd|ssd] which represent the cost
of an IO in secs are inaccurate and therefore removed.

The new model considers the following aspects of an osd to calculate
the cost of an IO:

 - osd_mclock_max_capacity_iops_[hdd|ssd] (existing option)
   The measured random write IOPS at 4 KiB block size. This is
   measured during OSD boot-up using OSD bench tool.
 - osd_mclock_max_sequential_bandwidth_[hdd|ssd] (new config option)
   The maximum sequential bandwidth of of the underlying device.
   For HDDs, 150 MiB/s is considered, and for SSDs 750 MiB/s is
   considered in the cost calculation.

The following important changes are made to arrive at the overall
cost of an IO,

1. Represent QoS reservation and limit config parameter as proportion:
The reservation and limit parameters are now set in terms of a
proportion of the OSD's max IOPS capacity. The earlier representation
was in terms of IOPS per OSD shard which required the user to perform
calculations before setting the parameter. Representing the
reservation and limit in terms of proportions is much more intuitive
and simpler for a user.

2. Cost per IO Calculation:
Using the above config options, osd_bandwidth_cost_per_io for the osd is
calculated and set. It is the ratio of the max sequential bandwidth and
the max random write iops of the osd. It is a constant and represents the
base cost of an IO in terms of bytes. This is added to the actual size of
the IO(in bytes) to represent the overall cost of the IO operation.See
mClockScheduler::calc_scaled_cost().

3. Cost calculation in Bytes:
The settings for reservation and limit in terms a fraction of the OSD's
maximum IOPS capacity is converted to Bytes/sec before updating the
mClock server's ClientInfo structure. This is done for each OSD op shard
using osd_bandwidth_capacity_per_shard shown below:

    (res|lim)  = (IOPS proportion) * osd_bandwidth_capacity_per_shard
    (Bytes/sec)   (unitless)             (bytes/sec)

The above result is updated within the mClock server's ClientInfo
structure for different op_scheduler_class operations. See
mClockScheduler::ClientRegistry::update_from_config().

The overall cost of an IO operation (in secs) is finally determined
during the tag calculations performed in the mClock server. See
crimson::dmclock::RequestTag::tag_calc() for more details.

4. Profile Allocations:
Optimize mClock profile allocations due to the change in the cost model
and lower recovery cost.

5. Modify standalone tests to reflect the change in the QoS config
parameter representation of reservation and limit options.

Fixes: https://tracker.ceph.com/issues/58529
Fixes: https://tracker.ceph.com/issues/59080
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd: update PGRecovery queue item cost to reflect object size
Sridhar Seshasayee [Fri, 3 Feb 2023 12:23:06 +0000 (17:53 +0530)]
osd: update PGRecovery queue item cost to reflect object size

Previously, we used a static value of osd_recovery_cost (20M
by default) for PGRecovery. For pools with relatively small
objects, this causes mclock to backfill very very slowly as
20M massively overestimates the amount of IO each recovery
queue operation requires. Instead, add a cost_per_object
parameter to OSDService::awaiting_throttle and set it to the
average object size in the PG being queued.

Fixes: https://tracker.ceph.com/issues/58606
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd: update OSDService::queue_recovery_context to specify cost
Sridhar Seshasayee [Fri, 3 Feb 2023 12:17:38 +0000 (17:47 +0530)]
osd: update OSDService::queue_recovery_context to specify cost

Previously, we always queued this with cost osd_recovery_cost which
defaults to 20M. With mclock, this caused these items to be delayed
heavily. Instead, base the cost on the operation queued.

Fixes: https://tracker.ceph.com/issues/58606
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd/osd_types: use appropriate cost value for PullOp
Sridhar Seshasayee [Fri, 3 Feb 2023 12:12:46 +0000 (17:42 +0530)]
osd/osd_types: use appropriate cost value for PullOp

See included comments -- previous values did not account for object
size.  This causes problems for mclock which is much more strict
in how it interprets costs.

Fixes: https://tracker.ceph.com/issues/58607
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agoosd/osd_types: use appropriate cost value for PushReplyOp
Sridhar Seshasayee [Thu, 2 Feb 2023 12:16:27 +0000 (17:46 +0530)]
osd/osd_types: use appropriate cost value for PushReplyOp

See included comments -- previous values did not account for object
size.  This causes problems for mclock which is much more strict
in how it interprets costs.

Fixes: https://tracker.ceph.com/issues/58529
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2 years agomgr/dashboard: Edit ceph authx users 51367/head
Pedro Gonzalez Gomez [Mon, 20 Feb 2023 13:37:00 +0000 (14:37 +0100)]
mgr/dashboard: Edit ceph authx users

Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>
Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
(cherry picked from commit 8177a748bd831568417df5c687109fbbbd9b981d)
(cherry picked from commit bc73c1aec686282547d6b920ef7ef239d0231f40)

2 years agoMerge pull request #51378 from zdover23/wip-doc-2023-05-07-backport-51322-to-quincy
Anthony D'Atri [Sun, 7 May 2023 10:37:39 +0000 (06:37 -0400)]
Merge pull request #51378 from zdover23/wip-doc-2023-05-07-backport-51322-to-quincy

quincy: doc/rados: stretch-mode: stretch cluster issues

2 years agodoc/rados: stretch-mode: stretch cluster issues 51378/head
Zac Dover [Wed, 3 May 2023 05:16:07 +0000 (15:16 +1000)]
doc/rados: stretch-mode: stretch cluster issues

Edit "Stretch Cluster Issues", which might better be called "Netsplits"
or "Recognizing Netsplits".

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 6c1baffb85556120672b45cce89b93a20e7b09a2)

2 years agodoc/cephfs: repairing inaccessible FSes 51372/head
Zac Dover [Fri, 5 May 2023 06:35:28 +0000 (16:35 +1000)]
doc/cephfs: repairing inaccessible FSes

Add a procedure to doc/cephfs/troubleshooting.rst that explains how to
restore access to FileSystems that became inaccessible after
post-Nautilus upgrades. The procedure included here was written by Harry
G Coin, and merely lightly edited by me. I include him here as a
"co-author", but it should be noted that he did the heavy lifting on
this.

See the email thread here for more context:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/HS5FD3QFR77NAKJ43M2T5ZC25UYXFLNW/

Co-authored-by: Harry G Coin <hgcoin@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
2 years agomgr/dashboard: import/export authx users
Pere Diaz Bou [Thu, 6 Apr 2023 14:24:03 +0000 (16:24 +0200)]
mgr/dashboard: import/export authx users

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
Fixes: https://tracker.ceph.com/issues/59486
(cherry picked from commit 62d762f6965c5b8585d223c06cb23071a856cfcb)
(cherry picked from commit 8a883896ff9a006f778f49d583dd22b30aa4ce2b)

2 years agomgr/dashboard: delete-ceph-authx
Pedro Gonzalez Gomez [Thu, 6 Apr 2023 14:18:41 +0000 (16:18 +0200)]
mgr/dashboard: delete-ceph-authx

Fixes: https://tracker.ceph.com/issues/59365
Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>
(cherry picked from commit 6b5a00fb8e8b9a72d9308a069763dd86e9ecd153)
(cherry picked from commit c40ca918f3ef29b5738fd5e5e252f2df3976a5b5)

2 years agomgr/dashboard: rgw role creation form
Pere Diaz Bou [Thu, 2 Mar 2023 12:17:25 +0000 (13:17 +0100)]
mgr/dashboard: rgw role creation form

Fixes: https://tracker.ceph.com/issues/59187
Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit bd0eb20c673d54b9be3440decc0f3a1449153385)
(cherry picked from commit c2e730b6afda85124a74ef6613a7d72adb60f672)

2 years agomgr/dashboard: replace ajsf with formly
Pere Diaz Bou [Mon, 6 Mar 2023 19:32:24 +0000 (20:32 +0100)]
mgr/dashboard: replace ajsf with formly

ajsf json schema library for angular doesn't seem to be actively
maintained. Instead, fromly is a well maintained replacement with extra
stuff like validators builtin, support for json schemas, custom
components, etc...

Textareas weren't supported on ajsf, therefore, it made sense to move to
this dep instead.

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 2c43dd0c16e3cc3b3eada03ed11958a689cc4bcd)
(cherry picked from commit 768cfbcfe7d937fc34e8afde68974c04569ba962)

2 years agomgr/dashboard: rgw role listing
Pere Diaz Bou [Mon, 23 Jan 2023 11:34:12 +0000 (12:34 +0100)]
mgr/dashboard: rgw role listing

Listing is performed using the radosgw-admin command api we have with
the mgr for now until the S3 api is fixed: https://tracker.ceph.com/issues/58547.

This commit fixes and issue with regards to the _crud.py controller
where redefining `CRUDClassMetadata` caused the users table and the
roles table to share columns. We fixed this by creating
CRUDClassMetadata dynamically for each endpoint.

The issue described above is linked to an issue with NamedTuple were
default nested lists are not a great move because it can cause
unexpected issues when 2 or more classes are created. Moreover,
NamedTuples are read-only making initialization even harder without
factory methods as with dataclasses. Therefore, let's move to the good
old __init__ :).

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
Fixes: https://tracker.ceph.com/issues/58699
(cherry picked from commit 07e07b8d8a39e8cec3cce30e9cdd5439cc9b2906)

2 years agomgr/dashboard: create authx users
Pere Diaz Bou [Wed, 24 Aug 2022 17:28:38 +0000 (19:28 +0200)]
mgr/dashboard: create authx users

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
Signed-off-by: Nizamudeen A <nia@redhat.com>
Co-authored-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 10f17bd9eb379c8a15d0e8b76179e374ed92a87d)

2 years agomgr/dashboard: add backend-driven UI tables
Ernesto Puerta [Tue, 2 Nov 2021 12:03:19 +0000 (13:03 +0100)]
mgr/dashboard: add backend-driven UI tables

As an example this PR displays the list of Ceph Auth users, keys and
caps.

It tries to minimize the amount of UI code required by feeding a generic
table-like component (RestTable) with backend-generated JSON data.

This is just a proof of concept and there's lot of room for improvement.

Fixes: https://tracker.ceph.com/issues/52701
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit f6c88f4c30a8de8b03aaa96b2e8916943eab5f80)

2 years agoMerge pull request #51252 from rhcs-dashboard/fix-pg-imbalancy-quincy
Nizamudeen A [Fri, 5 May 2023 15:19:51 +0000 (20:49 +0530)]
Merge pull request #51252 from rhcs-dashboard/fix-pg-imbalancy-quincy

quincy: mgr/dashboard: fix CephPGImbalance alert

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
2 years agoMerge pull request #51358 from rhcs-dashboard/wip-59655-quincy
Nizamudeen A [Fri, 5 May 2023 09:45:45 +0000 (15:15 +0530)]
Merge pull request #51358 from rhcs-dashboard/wip-59655-quincy

quincy: mgr/dashboard: bump moment from 2.29.3 to 2.29.4 in /src/pybind/mgr/dashboard/frontend

Reviewed-by: Pegonzal <NOT@FOUND>
2 years agomgr/dashboard: bump moment in /src/pybind/mgr/dashboard/frontend 51358/head
dependabot[bot] [Wed, 6 Jul 2022 19:10:21 +0000 (19:10 +0000)]
mgr/dashboard: bump moment in /src/pybind/mgr/dashboard/frontend

Bumps [moment](https://github.com/moment/moment) from 2.29.3 to 2.29.4.
- [Release notes](https://github.com/moment/moment/releases)
- [Changelog](https://github.com/moment/moment/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/moment/moment/compare/2.29.3...2.29.4)

---
updated-dependencies:
- dependency-name: moment
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
(cherry picked from commit 9e8245e328e56755f13bffbec4b0740850696f94)

2 years agoMerge pull request #51149 from rhcs-dashboard/wip-59466-quincy
Nizamudeen A [Fri, 5 May 2023 05:27:31 +0000 (10:57 +0530)]
Merge pull request #51149 from rhcs-dashboard/wip-59466-quincy

quincy: mgr/dashboard: skip Create OSDs step in Cluster expansion

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
2 years agoMerge pull request #51112 from rhcs-dashboard/wip-59459-quincy
Nizamudeen A [Fri, 5 May 2023 05:26:16 +0000 (10:56 +0530)]
Merge pull request #51112 from rhcs-dashboard/wip-59459-quincy

quincy: mgr/dashboard: expose more grafana configs in service form

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
2 years agoMerge pull request #51350 from zdover23/wip-doc-2023-05-05-backport-51348-to-quincy
Anthony D'Atri [Fri, 5 May 2023 03:10:35 +0000 (23:10 -0400)]
Merge pull request #51350 from zdover23/wip-doc-2023-05-05-backport-51348-to-quincy

quincy: doc: Use `ceph osd crush tree` command to display weight set weights

2 years agomon/MgrMonitor: plug PAXOS for batched MgrMap/OSDMap 50979/head
Patrick Donnelly [Mon, 6 Mar 2023 18:21:51 +0000 (13:21 -0500)]
mon/MgrMonitor: plug PAXOS for batched MgrMap/OSDMap

Plugging PAXOS has the effect of batching map updates into a single
PAXOS transaction. Since we're updating the OSDMap several times and the
MgrMap, plug PAXOS for efficiency. This also has the nice effect of
reducing any delay between the active mgr getting dropped and the
blocklisting of its clients. This doesn't resolve any race condition as
the two maps are never processed in one unit. So the former active
manager may process the OSDMap blocklists before learning it is dropped
from the MgrMap.

Fixes: https://tracker.ceph.com/issues/58923
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 2e057bbf9ed4934443fb78e8cdd588aa2100969c)

2 years agomgr: force propose whenever the active changes
Patrick Donnelly [Wed, 8 Mar 2023 19:54:04 +0000 (14:54 -0500)]
mgr: force propose whenever the active changes

This race fixed by 23c3f7 exists wherever we drop the active mgr.
Resolve this by forcing immediate proposal (circumventing any delays)
whenever the active is dropped.

Fixes: 23c3f76018b446fb77bbd71fdd33bddfbae9e06d
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 30b20d3c13be532e5508dc65276e90c7973c0c51)

2 years agomon: fix semantic error prepare_update return
Patrick Donnelly [Mon, 13 Mar 2023 14:25:55 +0000 (10:25 -0400)]
mon: fix semantic error prepare_update return

The return value is used to indicate whether the pending state should be
committed. There is no concept of "handled message" here (unlike
preprocess_query).

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 8caa1fd5abe0fd16e579e10a1ad45a4147994862)

2 years agomon/MgrMonitor: do not propse again for "mgr fail"
Kefu Chai [Sat, 27 Aug 2022 15:46:00 +0000 (23:46 +0800)]
mon/MgrMonitor: do not propse again for "mgr fail"

in 23c3f76018b446fb77bbd71fdd33bddfbae9e06d, the change to fail the mgr
is proposed immediately. but `MgrMonitor::prepare_command()` method still
returns `true` in this case. its indirect caller of
`PaxosService::dispatch()` considers this as a sign that it needs to
propose the change with `propose_pending()`. but the pending change has
already been proposed by `MgrMonitor::prepare_command()`, and
`have_pending` is also cleared by this call. as we don't allow
consecutive paxos proposals, the second `propose_pending()` call is
delayed with a configured latency. but when the timer is fired, this
poseponed call would find itself trying to propose nothing. the change
to fail the mgr has been proposed. that's why we have
`ceph_assert(have_pending)` assertion failures.

in this change, the second proposal is not proposed anymore if the
proposal is proposed immediately. this should avoid the assertion
failure.

this change should address the regression introduced by
23c3f76018b446fb77bbd71fdd33bddfbae9e06d.

Fixes: https://tracker.ceph.com/issues/56850
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 5b1c6ad4967196cb97afd8c04848b13ee5a198f0)

2 years agomon: fix a race between `mgr fail` and MgrMonitor::prepare_beacon()
Radosław Zarzyński [Mon, 16 May 2022 23:41:40 +0000 (01:41 +0200)]
mon: fix a race between `mgr fail` and MgrMonitor::prepare_beacon()

There is a race condition between the `mgr fail` handling  and `mgrbeacon`.

```diff
diff --git a/src/mon/MgrMonitor.cc b/src/mon/MgrMonitor.cc
index 8ada44e2628..9000b2e0687 100644
--- a/src/mon/MgrMonitor.cc
+++ b/src/mon/MgrMonitor.cc
@@ -1203,7 +1203,9 @@ bool MgrMonitor::prepare_command(MonOpRequestRef op)
     }

     if (changed && pending_map.active_gid == 0) {
+      dout(5) << "========== changed and active_state == 0" << dendl;
       promote_standby();
+      dout(5) << "========== after promote_standby: " << pending_map.active_gid << dendl;
     }
   } else if (prefix == "mgr module enable") {
     string module;
```

```
2022-05-17T00:11:19.602+0200 7f6bd5769700  0 mon.a@0(leader) e1 handle_command mon_command({"prefix": "mgr fail", "who": "x"} v 0) v1
...
2022-05-17T00:11:19.614+0200 7f6bd5769700  5 mon.a@0(leader).mgr e25 ========== changed and active_state == 0
2022-05-17T00:11:19.614+0200 7f6bd5769700  5 mon.a@0(leader).mgr e25 ========== after promote_standby: 0
2022-05-17T00:11:19.614+0200 7f6bd5769700  4 mon.a@0(leader).mgr e25 prepare_command done, r=0
...
2022-05-17T00:11:19.630+0200 7f6bd5769700  4 mon.a@0(leader).mgr e25 selecting new active 4210 x (was 0 )
```

```cpp
bool MgrMonitor::prepare_beacon(MonOpRequestRef op)
  if (pending_map.active_gid == m->get_gid()) {
    // ...
  } else if (pending_map.active_gid == 0) {
    // There is no currently active daemon, select this one.
    if (pending_map.standbys.count(m->get_gid())) {
      drop_standby(m->get_gid(), false);
    }
    dout(4) << "selecting new active " << m->get_gid()
            << " " << m->get_name()
            << " (was " << pending_map.active_gid << " "
            << pending_map.active_name << ")" << dendl;
    pending_map.active_gid = m->get_gid();
    pending_map.active_name = m->get_name();
    pending_map.active_change = ceph_clock_now()
```

The `25` version of `MgrMap`, when handled at `mgr.x`, doesn't trigger the `respawn()` path:

```
2022-05-17T00:10:11.197+0200 7fa3d1e0a700 10 mgr ms_dispatch2 active mgrmap(e 25) v1
2022-05-17T00:10:11.197+0200 7fa3d1e0a700  4 mgr handle_mgr_map received map epoch 25
2022-05-17T00:10:11.197+0200 7fa3d1e0a700  4 mgr handle_mgr_map active in map: 1 active is 4210
2022-05-17T00:10:11.197+0200 7fa3d6613700 10 --2- 127.0.0.1:0/743576734 >> [v2:127.0.0.1:40929/0,v1:127.0.0.1:40930/0] conn(0x5592635ef400 0x5592635f6580 secure :-1 s=THROTTLE_DONE pgs=130 cs=0 l=1 rev1=1 crypto rx=0x55926362e810 tx=0x559263563b60 comp rx=0 tx=0).handle_read_frame_dispatch tag=17
2022-05-17T00:10:11.197+0200 7fa3d6613700  5 --2- 127.0.0.1:0/743576734 >> [v2:127.0.0.1:40929/0,v1:127.0.0.1:40930/0] conn(0x5592635ef400 0x5592635f6580 secure :-1 s=THROTTLE_DONE pgs=130 cs=0 l=1 rev1=1 crypto rx=0x55926362e810 tx=0x559263563b60 comp rx=0 tx=0).handle_message got 43089 + 0 + 0 byte message. envelope type=1796 src mon.0 off 0
2022-05-17T00:10:11.197+0200 7fa3d1e0a700 10 mgr handle_mgr_map I was already active
```

Fixes: https://tracker.ceph.com/issues/55711
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
(cherry picked from commit 23c3f76018b446fb77bbd71fdd33bddfbae9e06d)

2 years agodoc: Use `ceph osd crush tree` command to display weight set weights 51350/head
James Lakin [Thu, 4 May 2023 17:02:36 +0000 (18:02 +0100)]
doc: Use `ceph osd crush tree` command to display weight set weights

The previous `ceph osd tree` doesn't show pool-defined weight-sets as the above documentation suggests.

Signed-off-by: James Lakin <james@jameslakin.co.uk>
(cherry picked from commit 15c3d72a43a37798de823b26f1429f7776f67aaa)

2 years agoMerge pull request #51325 from rhcs-dashboard/wip-59623-quincy
Nizamudeen A [Thu, 4 May 2023 15:29:09 +0000 (20:59 +0530)]
Merge pull request #51325 from rhcs-dashboard/wip-59623-quincy

quincy: mgr/dashboard: fix the rbd mirroring configure check

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
2 years agoMerge pull request #51338 from zdover23/wip-doc-2023-05-04-backport-51292-to-quincy
Anthony D'Atri [Thu, 4 May 2023 02:19:12 +0000 (22:19 -0400)]
Merge pull request #51338 from zdover23/wip-doc-2023-05-04-backport-51292-to-quincy

quincy: doc/rados: edit stretch-mode.rst