git.apps.os.sepia.ceph.com Git

cephadm: handle exceptions applying secondary services during bootstrap

Otherwise we risk hitting a mismatch between the cephadm binary version
and the container image version we're bootstrapping on, resulting in
bootstrap failing. Example in the tracker.

Fixes: https://tracker.ceph.com/issues/59082
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit a57fc000cc40066dff8692f489e4942d5bda1f56)

Remove the filestore section from ceph-volume

Filestore is no longer supported in cephadm and both the doc [1] and the
DriveGroupValidation [2] raise an exception if this method is used. This
patch removes the legacy code that is supposed to produce filestore
ceph-volume related commands.

[1] https://github.com/ceph/ceph/blob/main/doc/cephadm/adoption.rst#limitations
[2] https://github.com/ceph/ceph/blob/main/src/python-common/ceph/deployment/drive_group.py#L366-L369

Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit 6e5bef1b1d69ec6e8be9b0fa2f4f47491df80687)

mgr/cephadm: make upgrade respect use_repo_digest

If the option is false, we should upgrade images based
on whether their container image name matches, not whether
the digest is the same or not.

Fixes: https://tracker.ceph.com/issues/58698
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 78a73d146586e2f1f15d44c3b499f24299bc6662)

test_cephadm: fix conf format

Signed-off-by: Rongqi Sun <sunrongqi@huawei.com>
(cherry picked from commit b3125f4449781828d310c5eaf43bc478b91cba4d)

cephadm: eliminate duplication of sections

Signed-off-by: Rongqi Sun <sunrongqi@huawei.com>
(cherry picked from commit 0d1596e9275e8ca1b8c09b1490b3e595130aa97b)

cephadm: eliminate duplication of [global] section when applying set-extra-ceph-conf

Signed-off-by: Rongqi Sun <sunrongqi@huawei.com>
(cherry picked from commit ece7ccccc915392bced60c07703eb82142874516)

mgr/prometheus: remove dependency on cephadm module

https://github.com/ceph/ceph/commit/f967ac061ebee362cdc82c458e955da75a9045e9
introduced an import of something in the cephadm module
in the prometheus module. This seems to break the prometheus
module in some non-cephadm setups. For example, the ceph-ansible
ci hit

failed: [mgr0 -> mon0] (item=prometheus) => changed=true
  ansible_loop_var: item
  cmd:
  - ceph
  - -n
  - client.admin
  - -k
  - /etc/ceph/ceph.client.admin.keyring
  - --cluster
  - ceph
  - mgr
  - module
  - enable
  - prometheus
  delta: '0:00:00.389965'
  end: '2023-03-03 15:30:07.631308'
  item: prometheus
  rc: 2
  start: '2023-03-03 15:30:07.241343'
  stderr: 'Error ENOENT: module ''prometheus'' reports that it cannot run on the active manager daemon: No module named ''cephadm'' (pass --force to force enablement)'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

so we need to be a bit more careful with this import and
make sure the prometheus module works fine without cephadm

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 8be588958bd2694b8a427bdfc10011ba62df8453)

mgr/cephadm: be aware of host's shortname and FQDN

The idea is to gether the shortname and FQDN as part
of gather-facts, and then if we ever try to check if a certain
host is in our internal inventory by hostname, we can check
these other known names. This should avoid issues where
we think a hostname specified by FQDN is not in our
inventory because we know the host by the shortname
or vice versa.

Fixes: https://tracker.ceph.com/issues/58738
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 6443cf15d54a5e50c245dd08c3db005bb8521b6a)

cephadm/tests: enable timeout test cases for call function

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 05304675f750bc54ba73a92a5c98624f5050cd18)

cephadm: fix timeout argument to call function

The timeout argument to call function, for executing sub-processes, did
not function - this patch makes timeout work as (probably) intended.
Use the `process.communicate()` method rather than `tee` functions to
handle IO collection. Since no logging is done until after the exit code
is known the tee calls are not necessary. Add calls to kill the child
process when the time out occurs. This helps prevent event loop "leaks"
that generate python warnings.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit cefe44cde8d53d7bfd935435f87205a01d677986)

cephadm/tests: add initial test coverage for call function

The call function provides the ability to run subprocesses, log output,
and provides an optional timeout parameter. This timeout parameter does
not appear to function correctly today, so we make use of
pytest.param/pytest.mark.xfail to mark these cases as already known to
fail.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 471a017ad86f752f862d6211faa5231567ba78d8)

cephadm: disable coverage for some compatibility blocks

This change disables reporting missing coverage for blocks that
contain copy and pasted code from other python versions and exist
to make those functions available to older python versions.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 2a8316b1c7440f7239cd74470b26bad6a50f9a2c)

mgr/orchestrator: allow deploying raw mode OSDs with --all-available-devices

Fixes: https://tracker.ceph.com/issues/58714
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit c034b08d952467a99be35a8cf6fc02834eb9a933)

cephadm: set --ulimit nofiles with Docker

cephadm is setting LimitNOFILE in systemd units, but it doesn't
get picked up by Docker.

Fixes: https://tracker.ceph.com/issues/58855
Signed-off-by: Michal Nasiadka <mnasiadka@gmail.com>
(cherry picked from commit ce169004f749f088c4f05505b6457e7a190db680)

mgr/cephadm: add commands to set services to managed/unmanaged

Fixes: https://tracker.ceph.com/issues/58713
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit b0d8c0846cbe0ebae69dbdeb3f2982213ab56a58)

drive_group: fix limit filter in drive_selection.selector

When multiple osd service specs with 'limit' filter are applied,
the current logic makes the second service speec
try to pick devices that are already used by the first service spec.

Fixes: https://tracker.ceph.com/issues/58626
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 8b7da77ae0bf3b7c2ab28cd54b166bc1ff43b437)

qa: adding logic to wait for rgw realm tokens before testing

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit b431e308a747afa7fa8c1852ba7a375c4e0f1867)

mgr/rgw: removing placement section

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 89b3bd7935f3f804ed00f2f3c03d0e31231b1ea8)

qa: Adding rgw multisite support

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 3e4ef2c967f24e5b993116a679225750faf528bc)

Merge pull request #50880 from guits/wip-59310-reef

reef: ceph-volume: fix issue with fast device allocs when there are multiple PVs per VG

Merge pull request #51181 from zdover23/wip-doc-2023-04-23-backport-51177-to-reef

reef: doc/start: edit first 150 lines of documenting-ceph

Merge pull request #51184 from zdover23/wip-doc-2023-04-23-backport-51178-to-reef

reef: doc/glossary: add "Placement Groups" definition

doc/glossary: add "Placement Groups" definition

Add a definition of "Placement Groups" to the Glossary.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 9f37ea651f9ee2c51e0705b9b58ed356f1bc56e6)

doc/start: edit first 50 lines of documenting-ceph

Edit the first 150 lines of doc/start/documenting-ceph.rst. This is part
of an initiative to harvest the fruits of Cephalocon 2023, at which
documentation proved to be in demand to a surprising degree.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit dd37f94aa4f1de947b1eaf5d82cc529925f5823e)

Merge pull request #51151 from rhcs-dashboard/wip-59468-reef

reef: mgr/dashboard: skip Create OSDs step in Cluster expansion

Reviewed-by: Pegonzal <NOT@FOUND>

mgr/dashboard: skip Create OSDs step in Cluster expansion

Its to ensure OSDs are not deployed on all hosts because that would make
the host draining impossible

Fixes: https://tracker.ceph.com/issues/59457
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 0f6d23a7aa024495a79cdedc80f7a00902115b6b)

Merge pull request #51125 from zdover23/wip-doc-2023-04-17-backport-50639-to-reef

reef: doc: account for PG autoscaling being the default

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

Merge pull request #51121 from rhcs-dashboard/wip-59465-reef

reef: mgr/dashboard: remove unncessary hyperlink in landing page

Reviewed-by: Pegonzal <NOT@FOUND>

Merge pull request #51123 from zdover23/wip-doc-2023-04-17-backport-49762-to-reef

reef: vstart: fix text format

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

doc: account for PG autoscaling being the default

The current documentation tries really hard to convince people to set
both `osd_pool_default_pg_num` and `osd_pool_default_pgp_num` in their
configs, but at least the latter has undesirable side effects on any
Ceph version that has PG autoscaling enabled by default (at least quincy
and beyond).

Assume a cluster with defaults of `64` for `pg_num` and `pgp_num`.
Starting `radosgw` will fail as it tries to create various pools without
providing values for `pg_num` or `pgp_num`. This triggers the following
in `OSDMonitor::prepare_new_pool()`:

- `pg_num` is set to `1`, because autoscaling is enabled
- `pgp_num` is set to `osd pool default pgp_num`, which we set to `64`
- This is an invalid setup, so the pool creation fails

Likewise, `ceph osd pool create mypool` (without providing values for
`pg_num` or `pgp_num`) does not work.

Following this rationale:

- Not providing a default value for `pgp_num` will always do the right
  thing, unless you use advanced features, in which case you can be
  expected to set both values on pool creation
- Setting `osd_pool_default_pgp_num` in your config breaks pool creation
  for various cases

This commit:

- Removes `osd_pool_default_pgp_num` from all example configs
- Adds mentions of the autoscaling and how it interacts with the default
  values in various places

For each file that was touched, the following maintenance was also
performed:

- Change interternal spaces to underscores for config values
- Remove mentions of filestore or any of its settings
- Fix minor inconsistencies, like indentation etc.

There is also a ticket which I think is very relevant and fixed by this,
though it only captures part of the broader issue addressed here:

Fixes: https://tracker.ceph.com/issues/47176
Signed-off-by: Conrad Hoffmann <ch@bitfehler.net>
(cherry picked from commit 402d2eacbc67f7a6d47d8f90d9ed757fc20931a6)

vstart: fix text format

Signed-off-by: Rongqi Sun <sunrongqi@huawei.com>
(cherry picked from commit 57dc8ce51602543d42a2f82cb829eda1c231b434)

mgr/dashboard: remove unncessary hyperlink in landing page

Fixes: https://tracker.ceph.com/issues/59462
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 7e7da955445ecf37bc43fc296298bd91d0d8a140)

Merge pull request #51080 from rhcs-dashboard/wip-59452-reef

reef: mgr/dashboard: fix cephadm e2e expression changed error

Reviewed-by: Aashish Sharma <aasharma@redhat.com>

Merge pull request #51116 from zdover23/wip-doc-2023-04-17-backport-51114-to-reef

reef: doc/radosgw: format part of s3select

Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>

doc/radosgw: format part of s3select

Partially format the 'Basic Workflow' section's introduction and 'Basic Functionalities' subsection in s3select. Nothing else is being fixed.

Signed-off-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
(cherry picked from commit 13cf134c0610509da52aa68e11e26f0740002bde)

Merge pull request #51110 from zdover23/wip-doc-2023-04-16-backport-50941-to-reef

reef: doc/foundation: Update Foundation members for April 2023

Merge pull request #51107 from zdover23/wip-doc-2023-04-16-backport-51099-to-reef

reef: doc/dev: format command in cephfs-mirroring

Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>

Merge pull request #51096 from zdover23/wip-doc-2023-04-16-backport-51062-to-reef

reef: doc/glossary: add "Hybrid Storage"

Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>

Merge pull request #51092 from zdover23/wip-doc-2023-04-16-backport-51091-to-reef

reef: doc/mgr/prometheus: fix confval reference

Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>

doc/foundation: Update Foundation members for April 2023

Signed-off-by: Mike Perez <thingee@gmail.com>
(cherry picked from commit 759f26a99f7a1b52954e12e080304b867af81418)

doc/dev: format command in cephfs-mirroring

Correctly format a command in doc/dev/cephfs-mirroring/#creating-users.

Reported by casanlin@init7.net at
https://pad.ceph.com/p/Report_Documentation_Bugs

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 408219bfca6b1e698229967e76d22d10028b7c20)

Merge pull request #51104 from zdover23/wip-doc-2023-04-16-backport-51103-to-reef

reef: doc/radosgw: format part of s3select

Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>

doc/radosgw: format part of s3select

Format the first section of s3select. Nothing else is being fixed.

Signed-off-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
(cherry picked from commit a6a84471a7af154e7ccc93f51df2fc9744dc606c)

doc/glossary: add "Hybrid Storage"

Add "Hybrid Storage" to the glossary.

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit dc8148d0727b307fb3baa30baf9dfee9bf8a247e)

doc/mgr/prometheus: fix confval reference

Signed-off-by: Piotr Parczewski <piotr@stackhpc.com>
(cherry picked from commit b9b75dafe248e07b21f2958023697397094cc537)

Merge pull request #51087 from zdover23/wip-doc-2023-04-15-backport-51086-to-reef

reef: doc/rados/ops: remove ceph-medic from monitoring

doc/rados/ops: remove ceph-medic from monitoring

Remove mention of ceph-medic from doc/rados/operations/monitoring.rst,
because it is no longer supported.

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 42cd28a2a639e68a44838ae4e7f875cb6bd5d97b)

mgr/dashboard: fix cephadm e2e expression changed error

tried to fix this issue from the daemon component sometime ago several
times but it didn't work. So force ignoring the error

Fixes: https://tracker.ceph.com/issues/59444
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit f7e29e5ab85fabcf2524656bb456a2955fa8608d)

Merge pull request #51010 from rhcs-dashboard/wip-59420-reef

reef: mgr/dashboard: fix eviction of all FS clients

Reviewed-by: Pegonzal <NOT@FOUND>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #51063 from zdover23/wip-doc-2023-04-13-backport-50713-to-reef

reef: doc/glossary: improve "CephX" entry

doc/glossary: improve "CephX" entry

Improve the glossary entry for "CephX".

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 02e3a5cb763987eeaee2dd1a7543d2762aaad7fe)

Merge pull request #50919 from idryomov/wip-rbd-reef-backports-1

reef: RBD backports (batch 1)

Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Christopher Hoffman <choffman@redhat.com>

Merge pull request #51058 from rhcs-dashboard/wip-59436-reef

reef: mgr/dashboard: rbd-mirror force promotion

Reviewed-by: Nizamudeen A <nia@redhat.com>

qa/suites/rbd: install qemu-utils in addition to qemu-block-extra on Ubuntu

qemu-utils is usually pre-installed but, due to what appears to be
a Ubuntu packaging bug, it's not upgraded when qemu-block-extra is
installed:

  The following NEW packages will be installed:
    qemu-block-extra
  The following packages will be upgraded:
    qemu-system-common qemu-system-data qemu-system-gui qemu-system-x86

However, the version of the block driver must match exactly the version
of the qemu-img tool, so the above leads to:

  $ qemu-img convert -f qcow2 -O raw /home/ubuntu/cephtest/qemu/base.client.0.0.qcow2 rbd:rbd/client.0.0
  Failed to initialize module: /usr/lib/x86_64-linux-gnu/qemu/block-rbd.so
  Note: only modules from the same build can be loaded.
  qemu: module block-block-rbd not found, do you want to install qemu-block-extra package?
  qemu-img: Unknown protocol 'rbd'

Fixes: https://tracker.ceph.com/issues/59431
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit c529fdd63a5aae2c598078df05fe9bbef40042dc)

mgr/dashboard: rbd-mirror force promotion

resolves: https://tracker.ceph.com/issues/59327
Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>
(cherry picked from commit 9696b6a04830297c23c4cccd6e7c225f183ba0b2)

Merge pull request #51035 from zdover23/wip-doc-2023-04-12-backport-50993-to-reef

reef: doc/rados/operations: edit monitoring.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

Merge pull request #50994 from guits/cv-bkp-50473-reef

ceph-volume: update the OS before deploying Ceph (reef)

doc/rados/operations: edit monitoring.rst

Line-edit the final third of doc/rados/operations/monitoring.rst.

Follows https://github.com/ceph/ceph/pull/50834.

https://tracker.ceph.com/issues/58485

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit b9ccad80608953fc0af779e8cad93971d47649b6)

Merge pull request #51006 from rhcs-dashboard/wip-59402-reef

reef: mgr/dashboard: fix create osd default selected as recommended not working

Reviewed-by: Pegonzal <NOT@FOUND>
Reviewed-by: Aashish Sharma <aasharma@redhat.com>

mgr/dashboard: fix eviction of all FS clients

Signed-off-by: Pere Diaz Bou <pere-altea@hotmail.com>
(cherry picked from commit 780f49325aefd0b2c537462c6bb498232621ee8a)

ceph-volume/tests: fix an issue with rpm

Typical error seen in the CI:

```
error: /var/cache/dnf/baseos-00fe51d07def85f0/packages/kernel-core-4.18.0-483.el8.x86_64.rpm: signature hdr data: BAD, no. of bytes(459772) out of range
```

Upgrading `rpm` fixes this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 05aa334b96ed8a8c8497047bc26da64ee3a3d1fa)

mgr/dashboard: fix create osds step failing on the default option

the backend was reporting with a keyError which doesn't find the
`encrypted` key.

Fixes: https://tracker.ceph.com/issues/59319
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit ca8b107cf2198ecf189b513ff75b9d5a9d9782b9)

mgr/dashboard: add option to skip the create OSDs step

Fixes: https://tracker.ceph.com/issues/59319
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit caa0c456b1d0c33f6abb847b3c9259501d587004)

ceph-volume: add bluefs_buffered_io: false in functional tests

This is a workaround to make dmcrypt scenarios pass the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 491e88ede56b51bc425ba5e38adf6eb74ad5196e)

ceph-volume: update the OS before deploying Ceph

ceph-volume tests are failing, OSDs never get up and running.
For some reason, updating the OS early in the testing workflow
addresses that issue in the CI.

-- to be continued ... --

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 663c914c59873f433a5512aca0ebc4b4a7ef1fd3)

Merge pull request #50955 from zdover23/wip-doc-2023-04-09-backport-50827-to-reef

reef: doc/rados/ops: edit health-checks.rst (4 of x)

Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>

Merge pull request #50966 from zdover23/wip-doc-2023-04-10-backport-50828-to-reef

reef: doc/rados/ops: edit health checks.rst (5 of x)

Merge pull request #50969 from zdover23/wip-doc-2023-04-10-backport-50829-to-reef

reef: doc/rados/ops: edit health-checks.rst (6 of x)

doc/rados/ops: edit health-checks.rst (6 of x)

Edit docs/rados/operations/health-checks.rst (6 of x).

Follows https://github.com/ceph/ceph/pull/50828.

https://tracker.ceph.com/issues/58485

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit c59bc152ea746a8bfc5795dbeb68b2ef6e511b1f)

doc/rados/ops: edit health checks.rst (5 of x)

Edit docs/rados/operations/health-checks.rst (5 of x).

Follows https://github.com/ceph/ceph/pull/50827

https://tracker.ceph.com/issues/58485

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit cb8ec5b5ab8f5fb25c8f2c6404447dabe4ebd1ad)

Merge pull request #50959 from zdover23/wip-doc-2023-04-09-backport-50958-to-reef

reef: doc/rados/ops: add hyphen to mon-osd-pg.rst

doc/rados/ops: add hyphen to mon-osd-pg.rst

Remove confusing parentheses and add a clearer (as compared to the
parentheses) hyphen (actually an em-dash, or at least it is intended
to be an em-dash) to doc/rados/operations/monitoring-osd-pg.rst

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 0c965c18d0e6ab1461b5fad42d481f25e4207940)

Merge pull request #50952 from zdover23/wip-doc-2023-04-09-backport-50826-to-reef

reef: doc/rados/ops: edit health-checks.rst (3 of x)

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

Merge pull request #50949 from zdover23/wip-doc-2023-04-09-backport-50907-to-reef

reef: doc/rados/config: edit auth-config-ref

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

doc/rados/ops: edit health-checks.rst (4 of x)

Edit docs/rados/operations/health-checks.rst (4 of x).

Follows https://github.com/ceph/ceph/pull/50826.

https://tracker.ceph.com/issues/58485

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 31df75356f2e49e0da6e0f4a6ec498e3609bd39e)

doc/rados/ops: edit health-checks.rst (3 of x)

Edit docs/rados/operations/health-checks.rst (3 of x).

Follows https://github.com/ceph/ceph/pull/50825

https://tracker.ceph.com/issues/58485

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit e02ddb24e583b5bedd4056ba0871a9427b581bfe)

doc/rados/config: edit auth-config-ref

Line edit doc/rados/config/auth-config-ref.rst.

https://tracker.ceph.com/issues/58485

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 3d0acf9372a35ef48be47efe68192ec389c17ccb)

Merge pull request #50945 from zdover23/wip-doc-2023-04-08-backport-50863-to-reef

reef: doc/rados/ops: edit monitoring-osd-pg.rst (2 of x)

Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>

Merge pull request #50942 from zdover23/wip-doc-2023-04-08-backport-50795-to-reef

reef: doc/rados: line-edit common.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

doc/rados/ops: edit monitoring-osd-pg.rst (2 of x)

Line-edit monitoring-osd-pg.rst (2 of x).

https://tracker.ceph.com/issues/58485

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 270e2fd730dad19d970d73ea19c6872defa9fce4)

doc/rados: line-edit common.rst

Edit syntax and semantics in doc/configuration/common.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 6c9f226742e951de6c40627271ba6d6cabba4535)

Merge pull request #50933 from zdover23/wip-doc-2023-04-07-backport-50928-to-reef

reef: doc/rados/ops: line-edit operating.rst

doc/rados/ops: line-edit operating.rst

Line-edit doc/rados/operations/operating.rst.

https://tracker.ceph.com/issues/58485

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit bb8fc0068369369fd213062590738e38c9f4dc63)

Merge pull request #50848 from zdover23/wip-doc-2023-04-04-backport-50834-to-reef

reef: doc/rados: edit ops/monitoring.rst (2 of 3)

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

doc/rados: edit ops/monitoring.rst (2 of 3)

Line-edit the second third of doc/rados/operations/monitoring.rst.

Follows https://github.com/ceph/ceph/pull/50670.

https://tracker.ceph.com/issues/58485

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 41684ebd33b5c9fe707f5a33b27c55ed29cd5ede)

Merge pull request #50911 from zdover23/wip-doc-2023-04-06-backport-50825-to-reef

reef: doc/rados/ops: edit health-checks.rst (2 of x)

librbd: avoid generating ESHUTDOWN in ManagedLock

EBLOCKLISTED has a very special meaning but happens to be an alias for
ESHUTDOWN. If the client gets blocklisted, we always want to propagate
EBLOCKLISTED error code since it's generated by the OSD.

For ManagedLock use case of indicating that an operation on the lock
raced with lock shut down, meaning that a higher level request can just
be restarted, ERESTART should do.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 76856d90e0e0d73276c06a724091197301f982e9)

librbd: fix recursive locking on owner_lock in ImageDispatch

needs_exclusive_lock() calls acquire_lock() with owner_lock held.
If lock acquisiton races with lock shut down, ManagedLock completes
ImageDispatch context directly and dispatch is retried immediately on
the same thread (due to DISPATCH_RESULT_RESTART). This results in
recursion into needs_exclusive_lock() and, barring locking issues, can
lead to unbounded stack growth if lock shut down takes its time.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1d943f8c47b3510e47d40701e9acd213706754e5)

test/librados_test_stub: drop watches only after actually blocklisting

Eliminate a race where a client is able to submit an operation after
WatchCtx2::handle_error() is invoked on its watch due to blocklisting.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit a14498eff645e9e154e4d1a3cc436453b8401d89)

test/librados_test_stub: raise a watch error on blocklisting

Simulate getting MWatchNotify CEPH_WATCH_EVENT_DISCONNECT message after
the client is blocklisted.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 0dfe87d3b6ed52cee6f6660b42af42a8dd7708bb)

librbd: Propagate EBLOCKLIST in send_acquire_lock

During send_acquire_lock, there's a case where
there's no watcher handle present and lock request is delayed.
If the client is blocklisted, the delayed request will not
continue and the call that requested lock will never complete.

The lock process will now propagate -EBLOCKLIST, to callback
instead of indefinitely delaying.

Fixes: https://tracker.ceph.com/issues/59115
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit 6a0aeadc31ab1942c42c6e466183148f1d3752be)

PendingReleaseNotes: add a note for rbd-mirror daemon perf counters

This was missed in commit 1a1477b9fd7f ("rbd-mirror: add and rename
perf counters for journal and snapshot mirroring").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit e89f5815134fbeea4fa4894082c71a4ebd249794)

librbd: clear Image::list_watchers() list before populating it

The "append to the passed list" behavior is confusing and not what the
corresponding C API (rbd_watchers_list) or other similar C++ APIs (e.g.
list_lockers) do.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit e11097bc4cf82b997ad1e8e21a8cd9400767f40d)

librbd: fix wrong attribute for rbd_quiesce_complete api

When we use rbd_quiesce_complete api, we got an error:
/usr/bin/ld: undefined reference to `rbd_quiesce_complete'

Then we found the problem is the symbol of rbd_quiesce_complete
in librbd.so is LOCAL. After some investigation, we found
the attribute of rbd_quiesce_complete api is CEPH_RADOS_API
rather than expected CEPH_RBD_API.

Fixes: https://tracker.ceph.com/issues/59208
Signed-off-by: Dongsheng Yang <dongsheng.yang.linux@gmail.com>
(cherry picked from commit 51a2b707a3074e000b310fc20901d5038b15ea0c)

rbd: improve log msgs when failing to enable mirroring on image

Attempting to enable mirroring on an an image within a namespace bails
out without any useful msg.

$ rbd mirror image enable mi_pool/ns_0/big snapshot
2021-11-17T08:33:21.321+0000 7f4b36f9c2c0 -1 librbd::api::Mirror:
image_enable: cannot enable mirroring in the current pool mirroring mode

Fixes: https://tracker.ceph.com/issues/58895
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
(cherry picked from commit 828a0dea2d40ba0c1aa0a9d4b211f0fabc854350)

doc/rados/ops: edit health-checks.rst (2 of x)

Edit docs/rados/operations/health-checks.rst (2 of x). PR#50674, the PR
that immediately precedes this PR in the series of PRs that line-edit
health-checks.rst, wrongly identified this series as having five
sections. This has been rectified by using the "2 of x" formulation.

Follows https://github.com/ceph/ceph/pull/50674

https://tracker.ceph.com/issues/58485

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 0565f59698e9d00adf5a4667ac6e84e97d31b554)

Merge pull request #50869 from rhcs-dashboard/wip-59323-reef

reef: mgr/dashboard: fix displaying mirror image progress

Reviewed-by: Pegonzal <NOT@FOUND>
Reviewed-by: Nizamudeen A <nia@redhat.com>

ceph-volume: fix issue with fast device allocs when there are multiple PVs per VG

Fixes a regression with fast device allocations when there are multiple PVs
per VG. This is the case for clusters that were deployed prior to v15.2.8.

Fixes: https://tracker.ceph.com/issues/58857
Signed-off-by: Cory Snyder <csnyder@1111systems.com>
(cherry picked from commit efcf71be18eb25be10a574d54b70229753538664)

ceph-volume: add test case to reproduce bug in get_physical_fast_allocs

Adds a test case to reproduce a bug with get_physical_fast_allocs for
clusters that have multiple fast device PVs in a single VG (deployed
prior to v15.2.8). Also fixes other test cases for this function
to more accurately represent reality.

Signed-off-by: Cory Snyder <csnyder@1111systems.com>
(cherry picked from commit 02592cb0a5970f861d25a521579c30639d10b007)