'ceph-volume raw list' is broken for a specific use case (rook).
rook copies devices from /dev/ to /mnt for specific/internal needs.
when ceph-volume raw list is passed a device from /mnt then
ceph-volume ignores it and return an empty dict.
That prevent rook from creating OSDs properly.
Zac Dover [Tue, 14 Nov 2023 13:40:42 +0000 (23:40 +1000)]
doc/glossary: add "Quorum" to glossary
Add the term "Quorum" to the glossary and link to the part of
architecture.rst concerning Monitors. The sticky header at the top of
the docs.ceph.com website gets in the way of the location linked to in
this commit, but fatigue and disgust prevent me from spending time today
trial-and-erroring my way through the hostile and ill-documented
wilderness of scroll-margin so that the link goes where it should.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit c2f6a770bf0e12296c334d99ac86ff4732ec29b7)
Zac Dover [Mon, 13 Nov 2023 10:57:07 +0000 (20:57 +1000)]
doc/rados: format "initial troubleshooting"
Format the steps in the "Initial Troubleshooting" section of
doc/rados/troubleshooting/troubleshooting-mon.rst. A near-future PR (not
this one) will add context to this section and explain that the steps
described here are the first steps that you should undertake when you
determine that you have an unresponsive or down Monitor. This PR is
merely for formatting.
Zac Dover [Sun, 12 Nov 2023 10:21:41 +0000 (20:21 +1000)]
doc/config: edit "ceph-conf.rst"
Edit the first section of doc/rados/configuration/ceph-conf.rst.
Initially I just wanted to change "series" to "set", but once I got my
hands dirty I ended up simplifying some sentences.
Zac Dover [Sun, 12 Nov 2023 10:52:09 +0000 (20:52 +1000)]
doc/rados: parallelize t-mon headings
Give parallel structure to the questions in the Q&A section of the "The
Cluster Has Quorum But At Least One Monitor Is Down" subsection of the
"Most Common Monitor Issues" section of
doc/rados/troubleshooting/troubleshooting-mon.rst.
Aashish Sharma [Tue, 7 Nov 2023 13:27:24 +0000 (18:57 +0530)]
mgr/dashboard: fix rgw multi-site import form helper
Before : To obtain the token, generate it from your primary Ceph cluster. This token includes encoded information about the secondary cluster's endpoint, access key, and secret key.
Fix: To obtain the token, generate it from your primary Ceph cluster. This token includes encoded information about the primary cluster's endpoint, access key, and secret key.
Prashant D [Wed, 18 Oct 2023 20:07:47 +0000 (16:07 -0400)]
qa/smoke,orch,perf-basic: add POOL_APP_NOT_ENABLED to ignorelist
Some of the smoke, orch and perf-basic tests are failing due
to POOL_APP_NOT_ENABLED health check failure. Add
POOL_APP_NOT_ENABLED to ignorelist for these tests.
Casey Bodley [Tue, 24 Oct 2023 20:48:06 +0000 (16:48 -0400)]
rgw: fetch_remote_obj() uses uncompressed size for encrypted objects
use the original size from RGW_ATTR_COMPRESSION as the accounted size in
the bucket index for objects that were transferred in their
encrypted/compressed form
Ville Ojamo [Fri, 3 Nov 2023 05:44:00 +0000 (12:44 +0700)]
doc/cephadm/services: remove excess rendered indentation in osd.rst
Start bash command blocks at the left margin, removing
excessive padding/indentation that would render the
block too much towards the right.
At the same time ident the source consistently:
- Two spaces for command blocks and output blocks.
- Four spaces for notes, code blocks.
There seems to be no uniform style for this, sometimes
commands are indented with three spaces but it would
seem two spaces is common. In the end it all renders
the same I guess.
Ramana Raja [Mon, 18 Sep 2023 02:52:56 +0000 (22:52 -0400)]
qa/suites/rbd: add test to check rbd_support module recovery
... on repeated blocklisting of its client.
There were issues with rbd_support module not being able to recover
from its RADOS client being repeatedly blocklisted. This occured for
example in clusters with OSDs slow to process RBD requests while the
module's mirror_snapshot_scheduler was taking mirror snapshots by
requesting exclusive locks on the RBD images and workloads were running
on the snapshotted images via kernel clients.
There is no need for CreateSnapshotRequests.__del__() that calls
CreateSnapshotRequests.wait_for_pending().
MirrorSnapshotScheduleHandler.shutdown() already calls
CreateSnapshotRequests.wait_for_pending().
Ramana Raja [Thu, 26 Oct 2023 17:18:52 +0000 (13:18 -0400)]
mgr/rbd_support: fix recursive locking on CreateSnapshotRequests lock
The MirrorSnapshotScheduleHandler's run thread issues asynchronous
create snapshot requests using a CreateSnapshotRequests instance. When
the thread invokes a CreateSnapshotRequests instance's get_ioctx(),
the instance's class variable lock is acquired. With the class
variable lock held, the garbage collection of a CreateSnapshotRequests
instance may race in the thread. The thread would then call
CreateSnapshotRequests __del__() that tries to acquire the class
variable lock that the thread already holds. Fix this
recursive deadlock by converting the CreateSnapshotRequests lock from
a class variable to an instance variable. There is no need to share
the lock across CreateSnapshotRequests instances.
Also convert MirrorSnapshotScheduleHandler, PerfHandler and
TrashPurgeScheduleHandler class variables to instance variables
that don't need to be shared across the instances.
Fixes: https://tracker.ceph.com/issues/62994 Signed-off-by: Ramana Raja <rraja@redhat.com> Co-Authored-By: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 4452bc22d1c6c8499cf55d6e39090adf7ae1dcbf)
Zac Dover [Wed, 1 Nov 2023 01:53:59 +0000 (11:53 +1000)]
doc/cephadm: edit troubleshooting.rst (1 of x)
Edit doc/cephadm/troubleshooting.rst. This commit and the PR of which it
is a part was raised in response to
https://github.com/ceph/ceph/pull/53976. The limits of reStructuredText
are particularly visible here in every instance of a BASH for-loop and
in every instance of a command stretched over multiple lines.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 69472c26af5419faa9ed93c071ed5933d03fa67f)
Laura Flores [Thu, 28 Sep 2023 17:52:11 +0000 (17:52 +0000)]
osd: fix logic in check_pg_upmaps
The logic was changed in check_pg_upmaps
in a Reef refactor, which results in recommendations
made by the upmap balancer even when it says there are
no optimizations.
Zac Dover [Mon, 30 Oct 2023 02:37:39 +0000 (12:37 +1000)]
doc/glossary: improve "BlueStore" entry
Initially s/backend/back end/ but then I added a little more information
about BlueStore's use of RocksDB to map object names to block locations
on disk.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 8713cca328c9373636efdb92449d743b5bd56584)
test/librbd/fsx: wait for resize to propagate in krbd_resize()
With this changes resize request will not be blocked until the resize is
completed. Because of this the fsx test fails as it assumes that the
request to resize immediately implies changes on the device size.
Hence we have to add a wait in resize handler of fsx for the device to
actually get resized.
Problem:
-------
Trying to disable any feature on an rbd image mapped with nbd leads to stuck
in rbd-nbd.
The rbd-nbd registers a watcher callback to detect image resize in
NBDWatchCtx::handle_notify(). The handle_notify calls image info method, which
calls refresh_if_required and it got stuck there.
It is getting stuck in ImageState::refresh_if_required() because
DisableFeaturesRequest issues update notifications while still holding onto
the exclusive lock with everything that has to do with it blocked.
Solution:
--------
Set only notify flag as part of NBDWatchCtx::handle_notify() and handle
the resize detection part as part of a different thread.
Aashish Sharma [Mon, 30 Oct 2023 07:47:37 +0000 (13:17 +0530)]
mgr/dashboard: update rgw multisite import form helper info
Change 'To obtain the token, generate it from your secondary Ceph cluster' to 'To obtain the token, generate it from your primary Ceph cluster' in rgw multisite import form helper