Zac Dover [Fri, 23 Feb 2024 16:05:42 +0000 (02:05 +1000)]
doc/rbd: repair ordered list
Fix the numbering in an ordered list. The numbering was thrown off
because a ".. prompt" directive was improperly indented (it wasn't
indented at all).
See https://github.com/ceph/ceph/pull/55540#discussion_r1500051264
Casey Bodley [Thu, 22 Feb 2024 21:54:54 +0000 (16:54 -0500)]
rgw/aio: avoid infinite recursion in aio_abstract()
a recent regression from 320a2179a3c6c1981a0fd2494938515997c1bfad causes
aio_abstract() to recurse when given an empty optional_yield. this is
exposed by the librgw_file tests
Ramana Raja [Thu, 25 May 2023 16:48:12 +0000 (16:48 +0000)]
qa: Add tests to validate syncing of images using rbd-mirror
Introduce functional tests to validate that the images under
workloads are correctly mirrored between two clusters using snapshot
based mirroring.
Run workload on a primary image using a krbd or nbd client. Take
mirror snapshots of the image under workload. Unmount the mapped image
and calculate its MD5 checksum before demoting it. After demotion,
wait for the mirror status of the image to be 'up+unknown' in both
the clusters. This is to make sure that the non-primary image in the
other cluster is ready to be promoted. Now promote the non-primary
image in the other cluster. Map the promoted image and calculate its
MD5 checksum. Verify that the checksums of the demoted and promoted
images in the two clusters are the same.
The above test is run as part of two different workunits:
- a workunit that validates the syncing of multiple mirrored images
with workloads running on them
- another workunit that validates the syncing of a single mirrored
image with workload running on it and the image is set as primary
alternatively between the two clusters, as it happens during
failover and failback scenarios.
Fixes: https://tracker.ceph.com/issues/61617 Signed-off-by: Ramana Raja <rraja@redhat.com> Co-authored-by: Ilya Dryomov <idryomov@redhat.com> Co-authored-by: Christopher Hoffman <choffman@redhat.com>
Ramana Raja [Fri, 9 Feb 2024 00:32:37 +0000 (19:32 -0500)]
qa/workunits: make wait_for_status_in_pool_dir() reentrant
In rbd_mirror_helpers.sh, the `wait_for_status_in_pool_dir()` helper
stored `mirror image status` and `mirror pool status` command outputs
in files that could be shared over successive calls or calls from
multiple threads. Instead store the command outputs in local variables
to make `wait_for_status_in_pool_dir()` reentrant.
This allows to override persistent min_alloc_size if needed.
This might be helpful to troubleshoot and work around issues like
https://tracker.ceph.com/issues/63618
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
The new version of the code only takes about 38% of the time of the old
one. See https://github.com/ronen-fr/hobjtostr/tree/rf-2 for the code
used to benchmark the new version.
hobject_fmt.h is folded into hobject.h, as fmtlib is now an accepted
dependency in all of Ceph.
Ronen Friedman [Mon, 12 Feb 2024 16:23:15 +0000 (10:23 -0600)]
osd: improve hobject_t::to_str() performance
The new version of the code takes only ~70% of the time of the old one.
See https://github.com/ronen-fr/hobjtostr/tree/rf-1 for the code used
to benchmark the performance of this and various other implementations.
Afreen [Tue, 13 Feb 2024 10:26:09 +0000 (15:56 +0530)]
mgr/dashboard: Handle errors for /api/osd/settings
Fixes https://tracker.ceph.com/issues/62089
issue:
=====
/api/osd/settings returns "TypeError: string indices must be
integers" sometimes.
The result is coming from `osd dump` command which instead of returning
an object returns an error message which then displays error on
dashboard.
fix:
====
Added a try-catch block to handle error and updated frontend code to
handle those
Matan Breizman [Mon, 19 Feb 2024 12:24:52 +0000 (12:24 +0000)]
common/buffer_seastar: fix alien threads memory
The underlying raw_seastar_foreign_ptr::ptr is allocated from seastar.
This ptr is wrapped with seastar::foreign_ptr:
```
/// \c foreign_ptr<> wraps smart pointers -- \ref seastar::shared_ptr<>,
/// or similar, and remembers on what core this happened.
/// When the \c foreign_ptr<> object is destroyed, it sends a message to
/// the original core so that the wrapped object can be safely destroyed.
```
The issue is that once the pointer is de-allocated from an alien thread
it is unable to send a message to the original core.
Fix this issue by making use of seastar::alien integration with non-seastar applications.
In case ~raw_seastar_foreign_ptr() will be called from an alien thread, we will submit *and wait*
for the memory to be released from the origin core.
Redouane Kachach [Wed, 21 Feb 2024 07:27:53 +0000 (08:27 +0100)]
mgr/rook: adding empty calls to upgrade_ls and upgrade_status
added empty calls to upgrade_ls and upgrade_status to avoid
dashboard errors when entering the view Cluster > Upgrade. Empty
calls are used because we don't support the upgrade functionality
in rook as we do for normal Ceph deployments. In case of rook user
has to follow a different process to upgrade Ceph.
luo rixin [Wed, 7 Feb 2024 03:21:50 +0000 (11:21 +0800)]
cmake/AddCephTest: Specify resoureces to crimson unittest
When running crimson unittest, the seastar framework always
use and only use cpu0, and with many parallel crimson unittest
jobs, all the jobs are running on cpu0, the other cpu cores
can't used, make the make check run very slow, even timeout
happens. Use set_property RESOURCE_GROUPS to specify cpu resources
to crimson unittest, and accelerate make check running.
Fixes: https://tracker.ceph.com/issues/64117 Co-authored-by: Kefu Chai <tchaikov@gmail.com> Signed-off-by: luo rixin <luorixin@huawei.com>
Samuel Just [Tue, 20 Feb 2024 23:56:26 +0000 (23:56 +0000)]
cmake/.../FindSanitizers: add check for Sanitizers_FIBER_SUPPPORT
With newer clang and gcc versions (observed on clang-17.0.6 as
well as gcc 12/13), asan is throwing stack-use-after-return
during OSD startup related to usage of seastar::async, which
relies on swapcontext internally.
seastar/src/core/thread.cc supports asan's hooks, but only if
SEASTAR_HAVE_ASAN_FIBER_SUPPORT is set. seastar's CMakeList.txt
sets it based on Sanitizers_FIBER_SUPPORT, which probably should
be set by the module at src/seastar/cmake/FindSanitizers.cmake,
but that module doesn't seem to be actually invoked anywhere.
Ceph's version of that module (cmake/modules/FindSanitizers.cmake)
does not set Sanitizers_FIBER_SUPPORT.
This commit adds that check as well as the related code snippet.
Fixes: https://tracker.ceph.com/issues/64512 Signed-off-by: Samuel Just <sjust@redhat.com>
Venky Shankar [Tue, 20 Feb 2024 04:58:48 +0000 (10:28 +0530)]
Merge PR #52670 into main
* refs/pull/52670/head:
doc: add the reject the clone when threads are not available feature in the document
qa: add test cases for the support to reject clones feature
mgr/volumes: support to reject CephFS clones if cloner threads are not available
luo rixin [Sat, 27 Jan 2024 06:59:11 +0000 (14:59 +0800)]
CMakeLists: Modify CEPH_TEST_TIMEOUT from 3600s to 7200s
There are some older Arm server running pretty slow, the make
check jobs like `check-generated.sh` are killed as the job timeout.
Make CEPH_TEST_TIMEOUT more longer.
Patrick Donnelly [Thu, 15 Feb 2024 15:28:32 +0000 (10:28 -0500)]
mds: reverse MDSMap encoding of max_xattr_size/bal_rank_mask
Commit e134c890 adds the bal_rank_mask with encoded (ev) version 17. This was
merged into main Oct 2022 and made it into the reef release normally.
Commit 7b8def5c adds the max_xattr_size also with encoded (ev) version 17 but
places it before bal_rank_mask. This is problematic as there were no plans to
backport e134c890 to quincy or pacific so piggybacking on the ev 17 bump would
not work and otherwise would require the backports to be done as a set to
ensure consistency (including with the kernel client).
However, the real issue is that 7b8def5c was not merged until after reef was
already cut. This required 7b8def5c to be backported separately in [1] which
was not merged until after v18.2.1 (current reef HEAD as of this commit).
Ultimately, this means that there are reef versions (v18.2.[01]) in the wild
which expect bal_rank_mask to be encoded at ev17 and not (max_xattr_size,
bal_rank_mask). Adding to the complications, the kernel client has already
merged code [2] expecting max_xattr_size for ev17.
It was decided in a github discussion [3] to move bal_rank_mask to ev18 to
avoid updating the kernel client which was done in the main branch via 36ee8e7e
and update the reef max_xattr_size backport with the same change (d8cebd67).
Unfortunately, this breaks upgrades v18.2.[01] to newer reef versions or to
main. The reason is that monitors will encode v17 with bal_rank_mask
(max_xattr_size is not merged yet) and send that to upgraded mgrs (which are
upgraded first). The mgr will attempt to decode bal_rank_mask as a uint64_t
(max_xattr_size) but fail because an empty (by default) bal_rank_mask is simply
encoded as a signed 32-bit integer. Consequently, the mgr will fail decoding
with:
failed to decode message of type 45 v1: End of buffer [buffer:2]
Of course the problem does not stop there, even if the mgr were able to handle
this, the monitors/mds/clients would fail in similar fashion.
So the only choice left is to fix max_xattr_size to be encoded at ev18.
Fortunately, v18.2.2 has not been released nor has any max_xattr_size backport
to quincy/pacific been merged. The main downside will be that kernels will
wrongly decode ev17 (which is already true for ceph clusters running
v18.2.[01]). A follow-up kernel fix will be required.
Zac Dover [Mon, 19 Feb 2024 08:41:45 +0000 (18:41 +1000)]
doc/cephfs: edit add-remove-mds
Disambiguate a note in doc/cephfs/add-remove-mds.rst to help readers
distinguish between cases in which they might want to use an automated
tool such as cephadm to deploy MDSes and cases in which they might want
to manually deploy MDSes.
See: https://github.com/ceph/ceph/pull/45639
Tracker: https://tracker.ceph.com/issues/54551
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>