Ionut Balutoiu [Wed, 1 Nov 2023 16:07:58 +0000 (18:07 +0200)]
rgw: fix cloud-sync multi-tenancy scenario
At the moment, we cannot set buckets prefixed with tenant ID in the
`source_bucket` field from cloud-sync profiles (non-trivial config):
https://docs.ceph.com/en/latest/radosgw/cloud-sync-module/#non-trivial-configuration
This is because the `do_find_profile` function only searches in the
profiles configured using `bucket.name`, and it ignores `bucket.tenant`.
This is problematic in the RGW multi-tenancy scenario:
https://docs.ceph.com/en/latest/radosgw/multitenancy/#rgw-multi-tenancy
At the moment, we can only configure bucket name in the profile
`source_bucket` field. In the multi-tenancy scenario, this would sync
all the buckets (from all the tenants).
Without this fix, we cannot configure a cloud-sync profile that syncs
all the buckets from a tenant to a particular S3 target.
For example, we cannot do this:
* `tenantA/test-bucket` -> S3 target A
* `tenantB/test-bucket` -> S3 target B
* `tenantC/test-bucket` -> S3 target C
We can only do this at the moment:
* `test-bucket` -> S3 target A
If `test-bucket` is present in both `tenantA` and `tenantB`, both
buckets will be synced to S3 target A.
The idea would be to be able to do this:
* `tenantA/*` -> S3 target A
* `tenantB/*` -> S3 target B
* `tenantC/*` -> S3 target C
If `test-bucket` is present in all tenants, each tenant bucket is
synced to its own S3 target.
Ramana Raja [Mon, 18 Sep 2023 02:52:56 +0000 (22:52 -0400)]
qa/suites/rbd: add test to check rbd_support module recovery
... on repeated blocklisting of its client.
There were issues with rbd_support module not being able to recover
from its RADOS client being repeatedly blocklisted. This occured for
example in clusters with OSDs slow to process RBD requests while the
module's mirror_snapshot_scheduler was taking mirror snapshots by
requesting exclusive locks on the RBD images and workloads were running
on the snapshotted images via kernel clients.
There is no need for CreateSnapshotRequests.__del__() that calls
CreateSnapshotRequests.wait_for_pending().
MirrorSnapshotScheduleHandler.shutdown() already calls
CreateSnapshotRequests.wait_for_pending().
Ramana Raja [Thu, 26 Oct 2023 17:18:52 +0000 (13:18 -0400)]
mgr/rbd_support: fix recursive locking on CreateSnapshotRequests lock
The MirrorSnapshotScheduleHandler's run thread issues asynchronous
create snapshot requests using a CreateSnapshotRequests instance. When
the thread invokes a CreateSnapshotRequests instance's get_ioctx(),
the instance's class variable lock is acquired. With the class
variable lock held, the garbage collection of a CreateSnapshotRequests
instance may race in the thread. The thread would then call
CreateSnapshotRequests __del__() that tries to acquire the class
variable lock that the thread already holds. Fix this
recursive deadlock by converting the CreateSnapshotRequests lock from
a class variable to an instance variable. There is no need to share
the lock across CreateSnapshotRequests instances.
Also convert MirrorSnapshotScheduleHandler, PerfHandler and
TrashPurgeScheduleHandler class variables to instance variables
that don't need to be shared across the instances.
Fixes: https://tracker.ceph.com/issues/62994 Signed-off-by: Ramana Raja <rraja@redhat.com> Co-Authored-By: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 4452bc22d1c6c8499cf55d6e39090adf7ae1dcbf)
Zac Dover [Wed, 1 Nov 2023 01:53:59 +0000 (11:53 +1000)]
doc/cephadm: edit troubleshooting.rst (1 of x)
Edit doc/cephadm/troubleshooting.rst. This commit and the PR of which it
is a part was raised in response to
https://github.com/ceph/ceph/pull/53976. The limits of reStructuredText
are particularly visible here in every instance of a BASH for-loop and
in every instance of a command stretched over multiple lines.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 69472c26af5419faa9ed93c071ed5933d03fa67f)
Zac Dover [Mon, 30 Oct 2023 02:37:39 +0000 (12:37 +1000)]
doc/glossary: improve "BlueStore" entry
Initially s/backend/back end/ but then I added a little more information
about BlueStore's use of RocksDB to map object names to block locations
on disk.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 8713cca328c9373636efdb92449d743b5bd56584)
Aashish Sharma [Mon, 30 Oct 2023 07:47:37 +0000 (13:17 +0530)]
mgr/dashboard: update rgw multisite import form helper info
Change 'To obtain the token, generate it from your secondary Ceph cluster' to 'To obtain the token, generate it from your primary Ceph cluster' in rgw multisite import form helper
Zac Dover [Fri, 27 Oct 2023 06:58:28 +0000 (16:58 +1000)]
doc/rados: remove cache-tiering-related keys
Remove information related to cache-tiering-related keys from
doc/rados/operations/pools.rst. Cache-tiering is deprecated in Reef.
This PR is suitable for backporting to the Reef release branch, but not
to release branches prior to Reef.
John Mulligan [Wed, 11 Oct 2023 18:05:17 +0000 (14:05 -0400)]
cephadm: add a --dry-run option to cephadm shell
Instead of creating the shell, the --dry-run option prints the container
command that would be used. This can be used as a starting point for
creating custom container commands similar to what cephadm shell would
generate but with tweaks.
Zac Dover [Wed, 25 Oct 2023 23:48:57 +0000 (09:48 +1000)]
doc/rados: remove HitSet-related key information
Remove HitSet-related key information from
doc/rados/operations/pools.rst. HitSet-related keys are relevant only to
releases of Ceph that support cache tiering. Only Quincy and earlier
(inclusive) releases of Ceph support cache tiering. Backport this commit
from main to Reef, but not to Quincy or to release branches earlier than
Quincy.
Direct leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x7f5c76eb6367 in operator new(unsigned long) (/lib64/libasan.so.6+0xb6367)
#1 0x7f5c76a2fb81 in MallocExtension::Register(MallocExtension*) (/lib64/libtcmalloc.so.4+0x2fb81)
SUMMARY: AddressSanitizer: 8 byte(s) leaked in 1 allocation(s)
```
crimson/os/seastore/onode_manager: populate value recorders of onodes to
be erased
Otherwise, the following modification sequence with the same transaction
might lead to onode extents' crc inconsistency during journal replay:
1. modify the last mapping in an onode extent;
2. erase the last mapping in that onode extent.
During journal replay, if the first modification is not recorded in the
delta, the onode extent's content would be inconsistent with that before
the system reboot
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Co-authored-by: Zac Dover <zac.dover@proton.me> Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@ibm.com>
(cherry picked from commit 9bb63bdc8969f2ecdfeedfc8396890ad59f0d796)
Mark Nelson [Wed, 27 Apr 2022 15:06:22 +0000 (15:06 +0000)]
crimson: Enable tcmalloc when using seastar
classic-osds have always caused significant memory fragmentation
when using the libc memory allocator due to the way that Ceph
tends to utilize memory. In recent testing, crimson-osd was found
to use 25-27GB of RAM with the stock 3GB bluestore cache settings
(osd_memory_target is only used when tcmalloc is available). Upon
further testing, it was found that the classic OSD is even worse,
using between 32-33GB of RAM after a 5 minute 4K sequential
write test when using libc malloc.
The good news is that it appears that crimson-osd is able to use
tcmalloc for alienstore without significant modification. Better
still, it drastically reduces memory usage. In the same test that
resulted in 25GB RSS memory usage for crimson-osd with libc malloc,
a tcmalloc linked version took around 9GB (with an 8GB
osd_memory_target). Since we do not yet (afaik) expose classic OSD
debugging in crimson it is tough to tell why we are still a little
over, but it's clear that for alienstore we are going to need to
use tcmalloc as we do in classic.