Ilya Dryomov [Sun, 12 May 2024 09:15:36 +0000 (11:15 +0200)]
qa/suites/krbd: drop pre-single-major test
Single-major mapping scheme was introduced in 2014 and became the
default in 2017. It's getting increasingly difficult to build and,
more importantly, to boot a 10 year old kernel with recent userspace
(systemd, etc). If someone is still running such a kernel, it's
really unlikely that they would have the most recent rbd CLI tool
installed.
Cleanup of variables, queries and tests to enable showMultiCluster=True
Rendering the dashboards with showMultiCluster=True allows for
them to work with multiple clusters storing their metrics in a single
Prometheus instance. This works via the cluster label and that functionality
already existed. This just fixes some inconsistencies in applying the label
filters.
Additionally this contains updates to the tests to have them succeed with
with both configurations and avoid the introduction of regressions in
regards to multiCluster in the future.
There also are some consistency cleanups here and there:
* `datasource` was not used consistently
* `cluster` label_values are determined from `ceph_health_status`
* `job` template and filters on this label were removed to align multi cluster
support solely via the `cluster` label
* `ceph_hosts` filter now uses label_values from any ceph_metadata metrici
to now show all instance values, but those of hosts with some Ceph
component / daemon.
* Enable showMultiCluster=True since `cluster` label is now always present,
via https://github.com/ceph/ceph/pull/54964
Improves: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
(cherry picked from commit 090b8e17f1e84d8b20e05143d5dd7ff107031176)
Zac Dover [Sun, 12 May 2024 01:39:34 +0000 (11:39 +1000)]
doc/cephfs: edit fs-volumes.rst (1 of x) followup
Include the suggestions for improving doc/cephfs/fs-volumes.rst made by
Anthony D'Atri here
https://github.com/ceph/ceph/pull/57415#discussion_r1597362110
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit cb700d804b4390fd9f55444dcfc04dfebac3a1bf)
Ilya Dryomov [Mon, 6 May 2024 06:16:01 +0000 (08:16 +0200)]
qa/workunits/rbd: wait for replaying status in bootstrap tests
wait_for_replay_complete() doesn't wait for image status to get
updated. This didn't matter previously because these tests are run on
two different pools and nothing else was following.
rbd-mirror: remove callout when destroying pool replayer
If a pool replayer is removed in an error state (e.g. after failing to
connect to the remote cluster), its callout should be removed as well.
Otherwise, the error would persist causing "daemon health: ERROR"
status to be reported even after a new pool replayer is created and
started successfully.
rbd-mirror: shut down and remove pool replayer if peer changes
The code in Mirror::update_pool_replayers() responsible for shutting
down and removing stale pool replayers kicks in only in case the peer
is removed, but not if the peer changes. However, the code responsible
for (re)starting pool replayers in the same method _does_ create and
start a new pool replayer in that case. As a result, we can end up
with nearly identical pool replayers running at the same time, hogging
OS resources and confusing instance_id tracking logic and mirror status
reporting at the very least.
The root cause is that PeerSpec is matched normally (i.e. based on all
fields) when it comes to m_pool_replayers, and based only on UUID when
it comes to pool_peers. This was missed in commit 5463e1a1e1b7
("rbd-mirror: extract optional peer mon_host/key values from MON").
Re-enabling the Reserver-based scrub queuing (undoing
https://github.com/ceph/ceph/pull/56750), as all known issues
related to the reservation queuing have been fixed and back-ported.
Casey Bodley [Fri, 3 May 2024 19:43:39 +0000 (15:43 -0400)]
rgw: move publish_complete() back to RGWCompleteMultipart::execute()
move publish_complete() and meta_obj->delete_object() back to execute()
so they only run on success. this allows several member variables to
move back to execute()'s stack as well
Casey Bodley [Fri, 3 May 2024 19:29:00 +0000 (15:29 -0400)]
rgw: CompleteMultipart uses s->object for Notification
get_notification() should be associated with the target object
s->object. the meta_obj has the wrong object name, so required passing
s->object->get_name() as an extra argument
importantly, Notification no longer depends on the lifetime of meta_obj
to avoid a dangling pointer, while the lifetime of s->object is guaranteed
Matan Breizman [Mon, 19 Feb 2024 12:24:52 +0000 (12:24 +0000)]
common/buffer_seastar: fix alien threads memory
The underlying raw_seastar_foreign_ptr::ptr is allocated from seastar.
This ptr is wrapped with seastar::foreign_ptr:
```
/// \c foreign_ptr<> wraps smart pointers -- \ref seastar::shared_ptr<>,
/// or similar, and remembers on what core this happened.
/// When the \c foreign_ptr<> object is destroyed, it sends a message to
/// the original core so that the wrapped object can be safely destroyed.
```
The issue is that once the pointer is de-allocated from an alien thread
it is unable to send a message to the original core.
Fix this issue by making use of seastar::alien integration with non-seastar applications.
In case ~raw_seastar_foreign_ptr() will be called from an alien thread, we will submit *and wait*
for the memory to be released from the origin core.
Samuel Just [Thu, 18 Apr 2024 22:19:31 +0000 (15:19 -0700)]
os/: modify getattrs to clear attrs out param before populating
Passing in a non-empty map would otherwise exhibit quite unexpected
behavior. For the bufferptr overload, any preexisting entries would
not be overwritten due to how std::map::emplace behaves. For the
bufferlist overload, it would result in appending to any pre-existing
entries.
The prior commit cleans up one such inadvertent caller which resulted
in the below bug.
rgw/multisite-notification: retry storing bucket notification attrs for ECANCELED(ConcurrentModification) errors.
An ECANCELED error coming from our write of the bucket instance metadat is a common error for metadata writes on secondary zones, because secondary write races with metadata sync from the write that is forwarded to the master zone
Zac Dover [Thu, 2 May 2024 08:36:34 +0000 (18:36 +1000)]
doc/cephadm: Squid default images procedure
Address Adam King's request for version-specific
cephadm-container-image-retrieval procedures, which he requested here: https://github.com/ceph/ceph/pull/57208#discussion_r1586614140
Co-authored-by: Adam King <adking@redhat.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
Add a list of default monitor images to the documentation. This commit
is made in response to a request from Eugen Block, and is made using the
information developed by Mr Block here:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QGC66QIFBKRTPZAQMQEYFXOGZJ7RLWBN/.
Explain that an error message received in response to
"redirect_resolve_ip_addr True" might be caused by having an
insufficiently recent release of Ceph running in your cluster.
Soumya Koduri [Mon, 25 Mar 2024 18:08:57 +0000 (23:38 +0530)]
rgw/lc: advance head if the current entry doesn't exist
This is extension to https://github.com/ceph/ceph/pull/47595.
When skipping the non-exist LC entry, instead of get_next_entry(),
use advance_head() to fetch the next entry. In case the cycle
is finished for that shard, head should be reset to avoid lc process
being stuck in indefinite loop.
Casey Bodley [Fri, 22 Mar 2024 14:23:31 +0000 (10:23 -0400)]
rgw: increase default metadata cache size for accounts
account users will put some extra pressure on the metadata cache,
because each request has to load metadata for the account and zero
or more groups, in addition to the user's access key and user metadata
librbd: make group and group snapshot IDs more random
Image IDs suffered from the same issue -- it was addressed in commit be8373688c1b ("librbd: block_name_prefix is not created randomly").
The code for generating group IDs is duplicated in api/Group.cc and
got missed.
Instead of cut-and-pasting the fix, just call generate_image_id()
directly and rename variables for more explicitness.
Adam King [Tue, 23 Apr 2024 16:04:39 +0000 (12:04 -0400)]
doc/cephadm: remove downgrade reference from upgrade docs
This has been in here for years, but cephadm will block
attempted upgrades to lower versions and we generally
don't want people to think this is supported or safe.