Zac Dover [Sat, 4 Jan 2025 20:54:48 +0000 (06:54 +1000)]
doc: README.md - improve "Tshooting" and "Tips & Tricks"
Improve the formatting and English language in the sections
"Troubleshooting" and "Tips and Tricks", and move those sections to a
place where they don't interrupt the flow of the vstart cluster
installation instructions. Some of the strings in "Tips and Tricks" are
not yet unambiguous sentences that will make sense to the uninitiated,
but this PR represents a step in that direction.
This PR is part of a series of PRs meant to preserve the integrity of
the README.md file after some recent additions that break the flow of
the document.
This PR follows https://github.com/ceph/ceph/pull/61226 and
https://github.com/ceph/ceph/pull/61221.
Zac Dover [Fri, 3 Jan 2025 19:52:24 +0000 (05:52 +1000)]
doc: README.md - format "Troubleshooting"
Format "Troubleshooting" into its own section so that it doesn't confuse
readers of the vstart installation procedure.
This PR is part of a series of PRs meant to preserve the integrity of
the README.md file after some recent additions that break the flow of
the document.
This PR follows https://github.com/ceph/ceph/pull/61221.
Once we destruct SharedLRU, SharedLRU::weak_refs map is destroyed.
As a weak refernce might outlive the SharedLRU itself, when destroying
the object via the custom Deleter, we try to access the already
destroyed SharedLRU instance's weak ref map.
Instead, invalidate the custom Deleter (Deleter::cache), when
destructing the SharedLRU.
Ronen Friedman [Sun, 29 Dec 2024 11:26:28 +0000 (05:26 -0600)]
qa/standalone/scrub: add build_pg_dicts()
a helper function that builds bash dictionaries:
pg to acting set, pg to primary & pg to pool.
Also added are two helper functions that make use of the dictionaries:
count_common_active() to count the number of common OSDs
in the acting set of two PGs, and find_disjoint_but_primary()
to find a PG that is disjoint from the first PG, apart from
possibly having the same primary OSD.
Benedikt Heine [Mon, 30 Dec 2024 14:26:16 +0000 (15:26 +0100)]
doc/mgr/dashboard: Fix HAProxy TLS example
With `ssl` set on the `server` option, HAProxy strips the TLS protocol
for all clients. You would need to connect to it with `http://<ip>:443`.
To have an active health check, which uses SSL, but does not strip it
for clients, you'd need to add:
- `check` to enable active health checks.
- `check-ssl` to instruct the health check to use TLS
- `verify none` to skip verification on the health check requests from
HAProxy
- _REMOVE_ `ssl` to stop stripping TLS
The active health checks are required to not route any requests to the
inactive managers. These would redirect to any unusable IP from the
active mgr.
---
Alternatively you could add another certificate in the frontend and then
re-encrypt the traffic. But this would require tracking the certs also
in HAProxy.
Ronen Friedman [Thu, 19 Dec 2024 16:02:08 +0000 (10:02 -0600)]
osd/scrub: abort reserving scrub if an operator-initiated scrub is
requested
Handling the case of receiving an operator command while the PG is
scrubbing, but
is waiting for replicas' reservations:
Now that the reservations are queued, the wait may be a very prolonged
one.
Usually - an operator direct scrub command has a priority high enough
to not require waiting for reservations. But in the current
implementation,
it would wait until the running scrub session terminates, and only then
will rerun at that high priority. This is not the intended behavior.
The solution is to abort the existing scrub session, and start the new
one.
Ronen Friedman [Thu, 26 Dec 2024 13:06:10 +0000 (07:06 -0600)]
osd/scrub: register for 'osd_max_scrubs' config changes
Since https://github.com/ceph/ceph/pull/55340, the
osd_max_scrubs (also) affects the parameters of the
async scrub reserver used by the replicas. Thus,
the code must notice and acknowledge changes to this config.
Venky Shankar [Fri, 27 Dec 2024 11:06:10 +0000 (16:36 +0530)]
Merge PR #55616 into main
* refs/pull/55616/head:
PendingReleaseNotes: add note for replay completion warning
qa: test to verify `MDS_ESTIMATED_REPLAY_TIME` warning
doc: add a note for `MDS_ESTIMATED_REPLAY_TIME` MDS warning
mds: emit warning for estinated replay time
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com> Reviewed-by: Milind Changire <mchangir@redhat.com>
Ronen Friedman [Thu, 26 Dec 2024 13:06:10 +0000 (07:06 -0600)]
osd/scrub: register for 'osd_max_scrubs' config changes
Since https://github.com/ceph/ceph/pull/55340, the
osd_max_scrubs (also) affects the parameters of the
async scrub reserver used by the replicas. Thus,
the code must notice and acknowledge changes to this config.
Ronen Friedman [Fri, 22 Nov 2024 18:00:50 +0000 (12:00 -0600)]
osd/scrub: show reservation status in 'pg dump' output
Whenever a PG is selected for scrubbing, and is waiting for
remote reservations, the 'pg dump' output will include the
following text (under the 'SCRUB_SCHEDULING' column):
Reserving. Waiting Ns for OSD.k (n/m)
Ilya Dryomov [Fri, 20 Dec 2024 10:16:58 +0000 (11:16 +0100)]
rbd: drop --pool option from "rbd group image {add,rm}"
It stopped working with removal of get_special_pool_group_names() in
commit 3e8624f157a1 ("rbd: add support for namespaces") over six years
ago. Given how much time has passed, stop accepting this option.
Ilya Dryomov [Tue, 17 Dec 2024 15:06:17 +0000 (16:06 +0100)]
rbd: handle --{group,image}-namespace in "rbd group image {add,rm}"
Currently only passing the namespace as part of the group or image spec
works. If --group-namespace or --image-namespace options are used, the
namespace isn't picked up.
Laura Flores [Tue, 17 Dec 2024 23:18:11 +0000 (17:18 -0600)]
PendingReleaseNotes: add note about tracker #69012
We merged a fix for v19.2.1 that helps alleviate
the worst of this problem (https://tracker.ceph.com/issues/68657),
but it still comes up on occasion. This release note addresses the
remaining issues tracked in https://tracker.ceph.com/issues/69012.
Zac Dover [Thu, 19 Dec 2024 13:19:22 +0000 (23:19 +1000)]
doc/radosgw: edit uadk-accel.rst
Edit the sections of doc/radosgw/uadk-accel.rst that concern the
automatic and manual building of UADK.
This is one in a series of uadk-accel.rst-related changes that includes
the following PRs:
https://github.com/ceph/ceph/pull/60953
https://github.com/ceph/ceph/pull/61128
Kotresh HR [Wed, 23 Oct 2024 19:00:41 +0000 (00:30 +0530)]
client: Fix a deadlock when osd is full
Problem:
When osd is full, the client receives the notification
and cancels the ongoing writes. If the ongoing writes
are async, it could cause a dead lock as the async
callback registered also takes the 'client_lock' which
the handle_osd_map takes at the beginning.
The op_cancel_writes calls the callback registered for
the async write synchronously holding the 'client_lock'
causing the deadlock.
Earlier approach:
It was tried to solve this issue by calling 'op_cancel_writes'
without holding 'client_lock'. But this failed lock dependency
between objecter's 'rwlock' and async write's callback taking
'client_lock'. The 'client_lock' should always be taken before
taking 'rwlock'. So this approach is dropped against the current
approach.
Solution:
Use C_OnFinisher for objecter async write callback i.e., wrap
the async write's callback using the Finisher. This queues the
callback to the Finisher's context queue which the finisher
thread picks up and executes thus avoiding the deadlock.
Testing:
The fix is tested in the vstart cluster with the following reproducer.
1. Mount the cephfs volume using nfs-ganesha at /mnt
2. Run fio on /mnt on one terminal
3. On the other terminal, blocklist the nfs client session
4. The fio would hang
It is reproducing in the vstart cluster most of the times. I think
that's because it's slow. The same test written for teuthology is
not reproducing the issue. The test expects one or more writes
to be on going in rados when the client is blocklisted for the deadlock
to be hit.
Stripped down version of Traceback:
----------
0 0x00007f4d77274960 in __lll_lock_wait ()
1 0x00007f4d7727aff2 in pthread_mutex_lock@@GLIBC_2.2.5 ()
2 0x00007f4d7491b0a1 in __gthread_mutex_lock (__mutex=0x7f4d200f99b0)
3 std::mutex::lock (this=<optimized out>)
4 std::scoped_lock<std::mutex>::scoped_lock (__m=..., this=<optimized out>, this=<optimized out>, __m=...)
5 Client::C_Lock_Client_Finisher::finish (this=0x7f4ca0103550, r=-28)
6 0x00007f4d74888dfd in Context::complete (this=0x7f4ca0103550, r=<optimized out>)
7 0x00007f4d7498850c in std::__do_visit<...>(...) (__visitor=...)
8 std::visit<Objecter::Op::complete(...) (__visitor=...)
9 Objecter::Op::complete(...) (e=..., e=..., r=-28, ec=..., f=...)
10 Objecter::Op::complete (e=..., r=-28, ec=..., this=0x7f4ca022c7f0)
11 Objecter::op_cancel (this=0x7f4d200fab20, s=<optimized out>, tid=<optimized out>, r=-28)
12 0x00007f4d7498ea12 in Objecter::op_cancel_writes (this=0x7f4d200fab20, r=-28, pool=103)
13 0x00007f4d748e1c8e in Client::_handle_full_flag (this=0x7f4d200f9830, pool=103)
14 0x00007f4d748ed20c in Client::handle_osd_map (m=..., this=0x7f4d200f9830)
15 Client::ms_dispatch2 (this=0x7f4d200f9830, m=...)
16 0x00007f4d75b8add2 in Messenger::ms_deliver_dispatch (m=..., this=0x7f4d200ed3e0)
17 DispatchQueue::entry (this=0x7f4d200ed6f0)
18 0x00007f4d75c27fa1 in DispatchQueue::DispatchThread::entry (this=<optimized out>)
19 0x00007f4d77277c02 in start_thread ()
20 0x00007f4d772fcc40 in clone3 ()
--------