This commit updates the `is_device` function to correctly handle
loop devices.
The function now validates loop devices when they are explicitly
allowed, by checking their type (`loop`) in addition to `disk`
and `mpath`.
Changes include:
- Extending the type check to include `loop` in the list of
supported device types.
- Enhancing the docstring for better documentation of the
function's purpose and behavior.
These changes ensure that loop devices are properly recognized
and handled when configuring OSDs in ceph-volume.
ceph-volume: add python hints to util.prepare.create_id()
This commit introduces type annotations to the `create_id` function in `ceph_volume.util.prepare`.
The parameters and return value are now typed as follows:
- `fsid` is a `str`.
- `json_secrets` is a `str`.
- `osd_id` is an optional `str` (`Optional[str]`).
- The function returns a `str`.
Vallari Agrawal [Wed, 25 Dec 2024 05:01:21 +0000 (10:31 +0530)]
mon/NVMeofGwMap: add delay to NVMEOF_GATEWAY_DELETING warning
Instead of immediately triggering, have this healthcheck trigger
after some time has elasped. This delay can be configured by
mon_nvmeofgw_delete_grace.
Track the time when gateways go into DELETING state in a new
member var (of NVMeofGwMon) 'gws_deleting_time'.
Casey Bodley [Wed, 18 Dec 2024 16:28:02 +0000 (11:28 -0500)]
rgw: don't use merge_and_store_attrs() when recreating a bucket
https://github.com/ceph/ceph/pull/56583 recently fixed
merge_and_store_attrs() to preserve existing attrs, but this broke the
swift api's ability to remove container metadata. RGWCreateBucket
handles this merging itself with prepare_add_del_attrs(), so we should
just assign createparams.attrs to the bucket and store it with
bucket->put_info()
make the same change for RGWPutMetadataBucket which swift uses to
add/remove existing metadata
This config allows to configure the delay in triggering
NVMEOF_GATEWAY_DELETING healthcheck warning, which is
triggered when NVMeoF gateways are in DELETEING state
for too long (indicating a problem in namespace
load-balacing).
The default value for this config is 15 mins.
Zac Dover [Sat, 4 Jan 2025 20:54:48 +0000 (06:54 +1000)]
doc: README.md - improve "Tshooting" and "Tips & Tricks"
Improve the formatting and English language in the sections
"Troubleshooting" and "Tips and Tricks", and move those sections to a
place where they don't interrupt the flow of the vstart cluster
installation instructions. Some of the strings in "Tips and Tricks" are
not yet unambiguous sentences that will make sense to the uninitiated,
but this PR represents a step in that direction.
This PR is part of a series of PRs meant to preserve the integrity of
the README.md file after some recent additions that break the flow of
the document.
This PR follows https://github.com/ceph/ceph/pull/61226 and
https://github.com/ceph/ceph/pull/61221.
Zac Dover [Fri, 3 Jan 2025 19:52:24 +0000 (05:52 +1000)]
doc: README.md - format "Troubleshooting"
Format "Troubleshooting" into its own section so that it doesn't confuse
readers of the vstart installation procedure.
This PR is part of a series of PRs meant to preserve the integrity of
the README.md file after some recent additions that break the flow of
the document.
This PR follows https://github.com/ceph/ceph/pull/61221.
Add a warning when NVMeoF gateways are in DELETING state.
This happens when there are namespaces under the deleted gateway's
ANA group ID.
The gateways are removed completely after users manually move these
namespaces to another load balancing group. Or if a new gateway is
deployed on that host.
Once we destruct SharedLRU, SharedLRU::weak_refs map is destroyed.
As a weak refernce might outlive the SharedLRU itself, when destroying
the object via the custom Deleter, we try to access the already
destroyed SharedLRU instance's weak ref map.
Instead, invalidate the custom Deleter (Deleter::cache), when
destructing the SharedLRU.
Ronen Friedman [Sun, 29 Dec 2024 11:26:28 +0000 (05:26 -0600)]
qa/standalone/scrub: add build_pg_dicts()
a helper function that builds bash dictionaries:
pg to acting set, pg to primary & pg to pool.
Also added are two helper functions that make use of the dictionaries:
count_common_active() to count the number of common OSDs
in the acting set of two PGs, and find_disjoint_but_primary()
to find a PG that is disjoint from the first PG, apart from
possibly having the same primary OSD.
Benedikt Heine [Mon, 30 Dec 2024 14:26:16 +0000 (15:26 +0100)]
doc/mgr/dashboard: Fix HAProxy TLS example
With `ssl` set on the `server` option, HAProxy strips the TLS protocol
for all clients. You would need to connect to it with `http://<ip>:443`.
To have an active health check, which uses SSL, but does not strip it
for clients, you'd need to add:
- `check` to enable active health checks.
- `check-ssl` to instruct the health check to use TLS
- `verify none` to skip verification on the health check requests from
HAProxy
- _REMOVE_ `ssl` to stop stripping TLS
The active health checks are required to not route any requests to the
inactive managers. These would redirect to any unusable IP from the
active mgr.
---
Alternatively you could add another certificate in the frontend and then
re-encrypt the traffic. But this would require tracking the certs also
in HAProxy.
Ronen Friedman [Thu, 19 Dec 2024 16:02:08 +0000 (10:02 -0600)]
osd/scrub: abort reserving scrub if an operator-initiated scrub is
requested
Handling the case of receiving an operator command while the PG is
scrubbing, but
is waiting for replicas' reservations:
Now that the reservations are queued, the wait may be a very prolonged
one.
Usually - an operator direct scrub command has a priority high enough
to not require waiting for reservations. But in the current
implementation,
it would wait until the running scrub session terminates, and only then
will rerun at that high priority. This is not the intended behavior.
The solution is to abort the existing scrub session, and start the new
one.
Ronen Friedman [Thu, 26 Dec 2024 13:06:10 +0000 (07:06 -0600)]
osd/scrub: register for 'osd_max_scrubs' config changes
Since https://github.com/ceph/ceph/pull/55340, the
osd_max_scrubs (also) affects the parameters of the
async scrub reserver used by the replicas. Thus,
the code must notice and acknowledge changes to this config.
Venky Shankar [Fri, 27 Dec 2024 11:06:10 +0000 (16:36 +0530)]
Merge PR #55616 into main
* refs/pull/55616/head:
PendingReleaseNotes: add note for replay completion warning
qa: test to verify `MDS_ESTIMATED_REPLAY_TIME` warning
doc: add a note for `MDS_ESTIMATED_REPLAY_TIME` MDS warning
mds: emit warning for estinated replay time
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com> Reviewed-by: Milind Changire <mchangir@redhat.com>
Ronen Friedman [Thu, 26 Dec 2024 13:06:10 +0000 (07:06 -0600)]
osd/scrub: register for 'osd_max_scrubs' config changes
Since https://github.com/ceph/ceph/pull/55340, the
osd_max_scrubs (also) affects the parameters of the
async scrub reserver used by the replicas. Thus,
the code must notice and acknowledge changes to this config.
Ronen Friedman [Fri, 22 Nov 2024 18:00:50 +0000 (12:00 -0600)]
osd/scrub: show reservation status in 'pg dump' output
Whenever a PG is selected for scrubbing, and is waiting for
remote reservations, the 'pg dump' output will include the
following text (under the 'SCRUB_SCHEDULING' column):
Reserving. Waiting Ns for OSD.k (n/m)