Zac Dover [Thu, 15 May 2025 13:24:58 +0000 (23:24 +1000)]
doc/mgr: edit dashboard.rst
Edit doc/mgr/dashboard.rst. Add prompts.
This changes eighty-nine prompts. Because this makes so many changes,
all other edits included in https://github.com/ceph/ceph/pull/63255 will
be made in a separate commit. This done for the sake of the patience of
the reviewers (probably Anthony, if history is any guide).
This commit is part of a project to separate out the twenty-five files
that were committed to https://github.com/ceph/ceph/pull/63255.
John Mulligan [Wed, 18 Sep 2024 03:21:31 +0000 (20:21 -0700)]
pybind/mgr: attempt to fix mypy importing from python-common
For some reason mypy on python 3.12 can no longer automatically find
imports from python-common. Help it out by expanding the MYPYPATH
value for the tox.ini.
Ville Ojamo [Thu, 15 May 2025 09:46:21 +0000 (16:46 +0700)]
doc/radosgw: Use ref for hyperlinking to multisite
Use validated ":ref:" hyperlinks instead of "external links" in "target
definitions" when linking within the Ceph docs:
- Update to use existing label in multisite.rst.
- Remove unused "target definitions".
Also use existing label for linking from multisite.rst.
Fix a broken link within multisite.rst.
The rendered PR should look the same as the old docs, only differing in
the source RST.
Zac Dover [Tue, 13 May 2025 06:31:42 +0000 (16:31 +1000)]
doc/dev/cephfs-mirroring: edit file 1 of x
Add prompts (and perform necessary corrections to glaring grammatical
errors) to doc/dev/cephfs-mirroring.rst, as requested by Jos Collin in https://github.com/ceph/ceph/pull/63237/files#r2085886075.
This commit edits the first quarter of the doc/dev/cephfs-mirroring.rst
file. This commit encompasses about one-hundred lines of RST.
Zac Dover [Tue, 13 May 2025 06:58:39 +0000 (16:58 +1000)]
doc/dev/cephfs-mirroring: edit file 2 of x
Add prompts (and perform necessary corrections to glaring grammatical
errors) to doc/dev/cephfs-mirroring.rst, as requested by Jos Collin in
https://github.com/ceph/ceph/pull/63237/files#r2085886075.
This commit edits the second quarter of the doc/dev/cephfs-mirroring.rst
file. This commit encompasses about one-hundred lines of RST.
Zac Dover [Thu, 8 May 2025 02:29:25 +0000 (12:29 +1000)]
doc/mgr: edit alerts.rst
Edit doc/mgr/alerts.rst as part of the project to determine where the
error is in https://github.com/ceph/ceph/pull/62782 that prevents the
Jenkins tests from passing.
This commit adds to the work done in
https://github.com/ceph/ceph/pull/62782 by correcting some of the
English that was present in that PR.
This is a change to one of twenty-five files in
https://github.com/ceph/ceph/pull/62782, and this commit represents one
of what will be at least twenty-five other commits made to track this
error down.
Zac Dover [Thu, 8 May 2025 00:08:06 +0000 (10:08 +1000)]
doc/mgr/ceph_api: edit index.rst
Edit doc/mgr/ceph_api/index.rst as part of the project to determine
where the error is in https://github.com/ceph/ceph/pull/62782 that
prevents the Jenkins tests from passing.
This is a change to one of twenty-five files in
https://github.com/ceph/ceph/pull/62782, and this commit represents one
of what will be at least twenty-five other commits made to track this
error down.
Adam Kupczyk [Tue, 1 Apr 2025 14:01:23 +0000 (14:01 +0000)]
os/bluestore/bluefs: Fix race condition between truncate() and unlink()
It was possible for unlink() to interrupt ongoing truncate().
As the result, unlink() finishes properly, but truncate() is not aware
of it and does:
1) updates file that is already removed
2) releases same allocations again
Now fixed by checking if file is deleted under FILE lock.
when extending the log, the sequence was left on a bad state because it would first create a transaction to update with the current seq number but leave the "real" transaction with the same sequence number which should be `extend_log_transaction.seq + 1`.
Correct the presentation of an example string in doc/cephadm/rgw.rst in
order to obviate an error reading "rgw.rst:202: WARNING: Inline emphasis start-string without end-string."
doc/rados: Update mClock doc on steps to override OSD IOPS capacity config
Describe the steps involved to
- Specify a global value for osd_mclock_max_capacity_iops_{ssd,hdd}, and
- Override existing individually scoped values for OSDs determined during
start-up for osd_mclock_max_capacity_iops_{ssd,hdd}.
The above is to help with the following:
- Steps to override existing setting with a global value.
- reduce the number of entries in the mon store and instead use a single
global specification for all OSDs in the cluster in case the underlying
hardware is the same for all OSDs.
qa/suites/orch/cephadm: add PG_DEGRADED to ignorelist
Issue: tests are failing in rados/cephadm due to
PG_DEGRADED warning in cluster log.
Cause: This is expected as we are intentionally killing OSDs.
Adding PG_DEGRADED warning to ignorelist will prevent the
test from failing when this warning is raised.
Ville Ojamo [Sat, 26 Apr 2025 04:17:16 +0000 (11:17 +0700)]
doc/radosgw: Fix RST syntax rendeded as text in oidc.rst
Empty line after starting a pre-formatted block with the double-colon
syntax is required, otherwise the double-colon does nothing and is just
rendered as-is as "::" and there would be no following pre-formatted
block.
Add empty lines after the double-colon syntax so that the following
block is rendered pre-formatted.
Also add bash privileged prompts to a block with 2 example CLI commands.
librbd: disallow "rbd trash mv" if image is in a group
Removing an image that is a member of a group has always been
disallowed. However, moving an image that is a member of a group to
trash is currently allowed and this is deceptive -- the only reason for
a user to move an image to trash should be the intent to remove it.
More importantly, group APIs operate in terms of image names -- there
are no corresponding variants that would operate in terms of image IDs.
For example, even though internally GroupImageSpec struct stores an
image ID, the public rbd_group_image_info_t struct insists on an image
name. When rbd_group_image_list() encounters a trashed member image
(i.e. one that doesn't have a name), it just fails with ENOENT and no
listing gets produced at all until the offending image is restored from
trash. Something like this can be very hard to debug for an average
user, so let's make rbd_trash_move() fail with EMLINK the same way as
rbd_remove() does in this scenario.
The one case where moving a member image to trash makes sense is live
migration where the source image gets trashed to be almost immediately
replaced by the destination image as part of preparing migration.
EMLINK is returned by rbd_remove() if the image is a member of a group.
Add a dedicated exception similar to ImageBusy or ImageHasSnapshots and
a test for it.
Conflicts:
src/test/pybind/test_rbd.py [ commits 68eea0eb814e
("src/tools/rbd: add group info command to output group id")
and e5ccce14c4b0 ("rbd: add group snap info command") not in
reef ]
Ilya Dryomov [Tue, 25 Mar 2025 08:13:27 +0000 (09:13 +0100)]
mgr/rbd_support: always parse interval and start_time in Schedules::remove()
Commit 1b62447071a9 ("mgr/rbd_support: fix schedule remove") addressed
the issue that it was concerned with in a rather suboptimal way: instead
of moving the parsing of interval and start_time upfront to be able to
bail early, it wrapped from_string() constructors with try/finally and
left the conditional behavior in place.
Ilya Dryomov [Fri, 21 Mar 2025 13:43:50 +0000 (14:43 +0100)]
librbd: respect rbd_default_snapshot_quiesce_mode in group_snap_create()
Make group_snap_create() behave the same as snap_create() and
mirror_image_create_snapshot(): APIs that don't take RBD_SNAP_CREATE_
flags explicitly should respect rbd_default_snapshot_quiesce_mode
option.
N Balachandran [Mon, 21 Apr 2025 11:34:08 +0000 (17:04 +0530)]
rbd: display correct mirror state when creating
The mirror image state is set to MIRROR_IMAGE_STATE_CREATING
when the image is first created on the secondary, but was displayed
as "unknown" by the rbd info command. This has been fixed.
Fixes: https://tracker.ceph.com/issues/70963 Signed-off-by: N Balachandran <nithya.balachandran@ibm.com>
(cherry picked from commit f2e35646721ed3076e3da54124f8d783c456b2dc)
mon: Track and process pending pings after election
Problem:
Monitors stop pinging each other when quorum_mon_feature
flag is empty. This happens when the monitor freshly starts
up and never formed a quorum before, or when you restart
the monitors in the cluster. Basically, Monitor startups.
This problem can easily be reproduced everytime.
Steps to reproduce:
1. Start 3 MONs with `connectivity` election strategy
2. Fail 1 mon.
3. Restart all the monitors (including the down monitor)
4. Observe that the connection scores of each monitor
will tell you that not all monitors are alive. Which
is not true because all 3 Monitors are in quorum.
What happened was during monitor startups,
quorum_mon_feature is empty and although
they all participated in the election,
when they hit the function begin_peer_ping,
some monitors if not all will not send ping
because of the emptry quorum_mon_feature flag.
Therefore, after the election the monitors will
have the wrong connection score. However,
this will get resolved in the next election
because now that quorum_mon_feature is populated
they will start pinging each other again, hence,
correct connectivity score.
Solution:
In begin_peer_ping, instead of just returning out of
the function when quorum_mon_feature is empty, we
keep track of the peers that we should ping once
the election is finished. In Monitor::win_election
and Monitor::lose_elections,
we process the pending pings by calling begin_peer_ping
on each of the peers (both peons and leader)
Additionally:
Improved loggings in Elector class such
that debugging the pinging process gets easier.
Ville Ojamo [Fri, 18 Apr 2025 07:43:27 +0000 (14:43 +0700)]
doc/radosgw: Improve and more consistent formatting
Use inline code formatting consistently for command
line switches, data, hostnames, etc.
Correctly indent text and child lists in list items.
Remove a mid-sentence double spaces.
Capitalize "RGW" and "API" in text.
Remove unordered lists that are just regular text
everywhere else.
Use correct prompt # instead of $ for privileged
commands.
Use line continuation for multi-line example commands
instead of render them incorrectly as separate
single-line commands.
Use Title Case in few section header text that
missed it.
multisite.rst: Don't repeat "(RGW)" after "RADOS
Gateway" beyond the first instance in the same
paragraph.
multisite.rst: Change one "multisite" to "multi-site"
because all other instances use this spelling (EXCEPT
the title of the document??).
multisite.rst: Fix indentation of continuation lines in
prompted example commands.
Use pre-formatted block, as seen elsewhere in docs,
instead of strange unordered list plus inline code for
syntax example.
Add space before backslash for multi-line command
continuation.
Ville Ojamo [Thu, 10 Apr 2025 10:34:57 +0000 (17:34 +0700)]
doc/radosgw: Promptify CLI, cosmetic fixes
Use the more modern prompt block for CLI commands
and use right one $ vs #.
Fix indentation on JSON example outputs and
some CLI command switches.
Add some arguably missing comma in JSON example output.
Add a full stop at the end of a one-sentence paragraph.
Remove extra comma mid-sentence in another.
Fix missing backslashes or typo at end of multiline commands.
Lines under section headings as long as heading text.
Fix hyperlinks. Fix list items prefixed with - insted of *.
Format configuration syntax in the middle of text as code.
Fix typo "PI" to "API" and remove extra space.
Remove colons at the end of section headers in a few places.
Use Title Case in section titles consistently with short words lowercase.
Possibly controversial: don't add whitespace before and
after main title section header text.
Possibly controversial: don't indent line continuation
backslashes, leave only 1 space before them.
Adam Kupczyk [Tue, 1 Apr 2025 14:01:23 +0000 (14:01 +0000)]
os/bluestore/bluefs: Fix race condition between truncate() and unlink()
It was possible for unlink() to interrupt ongoing truncate().
As the result, unlink() finishes properly, but truncate() is not aware
of it and does:
1) updates file that is already removed
2) releases same allocations again
Now fixed by checking if file is deleted under FILE lock.
Fixes: https://tracker.ceph.com/issues/70855 Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit e5f8892a1249a0ce631082d1fbf8884237434a0f)
Nitzan Mordechai [Thu, 28 Nov 2024 11:44:00 +0000 (11:44 +0000)]
common/pick_address: Add IPv6 support to is_addr_in_subnet
Updated the is_addr_in_subnet function to work with both
IPv4 and IPv6 addresses. Previously, it only supported IPv4,
which caused failures when IPv6 addresses were passed in.
Changes:
- Use inet_pton to detect IPv4 (AF_INET) or IPv6 (AF_INET6).
- Added sockaddr_in6 for IPv6 handling while keeping sockaddr_in for IPv4.
- Adjust the family and ifa_addr dynamically based on the address type.
Nitzan Mordechai [Thu, 28 Nov 2024 11:44:00 +0000 (11:44 +0000)]
common/pick_address: Add IPv6 support to is_addr_in_subnet
Updated the is_addr_in_subnet function to work with both
IPv4 and IPv6 addresses. Previously, it only supported IPv4,
which caused failures when IPv6 addresses were passed in.
Changes:
- Use inet_pton to detect IPv4 (AF_INET) or IPv6 (AF_INET6).
- Added sockaddr_in6 for IPv6 handling while keeping sockaddr_in for IPv4.
- Adjust the family and ifa_addr dynamically based on the address type.