Ville Ojamo [Sat, 26 Apr 2025 04:17:16 +0000 (11:17 +0700)]
doc/radosgw: Fix RST syntax rendeded as text in oidc.rst
Empty line after starting a pre-formatted block with the double-colon
syntax is required, otherwise the double-colon does nothing and is just
rendered as-is as "::" and there would be no following pre-formatted
block.
Add empty lines after the double-colon syntax so that the following
block is rendered pre-formatted.
Also add bash privileged prompts to a block with 2 example CLI commands.
librbd: disallow "rbd trash mv" if image is in a group
Removing an image that is a member of a group has always been
disallowed. However, moving an image that is a member of a group to
trash is currently allowed and this is deceptive -- the only reason for
a user to move an image to trash should be the intent to remove it.
More importantly, group APIs operate in terms of image names -- there
are no corresponding variants that would operate in terms of image IDs.
For example, even though internally GroupImageSpec struct stores an
image ID, the public rbd_group_image_info_t struct insists on an image
name. When rbd_group_image_list() encounters a trashed member image
(i.e. one that doesn't have a name), it just fails with ENOENT and no
listing gets produced at all until the offending image is restored from
trash. Something like this can be very hard to debug for an average
user, so let's make rbd_trash_move() fail with EMLINK the same way as
rbd_remove() does in this scenario.
The one case where moving a member image to trash makes sense is live
migration where the source image gets trashed to be almost immediately
replaced by the destination image as part of preparing migration.
EMLINK is returned by rbd_remove() if the image is a member of a group.
Add a dedicated exception similar to ImageBusy or ImageHasSnapshots and
a test for it.
Conflicts:
src/test/pybind/test_rbd.py [ commits 68eea0eb814e
("src/tools/rbd: add group info command to output group id")
and e5ccce14c4b0 ("rbd: add group snap info command") not in
reef ]
Ilya Dryomov [Tue, 25 Mar 2025 08:13:27 +0000 (09:13 +0100)]
mgr/rbd_support: always parse interval and start_time in Schedules::remove()
Commit 1b62447071a9 ("mgr/rbd_support: fix schedule remove") addressed
the issue that it was concerned with in a rather suboptimal way: instead
of moving the parsing of interval and start_time upfront to be able to
bail early, it wrapped from_string() constructors with try/finally and
left the conditional behavior in place.
Ilya Dryomov [Fri, 21 Mar 2025 13:43:50 +0000 (14:43 +0100)]
librbd: respect rbd_default_snapshot_quiesce_mode in group_snap_create()
Make group_snap_create() behave the same as snap_create() and
mirror_image_create_snapshot(): APIs that don't take RBD_SNAP_CREATE_
flags explicitly should respect rbd_default_snapshot_quiesce_mode
option.
N Balachandran [Mon, 21 Apr 2025 11:34:08 +0000 (17:04 +0530)]
rbd: display correct mirror state when creating
The mirror image state is set to MIRROR_IMAGE_STATE_CREATING
when the image is first created on the secondary, but was displayed
as "unknown" by the rbd info command. This has been fixed.
Fixes: https://tracker.ceph.com/issues/70963 Signed-off-by: N Balachandran <nithya.balachandran@ibm.com>
(cherry picked from commit f2e35646721ed3076e3da54124f8d783c456b2dc)
Ville Ojamo [Thu, 10 Apr 2025 10:34:57 +0000 (17:34 +0700)]
doc/radosgw: Promptify CLI, cosmetic fixes
Use the more modern prompt block for CLI commands
and use right one $ vs #.
Fix indentation on JSON example outputs and
some CLI command switches.
Add some arguably missing comma in JSON example output.
Add a full stop at the end of a one-sentence paragraph.
Remove extra comma mid-sentence in another.
Fix missing backslashes or typo at end of multiline commands.
Lines under section headings as long as heading text.
Fix hyperlinks. Fix list items prefixed with - insted of *.
Format configuration syntax in the middle of text as code.
Fix typo "PI" to "API" and remove extra space.
Remove colons at the end of section headers in a few places.
Use Title Case in section titles consistently with short words lowercase.
Possibly controversial: don't add whitespace before and
after main title section header text.
Possibly controversial: don't indent line continuation
backslashes, leave only 1 space before them.
Adam Kupczyk [Tue, 1 Apr 2025 14:01:23 +0000 (14:01 +0000)]
os/bluestore/bluefs: Fix race condition between truncate() and unlink()
It was possible for unlink() to interrupt ongoing truncate().
As the result, unlink() finishes properly, but truncate() is not aware
of it and does:
1) updates file that is already removed
2) releases same allocations again
Now fixed by checking if file is deleted under FILE lock.
Fixes: https://tracker.ceph.com/issues/70855 Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit e5f8892a1249a0ce631082d1fbf8884237434a0f)
Nitzan Mordechai [Thu, 28 Nov 2024 11:44:00 +0000 (11:44 +0000)]
common/pick_address: Add IPv6 support to is_addr_in_subnet
Updated the is_addr_in_subnet function to work with both
IPv4 and IPv6 addresses. Previously, it only supported IPv4,
which caused failures when IPv6 addresses were passed in.
Changes:
- Use inet_pton to detect IPv4 (AF_INET) or IPv6 (AF_INET6).
- Added sockaddr_in6 for IPv6 handling while keeping sockaddr_in for IPv4.
- Adjust the family and ifa_addr dynamically based on the address type.
Nitzan Mordechai [Thu, 28 Nov 2024 11:44:00 +0000 (11:44 +0000)]
common/pick_address: Add IPv6 support to is_addr_in_subnet
Updated the is_addr_in_subnet function to work with both
IPv4 and IPv6 addresses. Previously, it only supported IPv4,
which caused failures when IPv6 addresses were passed in.
Changes:
- Use inet_pton to detect IPv4 (AF_INET) or IPv6 (AF_INET6).
- Added sockaddr_in6 for IPv6 handling while keeping sockaddr_in for IPv4.
- Adjust the family and ifa_addr dynamically based on the address type.
test/librbd/test_notify.py: force line-buffered output
"master" and "slave" invocations are intended to run in parallel and
coordinate between themselves. Ensure that their respective output is
properly timestamped and ordered in teuthology.log file.
mgr/dashboard: Fix empty ceph version in GET api/hosts
Fixes https://tracker.ceph.com/issues/70821
Due to the pagination the host list is being fetched from orchestrator which caused a regression as via orchestrator list ceph version is always marked empty.
Caused by https://github.com/ceph/ceph/pull/52154
Also fixed tests , as the new version addition causing whole json object mock to fail in tests
test/librbd/test_notify.py: conditionally ignore some errors
In 2020, commit 01ff1530544c ("librbd: make all maintenance op
notifications async") introduced a backwards compatibility issue where
if exclusive lock is held by an older (octopus and below) client and
a maintenance op is proxied to it from a newer client, the newer client
interprets the notification for the in-place completion of the op as
the notification for the acceptance of an async request and expects
another notification for the completion of the op which never comes.
In 2021, this bug was discovered and test_notify.py was amended to
ignore it in commit 9c0b239d70cd ("qa/upgrade: conditionally disable
update_features tests").
However the two update_features tests that started hanging and got
disabled weren't the only ones to misbehave. Rename, create_snap and
remove_snap tests were affected too but didn't hang or fail because
librbd also filtered certain errors codes like EEXIST and EINVAL.
Taking rename is an example:
1. a rename request is sent to from a newer client (N) to an octopus
client (O)
2. O successfully renames the image and sends a completion notification
with result = 0
3. N mistakes it for async request acceptance
4. after a timeout, N resends the rename request to O
5. O sees that an image already has that name (after step 2) and sends
a completion notification with result = EEXIST
6. N interprets it as async request denial and bubbles up EEXIST,
however right before returning control from Operations::rename()
EEXIST is filtered and 0 is returned to the user
So back then rename, create_snap and remove_snap tests continued to
pass but started taking 30+ seconds instead of completing immediately.
In 2025 we did away with filtering error codes in commit 66508cdaa190
("librbd: stop filtering async request error codes") and these tests
started to fail. Following the approach taken in commit 9c0b239d70cd
("qa/upgrade: conditionally disable update_features tests"), let's
ignore these failures based on the same environment variable.
test/librbd/test_notify.py: conditionally ignore some errors
In 2020, commit 01ff1530544c ("librbd: make all maintenance op
notifications async") introduced a backwards compatibility issue where
if exclusive lock is held by an older (octopus and below) client and
a maintenance op is proxied to it from a newer client, the newer client
interprets the notification for the in-place completion of the op as
the notification for the acceptance of an async request and expects
another notification for the completion of the op which never comes.
In 2021, this bug was discovered and test_notify.py was amended to
ignore it in commit 9c0b239d70cd ("qa/upgrade: conditionally disable
update_features tests").
However the two update_features tests that started hanging and got
disabled weren't the only ones to misbehave. Rename, create_snap and
remove_snap tests were affected too but didn't hang or fail because
librbd also filtered certain errors codes like EEXIST and EINVAL.
Taking rename is an example:
1. a rename request is sent to from a newer client (N) to an octopus
client (O)
2. O successfully renames the image and sends a completion notification
with result = 0
3. N mistakes it for async request acceptance
4. after a timeout, N resends the rename request to O
5. O sees that an image already has that name (after step 2) and sends
a completion notification with result = EEXIST
6. N interprets it as async request denial and bubbles up EEXIST,
however right before returning control from Operations::rename()
EEXIST is filtered and 0 is returned to the user
So back then rename, create_snap and remove_snap tests continued to
pass but started taking 30+ seconds instead of completing immediately.
In 2025 we did away with filtering error codes in commit 66508cdaa190
("librbd: stop filtering async request error codes") and these tests
started to fail. Following the approach taken in commit 9c0b239d70cd
("qa/upgrade: conditionally disable update_features tests"), let's
ignore these failures based on the same environment variable.
Ronen Friedman [Thu, 30 Jan 2025 09:27:58 +0000 (03:27 -0600)]
osd/scrub: discard repair_oinfo_oid()
repair_oinfo_oid(), called every scrub, has a very specific
functionality: fix the object ID specified in the Object Info
attribute, if different from the ID of the owning object.
This fix was added in 2017, as a response to a unique failure
scenario that was observed in Sepia - probably following a
filesystem bug. See https://tracker.ceph.com/issues/18409 &
https://tracker.ceph.com/issues/20471.
The limited functionality of repair_oinfo_oid() -
only repairing this one specific issue, and only if the OI_ATTR
exists and is decodable - does not justify the overhead of
running it every scrub.
Kefu Chai [Sun, 30 Mar 2025 03:59:12 +0000 (11:59 +0800)]
cephfs-top: Removes unused `global` statements
Recent flake8 runs were failing with:
```
py3: flake8==7.2.0,mccabe==0.7.0,pip==25.0.1,pycodestyle==2.13.0,pyflakes==3.3.0,setuptools==75.8.0,wheel==0.45.1
py3: commands[0] /home/jenkins-build/build/workspace/ceph-pull-requests/src/tools/cephfs/top> flake8 --ignore=W503 --max-line-length=100 cephfs-top
cephfs-top:344:9: F824 `global fs_list` is unused: name is never assigned in scope
cephfs-top:466:13: F824 `global current_states` is unused: name is never assigned in scope
cephfs-top:872:9: F824 `global metrics_dict` is unused: name is never assigned in scope
cephfs-top:872:9: F824 `global current_states` is unused: name is never assigned in scope
cephfs-top:911:9: F824 `global fs_list` is unused: name is never assigned in scope
cephfs-top:981:9: F824 `global current_states` is unused: name is never assigned in scope
cephfs-top:1126:13: F824 `global current_states` is unused: name is never assigned in scope
py3: exit 1 (0.77 seconds) /home/jenkins-build/build/workspace/ceph-pull-requests/src/tools/cephfs/top> flake8 --ignore=W503 --max-line-length=100 cephfs-top pid=2309605
py3: FAIL code 1 (8.15=setup[7.38]+cmd[0.77] seconds)
evaluation failed :( (8.24 seconds)
```
Since these variables are only being referenced and not assigned within
their scopes, the `global` declarations are unnecessary and can be
safely removed. This change:
- Removes all flagged `global` statements
- Fixes the failing flake8 checks in the CI pipeline
- Maintains the original code behavior as variable references still work without the `global` keyword
The `global` keyword is only needed when assigning to global variables
within a function scope, not when simply referencing them.
Kefu Chai [Sun, 30 Mar 2025 03:48:28 +0000 (11:48 +0800)]
qa: Remove unnecessary global statements in tests
Removes unused `global` statements from Python test files to fix flake8
F824 errors.
Recent flake8 runs were failing with:
```
./tasks/radosgw_admin.py:330:5: F824 `global log` is unused: name is never assigned in scope
./workunits/dencoder/test_readable.py:99:5: F824 `global incompat_paths` is unused: name is never assigned in scope
./workunits/dencoder/test_readable.py:164:5: F824 `global backward_compat` is unused: name is never assigned in scope
./workunits/dencoder/test_readable.py:165:5: F824 `global fast_shouldnt_skip` is unused: name is never assigned in scope
```
Since these variables are only being referenced and not assigned within
their scopes, the `global` declarations are unnecessary and can be
safely removed. This change:
- Removes all flagged `global` statements
- Fixes the failing flake8 checks in the CI pipeline
- Maintains the original code behavior as variable references still work
without the `global` keyword
The `global` keyword is only needed when assigning to global variables
within a function scope, not when simply referencing them.
ceph-volume: allow zapping partitions on multipath devices
ceph-volume refuses to zap a device if it is a partition on a multipath
device due to an overly strict condition. This change ensures that only
full mapper devices (excluding partitions) are blocked from being zapped,
allowing partitions on multipath devices to be processed correctly.
Zac Dover [Mon, 24 Mar 2025 12:26:11 +0000 (22:26 +1000)]
src/common: add guidance for deep-scrubbing ratio warning
Add an explanation of how to set the value of
"mon_warn_pg_not_deep_scrubbed_ratio" to the confval definition of that
variable. Although this variable contains the string "mon", it is set on
the Manager. I have added a note to direct users to set this value on
the Manager.
This issue was pointed out by Petr Tlapa on Slack in late March of 2025.