librbd: disallow "rbd trash mv" if image is in a group
Removing an image that is a member of a group has always been
disallowed. However, moving an image that is a member of a group to
trash is currently allowed and this is deceptive -- the only reason for
a user to move an image to trash should be the intent to remove it.
More importantly, group APIs operate in terms of image names -- there
are no corresponding variants that would operate in terms of image IDs.
For example, even though internally GroupImageSpec struct stores an
image ID, the public rbd_group_image_info_t struct insists on an image
name. When rbd_group_image_list() encounters a trashed member image
(i.e. one that doesn't have a name), it just fails with ENOENT and no
listing gets produced at all until the offending image is restored from
trash. Something like this can be very hard to debug for an average
user, so let's make rbd_trash_move() fail with EMLINK the same way as
rbd_remove() does in this scenario.
The one case where moving a member image to trash makes sense is live
migration where the source image gets trashed to be almost immediately
replaced by the destination image as part of preparing migration.
EMLINK is returned by rbd_remove() if the image is a member of a group.
Add a dedicated exception similar to ImageBusy or ImageHasSnapshots and
a test for it.
Fix stray example command block leftover from rebase in
cloud-transition.rst.
Remove extra character > in cloud-sync-module.rst.
Add missing formatting char ` in cloud-sync-module.rst.
Remove extra empty line between example commands that
resulted in a line with just a "#" prompt.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
rbd-mirror: release lock before calling m_async_op_tracker.finish_op()
m_async_op_tracker.finish_op() in InstanceReplayer::start_image_replayers
may invoke a completion that re-enters code paths that attempt to acquire
the same mutex (m_lock), violating the non-recursive lock constraint.
This can be fixed by releasing the lock before calling
m_async_op_tracker.finish_op().
Merge pull request #62818 from ronen-fr/wip-rf-iocnt-plus
osd/scrub: performance counters: count I/Os, use unlabeled counters
Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com> Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com> Reviewed-by: Bill Scales <bill_scales@uk.ibm.com> Reviewed-by: Samuel Just <sjust@redhat.com>
Ville Ojamo [Thu, 10 Apr 2025 10:34:57 +0000 (17:34 +0700)]
doc/radosgw: Promptify CLI, cosmetic fixes
Use the more modern prompt block for CLI commands
and use right one $ vs #.
Fix indentation on JSON example outputs and
some CLI command switches.
Add some arguably missing comma in JSON example output.
Add a full stop at the end of a one-sentence paragraph.
Remove extra comma mid-sentence in another.
Fix missing backslashes or typo at end of multiline commands.
Lines under section headings as long as heading text.
Fix hyperlinks. Fix list items prefixed with - insted of *.
Format configuration syntax in the middle of text as code.
Fix typo "PI" to "API" and remove extra space.
Remove colons at the end of section headers in a few places.
Use Title Case in section titles consistently with short words lowercase.
Possibly controversial: don't add whitespace before and
after main title section header text.
Possibly controversial: don't indent line continuation
backslashes, leave only 1 space before them.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
osd/scrub: a single counters selection mechanism - step 1
Following the preceeding PR, the Scrubber now employs
two methods for selecting the specific subset of performance
counters to update (the replicated pool set or the EC one).
The first method is using labeled counters, with 4 optional labels
(Primary/Replica X Replicated/EC Pool). The second method
is by naming the specific OSD counters to use in ScrubIoCounterSet
objects, then selecting the appropriate set based on the pool type.
This commit is the first step on the path to unifying the two
methods - discarding the use of labeled counters, and only
naming OSD counters.
osd/scrub: perf-counters for I/O performed by the scrubber
Define two sets of performance counters to track I/O performed
by the scrubber - one set to be used when scrubbing a PG
in a replicated pool, and one - for EC PGs.
https://github.com/ceph/ceph/pull/62080 tested version was **different**
from the one that got merged.
The untested change was changing the boolean returned from start_recovery_ops.
While the seastar::repeat loop in BackgroundRecoveryT<T>::start() was changed accordingly,
other do_recovery() return cases were not considered.
See Tested / Merged here: https://github.com/Matan-B/ceph/pull/2/files
start_recovery_ops used by do_recovery should return whether the iteration (i.e recovery) keep going.
Direct users to upgrade only to Squid v19.2.2, and warn readers not to
upgrade to Squid 19.2.1. This PR is raised in response to a request from
Neha Ojha.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
rgw: metadata and data sync fairness notifications to retry upon any error case
This is a complementary fix to the earlier one described at #62156.
When the sync shard notification fails due to any failures including timeout,
this change keeps the loop going for both metadata and data sync.
The 'delay_ready_t' parameter was used in the past to
control whether, when a change in the scrub scheduling inputs
occurs (e.g. a configuration change), even those scheduling targets
that are already ripe for scrubbing will have their schedule recomputed.
This parameter, however, is ignored: all "regular-periodic"
scrubbing targets are always rescheduled when the scheduling inputs
change.
The commit removes the 'delay_ready_t' parameter from the codebase.
The ceph_ll_io_info structure has recently been extended to support
zerocopy operations. The proxy was initializing just the known members,
so, after the zerocopy support, it was passing garbage in some fields,
causing failures.
This patch completely clears the whole structure to be sure that
everything is initialized to its default value.
Naman Munet [Thu, 10 Apr 2025 11:40:02 +0000 (17:10 +0530)]
mgr/dashboard: fix bucket rate limit API on owner change
Fixes: https://tracker.ceph.com/issues/70874
PR covers & fixes below scenarios:
Whenever we change the owner of bucket from non-tenanted to tenanted and
vice-versa with the rate-limit changes, there was issue in sending bucket name
Scenario 1: Changing the bucket owner from tenanted to non-tenanted
Scenario 2: Changing the bucket owner from non-tenanted to tenanted
Scenario 3: Keeping the owner(tenanted) same and changing only rate limit
Xuehan Xu [Thu, 27 Feb 2025 05:54:49 +0000 (13:54 +0800)]
crimson/os/seastore/cache: do `prepare_commit` before retiring extents
Linked tree nodes in logical trees need to take parents from the prior
instances when being rewritten, which has to be done before the prior
instances are retired.
mgr/dashboard: fix typo in User Management form Fixes: https://tracker.ceph.com/issues/70719
-Corrected the label from 'logon' to 'login' in the User Management form
osd/scrub: additional configuration params to trigger scrub reschedule
Adding the following parameters to the (small) set of configuration
options that, if changed, trigger re-computation of the next scrub
schedule:
- osd_scrub_interval_randomize_ratio,
- osd_deep_scrub_interval_cv, and
- osd_deep_scrub_interval (which was missing in the list of
parameters watched by the OSD).
Ville Ojamo [Fri, 11 Apr 2025 05:04:52 +0000 (12:04 +0700)]
doc/releases: Fix invalid triple backticks in reef.rst squid.rst
Triple backtick does not create a code block in RST,
instead it renders as an inline code with the third
backtick rendered as-is.
This makes newlines in multiline code merged to a
single line and it makes the whole thing nonsense.
Change the second intended code block to use a code
block with a bash prompt.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Bill Scales [Mon, 31 Mar 2025 08:17:35 +0000 (09:17 +0100)]
test: Add unittests for pgtemp_primaryfirst/pgtemp_undo_primaryfirst
Add unittests for pgtemp_primaryfirst and pgtemp_undo_primaryfirst
to prove the later is a reverse transform and that neither has any
effect until an optimized EC pool configures non-primary shards.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Thu, 6 Mar 2025 12:20:52 +0000 (12:20 +0000)]
osd: Restrict choice of primary shard for ec_optimizations pools
Pools with ec_optimizations enabled have restrictions on which
shards are permitted to become the primary because not all shards
are updated for every I/O.
To preserve backwards compatibility with downlevel clients
pg_temp is used as the method to override the selection of
primary by OSDMap. Directly changing the logic in OSDMap
would have meant that all clients need to be upgraded to
tentacle before using optimized EC pools, so was discounted.
Using primary_temp to set the primary for an EC pool is
not reliable because under error conditions an OSD can store
multiple shards for the same PG and primary_temp cannot
define which of these shards will be choosen.
For optimized EC pools pg_temp is shuffled so that the
non-primary shards are listed last. This means that the
existing logic in OSDMap that picks the first available
shard as the primary will avoid selecting a non-primary
shard. OSDMonitor applies the shuffle when pg_temp is set,
this is then reverted in PeeringState when initializing the
acting set after OSDMap has selected the primary.
PeeringState::choose_acting is modified to set pg_temp if
OSDMap has selected a non-primary shard, this will cause
a new OSDMAP to be published which will persuade
OSDMap to select a primary shard instead.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>