Bill Scales [Wed, 26 Mar 2025 10:46:07 +0000 (10:46 +0000)]
osd: EC Optimizations: Relax reset_complete_to for partial writes
EC Optimized pools can have shards missing log entries because
of partial writes. This means it is possible to have a missing
entry with a newer version than the log. Relax an assert in
reset_complete_to to avoid this.
reset_complete_to also resets last_complete to 0 when the
oldest missing object is before the first log entry. This
is to aggressive for partial writes and needs to be relaxed.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Wed, 26 Mar 2025 10:05:07 +0000 (10:05 +0000)]
osd: EC Optimizations: Add shard_id_sets for backfill_target and ...
acting_recovery_backfill
Optimized EC code uses shard_id_sets as a convinient and fast way of
representing sets of shards. Peering calculates a backfill_target set
and an active_recovery_backfill set as a map of pg_shard_ids during
peering and these are then used while processing I/O requests.
Modify peering so that it initializes a shard_id_set version of
these two sets and makes these available to ECBackend code.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Wed, 26 Mar 2025 08:30:32 +0000 (08:30 +0000)]
osd: EC Optimizations: Share pwlc between peers
Optimized EC pools add partial_writes_last_complete (pwlc) data to
pg_info_t to track shards that were not updated because of partial
writes. During peering the primary collects the info structure from
all the replica/strays and then having reconciled the log can send
the info back to peers. Different shards may have newer/older
versions of pwlc, the primary merges these together to create
the definitive copy and then redistributes this to the other shards.
The primary also adjusts the last_update and last_complete values
in the info structure received from peers using the pwlc data to
advance these where shards were not updated because of a partial
write.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Tue, 25 Mar 2025 17:41:57 +0000 (17:41 +0000)]
osd: EC Optimizations: Update pwlc in pg_info_t
Optimized EC pools add extra data to the log entry to track
which shards were updated by a partial write. When the log
entry is completed this needs to be summarized in the
partial_writes_last_complete map in pg_info_t.
Summarising this data in pg_info_t makes it easy to determine
whether the reason a shard is behind is because it is missing
update or has just not been involved in recent updates. This
also ensures that even if there is a long sequence of
updates that all skip updating a shard that a record of this
is retained in the info structure even after the log
has been trimmed.
Edited by aainscow as suggested in comment here:
https://github.com/ceph/ceph/pull/62522/files#r2050803678
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com> Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
pybind: switch from pkgutil.find_loader() to importlib.util.find_spec()
Replace pkgutil.find_loader() with importlib.util.find_spec() throughout
Python bindings. This addresses the deprecation warning in Python 3.10
(scheduled for removal in 3.14) that appeared when generating librbd
Python bindings.
The importlib.util.find_spec() API has been available since Python 3.4
and is compatible with our minimum required Python version (3.9, since
commit 51f71fc1).
The warning resolved:
```
/home/kefu/dev/ceph/src/pybind/rbd/setup.py:8: DeprecationWarning: 'pkgutil.find_loader' is deprecated and slated for removal in Python 3.14; use importlib.util.find_spec() instead
if not pkgutil.find_loader('setuptools'):
```
J. Eric Ivancich [Wed, 16 Apr 2025 16:38:33 +0000 (12:38 -0400)]
rgw: prevent crash in `radosgw-admin bucket object shard ...`
This subcommand is used to ask radosgw-admin which bucket index shard
a given object in a given bucket would have its bucket index entry
on. The user is required to supply the number of shards (i.e., the
command doesn't look that up). If 0 is provided it would result in a
divide by zero runtime exception. Values less than or equal to zero
are now protected.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Fix stray example command block leftover from rebase in
cloud-transition.rst.
Remove extra character > in cloud-sync-module.rst.
Add missing formatting char ` in cloud-sync-module.rst.
Remove extra empty line between example commands that
resulted in a line with just a "#" prompt.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
rbd-mirror: release lock before calling m_async_op_tracker.finish_op()
m_async_op_tracker.finish_op() in InstanceReplayer::start_image_replayers
may invoke a completion that re-enters code paths that attempt to acquire
the same mutex (m_lock), violating the non-recursive lock constraint.
This can be fixed by releasing the lock before calling
m_async_op_tracker.finish_op().
Merge pull request #62818 from ronen-fr/wip-rf-iocnt-plus
osd/scrub: performance counters: count I/Os, use unlabeled counters
Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com> Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com> Reviewed-by: Bill Scales <bill_scales@uk.ibm.com> Reviewed-by: Samuel Just <sjust@redhat.com>
Ville Ojamo [Thu, 10 Apr 2025 10:34:57 +0000 (17:34 +0700)]
doc/radosgw: Promptify CLI, cosmetic fixes
Use the more modern prompt block for CLI commands
and use right one $ vs #.
Fix indentation on JSON example outputs and
some CLI command switches.
Add some arguably missing comma in JSON example output.
Add a full stop at the end of a one-sentence paragraph.
Remove extra comma mid-sentence in another.
Fix missing backslashes or typo at end of multiline commands.
Lines under section headings as long as heading text.
Fix hyperlinks. Fix list items prefixed with - insted of *.
Format configuration syntax in the middle of text as code.
Fix typo "PI" to "API" and remove extra space.
Remove colons at the end of section headers in a few places.
Use Title Case in section titles consistently with short words lowercase.
Possibly controversial: don't add whitespace before and
after main title section header text.
Possibly controversial: don't indent line continuation
backslashes, leave only 1 space before them.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
osd/scrub: a single counters selection mechanism - step 1
Following the preceeding PR, the Scrubber now employs
two methods for selecting the specific subset of performance
counters to update (the replicated pool set or the EC one).
The first method is using labeled counters, with 4 optional labels
(Primary/Replica X Replicated/EC Pool). The second method
is by naming the specific OSD counters to use in ScrubIoCounterSet
objects, then selecting the appropriate set based on the pool type.
This commit is the first step on the path to unifying the two
methods - discarding the use of labeled counters, and only
naming OSD counters.
osd/scrub: perf-counters for I/O performed by the scrubber
Define two sets of performance counters to track I/O performed
by the scrubber - one set to be used when scrubbing a PG
in a replicated pool, and one - for EC PGs.
https://github.com/ceph/ceph/pull/62080 tested version was **different**
from the one that got merged.
The untested change was changing the boolean returned from start_recovery_ops.
While the seastar::repeat loop in BackgroundRecoveryT<T>::start() was changed accordingly,
other do_recovery() return cases were not considered.
See Tested / Merged here: https://github.com/Matan-B/ceph/pull/2/files
start_recovery_ops used by do_recovery should return whether the iteration (i.e recovery) keep going.
Direct users to upgrade only to Squid v19.2.2, and warn readers not to
upgrade to Squid 19.2.1. This PR is raised in response to a request from
Neha Ojha.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
rgw: metadata and data sync fairness notifications to retry upon any error case
This is a complementary fix to the earlier one described at #62156.
When the sync shard notification fails due to any failures including timeout,
this change keeps the loop going for both metadata and data sync.
John Mulligan [Fri, 11 Apr 2025 17:02:15 +0000 (13:02 -0400)]
mgr/cephadm: do not delete smb fs cephx keys
This change effectively disables fencing for the smb service because
the previous attempt to implement fencing would destroy the only
cephx key. Deleting this key would prevent any smb service part of
the logical cluster from talking to cephfs, even ones that were not
to be fenced.
The whole concept of fencing and ranks needs a bit of a rethink in
regards to smb. For now, we're just going to rely on ctdb and not
cephadm for smb's HA.
Fixes: 60300360cc500091e9dadf929d00bb72afad033c Signed-off-by: John Mulligan <jmulligan@redhat.com>
The 'delay_ready_t' parameter was used in the past to
control whether, when a change in the scrub scheduling inputs
occurs (e.g. a configuration change), even those scheduling targets
that are already ripe for scrubbing will have their schedule recomputed.
This parameter, however, is ignored: all "regular-periodic"
scrubbing targets are always rescheduled when the scheduling inputs
change.
The commit removes the 'delay_ready_t' parameter from the codebase.
The ceph_ll_io_info structure has recently been extended to support
zerocopy operations. The proxy was initializing just the known members,
so, after the zerocopy support, it was passing garbage in some fields,
causing failures.
This patch completely clears the whole structure to be sure that
everything is initialized to its default value.
Naman Munet [Thu, 10 Apr 2025 11:40:02 +0000 (17:10 +0530)]
mgr/dashboard: fix bucket rate limit API on owner change
Fixes: https://tracker.ceph.com/issues/70874
PR covers & fixes below scenarios:
Whenever we change the owner of bucket from non-tenanted to tenanted and
vice-versa with the rate-limit changes, there was issue in sending bucket name
Scenario 1: Changing the bucket owner from tenanted to non-tenanted
Scenario 2: Changing the bucket owner from non-tenanted to tenanted
Scenario 3: Keeping the owner(tenanted) same and changing only rate limit
Xuehan Xu [Thu, 27 Feb 2025 05:54:49 +0000 (13:54 +0800)]
crimson/os/seastore/cache: do `prepare_commit` before retiring extents
Linked tree nodes in logical trees need to take parents from the prior
instances when being rewritten, which has to be done before the prior
instances are retired.