https://github.com/ceph/ceph/pull/62080 tested version was **different**
from the one that got merged.
The untested change was changing the boolean returned from start_recovery_ops.
While the seastar::repeat loop in BackgroundRecoveryT<T>::start() was changed accordingly,
other do_recovery() return cases were not considered.
See Tested / Merged here: https://github.com/Matan-B/ceph/pull/2/files
start_recovery_ops used by do_recovery should return whether the iteration (i.e recovery) keep going.
The ceph_ll_io_info structure has recently been extended to support
zerocopy operations. The proxy was initializing just the known members,
so, after the zerocopy support, it was passing garbage in some fields,
causing failures.
This patch completely clears the whole structure to be sure that
everything is initialized to its default value.
Naman Munet [Thu, 10 Apr 2025 11:40:02 +0000 (17:10 +0530)]
mgr/dashboard: fix bucket rate limit API on owner change
Fixes: https://tracker.ceph.com/issues/70874
PR covers & fixes below scenarios:
Whenever we change the owner of bucket from non-tenanted to tenanted and
vice-versa with the rate-limit changes, there was issue in sending bucket name
Scenario 1: Changing the bucket owner from tenanted to non-tenanted
Scenario 2: Changing the bucket owner from non-tenanted to tenanted
Scenario 3: Keeping the owner(tenanted) same and changing only rate limit
osd/scrub: additional configuration params to trigger scrub reschedule
Adding the following parameters to the (small) set of configuration
options that, if changed, trigger re-computation of the next scrub
schedule:
- osd_scrub_interval_randomize_ratio,
- osd_deep_scrub_interval_cv, and
- osd_deep_scrub_interval (which was missing in the list of
parameters watched by the OSD).
Bill Scales [Mon, 31 Mar 2025 08:17:35 +0000 (09:17 +0100)]
test: Add unittests for pgtemp_primaryfirst/pgtemp_undo_primaryfirst
Add unittests for pgtemp_primaryfirst and pgtemp_undo_primaryfirst
to prove the later is a reverse transform and that neither has any
effect until an optimized EC pool configures non-primary shards.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Thu, 6 Mar 2025 12:20:52 +0000 (12:20 +0000)]
osd: Restrict choice of primary shard for ec_optimizations pools
Pools with ec_optimizations enabled have restrictions on which
shards are permitted to become the primary because not all shards
are updated for every I/O.
To preserve backwards compatibility with downlevel clients
pg_temp is used as the method to override the selection of
primary by OSDMap. Directly changing the logic in OSDMap
would have meant that all clients need to be upgraded to
tentacle before using optimized EC pools, so was discounted.
Using primary_temp to set the primary for an EC pool is
not reliable because under error conditions an OSD can store
multiple shards for the same PG and primary_temp cannot
define which of these shards will be choosen.
For optimized EC pools pg_temp is shuffled so that the
non-primary shards are listed last. This means that the
existing logic in OSDMap that picks the first available
shard as the primary will avoid selecting a non-primary
shard. OSDMonitor applies the shuffle when pg_temp is set,
this is then reverted in PeeringState when initializing the
acting set after OSDMap has selected the primary.
PeeringState::choose_acting is modified to set pg_temp if
OSDMap has selected a non-primary shard, this will cause
a new OSDMAP to be published which will persuade
OSDMap to select a primary shard instead.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
cls/rbd: drop overzealous CLS_ERR message in mirror_remote_namespace_get()
Currently it unnecessarily floods the log of the OSD which hosts
rbd_mirroring object with "No such file or directory" errors. Just
drop it as read_key() already logs all errors except ENOENT.
Credit to N Balachandran <nibalach@redhat.com> for spotting this.
It looks like at some point the centos9 image started shipping with
curl-minimal, which conflicts with the regular curl package. Asking dnf to find
the binary avoids this, since both packages provide it. Since we were already
doing this with rpmbuild, we can go ahead and loop wget into that in case
something similar happens there.
Zack Cerza [Fri, 7 Mar 2025 20:53:23 +0000 (13:53 -0700)]
make-debs.sh: Optionally take debian version
Our existing CI builds have names like:
ceph-base_20.0.0-194-g6efaea33-1jammy_amd64.deb
Before this change, they are like:
ceph-base_20.0.0-158-gb0de3a42-1_amd64.deb
This way we can pass e.g. "jammy" to end up with names compatible with our CI
builds.
Ville Ojamo [Thu, 10 Apr 2025 08:09:11 +0000 (15:09 +0700)]
doc/ceph-volume: Promptify commands and fix formatting
Use the more modern prompt block for CLI
commands, fix missing newline and messed up
line breaks.
Also change existing prompts to all indent with
same amount of spaces.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Adam Kupczyk [Tue, 1 Apr 2025 14:01:23 +0000 (14:01 +0000)]
os/bluestore/bluefs: Fix race condition between truncate() and unlink()
It was possible for unlink() to interrupt ongoing truncate().
As the result, unlink() finishes properly, but truncate() is not aware
of it and does:
1) updates file that is already removed
2) releases same allocations again
Now fixed by checking if file is deleted under FILE lock.
Casey Bodley [Thu, 7 Nov 2024 20:36:45 +0000 (15:36 -0500)]
rgw/rados: add concurrent io algorithms for sharded data
cls/rgw provides the base class CLSRGWConcurrentIO as a swiss army knife
for bucket index operations that visit every shard object. while it uses
asynchronous librados requests to perform the io, it blocks on a
condition variable when waiting for the AioCompletions.
for use in coroutines, we need a version of this that suspends instead
of blocking. and to support both stackful and stackless coroutines, we
want a fully-generic async inferface templated on CompletionToken.
while the CLSRGWConcurrentIO algorithm works for all current uses
(reads and writes, with/without retries, with/without cleanup), i chose
to break this into 3 algorithms with well-defined semantics:
1. reads: to produce a successful result, all shard operations must
succeed. so any shard's failure causes the rest to be cancelled or
skipped. supports retries for ListBucket (RGWBIAdvanceAndRetryError).
2. writes: even if some shards fail, we still want to visit every shard
before returning the error. supports retries for log trimming
operations (repeat until ENODATA).
3. revertible writes: similar to reads, requires all shard operations to
succeed. on any failure, the side effects of any successful writes
must be reverted before returning. only used by IndexInit (any created
shards are removed on failure).
each algorithm provides a pure virtual base class that must be
implemented for each type of operation, similar to how existing
operations inherit from CLSRGWConcurrentIO.