cls/rbd: drop overzealous CLS_ERR message in mirror_remote_namespace_get()
Currently it unnecessarily floods the log of the OSD which hosts
rbd_mirroring object with "No such file or directory" errors. Just
drop it as read_key() already logs all errors except ENOENT.
Credit to N Balachandran <nibalach@redhat.com> for spotting this.
Ville Ojamo [Thu, 10 Apr 2025 08:09:11 +0000 (15:09 +0700)]
doc/ceph-volume: Promptify commands and fix formatting
Use the more modern prompt block for CLI
commands, fix missing newline and messed up
line breaks.
Also change existing prompts to all indent with
same amount of spaces.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Adam Kupczyk [Tue, 1 Apr 2025 14:01:23 +0000 (14:01 +0000)]
os/bluestore/bluefs: Fix race condition between truncate() and unlink()
It was possible for unlink() to interrupt ongoing truncate().
As the result, unlink() finishes properly, but truncate() is not aware
of it and does:
1) updates file that is already removed
2) releases same allocations again
Now fixed by checking if file is deleted under FILE lock.
Casey Bodley [Thu, 7 Nov 2024 20:36:45 +0000 (15:36 -0500)]
rgw/rados: add concurrent io algorithms for sharded data
cls/rgw provides the base class CLSRGWConcurrentIO as a swiss army knife
for bucket index operations that visit every shard object. while it uses
asynchronous librados requests to perform the io, it blocks on a
condition variable when waiting for the AioCompletions.
for use in coroutines, we need a version of this that suspends instead
of blocking. and to support both stackful and stackless coroutines, we
want a fully-generic async inferface templated on CompletionToken.
while the CLSRGWConcurrentIO algorithm works for all current uses
(reads and writes, with/without retries, with/without cleanup), i chose
to break this into 3 algorithms with well-defined semantics:
1. reads: to produce a successful result, all shard operations must
succeed. so any shard's failure causes the rest to be cancelled or
skipped. supports retries for ListBucket (RGWBIAdvanceAndRetryError).
2. writes: even if some shards fail, we still want to visit every shard
before returning the error. supports retries for log trimming
operations (repeat until ENODATA).
3. revertible writes: similar to reads, requires all shard operations to
succeed. on any failure, the side effects of any successful writes
must be reverted before returning. only used by IndexInit (any created
shards are removed on failure).
each algorithm provides a pure virtual base class that must be
implemented for each type of operation, similar to how existing
operations inherit from CLSRGWConcurrentIO.
mgr/dashboard: Fix empty ceph version in GET api/hosts
Fixes https://tracker.ceph.com/issues/70821
Due to the pagination the host list is being fetched from orchestrator which caused a regression as via orchestrator list ceph version is always marked empty.
Caused by https://github.com/ceph/ceph/pull/52154
Also fixed tests , as the new version addition causing whole json object mock to fail in tests
test/librbd/test_notify.py: drop RBD_DISABLE_UPDATE_FEATURES
This was put in place in commit 9c0b239d70cd ("qa/upgrade:
conditionally disable update_features tests") to paper over a backwards
compatibility issue that arose from commit 01ff1530544c ("librbd: make
all maintenance op notifications async"). It's not needed in squid or
later because upgrades from octopus are tested only until reef.
test/librbd/test_notify.py: force line-buffered output
"master" and "slave" invocations are intended to run in parallel and
coordinate between themselves. Ensure that their respective output is
properly timestamped and ordered in teuthology.log file.
Bill Scales [Thu, 6 Mar 2025 09:44:00 +0000 (09:44 +0000)]
osd: EC optimizations: add shard_versions to object_info_t
EC optimized pools do not always update every shard for every write I/O,
this includes not updating the object_info_t (OI attribute). This means
different shards can have OI indicaiting the object is at different
versions. When an I/O updates a subset of the shards, the OI for the
updated shards will record the old version number for the unmodified
shards in the shard_versions map. The latest OI therefore has a record
of the expected version number for all the shards which can be used to
work out what needs to be backfilled.
An empty shard_versions map imples that the OI attribute should be the
same on all shards.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Thu, 6 Mar 2025 08:01:49 +0000 (08:01 +0000)]
osd: EC optimizations: add written and present shard sets to pg_log_enty_t
Add two new sets to the pg_log_entry for use by EC optimization pools.
The written shards set tracks which shards were written to, the
present shards set tracks which shards were in the acting set at the
time of the write.
An empty set (default) is used to indicate all shards. For pools without
allow_ec_optimizations the written set is empty (indicating all shards are
written) and the present set is empty and unused.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Thu, 6 Mar 2025 07:58:51 +0000 (07:58 +0000)]
osd: EC optimizations: add partial_writes_last_complete to pg_info_t
Add partial_writes_last_complete map to pg_info_t and pg_fast_info_t.
For optimized EC pools not all shards receive every log entry. As
log entries are marked completed the partial writeis last complete
map is updated to track shards that did not receive the log entry.
Each map entry stores an eversion range. The first version is the last
completion the shard participated in, the second version tracks subsequent
updates where the shard was not updated. For example the range 88'10-88'12
means a shard completed update 10 and that updates 11 and 12 intentionally
did not update the shard. This information is used during peering to
distinguish a shard that is missing updates from a shard that intentionally
did not participate in an update to work out what recovery is required.
By default this map is empty indicating that every shard is expected to
participate in an update and have a copy of the log entry.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Thu, 6 Mar 2025 07:53:02 +0000 (07:53 +0000)]
osd: EC optimizations: add nonprimary_shards set to pg_pool_t
EC optimizations pools do not update every shard on every I/O. The primary
must have a complete log and requires objects to have up to date object
attributes, so the choice of primary has to be restricted. Shards that
cannot become a primary are listed in the nonprimary_shards set.
For a K+M EC pool with optimizations enabled the 1st data shard and all
M coding parity shards are always updated and can become a primary, the
other shards will be marked as nonprimary.
The new set nonprimary_shards stores shards that cannot become the primary,
by default it is an empty set which retains existing behavior. When
optimisations are enabled on an EC pool this set will be filled in to
restrict the choice of primary.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Sun, 23 Mar 2025 14:20:33 +0000 (14:20 +0000)]
osd: EC optimizations: additional types for EC
Add some extra types required by the EC optimizations code:
raw_shard_id_t is an equivalent type to shard_id_t but is used
for storing raw shards. Strong typing prevents bugs where code
forgets to translate between the two types.
shard_id_map is a mini_flat_map indexed by shard_id_t which will
be used by the EC optimizations I/O path to track updates to
each shard.
shard_id_set is a bitset_set of shard_id_t which is a compact
and fast way of storing a set of shards involved in an EC
operation.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Crimson's suite is reletavly limited and currently is run for
main only (not for prior releases).
Changes to Crimson are more delicate and having more main runs
to compare to might help with (git-)bisecting issues
Shilpa Jagannath [Mon, 16 Dec 2024 20:28:36 +0000 (15:28 -0500)]
rgw/trim: fix ENOENT return response from bucket sync status query.
only handle them when the bucket metadata is deleted. there is a case
when we get enoent when status objects have not been created yet,
for example when bucket metadata is created and synced but no data
exists yet and bucket sync status won't be initialized. these don't
need special handling.