Bill Scales [Thu, 6 Mar 2025 09:44:00 +0000 (09:44 +0000)]
osd: EC optimizations: add shard_versions to object_info_t
EC optimized pools do not always update every shard for every write I/O,
this includes not updating the object_info_t (OI attribute). This means
different shards can have OI indicaiting the object is at different
versions. When an I/O updates a subset of the shards, the OI for the
updated shards will record the old version number for the unmodified
shards in the shard_versions map. The latest OI therefore has a record
of the expected version number for all the shards which can be used to
work out what needs to be backfilled.
An empty shard_versions map imples that the OI attribute should be the
same on all shards.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Thu, 6 Mar 2025 08:01:49 +0000 (08:01 +0000)]
osd: EC optimizations: add written and present shard sets to pg_log_enty_t
Add two new sets to the pg_log_entry for use by EC optimization pools.
The written shards set tracks which shards were written to, the
present shards set tracks which shards were in the acting set at the
time of the write.
An empty set (default) is used to indicate all shards. For pools without
allow_ec_optimizations the written set is empty (indicating all shards are
written) and the present set is empty and unused.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Thu, 6 Mar 2025 07:58:51 +0000 (07:58 +0000)]
osd: EC optimizations: add partial_writes_last_complete to pg_info_t
Add partial_writes_last_complete map to pg_info_t and pg_fast_info_t.
For optimized EC pools not all shards receive every log entry. As
log entries are marked completed the partial writeis last complete
map is updated to track shards that did not receive the log entry.
Each map entry stores an eversion range. The first version is the last
completion the shard participated in, the second version tracks subsequent
updates where the shard was not updated. For example the range 88'10-88'12
means a shard completed update 10 and that updates 11 and 12 intentionally
did not update the shard. This information is used during peering to
distinguish a shard that is missing updates from a shard that intentionally
did not participate in an update to work out what recovery is required.
By default this map is empty indicating that every shard is expected to
participate in an update and have a copy of the log entry.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Thu, 6 Mar 2025 07:53:02 +0000 (07:53 +0000)]
osd: EC optimizations: add nonprimary_shards set to pg_pool_t
EC optimizations pools do not update every shard on every I/O. The primary
must have a complete log and requires objects to have up to date object
attributes, so the choice of primary has to be restricted. Shards that
cannot become a primary are listed in the nonprimary_shards set.
For a K+M EC pool with optimizations enabled the 1st data shard and all
M coding parity shards are always updated and can become a primary, the
other shards will be marked as nonprimary.
The new set nonprimary_shards stores shards that cannot become the primary,
by default it is an empty set which retains existing behavior. When
optimisations are enabled on an EC pool this set will be filled in to
restrict the choice of primary.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Sun, 23 Mar 2025 14:20:33 +0000 (14:20 +0000)]
osd: EC optimizations: additional types for EC
Add some extra types required by the EC optimizations code:
raw_shard_id_t is an equivalent type to shard_id_t but is used
for storing raw shards. Strong typing prevents bugs where code
forgets to translate between the two types.
shard_id_map is a mini_flat_map indexed by shard_id_t which will
be used by the EC optimizations I/O path to track updates to
each shard.
shard_id_set is a bitset_set of shard_id_t which is a compact
and fast way of storing a set of shards involved in an EC
operation.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Shilpa Jagannath [Mon, 16 Dec 2024 20:28:36 +0000 (15:28 -0500)]
rgw/trim: fix ENOENT return response from bucket sync status query.
only handle them when the bucket metadata is deleted. there is a case
when we get enoent when status objects have not been created yet,
for example when bucket metadata is created and synced but no data
exists yet and bucket sync status won't be initialized. these don't
need special handling.
Shilpa Jagannath [Wed, 26 Jun 2024 07:04:08 +0000 (03:04 -0400)]
rgw/multisite: in a multisite env with bucket sync policies configured,
we may end up orphaning objects on remote zones when a delete bucket
is issued on metadata master. to avoid this, list the buckets on remote
zones and delete bucket only when empty. if a zone is unreachable we
drop that zone and continue with bucket deletion. such zones might have
orphaned objects that will have to be cleaned up using radosgw-admin tool
rgw/multisite: handle the 'deleted' index log addition in RGWBucketInstanceMetadataHandler.
create an async cr for removing bucket instance info in bilog trimming logic
cmake: Fix googletest deprecated warnings by using target_compile_options()
Previously, we attempted to disable deprecated declarations warnings when
building gtest by adding `-Wno-deprecated-declarations` to the COMPILE_OPTIONS
property of the googletest directory. However, this approach failed to apply
the option when actually building gtest.
This change applies the compile option directly to the `gtest` target using
target_compile_options() instead. Verified by forcing the condition to TRUE
and confirming the option is included when building `gtest-all.cc` through
`cmake --build ~/dev/ceph/build --target gtest --verbose`.
cmake: Fix warning suppression for googletest build
In commit 27e9d563, we attempted to disable deprecated warnings when building
googletest, but the implementation contained two errors:
1. The `set_property()` call occurred before adding the target directory,
making it impossible to set properties on non-existent objects.
2. The `-Wno-deprecated-declarations` flag was incorrectly passed as an
`APPEND` argument instead of a `PROPERTY` argument.
This caused build failures with libstdc++-12 and newer Clang versions:
```
CMake Error at src/CMakeLists.txt:772 (set_property):
set_property given invalid argument "-Wno-deprecated-declarations".
```
This commit fixes both issues by:
- Moving the `set_property()` call after `add_subdirectory()`
- Correctly passing the warning flag as a `PROPERTY` argument
The io sequencer has been written to primarily test the new EC code. This commit
turns that flag on by default. It also provides a flag to disable the new
optimizations, which we expect to drop in the future.
John Mulligan [Wed, 2 Apr 2025 20:36:29 +0000 (16:36 -0400)]
doc/mgr: add a warning about the smb clustering option & placement
Add a warning to the docs highlighting that `clustering` is an advanced
option and setting it without out also setting a compatible placement
value may lead to unexpected behavior.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
common/mempool.cc: Improve performance of sharding
Reviewed-by: Adam Kupczyk <akupczyk@redhat.com> Reviewed-by: Jose Juan Palacios-Perez <perezjos@uk.ibm.com> Reviewed-by: John Agombar <agombar@uk.ibm.com> Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
On EC pool:
- Use host count instead of device count for host crush-failure-domain
- Host warning for k+m+1
On replicated:
- Set 'All devices' as default
Fixes: https://tracker.ceph.com/issues/70764 Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>