git.apps.os.sepia.ceph.com Git

osd: Fix segfault in EC debug string

The old debug_string implementation was potentially reading up to 3
bytes off the end of an array. It was also doing lots of unnecessary
bufferlist reconstructs. This refactor of this function fixes both
issues.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit da3ccdf4d03e40b747f8876449199102e53e00ce)

osd: Optimized EC backfill interval has wrong versions

Bug in the optimized EC code creating the backfill
interval on the primary. It is creating a map with
the object version for each backfilling shard. When
there are multiple backfill targets the code was
overwriting oi.version with the version
for a shard that has had partial writes which
can result in the object not being backfilled.

Can manifest as a data integirty issue, scrub
error or snapshot corruption.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit acca514f9a3d0995b7329f4577f6881ba093a429)

osd: Optimized EC choose_acting needs to use best primary shard

There have been a couple of corner case bugs with choose_acting
with optimized EC pools in the scenario where a new primary
with no existing log is choosen and find_best_info selects
a non-primary shard as the authorative shard.

Non-primary shards don't have a full log so in this scenario
we need to get the log from a shard that does have a complete
log first (so our log is ahead or eqivalent to authorative shard)
and then repeat the get log for the authorative shard.

Problems arise if we make different decisions about the acting
set and backfill/recovery based on these two different shards.
In one bug we osicillated between two different primaries
because one primary used one shard to making peering decisions
and the other primary used the other shard, resulting in
looping flip/flop changes to the acting_set.

In another bug we used one shard to decide that we could do
async recovery but then tried to get the log from another
shard and asserted because we didn't have enough history in
the log to do recovery and should have choosen to do a backfill.

This change makes optimized EC pools always choose the
best !non_primary shard when making decisions about peering
(irrespective of whether the primary has a full log or not).
The best overall shard is now only used for get log when
deciding how far to rollback the log.

It also sets repeat_getlog to false if peering fails because
the PG is incomplete to avoid looping forever trying to get
the log.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit f3f45c2ef3e3dd7c7f556b286be21bd5a7620ef7)

osd: Do not sent PDWs if read count > k

The main point of PDW (as currently implemented) is to reduce the amount
of reading performed by the primary when preparing for a read-modify-write (RMW).

It was making the assumption that if any recovery was required by a
conventional RMW, then a PDW is always better. This was an incorrect assumption
as a conventional RMW performs at most K reads for any plugin which
supports PDW. As such, we tweak this logic to perform a conventional RMW
if the PDW is going to read k or more shards.

This should improve performance in some minor areas.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit cffd10f3cc82e0aef29209e6e823b92bdb0291ce)

osd: Fix decode for some extent cache reads.

The extent cache in EC can cause the backend to perform some surprising reads. Some
of the patterns were discovered in test that caused the decode to attempt to
decode more data than was anticipated during the read planning, leading to an
assert. This simple fix reduces the scope of the decode to the minimum.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 2ab45a22397112916bbcdb82adb85f99599e03c0)

osd: Optimized EC calculate_maxles_and_minlua needs to use ...
exclude_nonprimary_shards

When an optimized EC pool is searching for the best shard that
isn't a non-primary shard then the calculation for maxles and
minlua needs to exclude nonprimary-shards

This bug was seen in a test run where activating a PG was
interrupted by a new epoch and only a couple of non-primary
shards became active and updated les. In the next epoch
a new primary (without log) failed to find a shard that
wasn't non-primary with the latest les. The les of
non-primary shards should be ignored when looking for
an appropriate shard to get the full log from.

This is safe because an epoch cannot start I/O without
at least K shards that have updated les, and there
are always K-1 non-primary shards. If I/O has started
then we will find the latest les even if we skip
non-primary shards. If I/O has not started then the
latest les ignoring non-primary shards is the
last epoch in which I/O was started and has a good
enough log+missing list.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 72d55eec85afa4c00fac8dc18a1fb49751e61985)

osd: Optimized EC choose_async_recovery_ec must use auth_shard

Optimized EC pools modify how GetLog and choose_acting work,
if the auth_shard is a non-primary shard and the (new) primary
is behind the auth_shard then we cannot just get the log from
the non-primary shard because it will be missing entries for
partial writes. Instead we need to get the log from a shard
that has the full log first and then repeat GetLog to get
the log from the auth_shard.

choose_acting was modifying auth_shard in the case where
we need to get the log from another shard first. This is
wrong - the remainder of the logic in choose_acting and
in particular choose_async_recovery_ec needs to use the
auth_shard to calculate what the acting set will be.
Using a different shard occasional can cause a
different acting set to be selected (because of
thresholds about the number of log entries behind
a shard needs to be to perform async recovery) and
this can lead to two shards flip/flopping with
different opinions about what the acting set should be.

Fix is to separate out which shard will be returned
to GetLog from the auth_shard which will be used
for acting set calculations.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 3c2161ee7350a05e0d81a23ce24cd0712dfef5fb)

osd: Optimized EC don't try to trim past crt

If there is an exceptionally long sequence of partial writes
that did not update a shard that is followed by a full write
then it is possible that the log trim point is ahead of the
previous write to the shard (and hence crt). We cannot trim
beyond crt. In this scenario its fine to limit the trim to crt
because the shard doesn't have any of the log entries for the
partial writes so there is nothing more to trim.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 645cdf9f61e79764eca019f58a4d9c6b51768c81)

osd: Optimized EC missing call to apply_pwlc after updating pwlc

update_peer_info was updating pwlc with a newer version received
from another shard, but failed to update the peer_info's to
reflect the new pwlc by calling apply_pwlc.

Scenario was primary receiving an update from shard X which had
newer information about shard Y. The code was calling apply_pwlc
for shard X but not for shard Y.

The fix simplifies the logic in update_peer_info - if we are
the primary update all peer_info's that have pwlc. If we
are a non-primary and there is pwlc then update info.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit d19f3a3bcbb848e530e4d31cbfe195973fa9a144)

osd: Optimized EC don't apply pwlc for divergent writes

Split pwlc epoch into a separate variable so that we
can use epoch and version number when comparing if
last_update is within a pwlc range. This ensures that
pwlc is not applied to a shard that has a divergent
write, but still tracks the most recent update of pwlc.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit d634f824f229677aa6df7dded57352f7a59f3597)

osd: Optimized EC present_shards no longer needed

present_shards is no longer needed in the PG log entry, this has been
replaced with code in proc_master_log that calculates which shards were
in the last epoch started and are still present.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 880a17e39626d99a0b6cc8259523daa83c72802c)

osd: Optimized EC proc_master_log fix roll-forward logic when shard is absent

Fix bug in optimized EC code where proc_master_log incorrectly did not
roll forward a write if one of the written shards is missing in the current
epoch and there is a stray version of that shard that did not receive the
write.

As long as the currently present shards that participated in les and were
updated by a write have the update then the write should be rolled-forward.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit e0e8117769a8b30b2856f940ab9fc00ad1e04f63)

osd: Refactor find_best_info and choose_acting

Refactor find_best_info to have separate function to calculate
maxles and minlua. The refactor makes history_les_bound
optional, tidy up the choose_acting interface removing this
where it is not used.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit f1826fdbf136dc7c96756f0fb8a047c9d9dda82a)

osd: EC Optimizations proc_master_log boundary case bug fixes

Fix a couple of bugs in proc_master_log for optimized EC
pools dealing with boundary conditions such as an empty
log and merging two logs that diverge from the very first
entry.

Refactor the code to handle the boundary conditions and
neaten up the code.

Predicate the code block with if (pool.info.allows_ecoptimizations())
to make it clear this code path is only for optimized EC pools.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 1b44fd9991f5f46b969911440363563ddfad94ad)

osd: Invalidate stats during peering if we are rolling a shard forwards.

This change will mean we always recalculate stats upon rolling stats forwards. This prevent the situation where we end up with incorrect statistics due to where we always take the stats of the oldest shard during peering; causing outdated pg stats being applied for cases where the oldest shards are shards that don't see partial writes where num_bytes has changed on other places after that point on that shard.

Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
(cherry picked from commit b178ce476f4a5b2bb0743e36d78f3a6e23ad5506)

osd: ECTransaction.h includes OSDMap.h

Needed for crimson.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 6dd393e37f6afb9063c4bed3e573557bd0efb6bd)

osd: bypass messenger for local EC reads

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit b07d1f67625c8b621b2ebf5a7f744c588cae99d3)

osd: fix buildability after get_write_plan() shuffling

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 7f4cb19251345849736e83bd0c7cc15ccdcdf48b)

osd: just shuffle get_write_plan() from ECBackend to ECCommon

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 9d5bf623537b8ee29e000504d752ace1c05964d7)

osd: prepare get_write_plan() for moving from ECBackend to ECCommon

For the sake of sharing with crimson.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit dc5b0910a500363b62cfda8be44b4bed634f9cd6)

osd: separate producing EC's WritePlan out into a dedicated method

For the sake of sharing with crimson in next commits.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit e06c0c6dd08fd6d2418a189532171553d63a9deb)

osd: fix unused variable warning in ClientReadCompleter

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit eb3a3bb3a70e6674f6e23a88dd1b2b86551efda2)

osd: shuffle ECCommon::RecoveryBackend from ECBackend.cc to ECCommon.cc

It's just code movement; there is no changes apart that.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit ef644c9d29b8adaef228a20fc96830724d1fc3f5)

osd: drop junky `#if 1` in recovery backend

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit d43bded4a02532c4612d53fc4418db8e4e829c3f)

osd: move ECCommon::RecoveryBackend from ECBackend.cc to ECCommon.cc

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit debd035a650768ead64e0707028bb862f4767bef)

osd: replace get_obc() with maybe_load_obc() in EC recovery

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 266773625f997ff6a1fda82b201e023948a5c081)

osd: abstract sending MOSDPGPush during EC recovery

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 1d54eaff41ec8d880bcf9149e4c71114e0ffdc09)

osd: prepare ECCommon::RecoveryBackend for shuffling to ECCommon.cc

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit e3ade5167d3671524eb372a028157f2a46e7a219)

osd: squeeze RecoveryHandle out of ECCommon::RecoveryBackend

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 1e0feb73a4b91bd8b7b3ecc164d28fe005b97ed1)

osd: just shuffle RecoveryMessages to ECCommon.h

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit bc28c16a9a83b0f12d3d6463eaeacbab40b0890b)

osd: prepare RecoveryMessages for shuffling to ECCommon.h

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 0581926035113b1a9cb38f76233242d6b32a7dc6)

osd: ECCommon::RecoveryBackend doesn't depend on ECBackend anymore

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 6ead960b23a95211847250d90e3d2945c6254345)

osd: fix buildability after the RecoveryBackend shuffling

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit c9d18cf3024e5ba681bed5dc315f70527e99b3f1)

osd: just shuffle RecoveryBackend from ECBackend.h to ECCommon.h

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 98480d2f75a7b99aa72562a6a6daa5f39db3d425)

Merge pull request #65411 from aainscow/wip-72561-tentacle

tentacle: Optimized Erasure Coding - Fixpack 2

Merge pull request #64030 from VallariAg/wip-71724-tentacle

tentacle: qa: reduce nvmeof thrasher fio to 32 devices from 200

Merge pull request #65429 from nbalacha/wip-72905-tentacle

tentacle: rgw/logging: fixes data loss during rollover

Reviewed-by: Adam Emerson <aemerson@redhat.com>
Reviewed-by: Yuval Lifshitz <ylifshit@redhat.com>

Merge pull request #65271 from smanjara/wip-72570-tentacle

tentacle: rgw/multisite: url-encode list_bucket query param 'key-marker'

Reviewed-by: Adam Emerson <aemerson@redhat.com>

Merge pull request #64862 from adamemerson/wip-71066-tentacle

tentacle: rgw/multisite: Fix lifetime issues

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #65546 from stackhpc/doc-balancer-tentacle

tentacle: doc: Fixes a typo in balancer operations

doc: Fixes a typo in balancer operations

Signed-off-by: Tyler Brekke <tbrekke@digitalocean.com>
(cherry picked from commit b038b8093d01a5e676ffa419607489a79261ef29)

Merge pull request #65160 from adk3798/wip-72668-tentacle

tentacle: cephadm/cephadmlib: Eliminate false warnings about old sysctl conf files

Reviewed-by: Gil Bregman <gbregman@il.ibm.com>

Merge pull request #65068 from adk3798/tentacle-smb-remotectl

tentacle: smb: add remote control server

Reviewed-by: John Mulligan <jmulligan@redhat.com>

Merge pull request #64724 from adk3798/wip-72268-tentacle

tentacle: mgr/cephadm: updating maintenance health status in the serve…

Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>
Reviewed-by: Kushal Deb <Kushal.Deb@ibm.com>

Merge pull request #64723 from adk3798/wip-72264-tentacle

tentacle: mgr/cephadm: Provide appropriate exit codes for orch operations

Reviewed-by: Kushal Deb <Kushal.Deb@ibm.com>
Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>

Merge pull request #64722 from adk3798/tentacle-cephadm-nvmeof-add-force-tls-flag

tentacle: mgr/cephadm/nvmeof: Add "force TLS" flag to NVMeOF spec file.

Reviewed-by: Gil Bregman <gbregman@il.ibm.com>

Merge pull request #64721 from adk3798/tentacle-cephadm-nvmeof-increase-default-max-namespaces

tentacle: mgr/cephadm/nvmeof: Increase the default limit of max_namespaces

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Gil Bregman <gbregman@il.ibm.com>

Merge pull request #64691 from adk3798/wip-72135-tentacle

tentacle: mgr/cephadm: disallow changing OSD service type to non-OSD types

Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>

Merge pull request #64674 from adk3798/tentacle-cephadm-undefined-variable-haproxy-config

tentacle: mgr/cephadm: handle possibly undefined template variable in haproxy.cfg.j2

Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>

Merge pull request #64673 from adk3798/wip-72138-tentacle

tentacle: mgr/rgw: don't fail realm bootstrap if system user exists already

Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>

Merge pull request #64672 from adk3798/tentacle-teuth-add-cephadm-file-path

tentacle: qa/tasks/cephadm: allow to select from 'cephadm' and 'cephadm.py'

Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>

Merge pull request #65466 from xhernandez/wip-72952-tentacle

tentacle: libcephfs_proxy: fix backward compatibility issue

Reviewed-by: Anoop C S <anoopcs@cryptolab.net>

Merge pull request #65447 from aaSharma14/wip-71903-tentacle

tentacle: mgr/dashboard: Enable rgw module automatically in the primary and secondary cluster if not enabled during multi-site automation

Reviewed-by: Afreen Misbah <afreen@ibm.com>

Merge pull request #65454 from rhcs-dashboard/wip-72928-tentacle

tentacle: mgr/dashboard:RGW- Storage Class ACL Mapping

Reviewed-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>

Merge pull request #64891 from aainscow/wip-72372-tentacle

tentacle: osd: Reduce reads when rebalancing healthy Erasure Coded PGs

Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>

Merge pull request #65477 from cbodley/wip-tentacle-rgw-reshard-release-note

tentacle: doc/rgw: release note for bucket reshard optimization

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

Merge pull request #65476 from rhcs-dashboard/wip-72966-tentacle

tentacle: monitoring: add user-agent headers to the urllib

doc/rgw: release note for bucket reshard optimization

(note that this applies directly to tentacle, not main)

https://github.com/ceph/ceph/pull/56597 merged last year but was not
mentioned in release notes. this optimization is worth highlighting for
rgw

Signed-off-by: Casey Bodley <cbodley@redhat.com>

monitoring: add user-agent headers to the urllib

The documentation started raising 403 suddenly. Adding User-Agent
headers to the request

Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit b8fe487010483681bbc8ddb8dfe18b40ebfd346b)

Merge pull request #64694 from adk3798/wip-72262-tentacle

tentacle: cephadm: Bind mount /var/lib/samba with 0755

Reviewed-by: John Mulligan <jmulligan@redhat.com>

Merge pull request #65038 from guits/backport-seastore-cv

(tentacle): ceph-volume: add seastore OSDs support

mgr/dashboard: Enable rgw module automatically in the primary and secondary cluster if not enabled during multi-site automation

1. Enable rgw module automatically in the primary and secondary cluster if not enabled during multi-site automation
2. Improve progress bar descriptions and add sub-descriptions for steps

Fixes: https://tracker.ceph.com/issues/71033
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit d6c657b7e6bb85f6d246b389c54941b777364f49)

Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-overview-dashboard/rgw-overview-dashboard.component.spec.ts

libcephfs_proxy: fix userperm pointer decoding for older protocols

The random data used to decode pointers coming from the old protocol was
taken from the client instead of using the global_random data, which is
the correct one.

Fixes: https://tracker.ceph.com/issues/72800
Signed-off-by: Xavi Hernandez <xhernandez@gmail.com>
(cherry picked from commit 674bd44001d6feb919a331d4a4586cc0d97847f8)

libcephfs_proxy: remove unnecessary protocol references in daemon

With the new protocol structure definitions, it's not necessary to
explicitly access each field inside its version substructure (v0, for
example). Now all fields of the latest version are declared inside an
anonymous substructure that can be accessed without a prefix.

Fixes: https://tracker.ceph.com/issues/72800
Signed-off-by: Xavi Hernandez <xhernandez@gmail.com>
(cherry picked from commit b5df01a605d0adfa332e08665faea00bf7b0fbd0)

libcephfs_proxy: remove unnecessary protocol references in client

With the new protocol structure definitions, it's not necessary to
explicitly access each field inside its version substructure (v0, for
example). Now all fields of the latest version are declared inside an
anonymous substructure that can be accessed without a prefix.

Fixes: https://tracker.ceph.com/issues/72800
Signed-off-by: Xavi Hernandez <xhernandez@gmail.com>
(cherry picked from commit 8cd9f8c808a09f8c2f1da5b15233a14effa41296)

libcephfs_proxy: fix protocol structures for backward compatibility

The structures used for transferring data between the proxy client and
the proxy daemon had been reworked in a recent change to be able to
expand the protocol. This caused an inconsistency in the size of the
data transferred when communication with a peer using the older version.
The result was that the peer receiving the data with an unexpected size
was closing the connection, causing unexpected errors.

The discrepancy in size is the result of how compilers pad structures
combined with the change in the structure layout introduced when
extending the protocol. With these changes, the computation of the size
of each version of the structures was not done correctly.

This change makes the layout equal to the older version, so that
computing the size of the structures becomes easier and doesn't depend
on unexpected paddings.

Fixes: https://tracker.ceph.com/issues/72800
Signed-off-by: Xavi Hernandez <xhernandez@gmail.com>
(cherry picked from commit 62e917148496bce299f4cd48342765b73b9950a8)

Merge pull request #64606 from cbodley/wip-72191-tentacle

tentacle: deb/cephadm: add explicit --home for cephadm user

Reviewed-by: Adam Emerson <aemerson@redhat.com>

Merge pull request #64935 from pritha-srivastava/wip-72465-tentacle

tentacle: rgw: check all JWKS for STS

Reviewed-by: Adam Emerson <aemerson@redhat.com>

mgr/dashboard:RGW- Storage Class ACL Mapping

Fixes: https://tracker.ceph.com/issues/72362
Signed-off-by: Dnyaneshwari Talwekar <dtalwekar@redhat.com>
(cherry picked from commit 38b237c8cb12cb77fd29a560c10c7e2225786955)

Merge pull request #65438 from ljflores/wip-72913-tentacle

tentacle: doc/rados/operations: add kernel client procedure to read balancer documentation

Merge pull request #65400 from ceph/tentacle-release

v20.1.0

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: David Galloway <dgallowa@redhat.com>
Reviewed-by: Yuri Weinstein <yweinste@redhat.com>

ceph-volume: add seastore OSDs support

This adds the seastore OSD objectstore support to ceph-volume.

Fixes: https://tracker.ceph.com/issues/71414
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 1b83235849fd875aac1f46b714a63f2c1e8ce836)

ceph-volume: refactor LvmBlueStore.setup_device()

This refactores redundant device setup calls in LvmBlueStore class:
Calling the same function twice with different arguments for WAL
and DB devices was inefficient and unnecessary.
The new implementation simplifies the logic by directly accessing
`self.args`, it removes the need for passing arguments manually.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
(cherry picked from commit 7626c12e5fd4800187963f3b7f8691eb2847c119)

Merge pull request #65301 from guits/wip-72782-tentacle

tentacle: ceph-volume: drop udevadm subprocess calls

Merge pull request #65446 from aaSharma14/wip-72910-tentacle

tentacle: mgr/dashboard: fix RGW Bucket Notification Dashboard units

Reviewed-by: Abhishek Desai <abhishek.desai1@ibm.com>

mgr/dashboard: fix RGW Bucket Notification Dashboard units

Fixes: https://tracker.ceph.com/issues/72868
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 2c2f1f83ecad42a4de0d4df3f4a44c9c2b69390f)

neorados/cls/fifo/detail/fifo: include strtol.h

https://github.com/ceph/ceph/commit/a2d26647c011274b61805f8ac17c3422e9b9b63c

ftbfs:
```
/home/jenkins-build/build/workspace/ceph-pull-requests/src/neorados/cls/fifo/detail/fifo.h:630:14: error: no member named 'parse' in namespace 'ceph'; did you mean 'pause'?
630 | auto n = ceph::parse<decltype(m.num)>(num);
| ^~~~~~~~~~~
```

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
(cherry picked from commit 3df1133c17b174c27c250cf7ac018199cc40b15b)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

rgw/datalog: Manage and shutdown tasks properly

This is slightly ugly but good enough for now. Make sure we can block
when shutting down background tasks.

Remove a few `driver` parameters that are unused. This lets us
simplify the IAM Policy and Lua tests and not construct stores we
never use. (Which is good since we aren't running them under a cluster.)

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 3def99eec5df353a0fb34be79d9e98a08eb05985)

Conflicts:
src/rgw/driver/rados/rgw_service.cc
src/rgw/rgw_sal.h
src/rgw/rgw_sal.cc
- `#ifdef` changes
src/test/rgw/test_rgw_iam_policy.cc
src/test/rgw/test_rgw_lua.cc
- SAL renaming

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

Merge pull request #65323 from rhcs-dashboard/wip-72798-tentacle

tentacle: mgr/dashboard: expose image summary API

neorados/fifo: Rewrite as proper I/O object

Split nominal handle object and reference-counted
implementation. While we're at it, add lazy-open functionality.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 3097297dd39432d172d69454419fa83a908075f6)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

{neorados,osdc}: Support subsystem cancellation

Tag operations with a subsystem so we can cancel them all in one go.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 2526eb573b789b33b7d9ebf1169491f13e2318bb)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

rgw/multi: Give tasks a reference to RGWDataChangesLog

Also run them in strands. Also `datalog_rados` is a `shared_ptr`,
now. Probably make it intrusive later.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 3c2b587ead6b0cb5acfd84788958dd957d020875)

Conflicts:
src/rgw/driver/rados/rgw_service.cc
src/rgw/rgw_sal.cc
- `#ifdef`s for standalone Rados
src/rgw/driver/rados/rgw_datalog.cc
- Periodic re-run of recovery removed in main and pending backport

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

neorados: Hold reference to implementation across operations

Asynchrony combined with cancellations keeps leading to occasional
lifetime issues, so follow the best-practices of Asio I/O objects by
having completions keep a reference live.

The original NeoRados backing implements Asio's two-phase shutdown
properly.

The RadosClient backing does not, because it shares an Objecter with
completions that do not belong to it. In practice I don't think this
will matter since librados and neorados get shut down around the same
time.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 57c9723928b4d2b2148ca0dd4d505acdc071f8eb)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

tentacle: update formatting to match across tentacle branches

I managed to create a bit of a mess with formatting changes after
a fix was cherry picked to `tentacle-release`. This change makes
the formatting on `tentacle-release` match that of `tentacle`.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

doc/rados/operations: add kernel client procedure to read balancer documentation

As of now, the kernel client does not support `pg-upmap-primary`. I have
added some troubleshooting steps to help users who are unable to
mount images and filesystems with the kernel client while using `pg-upmap-primary`.

Once the feature is supported by the kernel client, users will be able
to perform mounts along with `pg-upmap-primary`.

Fixes: https://tracker.ceph.com/issues/72897
Signed-off-by: Laura Flores <lflores@ibm.com>
(cherry picked from commit 546d523147873c8fc6cf4208b4fa71eb2703e9c3)

Merge pull request #65306 from rhcs-dashboard/wip-72768-tentacle

tentacle: mgr/dashboard: About panel showing other icons in background while open

Reviewed-by: Aashish Sharma <aasharma@redhat.com>

Merge pull request #65307 from aaSharma14/wip-72767-tentacle

tentacle: mgr/dashboard: Fix duplicate selection on multi-select in table component

Reviewed-by: Afreen Misbah <afreen@ibm.com>

Merge pull request #65381 from aaSharma14/wip-72841-tentacle

tentacle: mgr/dashboard: Allow the user to re-use existing realm/zg/zone and setup replication

Reviewed-by: Afreen Misbah <afreen@ibm.com>

Merge pull request #65383 from aaSharma14/wip-72867-tentacle

tentacle: mgr/dashboard: Adding RGW Bucket Notification Dashboard for Grafana

Reviewed-by: Afreen Misbah <afreen@ibm.com>

rgw/logging: fixes data loss during rollover

Multiple threads attempting to roll over the same log object can result
in the creation of numerous orphan tail objects, each with a single record.
This occurs when a NULL RGWObjVersionTracker is used during the creation of
a new logging object. These records are inaccessible, leading to data loss,
which is particularly critical in Journal mode.
Furthermore, valid log tail objects may be added to the Garbage Collection (GC)
list, exacerbating data loss.

Fixes: https://tracker.ceph.com/issues/72740
Signed-off-by: N Balachandran <nithya.balachandran@ibm.com>
(cherry picked from commit eea6525c031ae93f4ae846b06d55831e658faa2c)

osd: Optimized EC invalid pwlc for shards doing backfill/async

Shards performing backfill or async recovery receive log entries
(but not transactions) for updates to missing/yet to be backfilled
objects. These log entries get applied and completed immediately
because there is nothing that can be rolled back. This causes
pwlc to advance too early and causes problems if other shards
do not complete the update and end up rolling it backwards.

This fix sets pwlc to be invalid when such a log entry is
applied and completed and it then remains invalid until the
next interval when peering runs again. Other shards will
continue to update pwlc and any complete subset of shards
in a future interval will include at least one shard that
has continued to update pwlc

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 534fc76d40a86a49bfabab247d3a703cbb575e27)

osd: Optimized EC add_log_entry should not skip partial writes

Undo a previous attempt at a fix that made add_log_entry skip adding partial
writes to the log if the write did not update this shard. The only case where
this code path executed is when a partial write was to an object that needs
backfilling or async recovery. For async recovery we need to keep the
log entry because it is needed to update the missing list. For backfill it
doesn't harm to keep the log entry.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 9f0e883b710a06e3371bc7e0681e034727447f27)

osd: Optimized EC apply_pwlc needs to be careful about advancing last_complete

Fix bug in apply_pwlc where the primary was advancing last_complete for a
shard doing async recorvery so that last_complete became equal to last_update
and it then thought that recovery had completed. It is only valid to advance
last_complete if it is equal to last_update.

Tidy up the logging in this function as consecutive calls to this function
often logged that it could advance on the 1st call and then that it could not
on the 2nd call. We only want one log message.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 0b19ed49ff76d7470bfcbd7f26ea0c7e5a2bc358)

osd: Use std::cmp_greater to avoid signedness warnings.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 846879e6c2ec4ab5a65040981a617f4b603c379a)

osd: Always send EC messages to all shards following error.

Explanation of bug which is being fixed:

Log entry 204'784 is an error - "_rollback_to deleting head on smithi17019940-573 because
got ENOENT|whiteout on find_object_context" so this log entry is generated outside of EC by
PrimaryLogPG. It should be applied to all shards, however osd 13(2) was a little slow and
the update got interrupted by a new epoch so it didn't apply it. All the other shards
marked it as applied and completed (there isn't the usual interlock that EC has of making
sure all shards apply the update before any complete it).

We then processed 4 partial writes applying and completing them (they didn't update osd
13(2)), then we have a new epoch and go through peering. Peering says osd 13(2) didn't see
update 204'784 (it didn't) and therefore the error log entry and the 4 partial writes need
to be rolled back. The other shards had completed those 4 partial writes so we end up with
4 missing objects on all the shards which become unfound objects.

I think the underlying bug means that log entry 204'784 isn't really complete and may
"disappear" from the log in a subsequent peering cycle. Trying to forcefully rollback a
logged error doesn't generate a missing object or a miscompare, so the consequences of the
bug are hidden. It is however tripping up the new EC code where proc_master_log is being
much stricter about what a completed write means.

Fix:
After generating a logged error we could force the next write to EC to update metadata on
all shards even if its a partial write. This means this write won't complete unless all
shards see the logged error. This will make new EC behave the same as old EC. There is
already an interlock with EC (call_write_ordered) which is called just before generating
the log error that ensures that any in-flight writes complete before submitting the log
error. We could set a boolean flag here (at the point call_write_ordered is called is fine,
don't need to wait for the callback) to say the next write has to be to all shards. The
flag can be cleared if we generate the transactions for the next write, or we get an
on_change notification (peering will then clear up the mess)

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 4948f74331c13cd93086b057e0f25a59573e3167)

osd: Attribute re-reads in optimised EC

There were some bugs in attribute reads during recovery in optimised
EC where the attribute read failed. There were two scenarios:

1. It was not necessary to do any further reads to recover the data. This
can happen during recovery of many shards.
2. The re-read could be honoured from non-primary shards. There are
sometimes multiple copies of the shard whcih can be used, so a failed read
on one OSD can be replaced by a read from another.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 417fb71c9b5628726d3217909ba1b6d3e7bf251a)

osd: EC optimizations keep log entries on all shards

When a shard is backfilling it gets given log entries
for partial writes even if they do not apply to the
shard. The code was updating the missing list but
discarding the log entry. This is wrong because the
update can be rolled backwards and the log entry is
required to revert the update to the missing list.
Keeping the log entry has a small but insignificant
performance impact.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 1fa5302092cbbb37357142d01ca008cae29d4f5e)

osd: Remove some extraneous references to hinfo.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 1c2476560f1c4b2eec1c074a6eead5520b5474eb)

mon: Optimized EC pools preprocess_pgtemp incorrectly rejecting pgtemp as nop

Optimized EC pools store pgtemp with primary shards first, this was not
being taken into account by OSDMonitor::preprocess_pgtemp which meant
that the change of pgtemp from [None,2,4] to [None,4,2] for a 2+1 pool
was being rejected as a nop because the primary first encoded version
of [None,2,4] is [None,4,2].

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 00aa1933d3457c377d9483072e663442a4ff8ffd)

osd: EC Optimizations proc_master_log bug fixes

1. proc_master_log can roll forward full-writes that have
been applied to all shards but not yet completed. Add a
new function consider_adjusting_pwlc to roll-forward
pwlc. Later partial_write can be called to process the
same writes again. This can result in pwlc being rolled
backwards. Modify partial_write so it does not undo pwlc.

2. At the end of proc_master_log we want the new
authorative view of pwlc to persist - this may be
better or worse than the stale view of pwlc held by
other shards. consider_rollback_pwlc sometimes
updated the epoch in the toversion (second value of the
range fromversion-toverison). We now always do this.
Updating toversion.epoch causes problems because this
version sometimes gets copied to last_update and
last_complete - using the wrong epoch here messes
everything up in later peering cycles. Instead we
now update fromversion.epoch. This requires changes
to apply_pwlc and an assert in Stray::react(const MInfoRec&)

3. Calling apply_pwlc at the end of proc_master_log is
too early - updating last_update and last_complete here
breaks GetMissing. We need to do this later when activating
(change to search_missing and activate)

4. proc_master_log is calling partial_write with the
wrong previous version - this causes problems after a
split when the log is sparsely populated.

5. merging PGs is not setting up pwlc correctly which
can cause issues in future peering cycles. The
pwlc can simply be reset, we need to update the epoch
to make sure this view of pwlc persists vs stale
pwlc from other shards.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 0b8593a0112e31705acb581ac388a4ef1df31b4b)