git-server-git.apps.pok.os.sepia.ceph.com Git

Merge PR #68733 into tentacle

* refs/pull/68733/head:
tentacle: tentacle-p2p add centos9 to stress-split
tentacle: upgrade/tentacle-p2p install pytest for rbd-python tests
tentacle: suites/upgrade add centos to centos image upgrade

Reviewed-by: Laura Flores <lflores@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #68905 into tentacle

* refs/pull/68905/head:
mds: remove duplicate context completion calls
mds: add retry request to MDSRank wait queue rather via finisher
mds: adjust scan_stray_dir after fixing up MDSContext class
Revert "mds: move MDSContext completion handling to finish method"

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #68820 into tentacle

* refs/pull/68820/head:
qa/cephfs: treat "implicit declaration of function" for blogbench workunit for newer gcc version

Merge PR #68826 into tentacle

* refs/pull/68826/head:
rgw: fix lifecycle transition of encrypted multipart objects

Reviewed-by: Adam C. Emerson <aemerson@redhat.com>

tentacle: tentacle-p2p add centos9 to stress-split

Adds CentOS 9 Stream as a supported host OS for the tentacle-p2p
stress-split suite.

Rocky10 is not added here. This suite tests a bare-metal p2p upgrade
between Tentacle point releases. Rocky10 package support was added during
that release cycle, meaning there is no valid "FROM" package baseline to
install on Rocky10, making a bare-metal point-to-point upgrade path on Rocky10
impossible to test.

Fixes: https://tracker.ceph.com/issues/76710
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>

tentacle: upgrade/tentacle-p2p install pytest for rbd-python tests

The rbd-python workload runs test_librbd_python.sh, which invokes
'python3 -m pytest' to run the librbd Python API tests.
On a bare-metal install, python3-pytest is not pulled in by the
Ceph packages and is not present and the workunit fails.
Add it via extra_system_packages so the tests can run.

Fixes: https://tracker.ceph.com/issues/76710
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>

tentacle: suites/upgrade add centos to centos image upgrade

Previously, each suite had a single upgrade-sequence.yaml that targeted
only one image.
This commit splits each upgrade sequence into two variants teuthology
picks one per run

upgrade-sequence$/centos-stream9.yaml - targets $sha1
upgrade-sequence$/rockylinux-10.yaml - targets $sha1-rockylinux-10

This applies to:
- reef-x/parallel
- reef-x/stress-split
- squid-x/parallel
- squid-x/stress-split
- telemetry/reef-x
- telemetry/squid-x

For the stress-split suites, the upgrade logic is split into a
first-half-sequence run concurrently with thrashosds,
and a second-half-sequence run after.
Both sequences contain the hardcoded target image, so each variant
needs its own copy. They were previously inlined in 1-start.yaml.

This commit extracts them into the upgrade-sequence$/ files so
each variant can target the right image.

Fixes: https://tracker.ceph.com/issues/76710
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>

Merge PR #68803 into tentacle

* refs/pull/68803/head:
rgw/multisite: concurrency adjustment - consider the case caller provides 1
rgw/multisite: log concurrency state transitions in adj_concurrency
rgw/multisite: fix uninitialized LatencyMonitor average and use exponentially weighted moving average
rgw/multisite: expose lock latency as perf counter for data sync

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #68877 into tentacle

* refs/pull/68877/head:
extblkdev: Fix FCM plugin asserting on multivolume devices

Reviewed-by: Jaya Prakash Madaka <jayaprakash@ibm.com>

Merge PR #68617 into tentacle

* refs/pull/68617/head:
qa: fix setting rbd_sparse_read_threshold_bytes in test_migration_clone()

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #68406 into tentacle

* refs/pull/68406/head:
tools/ceph-kvstore-tool: fix crash on db close.
tools/kvstore_tool: reduce BlueStore.h exposure.
tools/kvstore_tool: add missing `#ifdef WITH_BLUESTORE`
tools/kvstore_tool: make load_bluestore() `private`

Reviewed-by: Laura Flores <lflores@redhat.com>
Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>

Merge PR #68710 into tentacle

* refs/pull/68710/head:
osd: FastEC: always update pwlc epoch when activating

Reviewed-by: bcAlex Ainscow <aainscow@uk.ibm.com>

Merge PR #68776 into tentacle

* refs/pull/68776/head:
tentacle: test/neorados: Narrow Asio includes in `leak_watch_notify`
test/neorados: Don't leak watch handle
neorados: Avoid double cleanup in watch/notify
neorados: Actually enforce notification queue limit
neorados: Do not try to decode an empty response in notify
neorados: Go through linger_cancel on `io_context` shutdown
common/async: Fix removal from service list
osdc: remove implicit LingerOp reference between watch/unwatch
osdc: linger_register() returns intrusive_ptr<LingerOp>
neorados: NotifierHandler holds intrusive_ptr<LingerOp>
neorados: Notifier holds intrusive_ptr<LingerOp>
librados: aio_unwatch() delivers ENOTCONN to AioCompletion
osdc: Objecter::linger_by_cookie() for safe cast from uint64
librados: linger callbacks hold a reference to LingerOp

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>

Merge PR #68718 into tentacle

* refs/pull/68718/head:
osd: PGLog Attach correct version to missing list when ignoring log entries

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #68717 into tentacle

* refs/pull/68717/head:
osd: Twiddle should create a full sized vector for optimized EC

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #68716 into tentacle

* refs/pull/68716/head:
osd: Add asserts to look for potential missing list corruption.
osd: Change rmissing map key from version_t to eversion_t

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #68715 into tentacle

* refs/pull/68715/head:
osd: Fix incorrect rollback logic for partial write OI

Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>

Merge PR #68739 into tentacle

* refs/pull/68739/head:
tentacle: test/rgw_multi: Import boto.s3.user for UserJSONENcoder
qa/radosgw_admin_rest: replace boto2 with boto3
qa/radosgw_admin_rest: pass endpoint to rgwadmin_rest()
rgw/rest: RESTArgs::get_string() url-decodes query params
qa/radosgw_admin: replace boto2 with boto3
qa/radosgw_admin: remove requestlog
qa/radosgw_admin: remove acl test cases

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>

Merge PR #68552 into tentacle

* refs/pull/68552/head:
ceph-volume: skip /dev/ram* devices in inventory

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #67293 into tentacle

* refs/pull/67293/head:
qa/cephfs: lua to respect missing kernel in yaml

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

tentacle: test/neorados: Narrow Asio includes in `leak_watch_notify`

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

tentacle: test/rgw_multi: Import boto.s3.user for UserJSONENcoder

This fixes the run-tox-qa failure in make check.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

test/neorados: Don't leak watch handle

Add missing unwatch call.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit ccc40eb69b4e8b66f7b9d32622bb22614b410166)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

neorados: Avoid double cleanup in watch/notify

An error coming in after `maybe_cleanup()` is called could trigger it
again. Add a flag to prevent that.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit d4ee87e985a6f2234050fd85d31564b7776bf1e6)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

neorados: Actually enforce notification queue limit

We were adding an overflow marker on every message above capacity, not
just the first, vitiating the purpose of the bound. The shame, the
shame.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 4cc2dbe316be9d8ac63ee00a6aff5455ecb4cd86)
Fixes: https://tracker.ceph.com/issues/76434
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

neorados: Do not try to decode an empty response in notify

The response can be empty on some errors. Attempting to decode an
empty one loses the error value on valid errors. Also swallow any
decode errors.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 89134791c16a2332457cfea85321a07e9ff7ca87)
Fixes: https://tracker.ceph.com/issues/76434
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

neorados: Go through linger_cancel on `io_context` shutdown

Rather than just dropping the reference, clean up the linger operation
properly within Objecter. Also, clear out handlers before
relinquishing reference to avoid use-after-free.

Fixes: https://tracker.ceph.com/issues/75164
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit f5affe06676bec2f32f856bce85b5d78932d807f)

Conflicts:
qa/workunits/rados/test.sh
- Just drop the `watch_leak` test on teuthology since it's of
marginal utility and not worth backporting all of
<https://github.com/ceph/ceph/pull/64219>.

Fixes: https://tracker.ceph.com/issues/76434
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

common/async: Fix removal from service list

Thanks to Seena Fallah <seenafallah@gmail.com> for this fix, part of a
larger commit.

Fixes: https://tracker.ceph.com/issues/75164
Co-authored-by: Seena Fallah <seenafallah@gmail.com>
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit b6a54e67cbcf11b23dcd5a5cd59795aeb7ff948e)
Fixes: https://tracker.ceph.com/issues/76434
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

osdc: remove implicit LingerOp reference between watch/unwatch

before this change set, linger_register() returned a raw LingerOp
pointer with an implicit reference for the caller. for librados,
this implicit reference is only dropped when the corresponding
unwatch() calls linger_cancel()

after commit 94f42b648feea77bd09dc3fdb48e6db2b48c7717 introduced
linger_by_cookie(), unwatch() no longer has a safe way to drop this
implicit reference. to prevent LingerOp leaks when unwatch() returns
ENOTCONN, we can't hold this implicit reference count until unwatch()

linger_register() now returns an explicit reference to the caller as
intrusive_ptr<LingerOp>. this helps to guarantee that this reference
count gets dropped before the completion of watch()/aio_watch()

because linger_register() no longer acquires an implicit reference for
the caller, linger_cancel() no longer drops it with info->put()

Reported-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 1b0f873162d4bc357b230a78452531fdf39a6b25)
Fixes: https://tracker.ceph.com/issues/76434
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

osdc: linger_register() returns intrusive_ptr<LingerOp>

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 4064da7ce716b42ce4924787024fd7ce01182762)
Fixes: https://tracker.ceph.com/issues/76434
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

neorados: NotifierHandler holds intrusive_ptr<LingerOp>

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 72eb48e041820095d20a035627999ec3db781180)
Fixes: https://tracker.ceph.com/issues/76434
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

neorados: Notifier holds intrusive_ptr<LingerOp>

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 9456aa73689924a0ad85005e824444b48d3c7d99)
Fixes: https://tracker.ceph.com/issues/76434
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

librados: aio_unwatch() delivers ENOTCONN to AioCompletion

94f42b648feea77bd09dc3fdb48e6db2b48c7717 added a new error condition to
IoCtx::aio_unwatch() that callers aren't prepared to handle. instead of
returning that error directly, report it asynchronously to the
AioCompletion

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit c0c146d37ae793fd1e1ab2a5118eac40149f2c6b)
Fixes: https://tracker.ceph.com/issues/76434
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

osdc: Objecter::linger_by_cookie() for safe cast from uint64

a `linger_ops_set` was added for `Objecter::handle_watch_notify()`
as a safety check before casting `uint64_t cookie` to `LingerOp*`
and deferencing it

neorados also made use of this set through `Objecter::is_valid_watch()`
checks. however, this approach was still susceptible to use-after-free,
because the callers didn't preserve a LingerOp reference between this
check and its use - and the Objecter lock is dropped in between. in
addition, `neorados::RADOS::unwatch_()` was missing its check for
`is_valid_watch()`

librados did not make use of this `is_valid_watch()` at all, so was
casting cookies directly to LingerOp* and dereferencing. this results
in use-after-free for any cookies invalidated by `linger_cancel()` -
for example when called by `CB_DoWatchError`

replace `is_valid_watch()` with a `linger_by_cookie()` function that
* performs the validity check with `linger_ops_set`,
* safely reinterpret_casts the cookie to LingerOp*, and
* returns a reference to the caller via intrusive_ptr<LingerOp>

`librados::IoCtxImpl::watch_check()`, `unwatch()` and `aio_unwatch()`
now call `linger_by_cookie()`, so have to handle the null case by
returning `-ENOTCONN` (this matches neorados' existing behavior)

Fixes: https://tracker.ceph.com/issues/72771
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 94f42b648feea77bd09dc3fdb48e6db2b48c7717)
Fixes: https://tracker.ceph.com/issues/76434
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

librados: linger callbacks hold a reference to LingerOp

preserve a reference to LingerOp in case their invocation races with
another linger_cancel()

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 2455a713d44babf979b55832dc6f75363357d270)
Fixes: https://tracker.ceph.com/issues/76434
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

mds: remove duplicate context completion calls

After reverting commit df404e0, the context completion in
MDSRank handling for `dump stray` command isn't required.

Also fixup the incorrect usage of std::unique_ptr with context
completion class (Context). Contexts delete themselves upon
completion.

Introduced-by: 801951e8c0d62dbbe724ce506fb44bc809bb7d4f
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit a77af14dd75bd4bda8b245ec7da0b85abccebc45)

mds: add retry request to MDSRank wait queue rather via finisher

C_MDS_RetryRequest inherits from MDSInternalContext which does not
acquire mds_lock by itself. Adding to MDSRank wait queue will process
this via the progress thread which completes the context with mds_lock
acquired.

Fixes: http://tracker.ceph.com/issues/76031
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 02735a362c685bfdb975ab86c9b042a8792f75b0)

mds: adjust scan_stray_dir after fixing up MDSContext class

Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 66abd9a13f670d6ea87ad23daef497d7a26e69e9)

Revert "mds: move MDSContext completion handling to finish method"

This reverts commit df404e03915765ef5c854a48556fd716161f3add.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 71c7be41b524dd0446f0a98fd8be09b950e6428b)

extblkdev: Fix FCM plugin asserting on multivolume devices

The issue is that FCM does not work properly with BlueStore "block"
that spans over multiple devices, even FCM ones.
The current code detects it, and issues health warning.

But by coding mistake the check for it applies to all deployments,
and is an assert-grade error.

The original problem was introduced with:
https://github.com/ceph/ceph/pull/68024
(as backport of https://github.com/ceph/ceph/pull/66318)

It was supposed to be solved with:
https://github.com/ceph/ceph/pull/68663
(as backport of https://github.com/ceph/ceph/pull/68416)

But it was not, since the actual offending lines were removed
by this temporary solution: https://github.com/ceph/ceph/pull/68391

This commit now properly removes the lines were not removed.
Fixes: https://tracker.ceph.com/issues/76581
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>

Merge PR #68370 into tentacle

* refs/pull/68370/head:
mgr/dashboard/controllers/nvmeof: remove success_message_template
mgr/dashboard: Add port and secure-listeners to subsystem add NVMeoF CLI command
mgr/dashboard: Add 'network_mask' to nvmeof cli

Reviewed-by: Afreen Misbah <afreen@ibm.com>

Merge PR #68536 into tentacle

* refs/pull/68536/head:
mgr/dashboard: remove sync_from entry when sync_from_all is true

Reviewed-by: Afreen Misbah <afreen@ibm.com>

Merge PR #68476 into tentacle

* refs/pull/68476/head:
mgr/dashboard: Update permissions for pool-manager role
mgr/dashboard : Select replicated rule by default in pools form
mgr/dashboard : Fix application names in pools form
mgr/dashboard : add stretch cluster validation for pools form

Reviewed-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>

Merge PR #68523 into tentacle

* refs/pull/68523/head:
mgr/dashboard : Add bottom padding for dashboard screens

Reviewed-by: Afreen Misbah <afreen@ibm.com>

Merge PR #68554 into tentacle

* refs/pull/68554/head:
mgr/dashboard : Fix RGW restart/stop issue

Reviewed-by: Afreen Misbah <afreen@ibm.com>

Merge PR #68818 into tentacle

* refs/pull/68818/head:
qa: Remove cephadm e2e tests from teuthology

Reviewed-by: Laura Flores <lflores@redhat.com>
Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>

Merge PR #68382 into tentacle

* refs/pull/68382/head:
mgr/dashboard: Fix tags in subvolume list and subvolume groups list

Reviewed-by: Afreen Misbah <afreen@ibm.com>

Merge PR #68704 into tentacle

* refs/pull/68704/head:
test/rgw/notification: fix the cloudevents package version

Reviewed-by: Yuval Lifshitz <ylifshit@redhat.com>

qa/cephfs: lua to respect missing kernel in yaml

When teuthology-suite is called with '-k none' option, which is valid,
there is no kernel record in job config created.

However at some test cases the lua premerge dies with exception:
KeyError: 'kernel'
and when branch is not set for '-k none' and kernel client is
overridden:
KeyError: 'branch'
so teuthology-suite quits unexpectedly without scheduling any jobs.

Fixes: https://tracker.ceph.com/issues/73676
Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@clyso.com>
(cherry picked from commit b410701b7c725605a641910873b488b37fbbca59)

mgr/dashboard/controllers/nvmeof: remove success_message_template

Remove 'success_message_template' from NvmeofCLICommand decorator of
NVMeoFSubsystem.add_network() and NVMeoFSubsystem.del_network().
This is because 'success_message_template' feature introduction PR
hasn't been backported to tentacle.
This commit can be reverted later in tentacle branch.

Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>

mgr/dashboard: Add port and secure-listeners to subsystem add NVMeoF CLI command

Fixes: https://tracker.ceph.com/issues/75998
Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
(cherry picked from commit 624adc09431dc2fdfa617940161f188c0831bf97)

Conflicts:
src/pybind/mgr/dashboard/controllers/nvmeof.py
Resolve conflict to use "traddr" instead of "server_address"
in NVMeoFSubsystem().create() parameters.
Main branch renamed the param ("traddr") to "server_address".
Tentacle

mgr/dashboard: Add 'network_mask' to nvmeof cli

This commit add the following to nvmeof cli:
0. Add new param `--network-mask` to 'subsystem add' cmd
   It's a list parameter so we can pass multiple netmask by
   `subsystem add --network-mask <subnet1> --network-mask <subnet2>`
1. Add new cli `subsystem add_network --network-mask <subnet>`
2. Add new cli `subsystem del_network --network-mask <subnet>`
3. Add column 'network_mask' to `subsystem list` output
4. Add column 'manual' to `listener list` output

Fixes: https://tracker.ceph.com/issues/75348
Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
(cherry picked from commit 366702057e65857ca86702b278cd2fd836484a51)

Conflicts:
src/pybind/mgr/dashboard/controllers/nvmeof.py
        NVMeoFSubsystem controller uses param name "traddr"
        in tentacle branch and its renamed to "server_address"
        in main branch. Since its a breaking change, it would be
        changed to "server_address" in next major version.
        So in this backport commit, we use "traddr" in create(),
        add_network(), and del_network().

Merge PR #68846 into tentacle

* refs/pull/68846/head:
tentacle: qa/rgw/multisite: remove duplicate test_suspended_delete_marker_incremental_sync

Reviewed-by: Adam C. Emerson <aemerson@redhat.com>

tentacle: qa/rgw/multisite: remove duplicate test_suspended_delete_marker_incremental_sync

PR #67318 (boto3 migration) cherry-picked onto tentacle conflicted with
PR #66168 which had added test_suspended_delete_marker_incremental_sync
using the old boto API. The conflict resolution added the boto3-rewritten
version of the function but left the original old-API version in place,
resulting in dup definitions with the same name.

Remove the stale old-API duplicate; keep the boto3 version added by PR #67318.

Fixes: https://tracker.ceph.com/issues/76505
Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>

Merge PR #68415 into tentacle

* refs/pull/68415/head:
nvmeof: Change the NVMEOF image version to 1.6

Reviewed-by: Aviv.Caro@ibm.com

Merge PR #68505 into tentacle

* refs/pull/68505/head:
rgw: reenable 'bucket stats' on indexless buckets
rgw: 'bucket stats' omits usage for buckets on other zonegroups
rgw/rados: pass SiteConfig into bucket_stats()
rgw: bucket_stats() uses local variable 'index'

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #67927 into tentacle

* refs/pull/67927/head:
rgw: handle plain-text object tags in RGWObjTags::decode()

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #67440 into tentacle

* refs/pull/67440/head:
RGW: remove custom copy constructor for RGWObjectCtx and enforce no copy/move

Merge PR #66769 into tentacle

* refs/pull/66769/head:
test/rgw/logging: run teuthology on erasure coded pool
rgw/bucket-logging: support for EC pools
rgw/logging: do not create empty temporary objects
rgw/logging: deleteting the object holding the temp object name on cleanup
rgw/logging: make sure source bucket is in the target's list
rgw/logging: removed unused APIs from header
rgw/logging: fix race condition when name update returns ECANCELED
rgw/logging: add error message when log_record fails
rgw/logging: rollover objects when conf changes
rgw/logging: allow committing empty objects
rgw/logging: verify http method exists
rgw/logging: fix/remove/add bucket logging op names
rgw/logging: refactor canonical_name()
rgw/logging: fix canonical names
rgw: RGWPostBucketLoggingOp uses yield context

Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Reviewed-by: Yuval Lifshitz <ylifshit@redhat.com>

Merge PR #66168 into tentacle

* refs/pull/66168/head:
rgw/multisite: check the local bucket's versioning status when replicating deletion from remote

Reviewed-by: Shilpa Jagannath <smanjara@redhat.com>

Merge PR #66300 into tentacle

* refs/pull/66300/head:
rgw/zone: remove duplicated startup logic in RGWSI_Zone

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #68594 into tentacle

* refs/pull/68594/head:
RGW: Change prerequest hook to run after authorization process

Reviewed-by: Yuval Lifshitz <ylifshit@redhat.com>

Merge PR #67923 into tentacle

* refs/pull/67923/head:
rgw/test [tentacle]: add missing ceph and call_ceph shell functions
RGW/test_multi/RGWBucketFullSyncCR: test bucket full sync while source bucket is deleted in the middle
RGW/multisite/RGWListRemoteBucketCR: clear reused bucket_list_result to avoid stale listings
RGW/multisite: bucket_list_result object provides a method to reset its entries
RGW/multisite: add some more debug logs to sync codepath
RGW/test_multi: remove unused import
RGW/test_multi: allow Cluster object to run ceph admin commands
RGW: add delay injection options for integration testing
RGW: make SSTR macro safe against variable name collisions

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #65557 into tentacle

* refs/pull/65557/head:
rgw: discard olh_ attributes when copying object from a versioning-suspended bucket to a versioning-disabled bucket

Reviewed-by: Adam C. Emerson <aemerson@redhat.com>

Merge PR #68709 into tentacle

* refs/pull/68709/head:
osd: Deleting PG should discard pwlc

Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>

Merge PR #68374 into tentacle

* refs/pull/68374/head:
qa: enforce centos9 for test
qa: rename distro
qa/suites/fs/bugs: use centos9 for squid upgrade test
qa: remove unused variables
qa: use centos9 for fs suites using k-testing
qa: update fs suite to rocky10
qa: skip dashboard install due to dependency noise
qa: only setup nat rules during bridge creation
qa: correct wording of comment
qa: use nft instead iptables
qa: use py3 builtin ipaddress module

Reviewed-by: Yaarit Hatuka <yaarithatuka@gmail.com>

Merge PR #68569 into tentacle

* refs/pull/68569/head:
tentacle: only add package tasks if rocky is the final distro
qa/distros: add centos 9 stream back to supported distros
qa/distros: re-install nvme-cli package in rocky tests
qa: allowlist bpf podman denials on Rocky 10
qa/distros: bump rocky to 10.1
qa/distros: add rocky_10 as supported container host
qa/distros: bump rpm_latest.yaml to rocky_10.yaml
qa/distros: rename centos_latest.yaml to rpm_latest.yaml
qa/distros: add rocky_9 and rocky_10

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #68660 into tentacle

* refs/pull/68660/head:
qa/suites/upgrade: allow upgrades to Rocky 10–based containers
qa/suites/upgrade: exclude rocky when Reef is involved
qa/suites/upgrade: update upgrade paths
qa/suites/upgrade: exclude rocky when Squid is involved

Reviewed-by: Nitzan Mordechai <nmordech@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #67465 into tentacle

* refs/pull/67465/head:
mgr: isolated CherryPy to prevent global state sharing
Fix the prometheus module crash

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>

Merge PR #67820 into tentacle

* refs/pull/67820/head:
qa: fixing cephadm mgmt-gateway test to remove openssl dependency

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #67895 into tentacle

* refs/pull/67895/head:
container/build.sh: add 'rocky-10' suffix if necessary

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: David Galloway <dgallowa@redhat.com>

Merge PR #67960 into tentacle

* refs/pull/67960/head:
container/Containerfile: fix crimson package naming
Containerfile: Support rocky/el10 in repo URLs
container: add CUSTOM_CEPH_REPO_URL build argument
container/build.sh: Use dedicated debug tags
container/build.sh: cleanup crimson flavors
container/build.sh: add 'rocky-10' suffix if necessary
container: pass CUSTOM_CEPH_REPO_URL thru container build script

Reviewed-by: Dan Mick <dmick@redhat.com>

Merge PR #67507 into tentacle

* refs/pull/67507/head:
qa/workunits/rados/test_envlibrados_for_rocksdb.sh: Add Rocky support
qa/workunits/ceph-helpers-root: Add Rocky support for install packages

Reviewed-by: Yaarit Hatuka <yaarithatuka@gmail.com>

Merge PR #67512 into tentacle

* refs/pull/67512/head:
neorados: specify alignments for aligned_storage

Reviewed-by: Adam C. Emerson <aemerson@redhat.com>

Merge PR #67515 into tentacle

* refs/pull/67515/head:
pybind/orchestrator/cli: fix OrchestratorError retval sign
orchestrator/test/test_orchestrator: fix return code to negative
mgr/mgr_module: fix tox test missing a type annotation
mgr/selftest: mypy error fix missing a type annotation
mgr/dashboard: use __name__ for module-specific logging
selftest: Add logging self tests
pybind/mgr/mgr_module: isolate logging per mgr module
mgr/Gil.cc: simplify Gil(), ~Gil()
mgr/Gil.cc: do not use PyGILState_Check()
mgr: add mgr_subinterpreter_modules config
python-common/.../service_spec: implement ServiceSpec.__getnewargs__ to allow unpickle to work correctly
mgr: serialize python objects sent between subinterpreters via remote

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #67518 into tentacle

* refs/pull/67518/head:
qa/tasks: update egrep to 'grep -E'

Reviewed-by: Yaarit Hatuka <yaarithatuka@gmail.com>

Merge PR #67519 into tentacle

* refs/pull/67519/head:
qa/suites: remove centos restriction from valgrind yaml

Reviewed-by: Yaarit Hatuka <yaarithatuka@gmail.com>

qa: enforce centos9 for test

Avoids problem where rocky10 packages do not exist for squid.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 815d1d73a7127f1fd460b64d51798f9e8eaf600b)

qa: rename distro

The kernel mount overrides for the distro have no effect if they are
applied before `supported-random-distro`.

Fixes:

    2026-04-13T19:06:13.603 INFO:teuthology.task.pexec:sudo dnf remove nvme-cli -y
    2026-04-13T19:06:13.603 INFO:teuthology.task.pexec:sudo dnf install nvmetcli nvme-cli -y
    2026-04-13T19:06:13.626 INFO:teuthology.task.pexec:Running commands on host ubuntu@trial005.front.sepia.ceph.com
    2026-04-13T19:06:13.627 INFO:teuthology.task.pexec:sudo dnf remove nvme-cli -y
    2026-04-13T19:06:13.627 INFO:teuthology.task.pexec:sudo dnf install nvmetcli nvme-cli -y
    2026-04-13T19:06:13.652 INFO:teuthology.orchestra.run.trial148.stderr:sudo: dnf: command not found
    2026-04-13T19:06:13.653 DEBUG:teuthology.orchestra.run:got remote process result: 1
    2026-04-13T19:06:13.654 ERROR:teuthology.run_tasks:Saw exception from tasks.
    Traceback (most recent call last):
      File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/run_tasks.py", line 105, in run_tasks
        manager = run_one_task(taskname, ctx=ctx, config=config)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/run_tasks.py", line 83, in run_one_task
        return task(**kwargs)
               ^^^^^^^^^^^^^^
      File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/task/pexec.py", line 149, in task
        with parallel() as p:
      File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/parallel.py", line 84, in __exit__
        for result in self:
      File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/parallel.py", line 98, in __next__
        resurrect_traceback(result)
      File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/parallel.py", line 30, in resurrect_traceback
        raise exc.exc_info[1]
      File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/parallel.py", line 23, in capture_traceback
        return func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
      File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/task/pexec.py", line 62, in _exec_host
        tor.wait([r])
      File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/orchestra/run.py", line 485, in wait
        proc.wait()
      File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/orchestra/run.py", line 161, in wait
        self._raise_for_status()
      File "/home/teuthworker/src/git.ceph.com_teuthology_426ec63bc4a39bba882efb593125294667afc593/teuthology/orchestra/run.py", line 181, in _raise_for_status
        raise CommandFailedError(
    teuthology.exceptions.CommandFailedError: Command failed on trial148 with status 1: 'TESTDIR=/home/ubuntu/cephtest bash -s'

which was done because these dnf commands were pulled from rocky10.yaml from the kclient overrides but ubuntu_latest was used for the random distro.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 9f2cac88d9638c5b49d5c766483e9194417e6adb)

Conflicts:
qa/suites/fs/32bits/0-distro
qa/suites/fs/full/0-distro
qa/suites/fs/functional/0-distro
qa/suites/fs/libcephfs/0-distro
qa/suites/fs/mirror-ha/0-distro
qa/suites/fs/mirror/0-distro
qa/suites/fs/multifs/0-distro
qa/suites/fs/nfs/0-distro
qa/suites/fs/permission/0-distro
qa/suites/fs/snaps/0-distro
qa/suites/fs/thrash/multifs/0-distro
qa/suites/fs/thrash/workloads/0-distro
qa/suites/fs/top/0-distro
qa/suites/fs/traceless/0-distro
qa/suites/fs/volumes/0-distro

qa/suites/fs/bugs: use centos9 for squid upgrade test

To avoid missing package error for rocky10

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 1e31a6deaf47cd6ac692012c2acbb1ac0139490a)

Conflicts:
qa/suites/upgrade/reef-x/centos_9.stream.yaml: ???
qa/suites/fs/bugs/multifs_mdsauthcaps/centos_9.stream.yaml: use existing distro

qa: remove unused variables

To make tox-qa happy.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit d8cc2ea40c428ef1734b27f3006840797539031c)

qa: use centos9 for fs suites using k-testing

A better approach would be to include centos9 OR rocky10 for
distribution choice. Then we can just filter out rocky10 when we're
testing the `testing` kernel but keep rocky10 coverage for other
testing.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 5b2f5ee919476c30099b03abb9ae8871ba66a653)

qa: update fs suite to rocky10

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 6bf6fcb2b193bdd0c9c2077d3fd1c7296853eed3)

Conflicts:
qa/suites/fs/workload/0-centos_9.stream.yaml: link change

qa: skip dashboard install due to dependency noise

    2025-11-18T19:46:46.226 INFO:teuthology.orchestra.run.smithi008.stdout:/usr/bin/ceph: stderr Error ENOTSUP: Module 'alerts' is not enabled/loaded (required by command 'dashboard set-ssl-certificate'): use `ceph mgr module enable alerts` to enable it

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit b81cca651787f74c12a780102be79ded72fc7da0)

Conflicts:
qa/suites/fs/upgrade/mds_upgrade_sequence/tasks/0-from/reef/reef.yaml: keep reef

Conflicts:
  qa/suites/fs/upgrade/mds_upgrade_sequence/tasks/0-from/reef/v18.2.8.yaml -

   taking image with version and not reef branch name

qa: only setup nat rules during bridge creation

Currently the code recreates these NAT rules for every mount. This only
needs to be done once by the first mount.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 6fa39caea771a9499b83e50d6f6429e6c29983bb)

qa: correct wording of comment

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 83ecfdfe8b9dcb7048f21fda75d808d4189cacd9)

qa: use nft instead iptables

rocky.10 does not support iptables with MASQUERADE targets. (Or maybe it
does with more prodding but it's easier to just switch to nft.)

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 60b31e50217ea05a818ff16eb83e02cae2548bf3)

qa: use py3 builtin ipaddress module

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 5c511835c3c1750cf9b3404ea262b97359a91d17)

tentacle: only add package tasks if rocky is the final distro

The `fs` suite sometimes overrides the distro for testing the kclient.
The dnf commands fail on Ubuntu.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

mgr/dashboard: remove sync_from entry when sync_from_all is true

Fixes: https://tracker.ceph.com/issues/76163
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 513686eaf76d1cf2f70c8e828003dd242f39c729)

Conflicts:
src/pybind/mgr/dashboard/services/rgw_client.py

pybind/orchestrator/cli: fix OrchestratorError retval sign

OrchestratorError stores errno as abs(), so e.errno is always positive.
Returning retval=e.errno (+22) caused the ceph CLI to exit 0 since it
only propagates the exit code when ret < 0.

Fix by returning retval=-e.errno.

Fixes: https://tracker.ceph.com/issues/75282
Signed-off-by: Nitzan Mordhai <nmordech@redhat.com>
(cherry picked from commit 18546fa92a04f81ba0d85b57554e0291285fe02f)

orchestrator/test/test_orchestrator: fix return code to negative

After changes of PR #67652

Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit dede88077033917022b86fc6170a64a02530f728)

mgr/mgr_module: fix tox test missing a type annotation

After changes of PR #67327

Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit e2826197531cb0e4c48b79cafaa11098b16456f6)

mgr/selftest: mypy error fix missing a type annotation

After changes of PR #67327

Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit 3bfe66a2362d2ea5c6290e83d3e1526c4418c09f)

mgr/dashboard: use __name__ for module-specific logging

Previously, using a hard-coded logger name like 'rgw_client' created
a top-level logger that bypassed the 'mgr.dashboard' hierarchy.
By switching to __name__, we ensure the logger identity follows the
package structure (e.g., 'mgr.dashboard.services.rgw_client').

Since propagate=True is enabled, this allows log records to flow
upward through the 'mgr' parent loggers, ensuring they are correctly
captured, formatted, and attributed to the dashboard module rather than
falling back to the root logger.

Fixes: https://tracker.ceph.com/issues/74848
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit 8c0f0d568f55b655f2a2d330ff79279196d972f1)

selftest: Add logging self tests

Fixes: https://tracker.ceph.com/issues/74848
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit 9369434ba5a60c30f12a98bd6e5508edb23df24f)

pybind/mgr/mgr_module: isolate logging per mgr module

After PR #66244, all mgr modules run inside the same Python interpreter.
That means they also share the same logging subsystem.
Previously, each module attached its handlers to the root logger. In practice,
whichever module initialized logging last effectively “owned” the root logger,
and log messages from other modules could end up attributed incorrectly.

This change scopes logging per module. Each module now registers its handlers
on a dedicated logger named after the module itself, with propagate=False to avoid
leaking messages into the root logger.

Now, the getLogger() default (no args) returns the module's named logger
rather than the root logger. This ensures self.log routes correctly.

Fixes: https://tracker.ceph.com/issues/74848
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit a89e1760566138fbfc8159d5ab3c01a61f8ae415)

mgr/Gil.cc: simplify Gil(), ~Gil()

Instead of restoring the passed ts and then swapping to
a fresh one, restore the fresh one in the first place.

Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit fb26bcd9865b336a043453c9c50aa935f68fdf90)

mgr/Gil.cc: do not use PyGILState_Check()

See comment for explanation.

Fixes: https://tracker.ceph.com/issues/74220
Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 19a9981e8f64c7808b0f0d29d1397e97300174ed)

mgr: add mgr_subinterpreter_modules config

This commit adds a mgr_subinterpreter_modules config to cause specified
modules (or all if * is specified) to be loaded in individual
subinterpreters.

This changes the default behavior of ceph-mgr from running each module
in a distinct subinterpreter to running them all in the same main
interpreter.  We can reintroduce subinterpreter support over time by
adding modules to the list as we test them.

Fixes: https://tracker.ceph.com/issues/73857
Fixes: https://tracker.ceph.com/issues/73859
Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 239b0dc8a9c42449ee1faa1bf78bdcc380345ae2)

Conflicts:
  - src/mgr/PyModule.cc
     #include "common/JSONFormatter.h" - removed (missing commit 3ab70dd in tentacle), not in tentacle
     dtor - still missing commit 3366ef5 on tentacle causing conflicts,
      taking tentacle changes with use_main_interpreter and end the interpater pMyThreadState.ts