git-server-git.apps.pok.os.sepia.ceph.com Git

qa/workunits/rbd: fix unbound variable in status()

It was missed in commit 5fe64fa806f3 ("qa: rbd_mirror.sh: change
parameters to cluster rather than daemon name").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1a280b9a320d51bdc4cb80be9bdd6ae265151132)

qa/workunits/rbd: short-circuit status() if "ceph -s" fails

In mirror-thrash tests, status() can be invoked after one of the
clusters is effectively stopped due to a watchdog bark:

2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons
...
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ status
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ local cluster daemon image_pool image_ns image
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ for cluster in ${CLUSTER1} ${CLUSTER2}

In this scenario all commands that are invoked from the loop body
are going to time out anyway.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 82717e43a08a1262987f5e271fd72d4433c4fb3b)

qa: rbd_mirror_fsx_compare.sh doesn't error out as expected

In mirror-thrash tests, one of the clusters can be effectively stopped
due to a watchdog bark while rbd_mirror_fsx_compare.sh is running and is
in the middle of the "wait for all images" loop:

2026-03-01T12:55:35.059 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ retrying_seconds=1040
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ '[' 1040 -le 7200 ']'
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:++ rbd --cluster cluster2 --pool mirror ls
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:++ wc -l
2026-03-01T12:55:35.084 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ '[' 290 -ge 292 ']'
2026-03-01T12:55:35.084 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ sleep 10
...
2026-03-01T12:55:49.568 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T12:55:49.568 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons

In this scenario "rbd ls" is going to time out repeatedly, turning the
loop into up to a ~60-hour sleep (up to 720 iterations with a 5-minute
timeout + 10-second sleep per iteration).

Fixes: https://tracker.ceph.com/issues/75239
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 81a5906f0d1cc844bb4ef16aae9ace3e7d371ac2)

Merge PR #66829 into squid

* refs/pull/66829/head:
monitoring: fix CephPgImbalance alert rule expression

Reviewed-by: Afreen Misbah <afreen@ibm.com>

Merge PR #66897 into squid

* refs/pull/66897/head:
common: drop stack singleton object of temp messenger for foreground ceph daemons

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

Merge PR #67066 into squid

* refs/pull/67066/head:
qa: Disable OSD benchmark from running for tests.

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

Merge PR #67278 into squid

* refs/pull/67278/head:
librbd: introduce RBD_LOCK_MODE_EXCLUSIVE_TRANSIENT
librbd: prepare lock_acquire() for changing between policies
librbd: fix RequestLockPayload log message in ImageWatcher
librbd: amend error message in lock_acquire()

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge PR #67280 into squid

* refs/pull/67280/head:
qa/valgrind.supp: make gcm_cipher_internal suppression more resilient

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

Merge PR #67356 into squid

* refs/pull/67356/head:
osd/PrimaryLogPG: encode an empty data_bl for empty sparse reads

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

Merge PR #67454 into squid

* refs/pull/67454/head:
qa: krbd_rxbounce.sh: do more reads to generate more errors

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge PR #66985 into squid

* refs/pull/66985/head:
monitoring: make cluster matcher backward compatible for pre-7.1 metrics

Reviewed-by: Afreen Misbah <afreen@ibm.com>

Merge PR #67761 into squid

* refs/pull/67761/head:
qa: whitelist slow requests progress.yaml
qa: make test_progress atomically capture OSD marked in/out events

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Shraddha Agrawal <shraddhaag@ibm.com>

qa: whitelist slow requests progress.yaml

The reason we had a slow-requests is because during the test, 16 concurrent 4 MB writes were running while recovery and backfill were disabled. At the same time, osd.0 was marked out and then back in, causing PG remapping. Because recovery/backfill was disabled, some PGs could not restore their replicas after the remap, leaving them in degraded/remapped states. As a result, a batch of writes remained stuck in the replicated write path, leading to IO stall and slow ops being reported. Solution is to ignore this as we are testing the progress module, not the write paths of OSDs. We intentionally disable backfill and recovery in order to prevent the recovery event to finish quickly. We wanted to prolong it until the progress event pops up.

Fixes: https://tracker.ceph.com/issues/70320
Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 6b0c943c8bd004665529c5c5786ecec42bcc9ff7)

qa: make test_progress atomically capture OSD marked in/out events

Problem:
Test had a race condition where events could complete and disappear
between checking the event count and fetching the event, causing
test failures.

Solution:
Refactor to atomically capture events during the wait condition check.
Added helper methods _wait_for_osd_marked_out_event() and
_wait_for_osd_marked_in_event() that capture events at the moment
they're detected, eliminating the race window.

Fixes: https://tracker.ceph.com/issues/70320
Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 0ef66f6f2e1881061ecb49e457bb2b9061c0260b)

Merge PR #66480 into squid

* refs/pull/66480/head:
mgr/dashboard: service creation fails if service name is same as service type

Reviewed-by: Afreen Misbah <afreen@ibm.com>

Merge PR #67582 into squid

* refs/pull/67582/head:
librbd/mirror: detect trashed snapshots in UnlinkPeerRequest

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge PR #67580 into squid

* refs/pull/67580/head:
librbd: don't complete ImageUpdateWatchers::shut_down() prematurely

Reviewed-by: Ramana Raja <rraja@redhat.com>

mgr/dashboard: service creation fails if service name is same as service type

Fixes: https://tracker.ceph.com/issues/73948
Signed-off-by: Naman Munet <naman.munet@ibm.com>
(cherry picked from commit 57d081d6b5efcbeac6c60e73d50aa5f1f8cab560)

librbd/mirror: detect trashed snapshots in UnlinkPeerRequest

If two instances of UnlinkPeerRequest race with each other (e.g. due
to rbd-mirror daemon unlinking from a previous mirror snapshot and the
user taking another mirror snapshot at same time), the snapshot that
UnlinkPeerRequest was created for may be in the process of being removed
(which may mean trashed by SnapshotRemoveRequest::trash_snap()) or fully
removed by the time unlink_peer() grabs the image lock. Because trashed
snapshots weren't handled explicitly, UnlinkPeerRequest could spuriously
fail with EINVAL ("not mirror snapshot" case) instead of the expected
ENOENT ("missing snapshot" case). This in turn could lead to spurious
ImageReplayer failures with it stopping prematurely.

Fixes: https://tracker.ceph.com/issues/68279
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3596ca077097a4e0ff8e8d05a410c2044332391e)

librbd: don't complete ImageUpdateWatchers::shut_down() prematurely

ImageUpdateWatchers::flush() requests aren't tracked with
m_in_flight-like mechanism the way ImageUpdateWatchers::send_notify()
requests are, but in both cases callbacks that represent delayed work
that is very likely to (indirectly) reference ImageCtx are involved.
When the image is getting closed, ImageUpdateWatchers::shut_down() is
called before anything that belongs to ImageCtx is destroyed. However,
the shutdown can complete prematurely in the face of a pending flush if
one gets sent shortly before CloseRequest is invoked. The callback for
that flush will then race with CloseRequest and may execute after parts
of or even the entire ImageCtx is destroyed, leading to use-after-free
and various segfaults.

Fixes: https://tracker.ceph.com/issues/75161
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3ea6ee62aa339d1ad9976fdcc6e207a505f9bf44)

Merge PR #67558 into squid

* refs/pull/67558/head:
test: disable known flaky tests in run-rbd-unit-tests

Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

test: disable known flaky tests in run-rbd-unit-tests

The failures seem to be more frequent on newer hardware. In the
absence of immediate fixes, disable a few tests that have been known to
be flaky for a long time to avoid disrupting "make check" runs.

Fixes: https://tracker.ceph.com/issues/75163
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ecb616681b5347676198b33c80de60742bac7b69)

Merge PR #67501 into squid

* refs/pull/67501/head:
doc: Remove sphinxcontrib-seqdiag Python package from RTD builds

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>

doc: Remove sphinxcontrib-seqdiag Python package from RTD builds

This is a proactive PR to avoid breaking docs builds when Setuptools 81
starts to be used in the RTD builds process.

The sphnixcontrib-seqdiag Python package is not compatible with
Setuptools 81 or later due to use of pkg_resources:
https://setuptools.pypa.io/en/latest/pkg_resources.html

Setuptools 81 release should be imminent, with the Python deprecation
warning stating pkg_resources "removal as early as 2025-11-30".

Seqdiag seems to be unmaintained with the latest update at Pypi in
the year 2021 and also no updates to the seqdiag git repo.

There are no seqdiag directives left in the docs after last seqdiags
were removed in PR #52308.

Two other options would exist for fixing the situation (see PR for
discussion) but this seems to be the suitable one.

Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
(cherry picked from commit 15481e509b4d644d0644188501d86a4ceda2c039)

Conflicts:
doc/conf.py: formatting difference

Merge PR #67497 into squid

* refs/pull/67497/head:
squid: qa: add missing .qa links

Reviewed-by: Yuri Weinstein <yweins@redhat.com>

squid: qa: add missing .qa links

So that "Check for missing .qa links" passes.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

qa: krbd_rxbounce.sh: do more reads to generate more errors

On faster hardware having each thread do 1024 reads isn't always
sufficient for the "two orders of magnitude" threshold that is used in
the test.

Fixes: https://tracker.ceph.com/issues/74712
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 7f85b9ddb54cb3d83311daa7a72be27731be2806)

Merge PR #64815 into squid

* refs/pull/64815/head:
The compilation of ISAL compress in the current code depends on the macro HAVE_NASM_X64_AVX2. However, the macro HAVE_NASM_X64_AVX2 has been removed, resulting in the compression not using ISAL even if the compressor_zlib_isal parameter is set to true.

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

osd/PrimaryLogPG: encode an empty data_bl for empty sparse reads

Commit 0cf383da0741 ("ReplicatedPG: clamp SPARSE_READ to object size
for ec pool") didn't handle the case of a sparse read that ends up
being empty correctly: the OSD encodes only an empty extent map whereas
clients (both userspace and kernel) also expect to see an empty data
buffer. IOW the reply contains one 32-bit zero instead of the expected
two.

Fixes: https://tracker.ceph.com/issues/74394
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit c694c35bbfce6e3033b34fe6994b40b00fad11d9)

qa/valgrind.supp: make gcm_cipher_internal suppression more resilient

gcm_cipher_internal() and ossl_gcm_stream_final() make it to the stack
trace only on CentOS Stream 9.  On Ubuntu 22.04 and Rocky 10, it looks
as follows:

Thread 4 msgr-worker-1:
Conditional jump or move depends on uninitialised value(s)
   at 0x70A36D4: ??? (in /usr/lib64/libcrypto.so.3.2.2)
   by 0x70A39A1: ??? (in /usr/lib64/libcrypto.so.3.2.2)
   by 0x6F8A09C: EVP_DecryptFinal_ex (in /usr/lib64/libcrypto.so.3.2.2)
   by 0xB498C1F: ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v15_2_0::list&) (crypto_onwire.cc:271)
   by 0xB4992D7: ceph::msgr::v2::FrameAssembler::disassemble_preamble(ceph::buffer::v15_2_0::list&) (frames_v2.cc:281)
   by 0xB482D98: ProtocolV2::handle_read_frame_preamble_main(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int) (ProtocolV2.cc:1149)
   by 0xB475318: ProtocolV2::run_continuation(Ct<ProtocolV2>&) (ProtocolV2.cc:54)
   by 0xB457012: AsyncConnection::process() (AsyncConnection.cc:495)
   by 0xB49E61A: EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*) (Event.cc:492)
   by 0xB49EA9D: UnknownInlinedFun (Stack.cc:50)
   by 0xB49EA9D: UnknownInlinedFun (invoke.h:61)
   by 0xB49EA9D: UnknownInlinedFun (invoke.h:111)
   by 0xB49EA9D: std::_Function_handler<void (), NetworkStack::add_thread(Worker*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (std_function.h:290)
   by 0xBB11063: ??? (in /usr/lib64/libstdc++.so.6.0.33)
   by 0x4F17119: start_thread (in /usr/lib64/libc.so.6)

The proposal to amend the existing suppression so that it's tied to the
specific callsite rather than libcrypto internals [1] received a thumbs
up from Radoslaw.

[1] https://github.com/ceph/ceph/pull/61689#issuecomment-2650179891

Fixes: https://tracker.ceph.com/issues/74672
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit d85265eab4f9344f0d08330f737ce9d7be32d716)

librbd: introduce RBD_LOCK_MODE_EXCLUSIVE_TRANSIENT

The existing StandardPolicy that exposed as RBD_LOCK_MODE_EXCLUSIVE
argument to rbd_lock_acquire() disables automatic exclusive lock
transitions with "permanent" semantics: any request to release the lock
causes the peer to error out immediately.  Such a lock owner can
perform maintenance operations that are proxied from other peers, but
any write-like I/O issued by other peers will fail with EROFS.

This isn't suitable for use cases where one of the peers wants to
manage exclusive lock manually (i.e. rbd_lock_acquire() is used) but
the lock is acquired only for very short periods of time.  The rest of
the time the lock is expected to be held by other peers that stay in
the default "auto" mode (AutomaticPolicy) and run as usual, completely
unconcerned with each other or the manual-mode peer.  However, these
peers get acutely aware of the manual-mode peer because when it grabs
the lock with RBD_LOCK_MODE_EXCLUSIVE their I/O gets disrupted: higher
layers translate EROFS into generic EIO, filesystems shut down, etc.

Add a new TransientPolicy exposed as RBD_LOCK_MODE_EXCLUSIVE_TRANSIENT
to allow disabling automatic exclusive lock transitions with semantics
that would cause the other peers to block waiting for the lock to be
released by the manual-mode peer.  This is intended to be a low-level
interface -- no attempt to safeguard against potential misuse causing
e.g. indefinite blocking is made.

It's possible to switch between RBD_LOCK_MODE_EXCLUSIVE and
RBD_LOCK_MODE_EXCLUSIVE_TRANSIENT modes of operation both while the
lock is held and after it's released.

Fixes: https://tracker.ceph.com/issues/73824
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 8740544c51781211b82ac45d4bc93b9eb9623e76)

librbd: prepare lock_acquire() for changing between policies

In preparation for adding a new TransientPolicy, get rid of the check
implemented in terms of exclusive_lock::Policy::may_auto_request_lock()
that essentially makes it so that exclusive lock policy on a given
image handle can be changed from the default AutomaticPolicy only once.
In order to effect another change a new image handle would have been
needed which is pretty suboptimal.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1f9396ff8208accdf334c088745d2734225b34c1)

librbd: fix RequestLockPayload log message in ImageWatcher

exclusive_lock::Policy::lock_requested() isn't guaranteed to queue
the release of exclusive lock (and in fact only one of the two existing
implementations does that). Instead of talking about the lock, log the
response to the notification.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ff89abf5ddcca91c34cfd46d288d41fe93ec38b0)

librbd: amend error message in lock_acquire()

... since it went stale with commit 2914eef50d69 ("rbd: Changed
exclusive-lock implementation to use the new managed-lock"). In the
context of exclusive lock, requesting the lock refers to a specific
action which may or may not be performed as part of acquiring the lock
and lock_acquire() doesn't get visibility into that.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 9159a2dd032bf47a030260660ff64b5be2d72648)

Merge pull request #66538 from Jayaprakash-ibm/wip-fix-cot-bz2404644-squid

squid: tools: handle get-attr as read-only ops in ceph_objectstore_tool

Merge pull request #66296 from idryomov/wip-69492-squid

squid: rbd-mirror: add cluster fsid to remote meta cache key

Reviewed-by: Mykola Golub <mykola.golub@clyso.com>

Merge pull request #67162 from idryomov/wip-74676-squid

squid: qa/tasks/rbd_mirror_thrash: don't use random.randrange() on floats

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge pull request #67153 from idryomov/wip-74669-squid

squid: qa/workunits/rbd: reduce randomized sleeps in live import tests

Reviewed-by: Miki Patel <miki.patel132@gmail.com>

Merge pull request #67151 from idryomov/wip-74601-squid

squid: qa/workunits/rbd: adapt rbd_mirror.sh for trial nodes

Reviewed-by: Miki Patel <miki.patel132@gmail.com>

Merge pull request #66627 from idryomov/wip-74168-squid

squid: librbd: fix ExclusiveLock::accept_request() when !is_state_locked()

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge pull request #64621 from idryomov/wip-71961-squid

squid: librbd: images aren't closed in group_snap_*_by_record() on error

Reviewed-by: Miki Patel <miki.patel132@gmail.com>

Merge pull request #59647 from mchangir/wip-67940-squid

squid: mgr/snap_schedule: correctly fetch mds_max_snaps_per_dir from mds

Merge pull request #65289 from dparmar18/wip-72504-squid

squid: client: fix unmount hang after lookups

Merge pull request #67185 from kshtsk/wip-74593-squid

squid: qa/workunits/rgw: drop netstat usage

qa/workunits/rgw: drop netstat usage

The `netstat` is deprecated now in modern Linux and usually
requires an extra package dependency to be installed.
Usually it is `net-tools`, however, for example, opensuse,
`netstat` does not present in it. Thus, let us use `ss` as
an alternative.

When using `netstat -nltp` we get lines like:
'tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 25156/valgrind.bin \ntcp6 0 0 :::443 :::* LISTEN 25156/valgrind.bin \n'
When using `ss -nltp` we get lines like:
'LISTEN 0 4096 0.0.0.0:443 0.0.0.0:* users:(("memcheck-amd64-",pid=66045,fd=72))'
so we need to filter processes by `memcheck`. However further
parsing code works equivalently as for netstat.

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@clyso.com>
(cherry picked from commit 82063f99024a8937dfa105e0828beda1bc730247)

Merge pull request #60567 from k0ste/wip-68781-squid

squid: osd: add clear_shards_repaired command

Reviewed-by: Ronen Friedman <rfriedma@redhat.com>

qa/tasks/rbd_mirror_thrash: don't use random.randrange() on floats

This stopped working in Python 3.12:

  Changed in version 3.12: Automatic conversion of non-integer types
  is no longer supported. Calls such as randrange(10.0) and
  randrange(Fraction(10, 1)) now raise a TypeError.

Fixes: https://tracker.ceph.com/issues/74676
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit d663359fae135b2337e0ffbb86256768f61088c7)

qa/workunits/rbd: reduce randomized sleeps in live import tests

These tests were tuned for slower hardware than what we have now.
Currently "rbd migration execute" always finishes (successfully) before
the NBD server is killed.

Fixes: https://tracker.ceph.com/issues/74669
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 592e3a9846b130c7321481f8b2bf9dba2fb05195)

Conflicts:
qa/workunits/rbd/cli_migration.sh [ commit afc89fdde80f
  ("qa/workunits/rbd: add test_import_nbd_stream_disconnected()")
  was originally skipped due to NBD stream not being in squid at
  the time ]

qa/workunits/rbd: drop randomized sleeps in "big image" tests

These tests were tuned for slower hardware than what we have now.
Even without these the image is often 25-30% synced by the time the
test gets to the "non-primary snapshot in question is still being
synced" assert.

Fixes: https://tracker.ceph.com/issues/74601
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ec868d5ca2e56bd6003b002eb5f15d575edabd4e)

qa/workunits/rbd: avoid unnecessary sleeping in stop_mirror()

There is no need to wait for anything if -KILL is passed for sig
because the process would disappear immediately. In teuthology runs
where multiple rbd-mirror daemons are deployed (and therefore need to
be stopped when stop_mirrors() is called by the test), it causes
gratuitous delays of 4+ seconds.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit f18fe5d15f72e34ab74b8ae187a47b52883fc780)

Merge pull request #67074 from idryomov/wip-74513-squid

squid: qa: krbd_blkroset.t: eliminate a race in the open_count test

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge pull request #67076 from idryomov/wip-74529-squid

squid: qa: don't assume that /dev/sda or /dev/vda is present in unmap.t

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge pull request #66518 from aclamk/aclamk-ifed-fix-70390-squid

squid: os/bluestore: compact patch to fix extent map resharding

qa: don't assume that /dev/sda or /dev/vda is present in unmap.t

Instead of hard-coding the block device name, use the block device that
is backing the filesystem that the test is running on. We can be quite
sure it won't be an RBD device ;)

Fixes: https://tracker.ceph.com/issues/74529
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 2b5f0f4e7396114f9944a4987c38e18d4ecfbb1f)

qa: krbd_blkroset.t: eliminate a race in the open_count test

Even at QD=1, dd may take less than 10 seconds to work its way to the
end of a 10M image, producing "No space left on device" error instead
of the expected "Operation not permitted" error which is supposed to
arise from the device getting marked read-only while opened.

Fixes: https://tracker.ceph.com/issues/74513
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 006e47e9ca691deb377fb76f7a23b6feec874865)

qa: Disable OSD benchmark from running for tests.

Disable OSD bench from benchmarking the OSDs for teuthology tests. This is to
help prevent a cluster warning pertaining to the IOPS value not lying within
a typical threshold range from being raised.

The tests can rely on the built-in static values as defined by
osd_mclock_max_capacity_iops_[ssd|hdd] which should be good enough.

Fixes: https://tracker.ceph.com/issues/74501
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit cf04790bede6be13cc27dcdf50fe19a953860321)

Conflicts:
qa/config/rados.yaml
- Removed a non-existing option under mon overrides

Merge pull request #66389 from Naveenaidu/wip-70082-squid

squid: mgr/telemetry: add stretch cluster data

Reviewed-by: Yaarit Hatuka <yhatuka@ibm.com>

monitoring: make cluster matcher backward compatible for pre-7.1 metrics

Ceph 18.* adds a `cluster` label to all Prometheus metrics. When
upgrading from earlier releases, historical metrics lack this label
and are excluded by Grafana queries that strictly match on `cluster`.
Update the shared Grafana matcher logic to use a regex matcher that
also matches series without the `cluster` label, restoring visibility
of pre-upgrade metrics while preserving multi-cluster behavior.

Fixes: https://tracker.ceph.com/issues/74342
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit f7f74e455423feab79b33ac8ab472be0b98cb29d)

Conflicts:
monitoring/ceph-mixin/dashboards_out/ceph-application-overview.json (file not in squid)
monitoring/ceph-mixin/dashboards_out/ceph-cluster-advanced.json
(conflicts with $rate_interval in main)
monitoring/ceph-mixin/dashboards_out/ceph-cluster.json (missing
cluster label in metrics)
monitoring/ceph-mixin/dashboards_out/cephfsdashboard.json (file
not in squid)
monitoring/ceph-mixin/dashboards_out/multi-cluster-overview.json
(file not in squid)

Merge pull request #66970 from bluikko/wip-doc-2026-01-19-fix-63073-to-squid

squid: doc/cephadm: remove sections that do not not apply to Squid in rgw.rst

doc/cephadm: remove sections not apply to Squid in rgw.rst

4949311 backported changes that do not apply to Squid.
PR #63073 body and the commit referenced therein as cherry-pick do not
correspond to the diff. Remove the additions that do not apply to Squid:

- Wildcard SAN feature in 3c24753 only since Tentacle.
- Shutdown delay feature in b84bb72 only since Tentacle.

The third feature doc addition is valid, d620ba6 was backported to Squid
in PR #61350 for disable multisite sync traffic, commit 59b3f28. This
backport cherry-picked only the feature addition and missed the docs
commit 8878619. Leave this section in.

Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>

Merge pull request #65363 from ifed01/wip-ifed-fix-snapdiff-fragment-squi

squid: mds: fix snapdiff result fragmentation

Merge pull request #65602 from kotreshhr/wip-73130-squid

squid: cephfs-journal-tool: Journal trimming issue

Merge pull request #65267 from batrick/wip-72277-squid

squid: mds: include auth credential in session dump

Merge pull request #64886 from vshankar/wip-72390-squid

squid: mds/MDSDaemon: unlock `mds_lock` while shutting down Beacon and others

Merge pull request #60398 from rishabh-d-dave/wip-68621-squid

squid: mon,cephfs: require confirmation when changing max_mds on unhealthy cluster

common: drop stack singleton object of temp messenger for foreground ceph daemons

During the initialization, OSD needs to create a temporary messenger to read config db from the ceph-mon.
This temporary messenger will need to create async messenger threads according to the local/default value of "ms_async_op_threads" .
If this option is not specified in ceph.conf, it will by default create 3 threads, then use these threads to read config db from ceph-mon.
Those threads are associated to a stack singletion object.
Now here is the difference between OSD running in foreground and background:
a. In background mode, this singleton object will be dropped before forking the child process in function notify_pre_fork,
then the new ms_async_op_threads fetched from mon config db will be used to create later messenger threads, this is what we want.
b. In foreground mode, this singleton object will not be dropped and will be reused by all later messengers,
thus the number of threads doesn't change.

Fixes: https://tracker.ceph.com/issues/71401
Signed-off-by: dongdong tao <dongdong.tao@canonical.com>
(cherry picked from commit 30d66ff075ca72c0c3759bfccee09302b221b25f)

monitoring: fix CephPgImbalance alert rule expression

The alert CephPGImbalance doesn't take any device classes configured into account. As a result, there can be false positives when using mixed-size OSD disks.
Ref: https://github.com/rook/rook/discussions/13126#discussioncomment-10043490

Fixes: https://tracker.ceph.com/issues/69690
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 5b4f7373655fa829af359d6e3cc61416964a97f0)

Conflicts:
monitoring/ceph-mixin/prometheus_alerts.yml (remove cluster
label from alert since its not there in squid)
monitoring/ceph-mixin/tests_alerts/test_alerts.yml (remove
cluster label from the alert since its not there in squid)

Merge pull request #65785 from NitzanMordhai/wip-71315-squid

squid: memory lock issues causing hangs during connection shutdown

Merge pull request #66797 from bluikko/wip-doc-revert-64033-from-squid

squid: Revert "doc: mgr/dashboard: add OAuth2 SSO documentation"

doc: Revert "doc: mgr/dashboard: add OAuth2 SSO documentation"

This reverts commit 2af5800f5a20ecc1fd592e024a8d03806ab67f89.

The dashboard OAuth2.0 feature was released in Tentacle.

Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>

Merge pull request #66739 from tchaikov/squid-backport-pr-66732

squid: debian/control: add iproute2 to build dependencies

Merge pull request #66707 from tchaikov/squid-backport-pr-66700

squid: mgr/dashboard: update teuth_ref hash in api test

debian/control: add iproute2 to build dependencies

Test scripts like qa/tasks/cephfs/mount.py expect the ip command to be
available in the container environment. Without it, tests fail with:

```
  /bin/bash: line 1: ip: command not found

  File "/ceph/qa/tasks/cephfs/mount.py", line 96, in cleanup_stale_netnses_and_bridge
    p = remote.run(args=['ip', 'netns', 'list'],
  ...
  teuthology.exceptions.CommandFailedError: Command failed with status 127: 'ip netns list'
```

Add iproute2 to the debian package build dependencies when the
<pkg.ceph.check> build profile is enabled. This ensures the package is
available during container-based builds, since buildcontainer-setup.sh
→ script/run-make.sh → install-deps.sh → debian/control → generated
dependency package chain respects build profiles configured via
`FOR_MAKE_CHECK` and `WITH_CRIMSON` environment variables set in
Dockerfile.build.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
(cherry picked from commit 599922aa582bbaa6fa8c8e274b780fabafb10a9b)

mgr/dashboard: update teuth_ref hash in api test

update the hash to the latest commit where Kefu addressed the distutils
error.

Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 36fb920c5e88f7da24d0c7289d7e6bafd8b367d2)

Merge pull request #66668 from ceph/apt-mirror-squid

squid: install-deps: Replace apt-mirror

install-deps: Replace apt-mirror

apt-mirror.front.sepia.ceph.com has happened to always work because we set up CNAMEs to gitbuilder.ceph.com.

That host is making its way to a new home upstate (literally and figuratively) so we'll get rid of the front subdomain since it's publicly accessible anyway and add TLS while we're at it.

Signed-off-by: David Galloway <david.galloway@ibm.com>
(cherry picked from commit 0b0c73ad860b20912c862b5376057153a5adab40)

librbd: fix ExclusiveLock::accept_request() when !is_state_locked()

To accept an async request, two conditions must be met: a) exclusive
lock must be a firm STATE_LOCKED state and b) async requests shouldn't
be blocked or if they are blocked there should be an exception in place
for a given request_type.  If a) is met but b) isn't, ret_val is set
to m_request_blocked_ret_val, as expected -- the reason for denying
the request is that async requests are blocked.  However, if a) isn't
met, ret_val also gets set to m_request_blocked_ret_val.  This is wrong
because the reason for denying the request in this case isn't that
async requests are blocked (they may or may not be) but a much heavier
circumstance of exclusive lock being in a transient state or not held
at all.

In such scenarios, whether async requests are blocked or not isn't
relevant and ExclusiveLock::accept_request() behaving otherwise can
lead to bogus "duplicate lock owners detected" errors getting raised
during an attempt to handle any maintenance operation notification in
ImageWatcher::handle_operation_request().  This error isn't considered
retryable so the entire operation that needed the exclusive lock would
be spuriously failed with EINVAL.

Fixes: https://tracker.ceph.com/issues/74168
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit e4855895a9f14a07fda03fbc736f596b87f92327)

librbd: add ExclusiveLock::accept_request() overload

Make ret_val out parameter required for the existing method and
introduce an overload taking just request_type to compensate.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3327001c8f65aebf9eb08fe5d28c8e344f338d4d)

Merge pull request #66289 from henrichter/wip-73704-squid

squid: rgw: beast add ssl hot-reload

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #62055 from k0ste/wip-68956-squid

squid: mds: session in the importing state cannot be cleared if an export subtree task is interrupted while the state of importer is acking

Merge pull request #64954 from batrick/wip-72515-squid

squid: mds: skip charmap handler check for MDS requests

Merge pull request #65449 from rishabh-d-dave/wip-70174-squid

squid: qa/cephfs: ignore warning that pg is stuck peering for upgrade jobs

Merge pull request #66471 from joscollin/wip-73879-squid

squid: cephfs: fix monclient not subscribed monmap/config

Merge pull request #66472 from joscollin/wip-73872-squid

squid: cephfs: MDCache request cleanup

Merge pull request #66473 from joscollin/wip-73870-squid

squid: client: account for mixed quotas in statfs

Merge pull request #64747 from kshtsk/wip-72330-squid

squid: qa/tasks/ceph_manager: population must be a sequence

tools: handle get-attr as read-only ops in ceph_objectstore_tool

Fixes: https://tracker.ceph.com/issues/73710
Signed-off-by: Jaya Prakash <jayaprakash@ibm.com>
(cherry picked from commit 85e0081a0023ef5dd725bd639f8f716149cfa26b)

os/bluestore: enforce extent split on shard boundary

Partially fixes: https://tracker.ceph.com/issues/70390

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 5beee2ad46cfeb8ffc70d106c1180f531e455e3e)
(cherry picked from commit 0611ed6d8980b8b8839bb0d6c7af07b598fcc089)

Conflicts:
src/os/bluestore/BlueStore.cc

The conflict was not a logical one, more like stemming from
refactor that changed "e"->"extent".

os/bluestore: Fix dirty_range in BlueStore::_do_remove

dirty_range used to have length = 1 byte.
This is good if whole extent is inside shard.
But this has proven not to be the case.
dirty_range(offset, length) is slower only when it crosses shard.

Partially fixes: https://tracker.ceph.com/issues/70390

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit 4f566eaf6c4646e513ea6747c7df17383d8716e2)
(cherry picked from commit d6c61326a125f8bd278ec1c656d673e53edf47cd)
(cherry picked from commit 37248077f4550c85258b98f184193101a02dae0e)

os/bluestore: Fix reshard on spanning blobs

Make sure that spanning blobs are not allowed to have extents crossing
shard boundary.

Partially fixes: https://tracker.ceph.com/issues/70390

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit ce05ade7980397cad48e8fc78bebc839c76ba327)
(cherry picked from commit 0f5e240e49a3a16b611fc80cf6ca06cfd8b1b303)
(cherry picked from commit c081b9eae7a7082dbc86b8d50a7044bc085729c5)

qa/cephfs: ignore warning that pg is stuck peering for upgrade jobs

Health warning "pg .* is stuck peering" is seen while Ceph cluster is
under the upgrade process during fs/upgrade QA job. Being an expected
warning, it should be added to the ignorelist.

And besides this one, we already ignore more severe warnings ("pg is
stuck inactive" and "pg is degrarded") for fs/upgrade jobs.

Fixes: https://tracker.ceph.com/issues/70023
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 9748de76e02254c6dc284dcc20ec5d5761760dcb)

Conflicts:
qa/cephfs/overrides/pg_health.yaml
- Line before the point where the patch was to be applied is different
comapred to main branch.

test: Add statfs test case for mixed quotas

Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit 2b057ec7bb40855e3be3cb0de12b63f8c10b450e)

client: account for mixed quotas in statfs

In statfs, when the quota root for a dir is discovered,
it uses that dir to base values for max_files and max_bytes.

This can be an issue when a dir is found with only one of two potential quota
fields. Take for instance, a dir with only max_files set and parent dir
has only max_bytes set. During a statfs call, it will then use the max_files
value for provided dir, but does not have a value for max_bytes. In this case,
this behavior will cause the size of the filesystem to be displayed.

Instead, find the quota root for max_files and max_bytes separately. This will
allow for mixed quotas to inherit missing values from its parent. In the above
example, max_files from current dir and max_bytes from parent dir will be
displayed.

Fixes: https://tracker.ceph.com/issues/73487
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit dd02ea9b18502b87ce815eba4286ae3516e334b3)

mds: MDCache: check validity of mdr requests before dispatching

Ignore null requests

Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@cern.ch>
(cherry picked from commit 75cd8c074f37de2a492177c54b3ef1879ab87637)

mds: MDCache request cleanup handles potential null mdr

In cases where there is a single element in a batch_op_map,new_batch_head
is a nullptr, when this is retried at Finisher we'd hit one of the asserts when
dereferencing

Fixes: https://tracker.ceph.com/issues/70769
Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@cern.ch>
(cherry picked from commit e63f8cc54d03dbdd147cdd2c301adef119a640da)

cephfs: make sure mon authenticate before objecter start

Signed-off-by: Shaohui Wang <wangshaohui.0512@bytedance.com>
(cherry picked from commit 1de46f335ea21c7369c67a021da79f3c7e929e66)

tests: add a test case for cephfs SingletonClient

In SingletonClient::init(), objecter->start() called before
monc->authenticate(), it makes conns of monc authencated before
monc->authenticate() called if mons reply faster, in this case,
monc will not subsribe monmap/config.

Signed-off-by: Shaohui Wang <wangshaohui.0512@bytedance.com>
(cherry picked from commit 8cce3277edcb819e5e61a67948f35e5c5358379d)

Conflicts:
src/test/client/CMakeLists.txt
- syncio.cc and fscrypt_conf.cc not backported to squid

qa/workunit: update telemetry quincy/reef workunits with "basic_stretch_cluster" collection

Note, this is not a clean cherry pick. The 4dac20e updated the
`test_telemetry_reef_x.sh` and `test_telemetry_squid_x.sh` upgrade
workunits. These upgrade workunits test the upgrade of a cluster from
reef and squid (X-2) releases to the X version of cluster.

Since we are cherry picking the commit to squid (X release), we would
instead have to update the workunit files of quicy and reef i,e the
(X-2) releases.

Signed-off-by: Naveen Naidu <naveen.naidu@ibm.com>
(cherry picked from commit 4dac20e8987e271e4d92a649a6812b655097c6e1)

mgr/telemetry: add stretch_mode information

Stretch Mode information helps us learn how deployments are done
for stretch clusters.

We add a basic_stretch_cluster collection fo the "basic" channel
for this purpose.

Fixes: https://tracker.ceph.com/issues/67812
Signed-off-by: Naveen Naidu <naveen.naidu@ibm.com>
(cherry picked from commit 6472b6b9f94affb96be341c9d595e543d734f30b)

Merge pull request #66357 from k0ste/wip-70542-squid

squid: os/bluestore: Disable invoking unittest_deferred

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>