git-server-git.apps.pok.os.sepia.ceph.com Git

]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log

Yuval Lifshitz [Wed, 4 Mar 2026 14:53:13 +0000 (14:53 +0000)]

test/rgw/kafka: fix kafka relase to more recent one

Fixes: https://tracker.ceph.com/issues/75323
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit dc412a7e519d037acbcac8a92c7ecf2dbde9875a)

commit | commitdiff | tree

Patrick Donnelly [Wed, 25 Mar 2026 00:07:52 +0000 (20:07 -0400)]

Merge PR #66884 into squid

* refs/pull/66884/head:
Squid: mgr/dashboard: Changing placement of a mds to label - creates a new mds-service, mds.label

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Tue, 24 Mar 2026 15:10:51 +0000 (11:10 -0400)]

Merge PR #62454 into squid

* refs/pull/62454/head:
mgr/dashboard: add types for mgr-module list
mgr/dashboard: fix access control permissions for roles

Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Thu, 19 Mar 2026 23:41:52 +0000 (19:41 -0400)]

Merge PR #67796 into squid

* refs/pull/67796/head:
qa/workunits/rbd: fix unbound variable in status()
qa/workunits/rbd: short-circuit status() if "ceph -s" fails
qa: rbd_mirror_fsx_compare.sh doesn't error out as expected

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Thu, 19 Mar 2026 23:41:25 +0000 (19:41 -0400)]

Merge PR #67794 into squid

* refs/pull/67794/head:
qa/tasks: make rbd_mirror_thrash inherit from ThrasherGreenlet

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Thu, 19 Mar 2026 23:40:58 +0000 (19:40 -0400)]

Merge PR #67704 into squid

* refs/pull/67704/head:
librbd/cache/pwl: WriteLogOperationSet::cell can be garbage

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Thu, 19 Mar 2026 15:00:04 +0000 (11:00 -0400)]

Merge PR #66838 into squid

* refs/pull/66838/head:
os/bluestore: rename row names in RocksDBBlueFSVolumeSelector.
test/bluestore: add volume selector tests
os/bluestore:fix bluestore_volume_selection_reserved_factor usage
os/bluestore: print the first RocksDB level which doesn't fit into fast

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>

commit | commitdiff | tree

Dnyaneshwari Talwekar [Mon, 12 Jan 2026 07:53:59 +0000 (13:23 +0530)]

Squid: mgr/dashboard: Changing placement of a mds to label - creates a new mds-service, mds.label

Fixes: https://tracker.ceph.com/issues/74376
Signed-off-by: Dnyaneshwari Talwekar <dtalweka@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:31:29 +0000 (10:31 -0400)]

Merge PR #63344 into squid

* refs/pull/63344/head:
mgr/DaemonServer: fixed mistype for mgr_osd_messages

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:25:33 +0000 (10:25 -0400)]

Merge PR #61417 into squid

* refs/pull/61417/head:
qa/cephfs: update ignorelist

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:23:07 +0000 (10:23 -0400)]

Merge PR #59688 into squid

* refs/pull/59688/head:
qa: some test set `refuse_client_session`, so the cluster log is expected

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:19:00 +0000 (10:19 -0400)]

Merge PR #64686 into squid

* refs/pull/64686/head:
mon/MgrMonitor: add a space before "is already disabled"

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:15:16 +0000 (10:15 -0400)]

Merge PR #65298 into squid

* refs/pull/65298/head:
qa/suites/upgrade: update ignorelist with cephfs specific warnings (under stress-split)
qa/suites/upgrade: add "Replacing daemon mds" to ignorelist

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:11:35 +0000 (10:11 -0400)]

Merge PR #65758 into squid

* refs/pull/65758/head:
.github: pin GH Actions to SHA-1 commit

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:09:30 +0000 (10:09 -0400)]

Merge PR #66126 into squid

* refs/pull/66126/head:
qa: ignore cluster warning (evicting unresponsive ...) with tasks/mgr-osd-full

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Igor Fedotov [Wed, 21 May 2025 08:30:15 +0000 (11:30 +0300)]

os/bluestore: rename row names in RocksDBBlueFSVolumeSelector.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit a9f591f4e1cb1e364879165250c55cb0f841d64f)

commit | commitdiff | tree

Igor Fedotov [Mon, 19 May 2025 19:20:53 +0000 (22:20 +0300)]

test/bluestore: add volume selector tests

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 158d1550a021ed60e5ad1c565b247e5b0b6d5946)

Conflicts:
src/test/objectstore/CMakeLists.txt - allocsim not present in Squid

commit | commitdiff | tree

Igor Fedotov [Mon, 19 May 2025 19:19:45 +0000 (22:19 +0300)]

os/bluestore:fix bluestore_volume_selection_reserved_factor usage

Fixes: https://tracker.ceph.com/issues/71368
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 43d7864093f92977a3fd084bbfd65229244b1cc9)

commit | commitdiff | tree

Igor Fedotov [Tue, 4 Feb 2025 16:45:13 +0000 (19:45 +0300)]

os/bluestore: print the first RocksDB level which doesn't fit into fast
device by default.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit d95aa620b315d9261cb50b0465ecfd2b6b534a60)

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:04:29 +0000 (10:04 -0400)]

Merge PR #66915 into squid

* refs/pull/66915/head:
monc: synchronize tick() of MonClient with shutdown()

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:03:12 +0000 (10:03 -0400)]

Merge PR #66973 into squid

* refs/pull/66973/head:
qa/tasks/thrashosds-health: whitelist PG_BACKFILL_FULL

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 13:59:32 +0000 (09:59 -0400)]

Merge PR #60391 into squid

* refs/pull/60391/head:
qa/cephfs: ignore when specific OSD is reported down during upgrade

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 13:57:44 +0000 (09:57 -0400)]

Merge PR #63026 into squid

* refs/pull/63026/head:
qa/workunits/cephtool: add extra privileges to cephtool script

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Kamoltat Sirivadhna <ksirivad@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 00:37:05 +0000 (20:37 -0400)]

Merge PR #63018 into squid

* refs/pull/63018/head:
qa/workunits/fs/misc: remove data pool cleanup

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 00:35:59 +0000 (20:35 -0400)]

Merge PR #61302 into squid

* refs/pull/61302/head:
qa: do not fail cephfs QA tests for slow bluestore ops

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 00:26:22 +0000 (20:26 -0400)]

Merge PR #67001 into squid

* refs/pull/67001/head:
doc: fetch releases from main branch

Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 00:22:23 +0000 (20:22 -0400)]

Merge PR #67623 into squid

* refs/pull/67623/head:
mgr/orchestrator: make group parameter optional for nvmeof (squid)
pybind/mgr/orchestrator/module.py: NvmeofServiceSpec service_id

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Tue, 17 Mar 2026 14:42:18 +0000 (10:42 -0400)]

Merge PR #66964 into squid

* refs/pull/66964/head:
monitoring: upgrade grafana version to 12.3.1

Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Tue, 17 Mar 2026 14:41:38 +0000 (10:41 -0400)]

Merge PR #66990 into squid

* refs/pull/66990/head:
monitoring: fix rgw_servers filtering in rgw sync overview grafana

Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Ilya Dryomov [Mon, 2 Mar 2026 11:07:48 +0000 (12:07 +0100)]

qa/workunits/rbd: fix unbound variable in status()

It was missed in commit 5fe64fa806f3 ("qa: rbd_mirror.sh: change
parameters to cluster rather than daemon name").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1a280b9a320d51bdc4cb80be9bdd6ae265151132)

commit | commitdiff | tree

Ilya Dryomov [Sun, 1 Mar 2026 21:55:52 +0000 (22:55 +0100)]

qa/workunits/rbd: short-circuit status() if "ceph -s" fails

In mirror-thrash tests, status() can be invoked after one of the
clusters is effectively stopped due to a watchdog bark:

2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons
...
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ status
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ local cluster daemon image_pool image_ns image
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ for cluster in ${CLUSTER1} ${CLUSTER2}

In this scenario all commands that are invoked from the loop body
are going to time out anyway.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 82717e43a08a1262987f5e271fd72d4433c4fb3b)

commit | commitdiff | tree

Ilya Dryomov [Sun, 1 Mar 2026 16:45:51 +0000 (17:45 +0100)]

qa: rbd_mirror_fsx_compare.sh doesn't error out as expected

In mirror-thrash tests, one of the clusters can be effectively stopped
due to a watchdog bark while rbd_mirror_fsx_compare.sh is running and is
in the middle of the "wait for all images" loop:

2026-03-01T12:55:35.059 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ retrying_seconds=1040
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ '[' 1040 -le 7200 ']'
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:++ rbd --cluster cluster2 --pool mirror ls
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:++ wc -l
2026-03-01T12:55:35.084 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ '[' 290 -ge 292 ']'
2026-03-01T12:55:35.084 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ sleep 10
...
2026-03-01T12:55:49.568 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T12:55:49.568 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons

In this scenario "rbd ls" is going to time out repeatedly, turning the
loop into up to a ~60-hour sleep (up to 720 iterations with a 5-minute
timeout + 10-second sleep per iteration).

Fixes: https://tracker.ceph.com/issues/75239
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 81a5906f0d1cc844bb4ef16aae9ace3e7d371ac2)

commit | commitdiff | tree

Ilya Dryomov [Fri, 27 Feb 2026 14:18:27 +0000 (15:18 +0100)]

qa/tasks: make rbd_mirror_thrash inherit from ThrasherGreenlet

Commit 21b4b89e5280 ("qa/tasks: watchdog terminate thrasher") made it
required for a thrasher to have stop_and_join() method, but the
preceding commit a035b5a22fb8 ("thrashers: standardize stop and join
method names") missed to add it to rbd_mirror_thrash (whether as an
ad-hoc implementation or by way of inheriting from ThrasherGreenlet).
Later on, commit 783f0e3a9903 ("qa: Adding a new class for the
daemonwatchdog to monitor") worsened the issue by expanding the use
of stop_and_join() to all watchdog barks rather than just the case of
a thrasher throwing an exception which is something that practically
never happens.

Fixes: https://tracker.ceph.com/issues/75200
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3ebe3a0a43251b0f126497d4100bd1af9ca8afc5)

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 16:02:49 +0000 (21:32 +0530)]

Merge PR #66829 into squid

* refs/pull/66829/head:
monitoring: fix CephPgImbalance alert rule expression

Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 15:36:24 +0000 (21:06 +0530)]

Merge PR #66897 into squid

* refs/pull/66897/head:
common: drop stack singleton object of temp messenger for foreground ceph daemons

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 15:34:48 +0000 (21:04 +0530)]

Merge PR #67066 into squid

* refs/pull/67066/head:
qa: Disable OSD benchmark from running for tests.

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 15:32:44 +0000 (21:02 +0530)]

Merge PR #67278 into squid

* refs/pull/67278/head:
librbd: introduce RBD_LOCK_MODE_EXCLUSIVE_TRANSIENT
librbd: prepare lock_acquire() for changing between policies
librbd: fix RequestLockPayload log message in ImageWatcher
librbd: amend error message in lock_acquire()

Reviewed-by: Ramana Raja <rraja@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 15:28:10 +0000 (20:58 +0530)]

Merge PR #67280 into squid

* refs/pull/67280/head:
qa/valgrind.supp: make gcm_cipher_internal suppression more resilient

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 15:23:54 +0000 (20:53 +0530)]

Merge PR #67356 into squid

* refs/pull/67356/head:
osd/PrimaryLogPG: encode an empty data_bl for empty sparse reads

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 15:22:30 +0000 (20:52 +0530)]

Merge PR #67454 into squid

* refs/pull/67454/head:
qa: krbd_rxbounce.sh: do more reads to generate more errors

Reviewed-by: Ramana Raja <rraja@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 15:18:11 +0000 (20:48 +0530)]

Merge PR #66985 into squid

* refs/pull/66985/head:
monitoring: make cluster matcher backward compatible for pre-7.1 metrics

Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 14:55:41 +0000 (20:25 +0530)]

Merge PR #67761 into squid

* refs/pull/67761/head:
qa: whitelist slow requests progress.yaml
qa: make test_progress atomically capture OSD marked in/out events

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Shraddha Agrawal <shraddhaag@ibm.com>

commit | commitdiff | tree

Kamoltat (Junior) Sirivadhna [Fri, 6 Mar 2026 17:20:18 +0000 (17:20 +0000)]

qa: whitelist slow requests progress.yaml

The reason we had a slow-requests is because during the test, 16 concurrent 4 MB writes were running while recovery and backfill were disabled. At the same time, osd.0 was marked out and then back in, causing PG remapping. Because recovery/backfill was disabled, some PGs could not restore their replicas after the remap, leaving them in degraded/remapped states. As a result, a batch of writes remained stuck in the replicated write path, leading to IO stall and slow ops being reported. Solution is to ignore this as we are testing the progress module, not the write paths of OSDs. We intentionally disable backfill and recovery in order to prevent the recovery event to finish quickly. We wanted to prolong it until the progress event pops up.

Fixes: https://tracker.ceph.com/issues/70320
Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 6b0c943c8bd004665529c5c5786ecec42bcc9ff7)

commit | commitdiff | tree

Kamoltat (Junior) Sirivadhna [Wed, 4 Mar 2026 22:08:49 +0000 (22:08 +0000)]

qa: make test_progress atomically capture OSD marked in/out events

Problem:
Test had a race condition where events could complete and disappear
between checking the event count and fetching the event, causing
test failures.

Solution:
Refactor to atomically capture events during the wait condition check.
Added helper methods _wait_for_osd_marked_out_event() and
_wait_for_osd_marked_in_event() that capture events at the moment
they're detected, eliminating the race window.

Fixes: https://tracker.ceph.com/issues/70320
Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 0ef66f6f2e1881061ecb49e457bb2b9061c0260b)

commit | commitdiff | tree

Patrick Donnelly [Thu, 12 Mar 2026 09:25:59 +0000 (14:55 +0530)]

Merge PR #66480 into squid

* refs/pull/66480/head:
mgr/dashboard: service creation fails if service name is same as service type

Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Tue, 10 Mar 2026 09:33:47 +0000 (15:03 +0530)]

Merge PR #67582 into squid

* refs/pull/67582/head:
librbd/mirror: detect trashed snapshots in UnlinkPeerRequest

Reviewed-by: Ramana Raja <rraja@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Tue, 10 Mar 2026 09:28:36 +0000 (14:58 +0530)]

Merge PR #67580 into squid

* refs/pull/67580/head:
librbd: don't complete ImageUpdateWatchers::shut_down() prematurely

Reviewed-by: Ramana Raja <rraja@redhat.com>

commit | commitdiff | tree

Naman Munet [Fri, 21 Nov 2025 04:41:44 +0000 (10:11 +0530)]

mgr/dashboard: service creation fails if service name is same as service type

Fixes: https://tracker.ceph.com/issues/73948
Signed-off-by: Naman Munet <naman.munet@ibm.com>
(cherry picked from commit 57d081d6b5efcbeac6c60e73d50aa5f1f8cab560)

commit | commitdiff | tree

Kefu Chai [Tue, 3 Mar 2026 04:51:32 +0000 (12:51 +0800)]

mgr/orchestrator: make group parameter optional for nvmeof (squid)

Add default value for group parameter in nvmeof commands to maintain
backward compatibility with existing squid tests and deployments.

Context:
--------
On main branch, when commit 6bee4e10f7f added the group parameter, the
tests were subsequently updated to provide the group argument explicitly:

  Main test: ceph orch apply nvmeof foo default
  Expected: nvmeof.foo.default

However, on squid branch, the existing tests still use the older syntax
without specifying a group:

  Squid test: ceph orch apply nvmeof foo
  Expected: nvmeof.foo

The previous cherry-pick (e1612d048a1) fixed the service_id construction
logic to handle empty groups correctly, but the group parameter was still
required without a default value, causing "ceph orch apply nvmeof foo" to
fail with EINVAL (missing required argument).

This commit adds the missing default value (group: str = '') to make the
parameter optional, maintaining backward compatibility with existing squid
tests and user scripts that don't specify a group.

With both changes:
1. Cherry-picked e1612d048a1: service_id logic handles empty group
2. This commit: group parameter has default value ''

Result:
  "ceph orch apply nvmeof foo" works (creates nvmeof.foo)
  "ceph orch apply nvmeof foo mygroup" also works (creates nvmeof.foo.mygroup)

Test: qa/suites/orch/cephadm/smoke-roleless/2-services/nvmeof.yaml
Fixes job 50373 failure from test run dgalloway-2026-02-13_23:06:25

Please note, this change was not cherry-picked from main branch, because
main intentionally still requires the CLI group argument for arch
apply/add nvmeof, and its tests were updated accordingly.
On squid, however, the earlier cherry-pick 6bee4e10 introduced the
required group parameter, but squid still has the old test/behavior
(ceph orch apply nvmeof foo expecting nvmeof.foo) and does not contain
the later main commits 3e5e85aadc1 and b377085c302.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Ilya Dryomov [Mon, 16 Feb 2026 21:24:47 +0000 (22:24 +0100)]

librbd/cache/pwl: WriteLogOperationSet::cell can be garbage

The pointer is never initialized but gets printed by operator<<.
Luckily outside of that it's unused.

Fixes: https://tracker.ceph.com/issues/74971
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit bffa11487cb7d68c0aa39994f50fbc3b4b00e415)

commit | commitdiff | tree

Nizamudeen A [Fri, 14 Mar 2025 07:10:45 +0000 (12:40 +0530)]

mgr/dashboard: add types for mgr-module list

also introducing a const for rgw

Fixes: https://tracker.ceph.com/issues/70331
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 3d6de8a669887c57711f176b3a75f2f2a635a23e)

Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-multisite-details/rgw-multisite-details.component.ts
- kept only the import that is relavant
src/pybind/mgr/dashboard/frontend/src/app/shared/api/mgr-module.service.ts
- same as above

commit | commitdiff | tree

Nizamudeen A [Wed, 5 Mar 2025 16:46:03 +0000 (22:16 +0530)]

mgr/dashboard: fix access control permissions for roles

Since prometheus is being used in the dashboard page we need to make
sure every role has prometheus read only access so that the dashboard
page can load the utilization metrics.

I also saw permission issue with the osd settings endpoint when its
trying to get the nearfull/full ratio. so instead of failing the entire
page i am proceeding with a chart that doesn't have those details when
the user doesn't have permission to access the config opt.

Multisite page was not accessible in the case of rgw-manager or
read-only user because its trying to show the status of rgw module. This
si also now gracefully handled to show the alert only when the user has
sufficient permission.

Fixes: https://tracker.ceph.com/issues/70331
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit f4bc03e4040ca32591d9b46b79309b162c3942db)

Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/dashboard-v3/dashboard/dashboard-v3.component.ts
- kept changes only relavant to bug fix and ignored the other changes
like h/w monitoring
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-multisite-details/rgw-multisite-details.component.html
- ignored multisite wizard changes
src/pybind/mgr/dashboard/frontend/src/app/core/navigation/administration/administration.component.html
- kept the current changes since carbon is not there in squid which
means this issue is not present
src/pybind/mgr/dashboard/frontend/src/app/core/navigation/navigation/navigation.component.html
- kept the current changes for the same reason above
src/pybind/mgr/dashboard/services/access_control.py
- ignored the SMB role manager and kept only what's available in squid

commit | commitdiff | tree

Alexander Indenbaum [Mon, 23 Sep 2024 08:47:47 +0000 (08:47 +0000)]

pybind/mgr/orchestrator/module.py: NvmeofServiceSpec service_id

- make service_id better alligned with default/empty group
(https://github.com/ceph/ceph/commit/f6d552d7c777f1160545188dcffa6b685b05ca8a)
- fix service_id in nvmeof daemon add

Signed-off-by: Alexander Indenbaum <aindenba@redhat.com>
(cherry picked from commit e1612d048a102a716aaa8b5d0d91a45525828664)

commit | commitdiff | tree

Ilya Dryomov [Tue, 24 Feb 2026 11:46:35 +0000 (12:46 +0100)]

librbd/mirror: detect trashed snapshots in UnlinkPeerRequest

If two instances of UnlinkPeerRequest race with each other (e.g. due
to rbd-mirror daemon unlinking from a previous mirror snapshot and the
user taking another mirror snapshot at same time), the snapshot that
UnlinkPeerRequest was created for may be in the process of being removed
(which may mean trashed by SnapshotRemoveRequest::trash_snap()) or fully
removed by the time unlink_peer() grabs the image lock. Because trashed
snapshots weren't handled explicitly, UnlinkPeerRequest could spuriously
fail with EINVAL ("not mirror snapshot" case) instead of the expected
ENOENT ("missing snapshot" case). This in turn could lead to spurious
ImageReplayer failures with it stopping prematurely.

Fixes: https://tracker.ceph.com/issues/68279
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3596ca077097a4e0ff8e8d05a410c2044332391e)

commit | commitdiff | tree

Ilya Dryomov [Wed, 25 Feb 2026 10:37:16 +0000 (11:37 +0100)]

librbd: don't complete ImageUpdateWatchers::shut_down() prematurely

ImageUpdateWatchers::flush() requests aren't tracked with
m_in_flight-like mechanism the way ImageUpdateWatchers::send_notify()
requests are, but in both cases callbacks that represent delayed work
that is very likely to (indirectly) reference ImageCtx are involved.
When the image is getting closed, ImageUpdateWatchers::shut_down() is
called before anything that belongs to ImageCtx is destroyed. However,
the shutdown can complete prematurely in the face of a pending flush if
one gets sent shortly before CloseRequest is invoked. The callback for
that flush will then race with CloseRequest and may execute after parts
of or even the entire ImageCtx is destroyed, leading to use-after-free
and various segfaults.

Fixes: https://tracker.ceph.com/issues/75161
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3ea6ee62aa339d1ad9976fdcc6e207a505f9bf44)

commit | commitdiff | tree

Patrick Donnelly [Fri, 27 Feb 2026 15:05:20 +0000 (10:05 -0500)]

Merge PR #67558 into squid

* refs/pull/67558/head:
test: disable known flaky tests in run-rbd-unit-tests

Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Ilya Dryomov [Thu, 19 Feb 2026 14:45:39 +0000 (15:45 +0100)]

test: disable known flaky tests in run-rbd-unit-tests

The failures seem to be more frequent on newer hardware. In the
absence of immediate fixes, disable a few tests that have been known to
be flaky for a long time to avoid disrupting "make check" runs.

Fixes: https://tracker.ceph.com/issues/75163
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ecb616681b5347676198b33c80de60742bac7b69)

commit | commitdiff | tree

Patrick Donnelly [Thu, 15 Jan 2026 16:35:34 +0000 (11:35 -0500)]

doc: fetch releases from main branch

So we do not need to backport actual EOL dates.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 4e57701c59d43d8b2e51b99664ba529cbf9445a8)

commit | commitdiff | tree

Patrick Donnelly [Mon, 21 Apr 2025 15:20:46 +0000 (11:20 -0400)]

qa/workunits/fs/misc: remove data pool cleanup

This cleanup is at the very least incorrect as it can cause the MDS to throw
read-only errors because the data pool is removed before it can write out
backtraces.

We've not yet finalized a truly safe workflow to remove a data pool -- even
flushing the MDS journals first may not be enough (considering a large purge
queue).

Fixes: https://tracker.ceph.com/issues/70919
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit f45cf2d53729b95ed1257241efaaa97d3d63abe1)

commit | commitdiff | tree

Patrick Donnelly [Fri, 27 Sep 2024 00:39:40 +0000 (20:39 -0400)]

qa: do not fail cephfs QA tests for slow bluestore ops

Fixes: https://tracker.ceph.com/issues/68283
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 256b20de486337dde92fcb2067e0351ea6e67f54)

commit | commitdiff | tree

Patrick Donnelly [Wed, 25 Feb 2026 01:56:25 +0000 (20:56 -0500)]

Merge PR #67501 into squid

* refs/pull/67501/head:
doc: Remove sphinxcontrib-seqdiag Python package from RTD builds

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>

commit | commitdiff | tree

Ville Ojamo [Mon, 5 Jan 2026 06:10:45 +0000 (13:10 +0700)]

doc: Remove sphinxcontrib-seqdiag Python package from RTD builds

This is a proactive PR to avoid breaking docs builds when Setuptools 81
starts to be used in the RTD builds process.

The sphnixcontrib-seqdiag Python package is not compatible with
Setuptools 81 or later due to use of pkg_resources:
https://setuptools.pypa.io/en/latest/pkg_resources.html

Setuptools 81 release should be imminent, with the Python deprecation
warning stating pkg_resources "removal as early as 2025-11-30".

Seqdiag seems to be unmaintained with the latest update at Pypi in
the year 2021 and also no updates to the seqdiag git repo.

There are no seqdiag directives left in the docs after last seqdiags
were removed in PR #52308.

Two other options would exist for fixing the situation (see PR for
discussion) but this seems to be the suitable one.

Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
(cherry picked from commit 15481e509b4d644d0644188501d86a4ceda2c039)

Conflicts:
doc/conf.py: formatting difference

commit | commitdiff | tree

Patrick Donnelly [Tue, 24 Feb 2026 21:46:02 +0000 (16:46 -0500)]

Merge PR #67497 into squid

* refs/pull/67497/head:
squid: qa: add missing .qa links

Reviewed-by: Yuri Weinstein <yweins@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Tue, 24 Feb 2026 21:39:48 +0000 (16:39 -0500)]

squid: qa: add missing .qa links

So that "Check for missing .qa links" passes.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Ilya Dryomov [Sun, 8 Feb 2026 08:27:26 +0000 (09:27 +0100)]

qa: krbd_rxbounce.sh: do more reads to generate more errors

On faster hardware having each thread do 1024 reads isn't always
sufficient for the "two orders of magnitude" threshold that is used in
the test.

Fixes: https://tracker.ceph.com/issues/74712
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 7f85b9ddb54cb3d83311daa7a72be27731be2806)

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Feb 2026 15:41:32 +0000 (10:41 -0500)]

Merge PR #64815 into squid

* refs/pull/64815/head:
The compilation of ISAL compress in the current code depends on the macro HAVE_NASM_X64_AVX2. However, the macro HAVE_NASM_X64_AVX2 has been removed, resulting in the compression not using ISAL even if the compressor_zlib_isal parameter is set to true.

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Ilya Dryomov [Tue, 13 Jan 2026 19:32:14 +0000 (20:32 +0100)]

osd/PrimaryLogPG: encode an empty data_bl for empty sparse reads

Commit 0cf383da0741 ("ReplicatedPG: clamp SPARSE_READ to object size
for ec pool") didn't handle the case of a sparse read that ends up
being empty correctly: the OSD encodes only an empty extent map whereas
clients (both userspace and kernel) also expect to see an empty data
buffer. IOW the reply contains one 32-bit zero instead of the expected
two.

Fixes: https://tracker.ceph.com/issues/74394
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit c694c35bbfce6e3033b34fe6994b40b00fad11d9)

commit | commitdiff | tree

Ilya Dryomov [Tue, 11 Nov 2025 20:39:58 +0000 (21:39 +0100)]

qa/valgrind.supp: make gcm_cipher_internal suppression more resilient

gcm_cipher_internal() and ossl_gcm_stream_final() make it to the stack
trace only on CentOS Stream 9.  On Ubuntu 22.04 and Rocky 10, it looks
as follows:

Thread 4 msgr-worker-1:
Conditional jump or move depends on uninitialised value(s)
   at 0x70A36D4: ??? (in /usr/lib64/libcrypto.so.3.2.2)
   by 0x70A39A1: ??? (in /usr/lib64/libcrypto.so.3.2.2)
   by 0x6F8A09C: EVP_DecryptFinal_ex (in /usr/lib64/libcrypto.so.3.2.2)
   by 0xB498C1F: ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v15_2_0::list&) (crypto_onwire.cc:271)
   by 0xB4992D7: ceph::msgr::v2::FrameAssembler::disassemble_preamble(ceph::buffer::v15_2_0::list&) (frames_v2.cc:281)
   by 0xB482D98: ProtocolV2::handle_read_frame_preamble_main(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int) (ProtocolV2.cc:1149)
   by 0xB475318: ProtocolV2::run_continuation(Ct<ProtocolV2>&) (ProtocolV2.cc:54)
   by 0xB457012: AsyncConnection::process() (AsyncConnection.cc:495)
   by 0xB49E61A: EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*) (Event.cc:492)
   by 0xB49EA9D: UnknownInlinedFun (Stack.cc:50)
   by 0xB49EA9D: UnknownInlinedFun (invoke.h:61)
   by 0xB49EA9D: UnknownInlinedFun (invoke.h:111)
   by 0xB49EA9D: std::_Function_handler<void (), NetworkStack::add_thread(Worker*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (std_function.h:290)
   by 0xBB11063: ??? (in /usr/lib64/libstdc++.so.6.0.33)
   by 0x4F17119: start_thread (in /usr/lib64/libc.so.6)

The proposal to amend the existing suppression so that it's tied to the
specific callsite rather than libcrypto internals [1] received a thumbs
up from Radoslaw.

[1] https://github.com/ceph/ceph/pull/61689#issuecomment-2650179891

Fixes: https://tracker.ceph.com/issues/74672
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit d85265eab4f9344f0d08330f737ce9d7be32d716)

commit | commitdiff | tree

Ilya Dryomov [Tue, 23 Dec 2025 13:27:18 +0000 (14:27 +0100)]

librbd: introduce RBD_LOCK_MODE_EXCLUSIVE_TRANSIENT

The existing StandardPolicy that exposed as RBD_LOCK_MODE_EXCLUSIVE
argument to rbd_lock_acquire() disables automatic exclusive lock
transitions with "permanent" semantics: any request to release the lock
causes the peer to error out immediately.  Such a lock owner can
perform maintenance operations that are proxied from other peers, but
any write-like I/O issued by other peers will fail with EROFS.

This isn't suitable for use cases where one of the peers wants to
manage exclusive lock manually (i.e. rbd_lock_acquire() is used) but
the lock is acquired only for very short periods of time.  The rest of
the time the lock is expected to be held by other peers that stay in
the default "auto" mode (AutomaticPolicy) and run as usual, completely
unconcerned with each other or the manual-mode peer.  However, these
peers get acutely aware of the manual-mode peer because when it grabs
the lock with RBD_LOCK_MODE_EXCLUSIVE their I/O gets disrupted: higher
layers translate EROFS into generic EIO, filesystems shut down, etc.

Add a new TransientPolicy exposed as RBD_LOCK_MODE_EXCLUSIVE_TRANSIENT
to allow disabling automatic exclusive lock transitions with semantics
that would cause the other peers to block waiting for the lock to be
released by the manual-mode peer.  This is intended to be a low-level
interface -- no attempt to safeguard against potential misuse causing
e.g. indefinite blocking is made.

It's possible to switch between RBD_LOCK_MODE_EXCLUSIVE and
RBD_LOCK_MODE_EXCLUSIVE_TRANSIENT modes of operation both while the
lock is held and after it's released.

Fixes: https://tracker.ceph.com/issues/73824
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 8740544c51781211b82ac45d4bc93b9eb9623e76)

commit | commitdiff | tree

Ilya Dryomov [Mon, 19 Jan 2026 16:43:41 +0000 (17:43 +0100)]

librbd: prepare lock_acquire() for changing between policies

In preparation for adding a new TransientPolicy, get rid of the check
implemented in terms of exclusive_lock::Policy::may_auto_request_lock()
that essentially makes it so that exclusive lock policy on a given
image handle can be changed from the default AutomaticPolicy only once.
In order to effect another change a new image handle would have been
needed which is pretty suboptimal.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1f9396ff8208accdf334c088745d2734225b34c1)

commit | commitdiff | tree

Ilya Dryomov [Mon, 22 Dec 2025 18:07:27 +0000 (19:07 +0100)]

librbd: fix RequestLockPayload log message in ImageWatcher

exclusive_lock::Policy::lock_requested() isn't guaranteed to queue
the release of exclusive lock (and in fact only one of the two existing
implementations does that). Instead of talking about the lock, log the
response to the notification.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ff89abf5ddcca91c34cfd46d288d41fe93ec38b0)

commit | commitdiff | tree

Ilya Dryomov [Mon, 22 Dec 2025 16:22:53 +0000 (17:22 +0100)]

librbd: amend error message in lock_acquire()

... since it went stale with commit 2914eef50d69 ("rbd: Changed
exclusive-lock implementation to use the new managed-lock"). In the
context of exclusive lock, requesting the lock refers to a specific
action which may or may not be performed as part of acquiring the lock
and lock_acquire() doesn't get visibility into that.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 9159a2dd032bf47a030260660ff64b5be2d72648)

commit | commitdiff | tree

SrinivasaBharathKanta [Sat, 7 Feb 2026 00:28:52 +0000 (05:58 +0530)]

Merge pull request #66538 from Jayaprakash-ibm/wip-fix-cot-bz2404644-squid

squid: tools: handle get-attr as read-only ops in ceph_objectstore_tool

commit | commitdiff | tree

Ilya Dryomov [Fri, 6 Feb 2026 18:18:49 +0000 (19:18 +0100)]

Merge pull request #66296 from idryomov/wip-69492-squid

squid: rbd-mirror: add cluster fsid to remote meta cache key

Reviewed-by: Mykola Golub <mykola.golub@clyso.com>

commit | commitdiff | tree

Yuri Weinstein [Fri, 6 Feb 2026 16:22:53 +0000 (08:22 -0800)]

Merge pull request #67162 from idryomov/wip-74676-squid

squid: qa/tasks/rbd_mirror_thrash: don't use random.randrange() on floats

Reviewed-by: Ramana Raja <rraja@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Fri, 6 Feb 2026 16:22:09 +0000 (08:22 -0800)]

Merge pull request #67153 from idryomov/wip-74669-squid

squid: qa/workunits/rbd: reduce randomized sleeps in live import tests

Reviewed-by: Miki Patel <miki.patel132@gmail.com>

commit | commitdiff | tree

Yuri Weinstein [Fri, 6 Feb 2026 16:21:20 +0000 (08:21 -0800)]

Merge pull request #67151 from idryomov/wip-74601-squid

squid: qa/workunits/rbd: adapt rbd_mirror.sh for trial nodes

Reviewed-by: Miki Patel <miki.patel132@gmail.com>

commit | commitdiff | tree

Yuri Weinstein [Fri, 6 Feb 2026 16:20:38 +0000 (08:20 -0800)]

Merge pull request #66627 from idryomov/wip-74168-squid

squid: librbd: fix ExclusiveLock::accept_request() when !is_state_locked()

Reviewed-by: Ramana Raja <rraja@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Fri, 6 Feb 2026 16:19:46 +0000 (08:19 -0800)]

Merge pull request #64621 from idryomov/wip-71961-squid

squid: librbd: images aren't closed in group_snap_*_by_record() on error

Reviewed-by: Miki Patel <miki.patel132@gmail.com>

commit | commitdiff | tree

Neeraj Pratap Singh [Wed, 4 Feb 2026 12:12:27 +0000 (17:42 +0530)]

Merge pull request #59647 from mchangir/wip-67940-squid

squid: mgr/snap_schedule: correctly fetch mds_max_snaps_per_dir from mds

commit | commitdiff | tree

Neeraj Pratap Singh [Wed, 4 Feb 2026 12:08:08 +0000 (17:38 +0530)]

Merge pull request #65289 from dparmar18/wip-72504-squid

squid: client: fix unmount hang after lookups

commit | commitdiff | tree

David Galloway [Tue, 3 Feb 2026 21:21:17 +0000 (16:21 -0500)]

Merge pull request #67185 from kshtsk/wip-74593-squid

squid: qa/workunits/rgw: drop netstat usage

commit | commitdiff | tree

Kyr Shatskyy [Fri, 21 Nov 2025 21:20:04 +0000 (22:20 +0100)]

qa/workunits/rgw: drop netstat usage

The `netstat` is deprecated now in modern Linux and usually
requires an extra package dependency to be installed.
Usually it is `net-tools`, however, for example, opensuse,
`netstat` does not present in it. Thus, let us use `ss` as
an alternative.

When using `netstat -nltp` we get lines like:
'tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 25156/valgrind.bin \ntcp6 0 0 :::443 :::* LISTEN 25156/valgrind.bin \n'
When using `ss -nltp` we get lines like:
'LISTEN 0 4096 0.0.0.0:443 0.0.0.0:* users:(("memcheck-amd64-",pid=66045,fd=72))'
so we need to filter processes by `memcheck`. However further
parsing code works equivalently as for netstat.

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@clyso.com>
(cherry picked from commit 82063f99024a8937dfa105e0828beda1bc730247)

commit | commitdiff | tree

Brad Hubbard [Mon, 2 Feb 2026 23:24:31 +0000 (09:24 +1000)]

Merge pull request #60567 from k0ste/wip-68781-squid

squid: osd: add clear_shards_repaired command

Reviewed-by: Ronen Friedman <rfriedma@redhat.com>

commit | commitdiff | tree

Ilya Dryomov [Fri, 30 Jan 2026 15:32:35 +0000 (16:32 +0100)]

qa/tasks/rbd_mirror_thrash: don't use random.randrange() on floats

This stopped working in Python 3.12:

  Changed in version 3.12: Automatic conversion of non-integer types
  is no longer supported. Calls such as randrange(10.0) and
  randrange(Fraction(10, 1)) now raise a TypeError.

Fixes: https://tracker.ceph.com/issues/74676
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit d663359fae135b2337e0ffbb86256768f61088c7)

commit | commitdiff | tree

Ilya Dryomov [Thu, 29 Jan 2026 20:41:03 +0000 (21:41 +0100)]

qa/workunits/rbd: reduce randomized sleeps in live import tests

These tests were tuned for slower hardware than what we have now.
Currently "rbd migration execute" always finishes (successfully) before
the NBD server is killed.

Fixes: https://tracker.ceph.com/issues/74669
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 592e3a9846b130c7321481f8b2bf9dba2fb05195)

Conflicts:
qa/workunits/rbd/cli_migration.sh [ commit afc89fdde80f
  ("qa/workunits/rbd: add test_import_nbd_stream_disconnected()")
  was originally skipped due to NBD stream not being in squid at
  the time ]

commit | commitdiff | tree

Ilya Dryomov [Wed, 28 Jan 2026 09:41:13 +0000 (10:41 +0100)]

qa/workunits/rbd: drop randomized sleeps in "big image" tests

These tests were tuned for slower hardware than what we have now.
Even without these the image is often 25-30% synced by the time the
test gets to the "non-primary snapshot in question is still being
synced" assert.

Fixes: https://tracker.ceph.com/issues/74601
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ec868d5ca2e56bd6003b002eb5f15d575edabd4e)

commit | commitdiff | tree

Ilya Dryomov [Tue, 27 Jan 2026 20:56:23 +0000 (21:56 +0100)]

qa/workunits/rbd: avoid unnecessary sleeping in stop_mirror()

There is no need to wait for anything if -KILL is passed for sig
because the process would disappear immediately. In teuthology runs
where multiple rbd-mirror daemons are deployed (and therefore need to
be stopped when stop_mirrors() is called by the test), it causes
gratuitous delays of 4+ seconds.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit f18fe5d15f72e34ab74b8ae187a47b52883fc780)

commit | commitdiff | tree

Ilya Dryomov [Thu, 29 Jan 2026 17:03:24 +0000 (18:03 +0100)]

Merge pull request #67074 from idryomov/wip-74513-squid

squid: qa: krbd_blkroset.t: eliminate a race in the open_count test

Reviewed-by: Ramana Raja <rraja@redhat.com>

commit | commitdiff | tree

Ilya Dryomov [Thu, 29 Jan 2026 15:10:06 +0000 (16:10 +0100)]

Merge pull request #67076 from idryomov/wip-74529-squid

squid: qa: don't assume that /dev/sda or /dev/vda is present in unmap.t

Reviewed-by: Ramana Raja <rraja@redhat.com>

commit | commitdiff | tree

NitzanMordhai [Tue, 27 Jan 2026 07:23:52 +0000 (09:23 +0200)]

Merge pull request #66518 from aclamk/aclamk-ifed-fix-70390-squid

squid: os/bluestore: compact patch to fix extent map resharding

commit | commitdiff | tree

Ilya Dryomov [Fri, 23 Jan 2026 13:48:53 +0000 (14:48 +0100)]

qa: don't assume that /dev/sda or /dev/vda is present in unmap.t

Instead of hard-coding the block device name, use the block device that
is backing the filesystem that the test is running on. We can be quite
sure it won't be an RBD device ;)

Fixes: https://tracker.ceph.com/issues/74529
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 2b5f0f4e7396114f9944a4987c38e18d4ecfbb1f)

commit | commitdiff | tree

Ilya Dryomov [Wed, 21 Jan 2026 18:41:41 +0000 (19:41 +0100)]

qa: krbd_blkroset.t: eliminate a race in the open_count test

Even at QD=1, dd may take less than 10 seconds to work its way to the
end of a 10M image, producing "No space left on device" error instead
of the expected "Operation not permitted" error which is supposed to
arise from the device getting marked read-only while opened.

Fixes: https://tracker.ceph.com/issues/74513
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 006e47e9ca691deb377fb76f7a23b6feec874865)

commit | commitdiff | tree

Sridhar Seshasayee [Fri, 12 Sep 2025 08:08:30 +0000 (13:38 +0530)]

qa: Disable OSD benchmark from running for tests.

Disable OSD bench from benchmarking the OSDs for teuthology tests. This is to
help prevent a cluster warning pertaining to the IOPS value not lying within
a typical threshold range from being raised.

The tests can rely on the built-in static values as defined by
osd_mclock_max_capacity_iops_[ssd|hdd] which should be good enough.

Fixes: https://tracker.ceph.com/issues/74501
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit cf04790bede6be13cc27dcdf50fe19a953860321)

Conflicts:
qa/config/rados.yaml
- Removed a non-existing option under mon overrides

commit | commitdiff | tree

Naveen Naidu [Wed, 21 Jan 2026 04:38:43 +0000 (10:08 +0530)]

Merge pull request #66389 from Naveenaidu/wip-70082-squid

squid: mgr/telemetry: add stretch cluster data

Reviewed-by: Yaarit Hatuka <yhatuka@ibm.com>

commit | commitdiff | tree

Aashish Sharma [Mon, 5 Jan 2026 07:18:14 +0000 (12:48 +0530)]

monitoring: fix rgw_servers filtering in rgw sync overview grafana

Fix rgw daemon filtering in RGW Sync Overview --> Replication(Time) Delta per shard graph in grafana

Fixes: https://tracker.ceph.com/issues/74315
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit cd23e940c1412dc4153a244cf8f5777727f6477b)

commit | commitdiff | tree

Aashish Sharma [Wed, 17 Dec 2025 09:21:14 +0000 (14:51 +0530)]

monitoring: make cluster matcher backward compatible for pre-7.1 metrics

Ceph 18.* adds a `cluster` label to all Prometheus metrics. When
upgrading from earlier releases, historical metrics lack this label
and are excluded by Grafana queries that strictly match on `cluster`.
Update the shared Grafana matcher logic to use a regex matcher that
also matches series without the `cluster` label, restoring visibility
of pre-upgrade metrics while preserving multi-cluster behavior.

Fixes: https://tracker.ceph.com/issues/74342
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit f7f74e455423feab79b33ac8ab472be0b98cb29d)

Conflicts:
monitoring/ceph-mixin/dashboards_out/ceph-application-overview.json (file not in squid)
monitoring/ceph-mixin/dashboards_out/ceph-cluster-advanced.json
(conflicts with $rate_interval in main)
monitoring/ceph-mixin/dashboards_out/ceph-cluster.json (missing
cluster label in metrics)
monitoring/ceph-mixin/dashboards_out/cephfsdashboard.json (file
not in squid)
monitoring/ceph-mixin/dashboards_out/multi-cluster-overview.json
(file not in squid)

commit | commitdiff | tree

bluikko [Mon, 19 Jan 2026 17:26:55 +0000 (00:26 +0700)]

Merge pull request #66970 from bluikko/wip-doc-2026-01-19-fix-63073-to-squid

squid: doc/cephadm: remove sections that do not not apply to Squid in rgw.rst

commit | commitdiff | tree

Naveen Naidu [Mon, 13 Jan 2025 13:35:47 +0000 (19:05 +0530)]

qa/tasks/thrashosds-health: whitelist PG_BACKFILL_FULL

rados/thrash-old-clients tests are failing due to PG_BACKFILL
error.

The low space error hindering backfill error is a expected
transitory state which resolves by itself when the PGs
are migrated out of the OSD during the test runs, freeing
up the needed space.

Yet, teuthology seems to pick up these PG_BACKFILL errors
and error out the test.

The solution is to add these expected errors into the
ignore list

Fixes: https://tracker.ceph.com/issues/65450
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
(cherry picked from commit 925bb875f23bd83559a70c5fb3c199373d1ea956)

commit | commitdiff | tree

Ville Ojamo [Mon, 19 Jan 2026 13:06:46 +0000 (20:06 +0700)]

doc/cephadm: remove sections not apply to Squid in rgw.rst

4949311 backported changes that do not apply to Squid.
PR #63073 body and the commit referenced therein as cherry-pick do not
correspond to the diff. Remove the additions that do not apply to Squid:

- Wildcard SAN feature in 3c24753 only since Tentacle.
- Shutdown delay feature in b84bb72 only since Tentacle.

The third feature doc addition is valid, d620ba6 was backported to Squid
in PR #61350 for disable multisite sync traffic, commit 59b3f28. This
backport cherry-picked only the feature addition and missed the docs
commit 8878619. Leave this section in.

Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom