]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log
ceph.git
7 weeks agotools: respect set features when adding addresses 62061/head
Radosław Zarzyński [Tue, 8 Oct 2024 13:14:49 +0000 (15:14 +0200)]
tools: respect set features when adding addresses

Fixes: https://tracker.ceph.com/issues/53751
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
(cherry picked from commit 19545eb9864b002c1a37d4f2509d1b2baa833128)

7 weeks agoMerge PR #67527 into squid
Patrick Donnelly [Thu, 2 Apr 2026 12:39:39 +0000 (08:39 -0400)]
Merge PR #67527 into squid

* refs/pull/67527/head:
mgr/Mgr.cc: clear daemon health metrics instead of removing down/out osd from daemon state

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
7 weeks agoMerge PR #67450 into squid
Patrick Donnelly [Tue, 31 Mar 2026 10:30:35 +0000 (16:00 +0530)]
Merge PR #67450 into squid

* refs/pull/67450/head:
qa/rgw: bucket notifications use pynose

Reviewed-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
8 weeks agoMerge PR #67575 into squid
Patrick Donnelly [Sat, 28 Mar 2026 07:31:52 +0000 (13:01 +0530)]
Merge PR #67575 into squid

* refs/pull/67575/head:
rgw/notification: fix reserved_size drift in 2pc_queue causing ENOSPC errors
rgw/notification: Prevent reserved_size leak by decrementing overhead on commit/abort.

Reviewed-by: Yuval Lifshitz <ylifshit@redhat.com>
8 weeks agoMerge PR #67398 into squid
Patrick Donnelly [Sat, 28 Mar 2026 07:27:51 +0000 (12:57 +0530)]
Merge PR #67398 into squid

* refs/pull/67398/head:
os/bluestore: Fix default base size for histogram

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
8 weeks agoMerge PR #67884 into squid
Patrick Donnelly [Sat, 28 Mar 2026 07:24:42 +0000 (12:54 +0530)]
Merge PR #67884 into squid

* refs/pull/67884/head:
qa/standalone: shorten bluefs test durations
qa/standalone: increase WAL volume size to 1GB
qa/standalone: fix bluefs expand test case

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
8 weeks agoMerge PR #67392 into squid
Patrick Donnelly [Sat, 28 Mar 2026 07:21:54 +0000 (12:51 +0530)]
Merge PR #67392 into squid

* refs/pull/67392/head:
test/encoding/readable: Add backward incompat checks
workunits/dencoder: use readable.sh script instead of python script

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2 months agoMerge PR #66884 into squid
Patrick Donnelly [Wed, 25 Mar 2026 00:07:52 +0000 (20:07 -0400)]
Merge PR #66884 into squid

* refs/pull/66884/head:
Squid: mgr/dashboard: Changing placement of a mds to label - creates a new mds-service, mds.label

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Afreen Misbah <afreen@ibm.com>
2 months agoMerge PR #62454 into squid
Patrick Donnelly [Tue, 24 Mar 2026 15:10:51 +0000 (11:10 -0400)]
Merge PR #62454 into squid

* refs/pull/62454/head:
mgr/dashboard: add types for mgr-module list
mgr/dashboard: fix access control permissions for roles

Reviewed-by: Afreen Misbah <afreen@ibm.com>
2 months agoMerge PR #67796 into squid
Patrick Donnelly [Thu, 19 Mar 2026 23:41:52 +0000 (19:41 -0400)]
Merge PR #67796 into squid

* refs/pull/67796/head:
qa/workunits/rbd: fix unbound variable in status()
qa/workunits/rbd: short-circuit status() if "ceph -s" fails
qa: rbd_mirror_fsx_compare.sh doesn't error out as expected

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agoMerge PR #67794 into squid
Patrick Donnelly [Thu, 19 Mar 2026 23:41:25 +0000 (19:41 -0400)]
Merge PR #67794 into squid

* refs/pull/67794/head:
qa/tasks: make rbd_mirror_thrash inherit from ThrasherGreenlet

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agoMerge PR #67704 into squid
Patrick Donnelly [Thu, 19 Mar 2026 23:40:58 +0000 (19:40 -0400)]
Merge PR #67704 into squid

* refs/pull/67704/head:
librbd/cache/pwl: WriteLogOperationSet::cell can be garbage

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agoMerge PR #66838 into squid
Patrick Donnelly [Thu, 19 Mar 2026 15:00:04 +0000 (11:00 -0400)]
Merge PR #66838 into squid

* refs/pull/66838/head:
os/bluestore: rename row names in RocksDBBlueFSVolumeSelector.
test/bluestore: add volume selector tests
os/bluestore:fix bluestore_volume_selection_reserved_factor usage
os/bluestore: print the first RocksDB level which doesn't fit into fast

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
2 months agoqa/standalone: shorten bluefs test durations 67884/head
Igor Fedotov [Mon, 9 Feb 2026 12:21:25 +0000 (15:21 +0300)]
qa/standalone: shorten bluefs test durations

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 5a901808dfe03dad5e34ef6374e34c0c03766e96)

2 months agoqa/standalone: increase WAL volume size to 1GB
Igor Fedotov [Mon, 9 Feb 2026 14:58:43 +0000 (17:58 +0300)]
qa/standalone: increase WAL volume size to 1GB

to avoid unexpected test case failures due to ENOSPC.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 2de79c64420ffba91becdf29f2d4f6b2d5931830)

2 months agoqa/standalone: fix bluefs expand test case
Igor Fedotov [Mon, 9 Feb 2026 12:19:52 +0000 (15:19 +0300)]
qa/standalone: fix bluefs expand test case

Fixes: https://tracker.ceph.com/issues/74525
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 9fc57f9ed1c61d54ca8ecd9e1b98782eee13848a)

2 months agoSquid: mgr/dashboard: Changing placement of a mds to label - creates a new mds-servic... 66884/head
Dnyaneshwari Talwekar [Mon, 12 Jan 2026 07:53:59 +0000 (13:23 +0530)]
Squid: mgr/dashboard: Changing placement of a mds to label - creates a new mds-service, mds.label

Fixes: https://tracker.ceph.com/issues/74376
Signed-off-by: Dnyaneshwari Talwekar <dtalweka@redhat.com>
2 months agoMerge PR #63344 into squid
Patrick Donnelly [Wed, 18 Mar 2026 14:31:29 +0000 (10:31 -0400)]
Merge PR #63344 into squid

* refs/pull/63344/head:
mgr/DaemonServer: fixed mistype for mgr_osd_messages

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agoMerge PR #61417 into squid
Patrick Donnelly [Wed, 18 Mar 2026 14:25:33 +0000 (10:25 -0400)]
Merge PR #61417 into squid

* refs/pull/61417/head:
qa/cephfs: update ignorelist

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agoMerge PR #59688 into squid
Patrick Donnelly [Wed, 18 Mar 2026 14:23:07 +0000 (10:23 -0400)]
Merge PR #59688 into squid

* refs/pull/59688/head:
qa: some test set `refuse_client_session`, so the cluster log is expected

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agoMerge PR #64686 into squid
Patrick Donnelly [Wed, 18 Mar 2026 14:19:00 +0000 (10:19 -0400)]
Merge PR #64686 into squid

* refs/pull/64686/head:
mon/MgrMonitor: add a space before "is already disabled"

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agoMerge PR #65298 into squid
Patrick Donnelly [Wed, 18 Mar 2026 14:15:16 +0000 (10:15 -0400)]
Merge PR #65298 into squid

* refs/pull/65298/head:
qa/suites/upgrade: update ignorelist with cephfs specific warnings (under stress-split)
qa/suites/upgrade: add "Replacing daemon mds" to ignorelist

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agoMerge PR #65758 into squid
Patrick Donnelly [Wed, 18 Mar 2026 14:11:35 +0000 (10:11 -0400)]
Merge PR #65758 into squid

* refs/pull/65758/head:
.github: pin GH Actions to SHA-1 commit

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agoMerge PR #66126 into squid
Patrick Donnelly [Wed, 18 Mar 2026 14:09:30 +0000 (10:09 -0400)]
Merge PR #66126 into squid

* refs/pull/66126/head:
qa: ignore cluster warning (evicting unresponsive ...) with tasks/mgr-osd-full

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agoos/bluestore: rename row names in RocksDBBlueFSVolumeSelector. 66838/head
Igor Fedotov [Wed, 21 May 2025 08:30:15 +0000 (11:30 +0300)]
os/bluestore: rename row names in RocksDBBlueFSVolumeSelector.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit a9f591f4e1cb1e364879165250c55cb0f841d64f)

2 months agotest/bluestore: add volume selector tests
Igor Fedotov [Mon, 19 May 2025 19:20:53 +0000 (22:20 +0300)]
test/bluestore: add volume selector tests

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 158d1550a021ed60e5ad1c565b247e5b0b6d5946)

 Conflicts:
   src/test/objectstore/CMakeLists.txt - allocsim not present in Squid

2 months agoos/bluestore:fix bluestore_volume_selection_reserved_factor usage
Igor Fedotov [Mon, 19 May 2025 19:19:45 +0000 (22:19 +0300)]
os/bluestore:fix bluestore_volume_selection_reserved_factor usage

Fixes: https://tracker.ceph.com/issues/71368
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 43d7864093f92977a3fd084bbfd65229244b1cc9)

2 months agoos/bluestore: print the first RocksDB level which doesn't fit into fast
Igor Fedotov [Tue, 4 Feb 2025 16:45:13 +0000 (19:45 +0300)]
os/bluestore: print the first RocksDB level which doesn't fit into fast
device by default.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit d95aa620b315d9261cb50b0465ecfd2b6b534a60)

2 months agoMerge PR #66915 into squid
Patrick Donnelly [Wed, 18 Mar 2026 14:04:29 +0000 (10:04 -0400)]
Merge PR #66915 into squid

* refs/pull/66915/head:
monc: synchronize tick() of MonClient with shutdown()

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2 months agoMerge PR #66973 into squid
Patrick Donnelly [Wed, 18 Mar 2026 14:03:12 +0000 (10:03 -0400)]
Merge PR #66973 into squid

* refs/pull/66973/head:
qa/tasks/thrashosds-health: whitelist PG_BACKFILL_FULL

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agoMerge PR #60391 into squid
Patrick Donnelly [Wed, 18 Mar 2026 13:59:32 +0000 (09:59 -0400)]
Merge PR #60391 into squid

* refs/pull/60391/head:
qa/cephfs: ignore when specific OSD is reported down during upgrade

2 months agoMerge PR #63026 into squid
Patrick Donnelly [Wed, 18 Mar 2026 13:57:44 +0000 (09:57 -0400)]
Merge PR #63026 into squid

* refs/pull/63026/head:
qa/workunits/cephtool: add extra privileges to cephtool script

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
2 months agotest/encoding/readable: Add backward incompat checks 67392/head
Nitzan Mordechai [Sun, 7 Dec 2025 09:06:14 +0000 (09:06 +0000)]
test/encoding/readable: Add backward incompat checks

The readable.sh script has forward incompat checks, but no
backward incompat checks.

This fix will:
 1. Add check for backward_incompat directory for each type for specific
    objects or all objects with the same type and skip those objects from being tested.
 2. Add version comparison helper functions (version_lt, version_le, version_ge,
    versions_span) for robust version handling
 3. Replace 'sort -n' with 'sort -V' for proper version number sorting
 4. Add CORPUS_PATH environment variable to allow teuthology tests to execute this script
 5. Improve readability of the script

The difference between backward and forward incompat:
- forward_incompat: Marks objects from older versions that newer ceph-dencoder
  versions cannot read. Example: Version 19.2.x objects marked incompat at version 20.2.x
  means ceph-dencoder v20.2.x+ can't decode them. Skip when testing old objects
  with a new ceph-dencoder.
- backward_incompat: Marks objects from newer versions that older ceph-dencoder
  versions cannot read. Example: Version 19.2.x objects marked backward_incompat at v19.2.x
  means ceph-dencoder < v19.2.x can't decode them. Skip when testing new objects
  with an old ceph-dencoder.

Fixes: https://tracker.ceph.com/issues/74074
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit 011b25d8038e0f0bd3272fa57b0c7e068feb130c)

2 months agoworkunits/dencoder: use readable.sh script instead of python script
NitzanMordhai [Mon, 2 Feb 2026 07:34:24 +0000 (07:34 +0000)]
workunits/dencoder: use readable.sh script instead of python script

The python script test_readable.py was added for backword and forward
compability. maintaining 2 scripts that finally doing the same is west,
reverting and using readable.sh and leave the python out.

https://tracker.ceph.com/issues/74074
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit 9d289ed14e79fa8008ba30b77b425a4508030110)

2 months agoqa/rgw: bucket notifications use pynose 67450/head
Casey Bodley [Thu, 19 Feb 2026 15:09:44 +0000 (10:09 -0500)]
qa/rgw: bucket notifications use pynose

nose incompatibility in multisite tests was fixed by switching to pynose
in https://github.com/ceph/teuthology/pull/1947, so i'm trying the same
here

Fixes: https://tracker.ceph.com/issues/74573
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 915a5309a639333839829b5a554f3fdb6c560464)

2 months agoMerge PR #63018 into squid
Patrick Donnelly [Wed, 18 Mar 2026 00:37:05 +0000 (20:37 -0400)]
Merge PR #63018 into squid

* refs/pull/63018/head:
qa/workunits/fs/misc: remove data pool cleanup

2 months agoMerge PR #61302 into squid
Patrick Donnelly [Wed, 18 Mar 2026 00:35:59 +0000 (20:35 -0400)]
Merge PR #61302 into squid

* refs/pull/61302/head:
qa: do not fail cephfs QA tests for slow bluestore ops

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
2 months agoMerge PR #67001 into squid
Patrick Donnelly [Wed, 18 Mar 2026 00:26:22 +0000 (20:26 -0400)]
Merge PR #67001 into squid

* refs/pull/67001/head:
doc: fetch releases from main branch

Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
2 months agoMerge PR #67623 into squid
Patrick Donnelly [Wed, 18 Mar 2026 00:22:23 +0000 (20:22 -0400)]
Merge PR #67623 into squid

* refs/pull/67623/head:
mgr/orchestrator: make group parameter optional for nvmeof (squid)
pybind/mgr/orchestrator/module.py: NvmeofServiceSpec service_id

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agoMerge PR #66964 into squid
Patrick Donnelly [Tue, 17 Mar 2026 14:42:18 +0000 (10:42 -0400)]
Merge PR #66964 into squid

* refs/pull/66964/head:
monitoring: upgrade grafana version to 12.3.1

Reviewed-by: Afreen Misbah <afreen@ibm.com>
2 months agoMerge PR #66990 into squid
Patrick Donnelly [Tue, 17 Mar 2026 14:41:38 +0000 (10:41 -0400)]
Merge PR #66990 into squid

* refs/pull/66990/head:
monitoring: fix rgw_servers filtering in rgw sync overview grafana

Reviewed-by: Afreen Misbah <afreen@ibm.com>
2 months agoqa/workunits/rbd: fix unbound variable in status() 67796/head
Ilya Dryomov [Mon, 2 Mar 2026 11:07:48 +0000 (12:07 +0100)]
qa/workunits/rbd: fix unbound variable in status()

It was missed in commit 5fe64fa806f3 ("qa: rbd_mirror.sh: change
parameters to cluster rather than daemon name").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1a280b9a320d51bdc4cb80be9bdd6ae265151132)

2 months agoqa/workunits/rbd: short-circuit status() if "ceph -s" fails
Ilya Dryomov [Sun, 1 Mar 2026 21:55:52 +0000 (22:55 +0100)]
qa/workunits/rbd: short-circuit status() if "ceph -s" fails

In mirror-thrash tests, status() can be invoked after one of the
clusters is effectively stopped due to a watchdog bark:

2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons
...
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ status
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ local cluster daemon image_pool image_ns image
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ for cluster in ${CLUSTER1} ${CLUSTER2}

In this scenario all commands that are invoked from the loop body
are going to time out anyway.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 82717e43a08a1262987f5e271fd72d4433c4fb3b)

2 months agoqa: rbd_mirror_fsx_compare.sh doesn't error out as expected
Ilya Dryomov [Sun, 1 Mar 2026 16:45:51 +0000 (17:45 +0100)]
qa: rbd_mirror_fsx_compare.sh doesn't error out as expected

In mirror-thrash tests, one of the clusters can be effectively stopped
due to a watchdog bark while rbd_mirror_fsx_compare.sh is running and is
in the middle of the "wait for all images" loop:

2026-03-01T12:55:35.059 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ retrying_seconds=1040
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ '[' 1040 -le 7200 ']'
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:++ rbd --cluster cluster2 --pool mirror ls
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:++ wc -l
2026-03-01T12:55:35.084 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ '[' 290 -ge 292 ']'
2026-03-01T12:55:35.084 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ sleep 10
...
2026-03-01T12:55:49.568 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T12:55:49.568 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons

In this scenario "rbd ls" is going to time out repeatedly, turning the
loop into up to a ~60-hour sleep (up to 720 iterations with a 5-minute
timeout + 10-second sleep per iteration).

Fixes: https://tracker.ceph.com/issues/75239
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 81a5906f0d1cc844bb4ef16aae9ace3e7d371ac2)

2 months agoqa/tasks: make rbd_mirror_thrash inherit from ThrasherGreenlet 67794/head
Ilya Dryomov [Fri, 27 Feb 2026 14:18:27 +0000 (15:18 +0100)]
qa/tasks: make rbd_mirror_thrash inherit from ThrasherGreenlet

Commit 21b4b89e5280 ("qa/tasks: watchdog terminate thrasher") made it
required for a thrasher to have stop_and_join() method, but the
preceding commit a035b5a22fb8 ("thrashers: standardize stop and join
method names") missed to add it to rbd_mirror_thrash (whether as an
ad-hoc implementation or by way of inheriting from ThrasherGreenlet).
Later on, commit 783f0e3a9903 ("qa: Adding a new class for the
daemonwatchdog to monitor") worsened the issue by expanding the use
of stop_and_join() to all watchdog barks rather than just the case of
a thrasher throwing an exception which is something that practically
never happens.

Fixes: https://tracker.ceph.com/issues/75200
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3ebe3a0a43251b0f126497d4100bd1af9ca8afc5)

2 months agoMerge PR #66829 into squid
Patrick Donnelly [Fri, 13 Mar 2026 16:02:49 +0000 (21:32 +0530)]
Merge PR #66829 into squid

* refs/pull/66829/head:
monitoring: fix CephPgImbalance alert rule expression

Reviewed-by: Afreen Misbah <afreen@ibm.com>
2 months agoMerge PR #66897 into squid
Patrick Donnelly [Fri, 13 Mar 2026 15:36:24 +0000 (21:06 +0530)]
Merge PR #66897 into squid

* refs/pull/66897/head:
common: drop stack singleton object of temp messenger for foreground ceph daemons

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2 months agoMerge PR #67066 into squid
Patrick Donnelly [Fri, 13 Mar 2026 15:34:48 +0000 (21:04 +0530)]
Merge PR #67066 into squid

* refs/pull/67066/head:
qa: Disable OSD benchmark from running for tests.

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2 months agoMerge PR #67278 into squid
Patrick Donnelly [Fri, 13 Mar 2026 15:32:44 +0000 (21:02 +0530)]
Merge PR #67278 into squid

* refs/pull/67278/head:
librbd: introduce RBD_LOCK_MODE_EXCLUSIVE_TRANSIENT
librbd: prepare lock_acquire() for changing between policies
librbd: fix RequestLockPayload log message in ImageWatcher
librbd: amend error message in lock_acquire()

Reviewed-by: Ramana Raja <rraja@redhat.com>
2 months agoMerge PR #67280 into squid
Patrick Donnelly [Fri, 13 Mar 2026 15:28:10 +0000 (20:58 +0530)]
Merge PR #67280 into squid

* refs/pull/67280/head:
qa/valgrind.supp: make gcm_cipher_internal suppression more resilient

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agoMerge PR #67356 into squid
Patrick Donnelly [Fri, 13 Mar 2026 15:23:54 +0000 (20:53 +0530)]
Merge PR #67356 into squid

* refs/pull/67356/head:
osd/PrimaryLogPG: encode an empty data_bl for empty sparse reads

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2 months agoMerge PR #67454 into squid
Patrick Donnelly [Fri, 13 Mar 2026 15:22:30 +0000 (20:52 +0530)]
Merge PR #67454 into squid

* refs/pull/67454/head:
qa: krbd_rxbounce.sh: do more reads to generate more errors

Reviewed-by: Ramana Raja <rraja@redhat.com>
2 months agoMerge PR #66985 into squid
Patrick Donnelly [Fri, 13 Mar 2026 15:18:11 +0000 (20:48 +0530)]
Merge PR #66985 into squid

* refs/pull/66985/head:
monitoring: make cluster matcher backward compatible for pre-7.1 metrics

Reviewed-by: Afreen Misbah <afreen@ibm.com>
2 months agoMerge PR #67761 into squid
Patrick Donnelly [Fri, 13 Mar 2026 14:55:41 +0000 (20:25 +0530)]
Merge PR #67761 into squid

* refs/pull/67761/head:
qa: whitelist slow requests progress.yaml
qa: make test_progress atomically capture OSD marked in/out events

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Shraddha Agrawal <shraddhaag@ibm.com>
2 months agoqa: whitelist slow requests progress.yaml 67761/head
Kamoltat (Junior) Sirivadhna [Fri, 6 Mar 2026 17:20:18 +0000 (17:20 +0000)]
qa: whitelist slow requests progress.yaml

The reason we had a slow-requests is because during the test, 16 concurrent 4 MB writes were running while recovery and backfill were disabled. At the same time, osd.0 was marked out and then back in, causing PG remapping. Because recovery/backfill was disabled, some PGs could not restore their replicas after the remap, leaving them in degraded/remapped states. As a result, a batch of writes remained stuck in the replicated write path, leading to IO stall and slow ops being reported. Solution is to ignore this as we are testing the progress module, not the write paths of OSDs. We intentionally disable backfill and recovery in order to prevent the recovery event to finish quickly. We wanted to prolong it until the progress event pops up.

Fixes: https://tracker.ceph.com/issues/70320
Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 6b0c943c8bd004665529c5c5786ecec42bcc9ff7)

2 months agoqa: make test_progress atomically capture OSD marked in/out events
Kamoltat (Junior) Sirivadhna [Wed, 4 Mar 2026 22:08:49 +0000 (22:08 +0000)]
qa: make test_progress atomically capture OSD marked in/out events

Problem:
Test had a race condition where events could complete and disappear
between checking the event count and fetching the event, causing
test failures.

Solution:
Refactor to atomically capture events during the wait condition check.
Added helper methods _wait_for_osd_marked_out_event() and
_wait_for_osd_marked_in_event() that capture events at the moment
they're detected, eliminating the race window.

Fixes: https://tracker.ceph.com/issues/70320
Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 0ef66f6f2e1881061ecb49e457bb2b9061c0260b)

2 months agoMerge PR #66480 into squid
Patrick Donnelly [Thu, 12 Mar 2026 09:25:59 +0000 (14:55 +0530)]
Merge PR #66480 into squid

* refs/pull/66480/head:
mgr/dashboard: service creation fails if service name is same as service type

Reviewed-by: Afreen Misbah <afreen@ibm.com>
2 months agoMerge PR #67582 into squid
Patrick Donnelly [Tue, 10 Mar 2026 09:33:47 +0000 (15:03 +0530)]
Merge PR #67582 into squid

* refs/pull/67582/head:
librbd/mirror: detect trashed snapshots in UnlinkPeerRequest

Reviewed-by: Ramana Raja <rraja@redhat.com>
2 months agoMerge PR #67580 into squid
Patrick Donnelly [Tue, 10 Mar 2026 09:28:36 +0000 (14:58 +0530)]
Merge PR #67580 into squid

* refs/pull/67580/head:
librbd: don't complete ImageUpdateWatchers::shut_down() prematurely

Reviewed-by: Ramana Raja <rraja@redhat.com>
2 months agomgr/dashboard: service creation fails if service name is same as service type 66480/head
Naman Munet [Fri, 21 Nov 2025 04:41:44 +0000 (10:11 +0530)]
mgr/dashboard: service creation fails if service name is same as service type

Fixes: https://tracker.ceph.com/issues/73948
Signed-off-by: Naman Munet <naman.munet@ibm.com>
(cherry picked from commit 57d081d6b5efcbeac6c60e73d50aa5f1f8cab560)

2 months agomgr/orchestrator: make group parameter optional for nvmeof (squid) 67623/head
Kefu Chai [Tue, 3 Mar 2026 04:51:32 +0000 (12:51 +0800)]
mgr/orchestrator: make group parameter optional for nvmeof (squid)

Add default value for group parameter in nvmeof commands to maintain
backward compatibility with existing squid tests and deployments.

Context:
--------
On main branch, when commit 6bee4e10f7f added the group parameter, the
tests were subsequently updated to provide the group argument explicitly:

  Main test: ceph orch apply nvmeof foo default
  Expected: nvmeof.foo.default

However, on squid branch, the existing tests still use the older syntax
without specifying a group:

  Squid test: ceph orch apply nvmeof foo
  Expected: nvmeof.foo

The previous cherry-pick (e1612d048a1) fixed the service_id construction
logic to handle empty groups correctly, but the group parameter was still
required without a default value, causing "ceph orch apply nvmeof foo" to
fail with EINVAL (missing required argument).

This commit adds the missing default value (group: str = '') to make the
parameter optional, maintaining backward compatibility with existing squid
tests and user scripts that don't specify a group.

With both changes:
1. Cherry-picked e1612d048a1: service_id logic handles empty group
2. This commit: group parameter has default value ''

Result:
  "ceph orch apply nvmeof foo" works (creates nvmeof.foo)
  "ceph orch apply nvmeof foo mygroup" also works (creates nvmeof.foo.mygroup)

Test: qa/suites/orch/cephadm/smoke-roleless/2-services/nvmeof.yaml
Fixes job 50373 failure from test run dgalloway-2026-02-13_23:06:25

Please note, this change was not cherry-picked from main branch, because
main intentionally still requires the CLI group argument for arch
apply/add nvmeof, and its tests were updated accordingly.
On squid, however, the earlier cherry-pick 6bee4e10 introduced the
required group parameter, but squid still has the old test/behavior
(ceph orch apply nvmeof foo expecting nvmeof.foo) and does not contain
the later main commits 3e5e85aadc1 and b377085c302.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
2 months agolibrbd/cache/pwl: WriteLogOperationSet::cell can be garbage 67704/head
Ilya Dryomov [Mon, 16 Feb 2026 21:24:47 +0000 (22:24 +0100)]
librbd/cache/pwl: WriteLogOperationSet::cell can be garbage

The pointer is never initialized but gets printed by operator<<.
Luckily outside of that it's unused.

Fixes: https://tracker.ceph.com/issues/74971
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit bffa11487cb7d68c0aa39994f50fbc3b4b00e415)

2 months agomgr/dashboard: add types for mgr-module list 62454/head
Nizamudeen A [Fri, 14 Mar 2025 07:10:45 +0000 (12:40 +0530)]
mgr/dashboard: add types for mgr-module list

also introducing a const for rgw

Fixes: https://tracker.ceph.com/issues/70331
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 3d6de8a669887c57711f176b3a75f2f2a635a23e)

 Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-multisite-details/rgw-multisite-details.component.ts
 - kept only the import that is relavant
src/pybind/mgr/dashboard/frontend/src/app/shared/api/mgr-module.service.ts
 - same as above

2 months agomgr/dashboard: fix access control permissions for roles
Nizamudeen A [Wed, 5 Mar 2025 16:46:03 +0000 (22:16 +0530)]
mgr/dashboard: fix access control permissions for roles

Since prometheus is being used in the dashboard page we need to make
sure every role has prometheus read only access so that the dashboard
page can load the utilization metrics.

I also saw permission issue with the osd settings endpoint when its
trying to get the nearfull/full ratio. so instead of failing the entire
page i am proceeding with a chart that doesn't have those details when
the user doesn't have permission to access the config opt.

Multisite page was not accessible in the case of rgw-manager or
read-only user because its trying to show the status of rgw module. This
si also now gracefully handled to show the alert only when the user has
sufficient permission.

Fixes: https://tracker.ceph.com/issues/70331
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit f4bc03e4040ca32591d9b46b79309b162c3942db)

 Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/dashboard-v3/dashboard/dashboard-v3.component.ts
 - kept changes only relavant to bug fix and ignored the other changes
   like h/w monitoring
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-multisite-details/rgw-multisite-details.component.html
 - ignored multisite wizard changes
src/pybind/mgr/dashboard/frontend/src/app/core/navigation/administration/administration.component.html
 - kept the current changes since carbon is not there in squid which
   means this issue is not present
src/pybind/mgr/dashboard/frontend/src/app/core/navigation/navigation/navigation.component.html
 - kept the current changes for the same reason above
src/pybind/mgr/dashboard/services/access_control.py
 - ignored the SMB role manager and kept only what's available in squid

2 months agopybind/mgr/orchestrator/module.py: NvmeofServiceSpec service_id
Alexander Indenbaum [Mon, 23 Sep 2024 08:47:47 +0000 (08:47 +0000)]
pybind/mgr/orchestrator/module.py: NvmeofServiceSpec service_id

- make service_id better alligned with default/empty group
  (https://github.com/ceph/ceph/commit/f6d552d7c777f1160545188dcffa6b685b05ca8a)
- fix service_id in nvmeof daemon add

Signed-off-by: Alexander Indenbaum <aindenba@redhat.com>
(cherry picked from commit e1612d048a102a716aaa8b5d0d91a45525828664)

2 months agolibrbd/mirror: detect trashed snapshots in UnlinkPeerRequest 67582/head
Ilya Dryomov [Tue, 24 Feb 2026 11:46:35 +0000 (12:46 +0100)]
librbd/mirror: detect trashed snapshots in UnlinkPeerRequest

If two instances of UnlinkPeerRequest race with each other (e.g. due
to rbd-mirror daemon unlinking from a previous mirror snapshot and the
user taking another mirror snapshot at same time), the snapshot that
UnlinkPeerRequest was created for may be in the process of being removed
(which may mean trashed by SnapshotRemoveRequest::trash_snap()) or fully
removed by the time unlink_peer() grabs the image lock.  Because trashed
snapshots weren't handled explicitly, UnlinkPeerRequest could spuriously
fail with EINVAL ("not mirror snapshot" case) instead of the expected
ENOENT ("missing snapshot" case).  This in turn could lead to spurious
ImageReplayer failures with it stopping prematurely.

Fixes: https://tracker.ceph.com/issues/68279
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3596ca077097a4e0ff8e8d05a410c2044332391e)

2 months agolibrbd: don't complete ImageUpdateWatchers::shut_down() prematurely 67580/head
Ilya Dryomov [Wed, 25 Feb 2026 10:37:16 +0000 (11:37 +0100)]
librbd: don't complete ImageUpdateWatchers::shut_down() prematurely

ImageUpdateWatchers::flush() requests aren't tracked with
m_in_flight-like mechanism the way ImageUpdateWatchers::send_notify()
requests are, but in both cases callbacks that represent delayed work
that is very likely to (indirectly) reference ImageCtx are involved.
When the image is getting closed, ImageUpdateWatchers::shut_down() is
called before anything that belongs to ImageCtx is destroyed.  However,
the shutdown can complete prematurely in the face of a pending flush if
one gets sent shortly before CloseRequest is invoked.  The callback for
that flush will then race with CloseRequest and may execute after parts
of or even the entire ImageCtx is destroyed, leading to use-after-free
and various segfaults.

Fixes: https://tracker.ceph.com/issues/75161
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3ea6ee62aa339d1ad9976fdcc6e207a505f9bf44)

2 months agorgw/notification: fix reserved_size drift in 2pc_queue causing ENOSPC errors 67575/head
Krunal Chheda [Tue, 10 Feb 2026 21:01:03 +0000 (16:01 -0500)]
rgw/notification: fix reserved_size drift in 2pc_queue causing ENOSPC errors

The urgent_data.reserved_size field was accumulating incorrect values over time due to a mismatch between what was added during reserve() and what was subtracted during commit()/abort(). This caused the reserved_size to grow unbounded, eventually hitting the queue capacity limit and returning ENOSPC errors even when the queue had plenty of actual space.

solution:
Add a one time self healing capability, where the reservation value is re calculated during the reserve and counter is updated with correct value.

Signed-off-by: Krunal Chheda <kchheda3@bloomberg.net>
(cherry picked from commit 7f4eaee30cba6efd3e0acc5b3c315c182a3bc8d9)

2 months agorgw/notification: Prevent reserved_size leak by decrementing overhead on commit/abort.
Krunal Chheda [Mon, 2 Feb 2026 21:04:52 +0000 (16:04 -0500)]
rgw/notification: Prevent reserved_size leak by decrementing overhead on commit/abort.

Signed-off-by: kchheda3 <kchheda3@bloomberg.net>
(cherry picked from commit 00ad83d3ab23b7da8e9fc1813f7214ffd153e314)

2 months agoMerge PR #67558 into squid
Patrick Donnelly [Fri, 27 Feb 2026 15:05:20 +0000 (10:05 -0500)]
Merge PR #67558 into squid

* refs/pull/67558/head:
test: disable known flaky tests in run-rbd-unit-tests

Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agotest: disable known flaky tests in run-rbd-unit-tests 67558/head
Ilya Dryomov [Thu, 19 Feb 2026 14:45:39 +0000 (15:45 +0100)]
test: disable known flaky tests in run-rbd-unit-tests

The failures seem to be more frequent on newer hardware.  In the
absence of immediate fixes, disable a few tests that have been known to
be flaky for a long time to avoid disrupting "make check" runs.

Fixes: https://tracker.ceph.com/issues/75163
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ecb616681b5347676198b33c80de60742bac7b69)

2 months agomgr/Mgr.cc: clear daemon health metrics instead of removing down/out osd from daemon... 67527/head
Cory Snyder [Fri, 19 Apr 2024 15:42:00 +0000 (15:42 +0000)]
mgr/Mgr.cc: clear daemon health metrics instead of removing down/out osd from daemon state

Reverts the change from https://github.com/ceph/ceph/pull/53993
and directly clears daemon health metrics for down and out OSDs.
The former approach of removing down/out OSDs from the daemon
state has undesirable consequences for stat output, including
the prometheus exporter.

Fixes: https://tracker.ceph.com/issues/66168
Fixes: https://tracker.ceph.com/issues/70164 - squid backport
Signed-off-by: Cory Snyder <csnyder@1111systems.com>
(cherry picked from commit 282558cf40274366360bb3b1ec0fa102fbb592a6)

2 months agodoc: fetch releases from main branch 67001/head
Patrick Donnelly [Thu, 15 Jan 2026 16:35:34 +0000 (11:35 -0500)]
doc: fetch releases from main branch

So we do not need to backport actual EOL dates.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 4e57701c59d43d8b2e51b99664ba529cbf9445a8)

2 months agoqa/workunits/fs/misc: remove data pool cleanup 63018/head
Patrick Donnelly [Mon, 21 Apr 2025 15:20:46 +0000 (11:20 -0400)]
qa/workunits/fs/misc: remove data pool cleanup

This cleanup is at the very least incorrect as it can cause the MDS to throw
read-only errors because the data pool is removed before it can write out
backtraces.

We've not yet finalized a truly safe workflow to remove a data pool -- even
flushing the MDS journals first may not be enough (considering a large purge
queue).

Fixes: https://tracker.ceph.com/issues/70919
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit f45cf2d53729b95ed1257241efaaa97d3d63abe1)

2 months agoqa: do not fail cephfs QA tests for slow bluestore ops 61302/head
Patrick Donnelly [Fri, 27 Sep 2024 00:39:40 +0000 (20:39 -0400)]
qa: do not fail cephfs QA tests for slow bluestore ops

Fixes: https://tracker.ceph.com/issues/68283
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 256b20de486337dde92fcb2067e0351ea6e67f54)

2 months agoMerge PR #67501 into squid
Patrick Donnelly [Wed, 25 Feb 2026 01:56:25 +0000 (20:56 -0500)]
Merge PR #67501 into squid

* refs/pull/67501/head:
doc: Remove sphinxcontrib-seqdiag Python package from RTD builds

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2 months agodoc: Remove sphinxcontrib-seqdiag Python package from RTD builds 67501/head
Ville Ojamo [Mon, 5 Jan 2026 06:10:45 +0000 (13:10 +0700)]
doc: Remove sphinxcontrib-seqdiag Python package from RTD builds

This is a proactive PR to avoid breaking docs builds when Setuptools 81
starts to be used in the RTD builds process.

The sphnixcontrib-seqdiag Python package is not compatible with
Setuptools 81 or later due to use of pkg_resources:
https://setuptools.pypa.io/en/latest/pkg_resources.html

Setuptools 81 release should be imminent, with the Python deprecation
warning stating pkg_resources "removal as early as 2025-11-30".

Seqdiag seems to be unmaintained with the latest update at Pypi in
the year 2021 and also no updates to the seqdiag git repo.

There are no seqdiag directives left in the docs after last seqdiags
were removed in PR #52308.

Two other options would exist for fixing the situation (see PR for
discussion) but this seems to be the suitable one.

Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
(cherry picked from commit 15481e509b4d644d0644188501d86a4ceda2c039)

Conflicts:
doc/conf.py: formatting difference

2 months agoMerge PR #67497 into squid
Patrick Donnelly [Tue, 24 Feb 2026 21:46:02 +0000 (16:46 -0500)]
Merge PR #67497 into squid

* refs/pull/67497/head:
squid: qa: add missing .qa links

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
2 months agosquid: qa: add missing .qa links 67497/head
Patrick Donnelly [Tue, 24 Feb 2026 21:39:48 +0000 (16:39 -0500)]
squid: qa: add missing .qa links

So that "Check for missing .qa links" passes.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
3 months agoqa: krbd_rxbounce.sh: do more reads to generate more errors 67454/head
Ilya Dryomov [Sun, 8 Feb 2026 08:27:26 +0000 (09:27 +0100)]
qa: krbd_rxbounce.sh: do more reads to generate more errors

On faster hardware having each thread do 1024 reads isn't always
sufficient for the "two orders of magnitude" threshold that is used in
the test.

Fixes: https://tracker.ceph.com/issues/74712
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 7f85b9ddb54cb3d83311daa7a72be27731be2806)

3 months agoos/bluestore: Fix default base size for histogram 67398/head
Adam Kupczyk [Thu, 6 Feb 2025 12:37:21 +0000 (12:37 +0000)]
os/bluestore: Fix default base size for histogram

Use allocator's unit size instead of 4096 as default value.
This makes "bluefs-db" and "bluefs-wal" histograms work on defaults.

+ Fixed error printout

Fixes: https://tracker.ceph.com/issues/69855
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit 55198cfa05bb033e2de5da120b2ccf922318ea3e)

3 months agoMerge PR #64815 into squid
Patrick Donnelly [Wed, 18 Feb 2026 15:41:32 +0000 (10:41 -0500)]
Merge PR #64815 into squid

* refs/pull/64815/head:
The compilation of ISAL compress in the current code depends on the macro HAVE_NASM_X64_AVX2. However, the macro HAVE_NASM_X64_AVX2 has been removed, resulting in the compression not using ISAL even if the compressor_zlib_isal parameter is set to true.

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
3 months agoosd/PrimaryLogPG: encode an empty data_bl for empty sparse reads 67356/head
Ilya Dryomov [Tue, 13 Jan 2026 19:32:14 +0000 (20:32 +0100)]
osd/PrimaryLogPG: encode an empty data_bl for empty sparse reads

Commit 0cf383da0741 ("ReplicatedPG: clamp SPARSE_READ to object size
for ec pool") didn't handle the case of a sparse read that ends up
being empty correctly: the OSD encodes only an empty extent map whereas
clients (both userspace and kernel) also expect to see an empty data
buffer.  IOW the reply contains one 32-bit zero instead of the expected
two.

Fixes: https://tracker.ceph.com/issues/74394
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit c694c35bbfce6e3033b34fe6994b40b00fad11d9)

3 months agoqa/valgrind.supp: make gcm_cipher_internal suppression more resilient 67280/head
Ilya Dryomov [Tue, 11 Nov 2025 20:39:58 +0000 (21:39 +0100)]
qa/valgrind.supp: make gcm_cipher_internal suppression more resilient

gcm_cipher_internal() and ossl_gcm_stream_final() make it to the stack
trace only on CentOS Stream 9.  On Ubuntu 22.04 and Rocky 10, it looks
as follows:

Thread 4 msgr-worker-1:
Conditional jump or move depends on uninitialised value(s)
   at 0x70A36D4: ??? (in /usr/lib64/libcrypto.so.3.2.2)
   by 0x70A39A1: ??? (in /usr/lib64/libcrypto.so.3.2.2)
   by 0x6F8A09C: EVP_DecryptFinal_ex (in /usr/lib64/libcrypto.so.3.2.2)
   by 0xB498C1F: ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v15_2_0::list&) (crypto_onwire.cc:271)
   by 0xB4992D7: ceph::msgr::v2::FrameAssembler::disassemble_preamble(ceph::buffer::v15_2_0::list&) (frames_v2.cc:281)
   by 0xB482D98: ProtocolV2::handle_read_frame_preamble_main(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int) (ProtocolV2.cc:1149)
   by 0xB475318: ProtocolV2::run_continuation(Ct<ProtocolV2>&) (ProtocolV2.cc:54)
   by 0xB457012: AsyncConnection::process() (AsyncConnection.cc:495)
   by 0xB49E61A: EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*) (Event.cc:492)
   by 0xB49EA9D: UnknownInlinedFun (Stack.cc:50)
   by 0xB49EA9D: UnknownInlinedFun (invoke.h:61)
   by 0xB49EA9D: UnknownInlinedFun (invoke.h:111)
   by 0xB49EA9D: std::_Function_handler<void (), NetworkStack::add_thread(Worker*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (std_function.h:290)
   by 0xBB11063: ??? (in /usr/lib64/libstdc++.so.6.0.33)
   by 0x4F17119: start_thread (in /usr/lib64/libc.so.6)

The proposal to amend the existing suppression so that it's tied to the
specific callsite rather than libcrypto internals [1] received a thumbs
up from Radoslaw.

[1] https://github.com/ceph/ceph/pull/61689#issuecomment-2650179891

Fixes: https://tracker.ceph.com/issues/74672
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit d85265eab4f9344f0d08330f737ce9d7be32d716)

3 months agolibrbd: introduce RBD_LOCK_MODE_EXCLUSIVE_TRANSIENT 67278/head
Ilya Dryomov [Tue, 23 Dec 2025 13:27:18 +0000 (14:27 +0100)]
librbd: introduce RBD_LOCK_MODE_EXCLUSIVE_TRANSIENT

The existing StandardPolicy that exposed as RBD_LOCK_MODE_EXCLUSIVE
argument to rbd_lock_acquire() disables automatic exclusive lock
transitions with "permanent" semantics: any request to release the lock
causes the peer to error out immediately.  Such a lock owner can
perform maintenance operations that are proxied from other peers, but
any write-like I/O issued by other peers will fail with EROFS.

This isn't suitable for use cases where one of the peers wants to
manage exclusive lock manually (i.e. rbd_lock_acquire() is used) but
the lock is acquired only for very short periods of time.  The rest of
the time the lock is expected to be held by other peers that stay in
the default "auto" mode (AutomaticPolicy) and run as usual, completely
unconcerned with each other or the manual-mode peer.  However, these
peers get acutely aware of the manual-mode peer because when it grabs
the lock with RBD_LOCK_MODE_EXCLUSIVE their I/O gets disrupted: higher
layers translate EROFS into generic EIO, filesystems shut down, etc.

Add a new TransientPolicy exposed as RBD_LOCK_MODE_EXCLUSIVE_TRANSIENT
to allow disabling automatic exclusive lock transitions with semantics
that would cause the other peers to block waiting for the lock to be
released by the manual-mode peer.  This is intended to be a low-level
interface -- no attempt to safeguard against potential misuse causing
e.g. indefinite blocking is made.

It's possible to switch between RBD_LOCK_MODE_EXCLUSIVE and
RBD_LOCK_MODE_EXCLUSIVE_TRANSIENT modes of operation both while the
lock is held and after it's released.

Fixes: https://tracker.ceph.com/issues/73824
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 8740544c51781211b82ac45d4bc93b9eb9623e76)

3 months agolibrbd: prepare lock_acquire() for changing between policies
Ilya Dryomov [Mon, 19 Jan 2026 16:43:41 +0000 (17:43 +0100)]
librbd: prepare lock_acquire() for changing between policies

In preparation for adding a new TransientPolicy, get rid of the check
implemented in terms of exclusive_lock::Policy::may_auto_request_lock()
that essentially makes it so that exclusive lock policy on a given
image handle can be changed from the default AutomaticPolicy only once.
In order to effect another change a new image handle would have been
needed which is pretty suboptimal.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1f9396ff8208accdf334c088745d2734225b34c1)

3 months agolibrbd: fix RequestLockPayload log message in ImageWatcher
Ilya Dryomov [Mon, 22 Dec 2025 18:07:27 +0000 (19:07 +0100)]
librbd: fix RequestLockPayload log message in ImageWatcher

exclusive_lock::Policy::lock_requested() isn't guaranteed to queue
the release of exclusive lock (and in fact only one of the two existing
implementations does that).  Instead of talking about the lock, log the
response to the notification.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ff89abf5ddcca91c34cfd46d288d41fe93ec38b0)

3 months agolibrbd: amend error message in lock_acquire()
Ilya Dryomov [Mon, 22 Dec 2025 16:22:53 +0000 (17:22 +0100)]
librbd: amend error message in lock_acquire()

... since it went stale with commit 2914eef50d69 ("rbd: Changed
exclusive-lock implementation to use the new managed-lock").  In the
context of exclusive lock, requesting the lock refers to a specific
action which may or may not be performed as part of acquiring the lock
and lock_acquire() doesn't get visibility into that.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 9159a2dd032bf47a030260660ff64b5be2d72648)

3 months agoMerge pull request #66538 from Jayaprakash-ibm/wip-fix-cot-bz2404644-squid
SrinivasaBharathKanta [Sat, 7 Feb 2026 00:28:52 +0000 (05:58 +0530)]
Merge pull request #66538 from Jayaprakash-ibm/wip-fix-cot-bz2404644-squid

squid: tools: handle get-attr as read-only ops in ceph_objectstore_tool

3 months agoMerge pull request #66296 from idryomov/wip-69492-squid
Ilya Dryomov [Fri, 6 Feb 2026 18:18:49 +0000 (19:18 +0100)]
Merge pull request #66296 from idryomov/wip-69492-squid

squid: rbd-mirror: add cluster fsid to remote meta cache key

Reviewed-by: Mykola Golub <mykola.golub@clyso.com>
3 months agoMerge pull request #67162 from idryomov/wip-74676-squid
Yuri Weinstein [Fri, 6 Feb 2026 16:22:53 +0000 (08:22 -0800)]
Merge pull request #67162 from idryomov/wip-74676-squid

squid: qa/tasks/rbd_mirror_thrash: don't use random.randrange() on floats

Reviewed-by: Ramana Raja <rraja@redhat.com>
3 months agoMerge pull request #67153 from idryomov/wip-74669-squid
Yuri Weinstein [Fri, 6 Feb 2026 16:22:09 +0000 (08:22 -0800)]
Merge pull request #67153 from idryomov/wip-74669-squid

squid: qa/workunits/rbd: reduce randomized sleeps in live import tests

Reviewed-by: Miki Patel <miki.patel132@gmail.com>
3 months agoMerge pull request #67151 from idryomov/wip-74601-squid
Yuri Weinstein [Fri, 6 Feb 2026 16:21:20 +0000 (08:21 -0800)]
Merge pull request #67151 from idryomov/wip-74601-squid

squid: qa/workunits/rbd: adapt rbd_mirror.sh for trial nodes

Reviewed-by: Miki Patel <miki.patel132@gmail.com>
3 months agoMerge pull request #66627 from idryomov/wip-74168-squid
Yuri Weinstein [Fri, 6 Feb 2026 16:20:38 +0000 (08:20 -0800)]
Merge pull request #66627 from idryomov/wip-74168-squid

squid: librbd: fix ExclusiveLock::accept_request() when !is_state_locked()

Reviewed-by: Ramana Raja <rraja@redhat.com>
3 months agoMerge pull request #64621 from idryomov/wip-71961-squid
Yuri Weinstein [Fri, 6 Feb 2026 16:19:46 +0000 (08:19 -0800)]
Merge pull request #64621 from idryomov/wip-71961-squid

squid: librbd: images aren't closed in group_snap_*_by_record() on error

Reviewed-by: Miki Patel <miki.patel132@gmail.com>
3 months agoMerge pull request #59647 from mchangir/wip-67940-squid
Neeraj Pratap Singh [Wed, 4 Feb 2026 12:12:27 +0000 (17:42 +0530)]
Merge pull request #59647 from mchangir/wip-67940-squid

squid: mgr/snap_schedule: correctly fetch mds_max_snaps_per_dir from mds

3 months agoMerge pull request #65289 from dparmar18/wip-72504-squid
Neeraj Pratap Singh [Wed, 4 Feb 2026 12:08:08 +0000 (17:38 +0530)]
Merge pull request #65289 from dparmar18/wip-72504-squid

squid: client: fix unmount hang after lookups

3 months agoMerge pull request #67185 from kshtsk/wip-74593-squid
David Galloway [Tue, 3 Feb 2026 21:21:17 +0000 (16:21 -0500)]
Merge pull request #67185 from kshtsk/wip-74593-squid

squid: qa/workunits/rgw: drop netstat usage

3 months agoqa/workunits/rgw: drop netstat usage 67185/head
Kyr Shatskyy [Fri, 21 Nov 2025 21:20:04 +0000 (22:20 +0100)]
qa/workunits/rgw: drop netstat usage

The `netstat` is deprecated now in modern Linux and usually
requires an extra package dependency to be installed.
Usually it is `net-tools`, however, for example, opensuse,
`netstat` does not present in it. Thus, let us use `ss` as
an alternative.

When using `netstat -nltp` we get lines like:
   'tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      25156/valgrind.bin  \ntcp6       0      0 :::443                  :::*                    LISTEN      25156/valgrind.bin  \n'
When using `ss -nltp` we get lines like:
   'LISTEN 0      4096           0.0.0.0:443       0.0.0.0:*    users:(("memcheck-amd64-",pid=66045,fd=72))'
so we need to filter processes by `memcheck`. However further
parsing code works equivalently as for netstat.

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@clyso.com>
(cherry picked from commit 82063f99024a8937dfa105e0828beda1bc730247)

3 months agoMerge pull request #60567 from k0ste/wip-68781-squid
Brad Hubbard [Mon, 2 Feb 2026 23:24:31 +0000 (09:24 +1000)]
Merge pull request #60567 from k0ste/wip-68781-squid

squid: osd: add clear_shards_repaired command

Reviewed-by: Ronen Friedman <rfriedma@redhat.com>