git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log

]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log

projects / ceph.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Laura Flores [Fri, 17 Apr 2026 15:27:47 +0000 (10:27 -0500)]

qa/suites/orch/cephadm: replace "reef" with "v18.2.8"

Since Reef is EOL, the "reef" tag was removed from quay.ceph.io.
The solution is to replace the reef references with the "v18.2.8" tag.

I also added some yaml files to test the reef upgrade path, as those
were missing. These exist in tentacle.

Fixes: https://tracker.ceph.com/issues/76028
Signed-off-by: Laura Flores <lflores@ibm.com>

commit | commitdiff | tree

Laura Flores [Fri, 17 Apr 2026 15:20:09 +0000 (10:20 -0500)]

qa/suites/fs/upgrade/mds_upgrade_sequence: replace "reef" with "v18.2.8"

Since Reef is EOL, the "reef" tag was removed from quay.ceph.io.
The solution is to replace it with a test for the last point release,
which is "v18.2.8".

Fixes: https://tracker.ceph.com/issues/76028
Signed-off-by: Laura Flores <lflores@ibm.com>

commit | commitdiff | tree

Laura Flores [Fri, 17 Apr 2026 15:17:24 +0000 (10:17 -0500)]

qa/suites/upgrade: use tagged versions of reef

We not longer have containers for these HEAD of these branches with the
lab relocation.

This is a direct commit to the squid branch since we got rid of the
reef upgrade path for main/umbrella.

Fixes: https://tracker.ceph.com/issues/76028
Signed-off-by: Laura Flores <lflores@ibm.com>

commit | commitdiff | tree

Laura Flores [Fri, 17 Apr 2026 15:13:56 +0000 (10:13 -0500)]

qa/suites/orch/cephadm: replace "quincy" with "v17.2.8"

Since Quincy is EOL, the "quincy" tag was removed from quay.ceph.io.
The solution is to replace the quincy references with the "v17.2.8" tag.

Fixes: https://tracker.ceph.com/issues/76028
Signed-off-by: Laura Flores <lflores@ibm.com>

commit | commitdiff | tree

Laura Flores [Fri, 17 Apr 2026 15:09:20 +0000 (10:09 -0500)]

qa/suites/fs/upgrade/mds_upgrade_sequence: replace "quincy" with "v17.2.8"

Since Quincy is EOL, the "quincy" tag was removed from quay.ceph.io.
The solution is to replace it with a test for the last point release
with a container image, which is "v17.2.8".

Fixes: https://tracker.ceph.com/issues/76028
Signed-off-by: Laura Flores <lflores@ibm.com>

commit | commitdiff | tree

Laura Flores [Fri, 17 Apr 2026 15:03:58 +0000 (10:03 -0500)]

qa/suites/upgrade/telemetry-upgrade: ignore expected health warning

"Telemetry requires re-opt-in" briefly shows up when we upgrade.
The test already re-opts in to telemetry to get rid of this warning,
but the cluster badness check still sometimes picks it up.

Fixes: https://tracker.ceph.com/issues/76028
Signed-off-by: Laura Flores <lflores@ibm.com>

commit | commitdiff | tree

Laura Flores [Thu, 16 Apr 2026 22:40:03 +0000 (17:40 -0500)]

qa/suites/upgrade: use tagged versions of quincy

We not longer have containers for these HEAD of these branches with the
lab relocation.

We'll use v17.2.8 since v17.2.9 was a limited release without containers.

This is a direct commit to the squid branch since we got rid of the
quincy upgrade path for tentacle and main.

Fixes: https://tracker.ceph.com/issues/76028
Signed-off-by: Laura Flores <lflores@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Mon, 13 Apr 2026 17:25:28 +0000 (13:25 -0400)]

Merge PR #68119 into squid

* refs/pull/68119/head:
qa/tasks/backfill_toofull.py: Fix assert failures with & without compression

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Sat, 11 Apr 2026 00:22:31 +0000 (20:22 -0400)]

Merge PR #68323 into squid

* refs/pull/68323/head:
mon/MonMap: Dump addr in backward compatible format

Reviewed-by: Yuri Weinstein <yweins@redhat.com>

commit | commitdiff | tree

Anoop C S [Fri, 5 Dec 2025 09:25:58 +0000 (14:55 +0530)]

mon/MonMap: Dump addr in backward compatible format

Prior to c5b43e9b2765ff98419c649a5ae53ec16601975d, we dumped only the
legacy string component from public_addrs as `addr`. Ensure that this
backward compatible filtering is retained when dumping MonMap.

Signed-off-by: Anoop C S <anoopcs@cryptolab.net>
(cherry picked from commit 01c85255bc9b266ecf9bd3b58d2a8d4cb4650d7f)

commit | commitdiff | tree

Patrick Donnelly [Fri, 10 Apr 2026 17:26:46 +0000 (13:26 -0400)]

Merge PR #66335 into squid

* refs/pull/66335/head:
RGW:fix obj by multipart upload cant get tag

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 10 Apr 2026 17:26:02 +0000 (13:26 -0400)]

Merge PR #67994 into squid

* refs/pull/67994/head:
test/rgw/kafka: fix kafka relase to more recent one

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 10 Apr 2026 17:23:45 +0000 (13:23 -0400)]

Merge PR #66541 into squid

* refs/pull/66541/head:
include: detect corrupt frag from byteswap
mds: dump frag_t as an object
common/frag: produce valid fragments for test instances
common: simplify fragment printing
common: properly convert frag_t to net/store endianness
mds: include sysinfo in status command output
include/frag.h: un-inline methods to reduce header dependencies

Reviewed-by: Yuri Weinstein <yweins@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Thu, 9 Apr 2026 16:58:08 +0000 (12:58 -0400)]

Merge PR #68227 into squid

* refs/pull/68227/head:
rgw: enhanced java s3-tests change setting of JAVA_HOME
rgw: java s3-tests change setting of JAVA_HOME

Reviewed-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Tue, 7 Apr 2026 14:25:14 +0000 (10:25 -0400)]

Merge PR #67322 into squid

* refs/pull/67322/head:
qa: set column for insertion
qa: bail sqlite3 on any error
qa: use actual sqlite3 blob instead of string
test: use json_extract instead of awkward json_tree

Reviewed-by: Shraddha Agrawal <shraddhaag@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Tue, 7 Apr 2026 14:23:47 +0000 (10:23 -0400)]

Merge PR #67324 into squid

* refs/pull/67324/head:
mon/HealthMonitor: avoid MON_DOWN for freshly added Monitor
mon: add time_added to mon_info_t
common/options: add missing runtime flag
mon/MonMap: cleanup initialization

Reviewed-by: Shraddha Agrawal <shraddhaag@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Tue, 7 Apr 2026 12:57:39 +0000 (08:57 -0400)]

Merge PR #62061 into squid

* refs/pull/62061/head:
tools: respect set features when adding addresses

commit | commitdiff | tree

J. Eric Ivancich [Tue, 7 Apr 2026 00:53:34 +0000 (20:53 -0400)]

rgw: enhanced java s3-tests change setting of JAVA_HOME

Under Centos 9 the Java 8 version is recognized by the substring
"java-1.8" rather than "java-8". So the grep has been modified to
accept either.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit a49d4446e4d84b28273b460b85a193011a9c4ed8)

commit | commitdiff | tree

J. Eric Ivancich [Wed, 1 Apr 2026 16:29:01 +0000 (12:29 -0400)]

rgw: java s3-tests change setting of JAVA_HOME

Previously s3tests_java.py set JAVA_HOME using the `alternatives`
command. That had issues in that `alternatives` is not present on all
Ubuntu systems, and some installations of Java don't update
alternatives. So instead we look for a "java-8" jvm in /usr/lib/jvm/
and set JAVA_HOME to the first one we find.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit b8e2796270f4558b406411682a9b916109d0c530)

commit | commitdiff | tree

Patrick Donnelly [Mon, 6 Apr 2026 15:02:49 +0000 (11:02 -0400)]

Merge PR #67800 into squid

* refs/pull/67800/head:
qa/tasks/mgr: test_module_selftest set influx hostname to avoid warnings

Reviewed-by: Laura Flores <lflores@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Mon, 6 Apr 2026 15:02:13 +0000 (11:02 -0400)]

Merge PR #61894 into squid

* refs/pull/61894/head:
pybind/rados: add note for reversed arguments to WriteOp.zero()
test/pybind/test_rados.py: add test for reversed arguments offset,length in WriteOp.zero
pybind/rados: fix the incorrect order of offset,length in WriteOp.zero

Reviewed-by: Laura Flores <lflores@redhat.com>

commit | commitdiff | tree

Wang Chao [Thu, 24 Oct 2024 01:10:44 +0000 (09:10 +0800)]

pybind/rados: add note for reversed arguments to WriteOp.zero()

Signed-off-by: Wang Chao <sean10reborn@gmail.com>
(cherry picked from commit e9ca8a01323d49c656c54d622a34280adc5b244b)

commit | commitdiff | tree

Wang Chao [Tue, 13 Aug 2024 13:34:12 +0000 (21:34 +0800)]

test/pybind/test_rados.py: add test for reversed arguments offset,length in WriteOp.zero

Before the fix, zero(0, 2) would have no effect, and read would get '12345' instead of the expected '\x00\x00345'.

Signed-off-by: Wang Chao <sean10reborn@gmail.com>
(cherry picked from commit 3a27c3e58fca96d0f0c80a1d264cb3f5f156f5c3)

commit | commitdiff | tree

Wang Chao [Sat, 10 Aug 2024 11:40:52 +0000 (19:40 +0800)]

pybind/rados: fix the incorrect order of offset,length in WriteOp.zero

The offset and length parameters in the rados pybind `WriteOp.zero()` method are being passed to the rados_write_op_zero() function in the incorrect order.
Incorrect order cause OP_ZERO not work correctly when use pybind's rados.

Signed-off-by: Wang Chao <sean10reborn@gmail.com>
(cherry picked from commit 049d7d35abe0aa2560e3bb9d4fafb43eefb4a0ed)

commit | commitdiff | tree

Radosław Zarzyński [Tue, 8 Oct 2024 13:14:49 +0000 (15:14 +0200)]

tools: respect set features when adding addresses

Fixes: https://tracker.ceph.com/issues/53751
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
(cherry picked from commit 19545eb9864b002c1a37d4f2509d1b2baa833128)

commit | commitdiff | tree

Patrick Donnelly [Thu, 2 Apr 2026 12:39:39 +0000 (08:39 -0400)]

Merge PR #67527 into squid

* refs/pull/67527/head:
mgr/Mgr.cc: clear daemon health metrics instead of removing down/out osd from daemon state

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Sridhar Seshasayee [Mon, 9 Mar 2026 09:31:54 +0000 (15:01 +0530)]

qa/tasks/backfill_toofull.py: Fix assert failures with & without compression

The following issues with the test are addressed:

1. The test was encountering assertion failure (assert backfillfull < 0.9) with
   compression enabled. This was because the condition was not factoring in the
   compression ratio. Without it the backfillfull ratio can easily exceed 1. By
   factoring in the compression ratio, the backfillfull ratio will be in the
   range (0 - n), where n can vary depending on the type of compression used.

2. The main contributing factor for (1) above is the amount of data written to
   the pool. The writes were time-bound earlier leading to excess data and
   eventually the assertion failure. By limiting the data written to the OSDs
   to 50% of the OSD capacity in the first phase and only 20% in the re-write
   phase, the outcome of the test is more deterministic regardless of
   compression being enabled or not.

3. A potential false cluster error is avoided by swapping the setting of
   the nearfull-ratio and backfill-ratio after the re-write phase.

4. Fix a couple of typos - s/tartget/target.

Fixes: https://tracker.ceph.com/issues/71005
Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>
(cherry picked from commit 91de6a0b7b8b8c2531446555c25bf53e23635982)

commit | commitdiff | tree

Patrick Donnelly [Tue, 31 Mar 2026 10:30:35 +0000 (16:00 +0530)]

Merge PR #67450 into squid

* refs/pull/67450/head:
qa/rgw: bucket notifications use pynose

Reviewed-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>

commit | commitdiff | tree

Patrick Donnelly [Sat, 28 Mar 2026 07:31:52 +0000 (13:01 +0530)]

Merge PR #67575 into squid

* refs/pull/67575/head:
rgw/notification: fix reserved_size drift in 2pc_queue causing ENOSPC errors
rgw/notification: Prevent reserved_size leak by decrementing overhead on commit/abort.

Reviewed-by: Yuval Lifshitz <ylifshit@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Sat, 28 Mar 2026 07:27:51 +0000 (12:57 +0530)]

Merge PR #67398 into squid

* refs/pull/67398/head:
os/bluestore: Fix default base size for histogram

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Sat, 28 Mar 2026 07:24:42 +0000 (12:54 +0530)]

Merge PR #67884 into squid

* refs/pull/67884/head:
qa/standalone: shorten bluefs test durations
qa/standalone: increase WAL volume size to 1GB
qa/standalone: fix bluefs expand test case

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Sat, 28 Mar 2026 07:21:54 +0000 (12:51 +0530)]

Merge PR #67392 into squid

* refs/pull/67392/head:
test/encoding/readable: Add backward incompat checks
workunits/dencoder: use readable.sh script instead of python script

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Yuval Lifshitz [Wed, 4 Mar 2026 14:53:13 +0000 (14:53 +0000)]

test/rgw/kafka: fix kafka relase to more recent one

Fixes: https://tracker.ceph.com/issues/75323
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit dc412a7e519d037acbcac8a92c7ecf2dbde9875a)

commit | commitdiff | tree

Patrick Donnelly [Wed, 25 Mar 2026 00:07:52 +0000 (20:07 -0400)]

Merge PR #66884 into squid

* refs/pull/66884/head:
Squid: mgr/dashboard: Changing placement of a mds to label - creates a new mds-service, mds.label

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Tue, 24 Mar 2026 15:10:51 +0000 (11:10 -0400)]

Merge PR #62454 into squid

* refs/pull/62454/head:
mgr/dashboard: add types for mgr-module list
mgr/dashboard: fix access control permissions for roles

Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Thu, 19 Mar 2026 23:41:52 +0000 (19:41 -0400)]

Merge PR #67796 into squid

* refs/pull/67796/head:
qa/workunits/rbd: fix unbound variable in status()
qa/workunits/rbd: short-circuit status() if "ceph -s" fails
qa: rbd_mirror_fsx_compare.sh doesn't error out as expected

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Thu, 19 Mar 2026 23:41:25 +0000 (19:41 -0400)]

Merge PR #67794 into squid

* refs/pull/67794/head:
qa/tasks: make rbd_mirror_thrash inherit from ThrasherGreenlet

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Thu, 19 Mar 2026 23:40:58 +0000 (19:40 -0400)]

Merge PR #67704 into squid

* refs/pull/67704/head:
librbd/cache/pwl: WriteLogOperationSet::cell can be garbage

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Thu, 19 Mar 2026 15:00:04 +0000 (11:00 -0400)]

Merge PR #66838 into squid

* refs/pull/66838/head:
os/bluestore: rename row names in RocksDBBlueFSVolumeSelector.
test/bluestore: add volume selector tests
os/bluestore:fix bluestore_volume_selection_reserved_factor usage
os/bluestore: print the first RocksDB level which doesn't fit into fast

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>

commit | commitdiff | tree

Igor Fedotov [Mon, 9 Feb 2026 12:21:25 +0000 (15:21 +0300)]

qa/standalone: shorten bluefs test durations

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 5a901808dfe03dad5e34ef6374e34c0c03766e96)

commit | commitdiff | tree

Igor Fedotov [Mon, 9 Feb 2026 14:58:43 +0000 (17:58 +0300)]

qa/standalone: increase WAL volume size to 1GB

to avoid unexpected test case failures due to ENOSPC.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 2de79c64420ffba91becdf29f2d4f6b2d5931830)

commit | commitdiff | tree

Igor Fedotov [Mon, 9 Feb 2026 12:19:52 +0000 (15:19 +0300)]

qa/standalone: fix bluefs expand test case

Fixes: https://tracker.ceph.com/issues/74525
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 9fc57f9ed1c61d54ca8ecd9e1b98782eee13848a)

commit | commitdiff | tree

Dnyaneshwari Talwekar [Mon, 12 Jan 2026 07:53:59 +0000 (13:23 +0530)]

Squid: mgr/dashboard: Changing placement of a mds to label - creates a new mds-service, mds.label

Fixes: https://tracker.ceph.com/issues/74376
Signed-off-by: Dnyaneshwari Talwekar <dtalweka@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:31:29 +0000 (10:31 -0400)]

Merge PR #63344 into squid

* refs/pull/63344/head:
mgr/DaemonServer: fixed mistype for mgr_osd_messages

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:25:33 +0000 (10:25 -0400)]

Merge PR #61417 into squid

* refs/pull/61417/head:
qa/cephfs: update ignorelist

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:23:07 +0000 (10:23 -0400)]

Merge PR #59688 into squid

* refs/pull/59688/head:
qa: some test set `refuse_client_session`, so the cluster log is expected

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:19:00 +0000 (10:19 -0400)]

Merge PR #64686 into squid

* refs/pull/64686/head:
mon/MgrMonitor: add a space before "is already disabled"

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:15:16 +0000 (10:15 -0400)]

Merge PR #65298 into squid

* refs/pull/65298/head:
qa/suites/upgrade: update ignorelist with cephfs specific warnings (under stress-split)
qa/suites/upgrade: add "Replacing daemon mds" to ignorelist

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:11:35 +0000 (10:11 -0400)]

Merge PR #65758 into squid

* refs/pull/65758/head:
.github: pin GH Actions to SHA-1 commit

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:09:30 +0000 (10:09 -0400)]

Merge PR #66126 into squid

* refs/pull/66126/head:
qa: ignore cluster warning (evicting unresponsive ...) with tasks/mgr-osd-full

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Igor Fedotov [Wed, 21 May 2025 08:30:15 +0000 (11:30 +0300)]

os/bluestore: rename row names in RocksDBBlueFSVolumeSelector.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit a9f591f4e1cb1e364879165250c55cb0f841d64f)

commit | commitdiff | tree

Igor Fedotov [Mon, 19 May 2025 19:20:53 +0000 (22:20 +0300)]

test/bluestore: add volume selector tests

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 158d1550a021ed60e5ad1c565b247e5b0b6d5946)

Conflicts:
src/test/objectstore/CMakeLists.txt - allocsim not present in Squid

commit | commitdiff | tree

Igor Fedotov [Mon, 19 May 2025 19:19:45 +0000 (22:19 +0300)]

os/bluestore:fix bluestore_volume_selection_reserved_factor usage

Fixes: https://tracker.ceph.com/issues/71368
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 43d7864093f92977a3fd084bbfd65229244b1cc9)

commit | commitdiff | tree

Igor Fedotov [Tue, 4 Feb 2025 16:45:13 +0000 (19:45 +0300)]

os/bluestore: print the first RocksDB level which doesn't fit into fast
device by default.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit d95aa620b315d9261cb50b0465ecfd2b6b534a60)

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:04:29 +0000 (10:04 -0400)]

Merge PR #66915 into squid

* refs/pull/66915/head:
monc: synchronize tick() of MonClient with shutdown()

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 14:03:12 +0000 (10:03 -0400)]

Merge PR #66973 into squid

* refs/pull/66973/head:
qa/tasks/thrashosds-health: whitelist PG_BACKFILL_FULL

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 13:59:32 +0000 (09:59 -0400)]

Merge PR #60391 into squid

* refs/pull/60391/head:
qa/cephfs: ignore when specific OSD is reported down during upgrade

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 13:57:44 +0000 (09:57 -0400)]

Merge PR #63026 into squid

* refs/pull/63026/head:
qa/workunits/cephtool: add extra privileges to cephtool script

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Kamoltat Sirivadhna <ksirivad@redhat.com>

commit | commitdiff | tree

Nitzan Mordechai [Sun, 7 Dec 2025 09:06:14 +0000 (09:06 +0000)]

test/encoding/readable: Add backward incompat checks

The readable.sh script has forward incompat checks, but no
backward incompat checks.

This fix will:
1. Add check for backward_incompat directory for each type for specific
    objects or all objects with the same type and skip those objects from being tested.
2. Add version comparison helper functions (version_lt, version_le, version_ge,
    versions_span) for robust version handling
3. Replace 'sort -n' with 'sort -V' for proper version number sorting
4. Add CORPUS_PATH environment variable to allow teuthology tests to execute this script
5. Improve readability of the script

The difference between backward and forward incompat:
- forward_incompat: Marks objects from older versions that newer ceph-dencoder
  versions cannot read. Example: Version 19.2.x objects marked incompat at version 20.2.x
  means ceph-dencoder v20.2.x+ can't decode them. Skip when testing old objects
  with a new ceph-dencoder.
- backward_incompat: Marks objects from newer versions that older ceph-dencoder
  versions cannot read. Example: Version 19.2.x objects marked backward_incompat at v19.2.x
  means ceph-dencoder < v19.2.x can't decode them. Skip when testing new objects
  with an old ceph-dencoder.

Fixes: https://tracker.ceph.com/issues/74074
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit 011b25d8038e0f0bd3272fa57b0c7e068feb130c)

commit | commitdiff | tree

NitzanMordhai [Mon, 2 Feb 2026 07:34:24 +0000 (07:34 +0000)]

workunits/dencoder: use readable.sh script instead of python script

The python script test_readable.py was added for backword and forward
compability. maintaining 2 scripts that finally doing the same is west,
reverting and using readable.sh and leave the python out.

https://tracker.ceph.com/issues/74074
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit 9d289ed14e79fa8008ba30b77b425a4508030110)

commit | commitdiff | tree

Casey Bodley [Thu, 19 Feb 2026 15:09:44 +0000 (10:09 -0500)]

qa/rgw: bucket notifications use pynose

nose incompatibility in multisite tests was fixed by switching to pynose
in https://github.com/ceph/teuthology/pull/1947, so i'm trying the same
here

Fixes: https://tracker.ceph.com/issues/74573
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 915a5309a639333839829b5a554f3fdb6c560464)

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 00:37:05 +0000 (20:37 -0400)]

Merge PR #63018 into squid

* refs/pull/63018/head:
qa/workunits/fs/misc: remove data pool cleanup

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 00:35:59 +0000 (20:35 -0400)]

Merge PR #61302 into squid

* refs/pull/61302/head:
qa: do not fail cephfs QA tests for slow bluestore ops

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Sun, 8 Feb 2026 15:48:54 +0000 (10:48 -0500)]

qa: set column for insertion

2026-02-08T13:02:24.439 INFO:tasks.workunit.client.0.trial031.stderr:Parse error near line 2: no such column: "start" - should this be a string literal in single-quotes?

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 6ebb625669afd6e112be26ff87a1e61cfc4ee979)

commit | commitdiff | tree

Patrick Donnelly [Sun, 8 Feb 2026 15:47:52 +0000 (10:47 -0500)]

qa: bail sqlite3 on any error

Otherwise it will wrongly proceed executing the next SQL statement.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit e6d10c23eb3c2b8b4aae15146f24bbfcf65ad1b6)

commit | commitdiff | tree

Patrick Donnelly [Sun, 8 Feb 2026 15:43:25 +0000 (10:43 -0500)]

qa: use actual sqlite3 blob instead of string

No functional change.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 1e0114bf2795e5c19348cec0eafa7d351d22dc81)

commit | commitdiff | tree

Patrick Donnelly [Sun, 8 Feb 2026 00:45:26 +0000 (19:45 -0500)]

test: use json_extract instead of awkward json_tree

Ideally this should be port better across sqlite3 versions. The sqlite3
on rocky10 failed because it started requiring components of the keys
to be quoted:

    sqlite> select * from p as a, p as b where a.i=1 and b.i = 2 and a.fullkey = '$."libcephsqlite_vfs"."opf_sync".avgcount' and b.fullkey = '$."libcephsqlite_vfs"."opf_sync".avgcount';
    i  key       value  type     atom  id   parent  fullkey                                    path                              i  key       value  type     atom  id   parent  fullkey
    -  --------  -----  -------  ----  ---  ------  -----------------------------------------  --------------------------------  -  --------  -----  -------  ----  ---  ------  ------------------
    1  avgcount  4      integer  4     581  570     $."libcephsqlite_vfs"."opf_sync".avgcount  $."libcephsqlite_vfs"."opf_sync"  2  avgcount  5      integer  5     581  570     $."libcephsqlite_v

Fixes: https://tracker.ceph.com/issues/74755
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit f304daa74ace4e6b856b585d71b8ff9c6e8a024a)

commit | commitdiff | tree

Patrick Donnelly [Wed, 19 Nov 2025 23:16:21 +0000 (18:16 -0500)]

mon/HealthMonitor: avoid MON_DOWN for freshly added Monitor

In testing, we often have the scenario where cephadm has created a
cluster but doesn't add more monitors until well past
mon_down_mkfs_grace. This causes useless MON_DOWN warnings to be thrown
which fails QA jobs. Avoid this situation entirely by giving a
reasonable grace period for a monitor added to the MonMap to join
quorum.

Fixes: https://tracker.ceph.com/issues/73934
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit b028a41e1f000b87aab3f263ab3259a0ca439555)

commit | commitdiff | tree

Patrick Donnelly [Wed, 19 Nov 2025 23:15:51 +0000 (18:15 -0500)]

mon: add time_added to mon_info_t

So we know when the Monitor was added to the map.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit c5b43e9b2765ff98419c649a5ae53ec16601975d)

Conflicts:
src/mon/MonMap.cc: generate_test_instances refactor missing

commit | commitdiff | tree

Patrick Donnelly [Wed, 19 Nov 2025 18:45:46 +0000 (13:45 -0500)]

common/options: add missing runtime flag

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 62c449e208aef23df35c311020f1518f62d3f013)

commit | commitdiff | tree

Patrick Donnelly [Wed, 19 Nov 2025 18:10:27 +0000 (13:10 -0500)]

mon/MonMap: cleanup initialization

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 42a37916ff5c69d7df4d917cd9e143b4e92d389f)

commit | commitdiff | tree

Patrick Donnelly [Thu, 13 Nov 2025 19:51:20 +0000 (14:51 -0500)]

include: detect corrupt frag from byteswap

If a big-endian MDS writes frag_t values into the metadata pool, these
will persist and confuse the MDS after it tries properly parsing them as
little-endian. Fortunately detecting this situation is fairly easy as we
restrict the number of bits and the number of bits restricts the mask
value.

Fixes: https://tracker.ceph.com/issues/73792
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 6bf91e4f6e49d99711b8be845eb77c883d662704)

commit | commitdiff | tree

Patrick Donnelly [Mon, 1 Dec 2025 20:12:37 +0000 (15:12 -0500)]

mds: dump frag_t as an object

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 1d526c50de0712a180db1b6fa39ae6f51e346c3c)

Conflicts:
src/mds/mdstypes.h: add dump method

commit | commitdiff | tree

Patrick Donnelly [Mon, 1 Dec 2025 20:12:16 +0000 (15:12 -0500)]

common/frag: produce valid fragments for test instances

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 4fb5ad4a258a3c447e9d7413ce87fcdf6a89d056)

Conflicts:
src/common/frag.cc: missing test instance refactor
src/mds/mdstypes.cc: missing test instance refactor

Conflicts:
src/mds/mdstypes.cc

commit | commitdiff | tree

Patrick Donnelly [Thu, 13 Nov 2025 19:47:24 +0000 (14:47 -0500)]

common: simplify fragment printing

There's better tooling for this now and we can avoid magic numbers.

Fixes: https://tracker.ceph.com/issues/73792
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 647de21c85f14d67e7941428c3af2ebeef39ad4f)

commit | commitdiff | tree

Patrick Donnelly [Tue, 11 Nov 2025 15:15:03 +0000 (10:15 -0500)]

common: properly convert frag_t to net/store endianness

The MDS/client are already accidentally doing the right thing unless
they are running on a big-endian machine.

Credit to Venky Shankar for originally hypothesizing an endianness issue
with the frag_t.

Fixes: https://tracker.ceph.com/issues/73792
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 9e3837c837bc9f76805f998dd06fe386dce35722)

commit | commitdiff | tree

Patrick Donnelly [Thu, 13 Nov 2025 14:24:19 +0000 (09:24 -0500)]

mds: include sysinfo in status command output

Of particular interest is the CPU architecture and endianness.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit fa4078adfc4e54d0bdd472437c7dcd8bc55ba4dd)

commit | commitdiff | tree

Max Kellermann [Fri, 25 Oct 2024 16:14:34 +0000 (18:14 +0200)]

include/frag.h: un-inline methods to reduce header dependencies

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
(cherry picked from commit 5f1a893dc54dc579a8428100496adc27d638aab9)

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 00:26:22 +0000 (20:26 -0400)]

Merge PR #67001 into squid

* refs/pull/67001/head:
doc: fetch releases from main branch

Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 18 Mar 2026 00:22:23 +0000 (20:22 -0400)]

Merge PR #67623 into squid

* refs/pull/67623/head:
mgr/orchestrator: make group parameter optional for nvmeof (squid)
pybind/mgr/orchestrator/module.py: NvmeofServiceSpec service_id

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Tue, 17 Mar 2026 14:42:18 +0000 (10:42 -0400)]

Merge PR #66964 into squid

* refs/pull/66964/head:
monitoring: upgrade grafana version to 12.3.1

Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Tue, 17 Mar 2026 14:41:38 +0000 (10:41 -0400)]

Merge PR #66990 into squid

* refs/pull/66990/head:
monitoring: fix rgw_servers filtering in rgw sync overview grafana

Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Nitzan Mordechai [Mon, 17 Nov 2025 11:51:14 +0000 (11:51 +0000)]

qa/tasks/mgr: test_module_selftest set influx hostname to avoid warnings

self-test will hit error MGR_INFLUX_NO_SERVER since we dont have
hostname configed, the following command will add a test hostname
so the error won't appear and fail the test.

Fixes: https://tracker.ceph.com/issues/72747
Signed-off-by: Nitzan Mordechai <nmordec@ibm.com>
(cherry picked from commit 6b170bb5366ec13239b768f7c344fa5e842af7ff)

commit | commitdiff | tree

Ilya Dryomov [Mon, 2 Mar 2026 11:07:48 +0000 (12:07 +0100)]

qa/workunits/rbd: fix unbound variable in status()

It was missed in commit 5fe64fa806f3 ("qa: rbd_mirror.sh: change
parameters to cluster rather than daemon name").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1a280b9a320d51bdc4cb80be9bdd6ae265151132)

commit | commitdiff | tree

Ilya Dryomov [Sun, 1 Mar 2026 21:55:52 +0000 (22:55 +0100)]

qa/workunits/rbd: short-circuit status() if "ceph -s" fails

In mirror-thrash tests, status() can be invoked after one of the
clusters is effectively stopped due to a watchdog bark:

2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons
...
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ status
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ local cluster daemon image_pool image_ns image
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ for cluster in ${CLUSTER1} ${CLUSTER2}

In this scenario all commands that are invoked from the loop body
are going to time out anyway.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 82717e43a08a1262987f5e271fd72d4433c4fb3b)

commit | commitdiff | tree

Ilya Dryomov [Sun, 1 Mar 2026 16:45:51 +0000 (17:45 +0100)]

qa: rbd_mirror_fsx_compare.sh doesn't error out as expected

In mirror-thrash tests, one of the clusters can be effectively stopped
due to a watchdog bark while rbd_mirror_fsx_compare.sh is running and is
in the middle of the "wait for all images" loop:

2026-03-01T12:55:35.059 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ retrying_seconds=1040
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ '[' 1040 -le 7200 ']'
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:++ rbd --cluster cluster2 --pool mirror ls
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:++ wc -l
2026-03-01T12:55:35.084 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ '[' 290 -ge 292 ']'
2026-03-01T12:55:35.084 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ sleep 10
...
2026-03-01T12:55:49.568 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T12:55:49.568 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons

In this scenario "rbd ls" is going to time out repeatedly, turning the
loop into up to a ~60-hour sleep (up to 720 iterations with a 5-minute
timeout + 10-second sleep per iteration).

Fixes: https://tracker.ceph.com/issues/75239
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 81a5906f0d1cc844bb4ef16aae9ace3e7d371ac2)

commit | commitdiff | tree

Ilya Dryomov [Fri, 27 Feb 2026 14:18:27 +0000 (15:18 +0100)]

qa/tasks: make rbd_mirror_thrash inherit from ThrasherGreenlet

Commit 21b4b89e5280 ("qa/tasks: watchdog terminate thrasher") made it
required for a thrasher to have stop_and_join() method, but the
preceding commit a035b5a22fb8 ("thrashers: standardize stop and join
method names") missed to add it to rbd_mirror_thrash (whether as an
ad-hoc implementation or by way of inheriting from ThrasherGreenlet).
Later on, commit 783f0e3a9903 ("qa: Adding a new class for the
daemonwatchdog to monitor") worsened the issue by expanding the use
of stop_and_join() to all watchdog barks rather than just the case of
a thrasher throwing an exception which is something that practically
never happens.

Fixes: https://tracker.ceph.com/issues/75200
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3ebe3a0a43251b0f126497d4100bd1af9ca8afc5)

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 16:02:49 +0000 (21:32 +0530)]

Merge PR #66829 into squid

* refs/pull/66829/head:
monitoring: fix CephPgImbalance alert rule expression

Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 15:36:24 +0000 (21:06 +0530)]

Merge PR #66897 into squid

* refs/pull/66897/head:
common: drop stack singleton object of temp messenger for foreground ceph daemons

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 15:34:48 +0000 (21:04 +0530)]

Merge PR #67066 into squid

* refs/pull/67066/head:
qa: Disable OSD benchmark from running for tests.

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 15:32:44 +0000 (21:02 +0530)]

Merge PR #67278 into squid

* refs/pull/67278/head:
librbd: introduce RBD_LOCK_MODE_EXCLUSIVE_TRANSIENT
librbd: prepare lock_acquire() for changing between policies
librbd: fix RequestLockPayload log message in ImageWatcher
librbd: amend error message in lock_acquire()

Reviewed-by: Ramana Raja <rraja@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 15:28:10 +0000 (20:58 +0530)]

Merge PR #67280 into squid

* refs/pull/67280/head:
qa/valgrind.supp: make gcm_cipher_internal suppression more resilient

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 15:23:54 +0000 (20:53 +0530)]

Merge PR #67356 into squid

* refs/pull/67356/head:
osd/PrimaryLogPG: encode an empty data_bl for empty sparse reads

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 15:22:30 +0000 (20:52 +0530)]

Merge PR #67454 into squid

* refs/pull/67454/head:
qa: krbd_rxbounce.sh: do more reads to generate more errors

Reviewed-by: Ramana Raja <rraja@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 15:18:11 +0000 (20:48 +0530)]

Merge PR #66985 into squid

* refs/pull/66985/head:
monitoring: make cluster matcher backward compatible for pre-7.1 metrics

Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 13 Mar 2026 14:55:41 +0000 (20:25 +0530)]

Merge PR #67761 into squid

* refs/pull/67761/head:
qa: whitelist slow requests progress.yaml
qa: make test_progress atomically capture OSD marked in/out events

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Shraddha Agrawal <shraddhaag@ibm.com>

commit | commitdiff | tree

Kamoltat (Junior) Sirivadhna [Fri, 6 Mar 2026 17:20:18 +0000 (17:20 +0000)]

qa: whitelist slow requests progress.yaml

The reason we had a slow-requests is because during the test, 16 concurrent 4 MB writes were running while recovery and backfill were disabled. At the same time, osd.0 was marked out and then back in, causing PG remapping. Because recovery/backfill was disabled, some PGs could not restore their replicas after the remap, leaving them in degraded/remapped states. As a result, a batch of writes remained stuck in the replicated write path, leading to IO stall and slow ops being reported. Solution is to ignore this as we are testing the progress module, not the write paths of OSDs. We intentionally disable backfill and recovery in order to prevent the recovery event to finish quickly. We wanted to prolong it until the progress event pops up.

Fixes: https://tracker.ceph.com/issues/70320
Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 6b0c943c8bd004665529c5c5786ecec42bcc9ff7)

commit | commitdiff | tree

Kamoltat (Junior) Sirivadhna [Wed, 4 Mar 2026 22:08:49 +0000 (22:08 +0000)]

qa: make test_progress atomically capture OSD marked in/out events

Problem:
Test had a race condition where events could complete and disappear
between checking the event count and fetching the event, causing
test failures.

Solution:
Refactor to atomically capture events during the wait condition check.
Added helper methods _wait_for_osd_marked_out_event() and
_wait_for_osd_marked_in_event() that capture events at the moment
they're detected, eliminating the race window.

Fixes: https://tracker.ceph.com/issues/70320
Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit 0ef66f6f2e1881061ecb49e457bb2b9061c0260b)

commit | commitdiff | tree

Patrick Donnelly [Thu, 12 Mar 2026 09:25:59 +0000 (14:55 +0530)]

Merge PR #66480 into squid

* refs/pull/66480/head:
mgr/dashboard: service creation fails if service name is same as service type

Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Tue, 10 Mar 2026 09:33:47 +0000 (15:03 +0530)]

Merge PR #67582 into squid

* refs/pull/67582/head:
librbd/mirror: detect trashed snapshots in UnlinkPeerRequest

Reviewed-by: Ramana Raja <rraja@redhat.com>

Unnamed repository; edit this file 'description' to name the repository.