From: Patrick Donnelly Date: Tue, 3 Mar 2026 21:57:17 +0000 (-0500) Subject: reef: qa: workaround pacific OSDs sending SERVER_REEF feature bits X-Git-Tag: v18.2.8~1^2~1 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=5a52bfaf4400a7d7632602f455c601744c1e7114;p=ceph.git reef: qa: workaround pacific OSDs sending SERVER_REEF feature bits This is fun: A bug was introduced that started with 6c097015bbc1bcfa8abe518680a3d3a17ff39884. The MON_SINGLE_PAXOS was deprecated but kept in CEPH_FEATURES_ALL and not removed until f1ecf99a86edfe899392b6b734351f1015a93be6 which didn't get released until Quincy. So Pacific OSDs are still advertising MON_SINGLE_PAXOS which is interpreted as SERVER_REEF by reef monitors. So why didn't we catch that during upgrades to reef from pacific for v18.2.0 QA testing? WELL, have I got a surprise for you. We didn't check that all OSDs are running reef until 25e8b22c6f29cd3947b501f6aaf7614ba204a2c8 which was released in v18.2.5. Fixes: 25e8b22c6f29cd3947b501f6aaf7614ba204a2c8 Signed-off-by: Patrick Donnelly Fixes: https://tracker.ceph.com/issues/75034 --- diff --git a/qa/suites/upgrade/pacific-x/parallel/1-tasks.yaml b/qa/suites/upgrade/pacific-x/parallel/1-tasks.yaml index f17bb9b5abd..99525b2f2bf 100644 --- a/qa/suites/upgrade/pacific-x/parallel/1-tasks.yaml +++ b/qa/suites/upgrade/pacific-x/parallel/1-tasks.yaml @@ -43,6 +43,18 @@ tasks: - test_telemetry_pacific.sh - print: "**** done end telemetry pacific..." +# This is fun: A bug was introduced that started with +# 6c097015bbc1bcfa8abe518680a3d3a17ff39884. The MON_SINGLE_PAXOS was +# deprecated but kept in CEPH_FEATURES_ALL and not removed until +# f1ecf99a86edfe899392b6b734351f1015a93be6 which didn't get released +# until Quincy. So Pacific OSDs are still advertising MON_SINGLE_PAXOS +# which is interpreted as SERVER_REEF by reef monitors. So why didn't we +# catch that during upgrades to reef from pacific for v18.2.0 QA testing? +# WELL, have I got a surprise for you. We didn't check that all OSDs are +# running reef until 25e8b22c6f29cd3947b501f6aaf7614ba204a2c8 which was +# released in v18.2.5. +- ceph health mute OSD_UPGRADE_FINISHED --sticky + - print: "**** done start parallel" - parallel: - workload diff --git a/qa/suites/upgrade/pacific-x/stress-split/0-roles.yaml b/qa/suites/upgrade/pacific-x/stress-split/0-roles.yaml index ad3ee43d38e..7fea077b875 100644 --- a/qa/suites/upgrade/pacific-x/stress-split/0-roles.yaml +++ b/qa/suites/upgrade/pacific-x/stress-split/0-roles.yaml @@ -29,3 +29,5 @@ overrides: conf: osd: osd shutdown pgref assert: true + log-ignorelist: + - OSD_UPGRADE_FINISHED diff --git a/qa/suites/upgrade/pacific-x/stress-split/1-start.yaml b/qa/suites/upgrade/pacific-x/stress-split/1-start.yaml index 352141f824d..19a697c0b6b 100644 --- a/qa/suites/upgrade/pacific-x/stress-split/1-start.yaml +++ b/qa/suites/upgrade/pacific-x/stress-split/1-start.yaml @@ -61,6 +61,19 @@ first-half-sequence: - ceph config set mgr mgr/cephadm/daemon_cache_timeout 60 - ceph config set global log_to_journald false --force + # This is fun: A bug was introduced that started with + # 6c097015bbc1bcfa8abe518680a3d3a17ff39884. The MON_SINGLE_PAXOS was + # deprecated but kept in CEPH_FEATURES_ALL and not removed until + # f1ecf99a86edfe899392b6b734351f1015a93be6 which didn't get released + # until Quincy. So Pacific OSDs are still advertising MON_SINGLE_PAXOS + # which is interpreted as SERVER_REEF by reef monitors. So why didn't we + # catch that during upgrades to reef from pacific for v18.2.0 QA testing? + # WELL, have I got a surprise for you. We didn't check that all OSDs are + # running reef until 25e8b22c6f29cd3947b501f6aaf7614ba204a2c8 which was + # released in v18.2.5. + + - ceph health mute OSD_UPGRADE_FINISHED --sticky + - echo wait for mgr daemons to upgrade # upgrade the mgr daemons first - ceph orch upgrade start --image quay.ceph.io/ceph-ci/ceph:$sha1 --daemon-types mgr