]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
5 months agoqa/suites/upgrade/telemetry-upgrade: improve ignorelist 61723/head
Laura Flores [Fri, 7 Feb 2025 22:53:12 +0000 (16:53 -0600)]
qa/suites/upgrade/telemetry-upgrade: improve ignorelist

In this commit, I added some pattern matching for
warnings that show up in the cluster log detail that
are related to degraded PGs. In these tests, we are intentionally
restarting OSDs to upgrade them, which leads to these states
showing up in the cluster log. So, the warnings are intended
and can be ignored in the context of an upgrade.

Fixes: https://tracker.ceph.com/issues/67881
Signed-off-by: Laura Flores <lflores@ibm.com>
5 months agoqa/tasks: improve ignorelist for thrashing OSDs
Laura Flores [Fri, 7 Feb 2025 21:53:21 +0000 (15:53 -0600)]
qa/tasks: improve ignorelist for thrashing OSDs

This yaml file is used in rados/thrash-old-clients.
In this commit, I added some pattern matching for
warnings that show up in the cluster log detail that
are related to degraded PGs. In these tests, we are intentionally
marking down or killing OSDs, which leads to these states
showing up in the cluster log. So, the warnings are intended
and can be ignored in the context of OSD thrashing.

Fixes: https://tracker.ceph.com/issues/67913
Signed-off-by: Laura Flores <lflores@ibm.com>
5 months agoMerge PR #61562 into main
Patrick Donnelly [Thu, 6 Feb 2025 17:53:28 +0000 (12:53 -0500)]
Merge PR #61562 into main

* refs/pull/61562/head:
qa: remove redundant and broken test
mds: skip scrubbing damaged dirfrag
tools/cephfs/DataScan: test equality of link including frag
tools/cephfs/DataScan: skip linkages that have been removed
tools/cephfs/DataScan: do not error out when failing to read a dentry
tools/cephfs/DataScan: create all ancestors during scan_inodes
tools/cephfs/DataScan: cleanup debug prints
qa: remove old MovedDir test
qa: add data scan tests for ancestry rebuild
qa: make the directory non-empty to force migration
qa: avoid unnecessary mds restart

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 months agoMerge PR #61555 into main
Patrick Donnelly [Thu, 6 Feb 2025 17:52:57 +0000 (12:52 -0500)]
Merge PR #61555 into main

* refs/pull/61555/head:
mds: do not path traverse a damaged dirfrag
qa: test file create on damaged directory

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Christopher Hoffman <choffman@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
5 months agoMerge PR #60761 into main
Patrick Donnelly [Thu, 6 Feb 2025 17:50:44 +0000 (12:50 -0500)]
Merge PR #60761 into main

* refs/pull/60761/head:
client: resolve bogus self-assignment

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 months agoMerge PR #60252 into main
Patrick Donnelly [Thu, 6 Feb 2025 17:50:25 +0000 (12:50 -0500)]
Merge PR #60252 into main

* refs/pull/60252/head:
mds: combine several fixed-size `encode()` calls
common/fs_types: combine several fixed-size `encode()` calls

Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
5 months agoMerge pull request #60102 from MaxKellermann/dispatch_notify_one
SrinivasaBharathKanta [Thu, 6 Feb 2025 13:24:44 +0000 (18:54 +0530)]
Merge pull request #60102 from MaxKellermann/dispatch_notify_one

msg/DispatchQueue: wake up only one dispatch thread

5 months agoMerge pull request #60763 from rzarzynski/wip-cls-dont-format-dead-logs
SrinivasaBharathKanta [Thu, 6 Feb 2025 10:28:58 +0000 (15:58 +0530)]
Merge pull request #60763 from rzarzynski/wip-cls-dont-format-dead-logs

objclass: don't do costly string formatting when not needed

5 months agoMerge pull request #60262 from MaxKellermann/ms_tcp_prefetch_max_size_64k
SrinivasaBharathKanta [Thu, 6 Feb 2025 10:28:36 +0000 (15:58 +0530)]
Merge pull request #60262 from MaxKellermann/ms_tcp_prefetch_max_size_64k

common/options: increase `ms_tcp_prefetch_max_size` default to 64 kB

5 months agoMerge pull request #60176 from MaxKellermann/protocol_locks
SrinivasaBharathKanta [Thu, 6 Feb 2025 10:28:23 +0000 (15:58 +0530)]
Merge pull request #60176 from MaxKellermann/protocol_locks

msg: Reduce ProtocolV[12] locks

5 months agoMerge pull request #57718 from NitzanMordhai/wip-nitzan-monclient-try-resent-mon...
SrinivasaBharathKanta [Thu, 6 Feb 2025 10:27:01 +0000 (15:57 +0530)]
Merge pull request #57718 from NitzanMordhai/wip-nitzan-monclient-try-resent-mon-command-to-same-mon

monclient: try to resend the mon commands to the same monitor if avai…

5 months agoMerge pull request #61623 from ronen-fr/wip-rf-m2ore-keys
Ronen Friedman [Thu, 6 Feb 2025 07:44:10 +0000 (09:44 +0200)]
Merge pull request #61623 from ronen-fr/wip-rf-m2ore-keys

common,osd: replace obsolete get_tracked_conf_keys()

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
5 months agoMerge pull request #59841 from phlogistonjohn/jjm-containerized-build-pyalt
Zack Cerza [Thu, 6 Feb 2025 01:23:28 +0000 (18:23 -0700)]
Merge pull request #59841 from phlogistonjohn/jjm-containerized-build-pyalt

containerized build tools [V2]

5 months agoMerge pull request #61616 from zdover23/wip-doc-2025-02-02-cephadm-services
Zac Dover [Wed, 5 Feb 2025 23:26:16 +0000 (09:26 +1000)]
Merge pull request #61616 from zdover23/wip-doc-2025-02-02-cephadm-services

doc/cephadm: clarify "Monitoring OSD State"

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
5 months agosrc/script: add a script to help build ceph using containers 59841/head
John Mulligan [Tue, 20 Aug 2024 19:01:05 +0000 (15:01 -0400)]
src/script: add a script to help build ceph using containers

The build-with-container script tries to encapsulate nearly all major
build tasks using docker/podman containers. If there's no build image
locally it will create one for your. It provides targets for building
(make), testing (make check), building rpm packages or deb packages and
is designed to be fairly easily extended.

View the comment at the top of the source file for usage details.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
5 months agobuild: add files needed to create a build container
John Mulligan [Tue, 20 Aug 2024 19:00:57 +0000 (15:00 -0400)]
build: add files needed to create a build container

A build container contains all the tools and dependencies needed to
build ceph. It provides a Container file and small script that
helps bootstrap the container setup. This script installs a few extra
things we need before farming most of the work out to install-deps.sh.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
5 months agobuild: small script tweak to allow different build dirs
John Mulligan [Sat, 14 Sep 2024 10:31:23 +0000 (06:31 -0400)]
build: small script tweak to allow different build dirs

Move the mkdir line to allow for other builds dir naming schemes outside
of what appears in the .gitignore file. A tiny bit of added flexibility
at little cost.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
5 months agosrc/script: add helper function has_build_dir
John Mulligan [Mon, 14 Nov 2022 15:57:25 +0000 (10:57 -0500)]
src/script: add helper function has_build_dir

This function returns successfully if $BUILD_DIR exists and is valid.
This is a useful building block for automation around the build and
can be used to avoid re-running commands that fail is the build dir
exists already.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
5 months agoMerge pull request #61470 from aclamk/wip-aclamk-bluefs-bdev-expand-addendum
Adam Kupczyk [Wed, 5 Feb 2025 19:42:41 +0000 (20:42 +0100)]
Merge pull request #61470 from aclamk/wip-aclamk-bluefs-bdev-expand-addendum

os/bluestore: CBT bluefs-bdev-expand addendum

5 months agoMerge pull request #61019 from MaxKellermann/test_objectstore__WITH_BLUESTORE-1
Adam Kupczyk [Wed, 5 Feb 2025 19:37:13 +0000 (20:37 +0100)]
Merge pull request #61019 from MaxKellermann/test_objectstore__WITH_BLUESTORE-1

test/objectstore: extend `#ifdef WITH_BLUESTORE`

5 months agoMerge pull request #60547 from MaxKellermann/without_bluestore
Adam Kupczyk [Wed, 5 Feb 2025 19:36:50 +0000 (20:36 +0100)]
Merge pull request #60547 from MaxKellermann/without_bluestore

Fix two build failures with `WITH_BLUESTORE=no`

5 months agoMerge pull request #59633 from YiteGu/optimize-offline-trim-report-info
Adam Kupczyk [Wed, 5 Feb 2025 19:36:27 +0000 (20:36 +0100)]
Merge pull request #59633 from YiteGu/optimize-offline-trim-report-info

tools/ceph-bluestore-tool: optimize offline trim report info

5 months agoqa: remove redundant and broken test 61562/head
Patrick Donnelly [Wed, 5 Feb 2025 18:12:10 +0000 (13:12 -0500)]
qa: remove redundant and broken test

Scrub does not fix damaged dirfrags for any type of damage we currently mark
dirfrags damaged for (corrupt fnode / missing dirfrag object).

In any case, this scenario is covered in cephfs_data_scan with correct checks
for damage / handling.

Fixes: 7f0cf0b7a2d94dd2189de4bef5865b024f3c7d4b
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
5 months agoMerge pull request #61633 from chardan/wip-jfw-rgw-fix-editor-mode
Jesse Williamson [Wed, 5 Feb 2025 18:07:22 +0000 (10:07 -0800)]
Merge pull request #61633 from chardan/wip-jfw-rgw-fix-editor-mode

rgw: fixup for emacs/vim modes, moved to top of file.

5 months agoMerge pull request #61594 from adk3798/test-nfs-task-cluster-purge-fixup
Adam King [Wed, 5 Feb 2025 15:23:58 +0000 (10:23 -0500)]
Merge pull request #61594 from adk3798/test-nfs-task-cluster-purge-fixup

mgr/cephadm: continue in nfs service purge if grace file is already deleted

Reviewed-by: John Mulligan <jmulligan@redhat.com>
5 months agoMerge pull request #61593 from adk3798/cephadm-osd-extra-args-initial-deploy
Adam King [Wed, 5 Feb 2025 15:22:35 +0000 (10:22 -0500)]
Merge pull request #61593 from adk3798/cephadm-osd-extra-args-initial-deploy

mgr/cephadm: create OSD daemon deploy specs through make_daemon_spec

Reviewed-by: John Mulligan <jmulligan@redhat.com>
5 months agoMerge pull request #61579 from phlogistonjohn/jjm-cephadm-small-moves
Adam King [Wed, 5 Feb 2025 15:21:43 +0000 (10:21 -0500)]
Merge pull request #61579 from phlogistonjohn/jjm-cephadm-small-moves

cephadm: move three functions out of cephadm.py

Reviewed-by: Adam King <adking@redhat.com>
5 months agoMerge pull request #61578 from omidyoosefi/monitor-port-nfs
Adam King [Wed, 5 Feb 2025 15:20:06 +0000 (10:20 -0500)]
Merge pull request #61578 from omidyoosefi/monitor-port-nfs

pybind/mgr/cephadm: allow setting custom monitoring_port for nfs

Reviewed-by: Adam King <adking@redhat.com>
5 months agoMerge pull request #61571 from adk3798/cephadm-ganesha-server-scope
Adam King [Wed, 5 Feb 2025 15:17:56 +0000 (10:17 -0500)]
Merge pull request #61571 from adk3798/cephadm-ganesha-server-scope

mgr/cephadm: add Server_Scope = <fsid> to NFSv4 section of ganesha conf

Reviewed-by: John Mulligan <jmulligan@redhat.com>
5 months agoMerge pull request #61389 from Kushal-deb/fix-issue-69435_NVMe-of_service
Adam King [Wed, 5 Feb 2025 15:15:57 +0000 (10:15 -0500)]
Merge pull request #61389 from Kushal-deb/fix-issue-69435_NVMe-of_service

mgr/cephadm: Abort nvme deployment with pool that doesn't exist

Reviewed-by: Adam King <adking@redhat.com>
5 months agoMerge pull request #60991 from ShwetaBhosale1/fix_issue_69153_nfs_to_show_ingress_mode
Adam King [Wed, 5 Feb 2025 15:14:47 +0000 (10:14 -0500)]
Merge pull request #60991 from ShwetaBhosale1/fix_issue_69153_nfs_to_show_ingress_mode

mgr/nfs: Show ingress mode in output of 'ceph nfs cluster info' command

Reviewed-by: Adam King <adking@redhat.com>
5 months agoMerge pull request #60915 from Kushal-deb/fix-issue-2313279
Adam King [Wed, 5 Feb 2025 15:13:40 +0000 (10:13 -0500)]
Merge pull request #60915 from Kushal-deb/fix-issue-2313279

cephadm: Add pre_remove and ensure deployment values are reset and API settings are updated when  removing Prometheus or Alertmanager daemons

Reviewed-by: Adam King <adking@redhat.com>
5 months agoMerge pull request #61368 from cbodley/wip-69527
J. Eric Ivancich [Wed, 5 Feb 2025 14:04:06 +0000 (09:04 -0500)]
Merge pull request #61368 from cbodley/wip-69527

rgw/s3: remove local variable 'uri' that shadows member variable

Reviewed-by: Yixin Jin yjin77@yahoo.ca
5 months agoMerge pull request #61037 from thotz/make-restore-attrs-humanreadable
J. Eric Ivancich [Wed, 5 Feb 2025 14:00:59 +0000 (09:00 -0500)]
Merge pull request #61037 from thotz/make-restore-attrs-humanreadable

rgw/rgw_admin.cc : Make restore attrs readable in admin cli

Reviewed-by: Soumya Koduri <skoduri@redhat.com>
Reviewed-by: Adam Emerson <aemerson@redhat.com>
Shreyansh Sancheti <ssanchet@redhat.com>

5 months agoMerge pull request #59937 from BBoozmen/oozmen-enhancing-fetch-remote-obj-logs
J. Eric Ivancich [Wed, 5 Feb 2025 13:59:31 +0000 (08:59 -0500)]
Merge pull request #59937 from BBoozmen/oozmen-enhancing-fetch-remote-obj-logs

RGW: add src/dest object info to fetch_remote_obj()'s debug log events

Reviewed-by: Adam Emerson <aemerson@redhat.com>
5 months agoMerge pull request #61505 from cbodley/wip-69582
J. Eric Ivancich [Wed, 5 Feb 2025 13:58:51 +0000 (08:58 -0500)]
Merge pull request #61505 from cbodley/wip-69582

examples/rgw: add type to HeadBucketOutput for old boto

Reviewed-by: Yuval Lifshitz <ylifshit@ibm.com>
5 months agoMerge pull request #59913 from clwluvw/bucketreplication-uid
J. Eric Ivancich [Wed, 5 Feb 2025 13:58:10 +0000 (08:58 -0500)]
Merge pull request #59913 from clwluvw/bucketreplication-uid

rgw: use effective owner in PutBucketReplication

Reviewed-by: Casey Bodley <cbodley@redhat.com>
5 months agoMerge pull request #61366 from aclamk/wip-aclamk-bluefs-unittest-string-fill-fix
Adam Kupczyk [Wed, 5 Feb 2025 10:55:14 +0000 (11:55 +0100)]
Merge pull request #61366 from aclamk/wip-aclamk-bluefs-unittest-string-fill-fix

os/bluestore: Fix unittest_bluefs

5 months agoMerge pull request #61466 from rhcs-dashboard/storage-class-management
afreen23 [Wed, 5 Feb 2025 09:52:02 +0000 (15:22 +0530)]
Merge pull request #61466 from rhcs-dashboard/storage-class-management

mgr/dashboard: Storage Class Management

Reviewed-by: Afreen Misbah <afreen@ibm.com>
5 months agomgr/dashboard: Storage Class Management 61466/head
Dnyaneshwari [Fri, 17 Jan 2025 10:06:50 +0000 (15:36 +0530)]
mgr/dashboard: Storage Class Management

Fixes: https://tracker.ceph.com/issues/69606
Signed-off-by: Dnyaneshwari Talwekar <dtalweka@redhat.com>
5 months agoMerge pull request #61067 from MaxKellermann/librados_static
SrinivasaBharathKanta [Wed, 5 Feb 2025 03:59:22 +0000 (09:29 +0530)]
Merge pull request #61067 from MaxKellermann/librados_static

librados: disable symbol versions when building statically

5 months agoMerge pull request #61025 from MaxKellermann/config_legacy_values__static
SrinivasaBharathKanta [Wed, 5 Feb 2025 03:59:08 +0000 (09:29 +0530)]
Merge pull request #61025 from MaxKellermann/config_legacy_values__static

common/config: make `legacy_values` static

5 months agoMerge pull request #58706 from xxhdx1985126/wip-67065
Radoslaw Zarzynski [Wed, 5 Feb 2025 00:22:49 +0000 (01:22 +0100)]
Merge pull request #58706 from xxhdx1985126/wip-67065

test: fix ld link errors

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
5 months agomgr/cephadm: continue in nfs service purge if grace file is already deleted 61594/head
Adam King [Wed, 29 Jan 2025 20:48:53 +0000 (15:48 -0500)]
mgr/cephadm: continue in nfs service purge if grace file is already deleted

The test_nfs task we run in teuthology creates and removes a number of
nfs clusters during the task. I think it's possible based on timing for
it to end up in a situation where it tries to remove an nfs service before
the grace file has been created. In that case, cephadm doesn't know it
hasn't created the grace file and just repeatedly fails forever attempting
to remove the nonexistent file. This patch adds handling for the error
case where we get a nonzero rc but the error message implies the command
failed because the file already does not exist.

Fixes: https://tracker.ceph.com/issues/69736
Signed-off-by: Adam King <adking@redhat.com>
5 months agoMerge pull request #61589 from piyushagarwal1411/fix-69727-main
afreen23 [Tue, 4 Feb 2025 20:02:18 +0000 (01:32 +0530)]
Merge pull request #61589 from piyushagarwal1411/fix-69727-main

mgr/dashboard: Add 'Browse Dashboards' button in Grafana dashboards

Reviewed-by: Afreen Misbah <afreen@ibm.com>
5 months agoMerge pull request #61634 from VallariAg/wip-vallari-nvme-maxgroup-alert
Vallari Agrawal [Tue, 4 Feb 2025 14:56:49 +0000 (20:26 +0530)]
Merge pull request #61634 from VallariAg/wip-vallari-nvme-maxgroup-alert

monitoring: add NVMeoFMaxGatewayGroups alert

5 months agoMerge pull request #61357 from VallariAg/wip-nvmeof-teuthology-test-fix-ha
Vallari Agrawal [Tue, 4 Feb 2025 14:55:32 +0000 (20:25 +0530)]
Merge pull request #61357 from VallariAg/wip-nvmeof-teuthology-test-fix-ha

qa: fix nvmeof teuthology thrasher fix

5 months agoMerge pull request #61620 from anthonyeleven/remove-obsolete-sample-conf
Zac Dover [Tue, 4 Feb 2025 11:51:39 +0000 (21:51 +1000)]
Merge pull request #61620 from anthonyeleven/remove-obsolete-sample-conf

src: modernize sample.ceph.conf

Reviewed-by: Zac Dover <zac.dover@proton.me>
5 months agoqa/suites/nvmeof: use SCALING_DELAYS: '120' 61357/head
Vallari Agrawal [Tue, 4 Feb 2025 07:50:18 +0000 (13:20 +0530)]
qa/suites/nvmeof: use SCALING_DELAYS: '120'

Increase delays for qa/workunits/nvmeof/scalability_test.sh
as namespace rebalancing takes more time. After upscaling,
gateway initially could be 'CREATED', it is a valid state during
gateway initialization, but then the state should progress
to 'AVAILABLE' within couple of seconds.

Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
5 months agocommon,osd: replace obsolete get_tracked_conf_keys() 61623/head
Ronen Friedman [Mon, 3 Feb 2025 07:19:44 +0000 (01:19 -0600)]
common,osd: replace obsolete get_tracked_conf_keys()

... with get_tracked_keys().

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
5 months agoMerge pull request #61567 from idryomov/wip-58185
Ilya Dryomov [Tue, 4 Feb 2025 09:58:24 +0000 (10:58 +0100)]
Merge pull request #61567 from idryomov/wip-58185

librbd: stop filtering async request error codes

Reviewed-by: Ramana Raja <rraja@redhat.com>
5 months agomgr/dashboard: Add 'Browse Dashboards' button in Grafana dashboards 61589/head
Piyush Agarwal [Thu, 30 Jan 2025 09:12:37 +0000 (14:42 +0530)]
mgr/dashboard: Add 'Browse Dashboards' button in Grafana dashboards

Fixes: https://tracker.ceph.com/issues/69727
Signed-off-by: Piyush Agarwal <piyushagarwal14.pa@gmail.com>
5 months agoMerge pull request #60686 from zhsgao/mds_bal_overload_epochs
Venky Shankar [Tue, 4 Feb 2025 07:51:37 +0000 (13:21 +0530)]
Merge pull request #60686 from zhsgao/mds_bal_overload_epochs

mds: fix option mds_bal_overload_epochs

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 months agocephadm: Add pre_remove and ensure deployment values are reset and API settings are... 60915/head
Kushal Deb [Fri, 29 Nov 2024 08:38:51 +0000 (14:08 +0530)]
cephadm: Add pre_remove and ensure deployment values are reset and API settings are updated when removing Prometheus or Alertmanager daemons

This fixes an issue where the dashboard API settings are not updated
properly when the active Prometheus or Alertmanager daemon is removed.
If the active daemon is removed, the settings are reconfigured to point
to a remaining daemon or reset if no daemons are available.

This avoids dashboard errors like "404 Not Found" caused by stale API
host settings.

Signed-off-by: Kushal Deb <Kushal.Deb@ibm.com>
5 months agoFixup for emacs/vim modes, moved to top of file. 61633/head
Jesse F. Williamson [Mon, 3 Feb 2025 22:36:14 +0000 (14:36 -0800)]
Fixup for emacs/vim modes, moved to top of file.

Signed-off-by: Jesse F. Williamson <jfw@ibm.com>
5 months agoMerge pull request #61632 from gbregman/main
Gil Bregman [Mon, 3 Feb 2025 23:52:08 +0000 (01:52 +0200)]
Merge pull request #61632 from gbregman/main

mgr/cephadm/nvmeof: Add max_hosts field to NVMeOF configuration

5 months agomgr/cephadm/nvmeof: Add max_hosts field to NVMeOF configuration and update default... 61632/head
Gil Bregman [Mon, 3 Feb 2025 21:13:49 +0000 (23:13 +0200)]
mgr/cephadm/nvmeof: Add max_hosts field to NVMeOF configuration and update default values
Fixes https://tracker.ceph.com/issues/69759

Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
5 months agoMerge pull request #61627 from petrutlucian94/zlib-fix
Ilya Dryomov [Mon, 3 Feb 2025 19:47:04 +0000 (20:47 +0100)]
Merge pull request #61627 from petrutlucian94/zlib-fix

win32_deps_build.sh: pin zlib tag

Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
5 months agoMerge pull request #61374 from myoungwon/fix-68518
Radoslaw Zarzynski [Mon, 3 Feb 2025 19:07:49 +0000 (20:07 +0100)]
Merge pull request #61374 from myoungwon/fix-68518

src/test: allow ENOENT if target object of tier_flush has snapshots

Reviewed-by: Laura Flores <lflores@redhat.com>
5 months agomonitoring: add tests for NVMeoFMaxGatewayGroups 61634/head
Vallari Agrawal [Mon, 3 Feb 2025 18:27:30 +0000 (23:57 +0530)]
monitoring: add tests for NVMeoFMaxGatewayGroups

Add unit tests for alert NVMeoFMaxGatewayGroups
in monitoring/ceph-mixin/tests_alerts/test_alerts.yml

Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
5 months agomonitoring: add alert NVMeoFMaxGatewayGroups
Vallari Agrawal [Mon, 3 Feb 2025 18:24:50 +0000 (23:54 +0530)]
monitoring: add alert NVMeoFMaxGatewayGroups

Add alert NVMeoFMaxGatewayGroups to prometheus_alerts.yml
and prometheus_alerts.libsonnet.

This alerts is to indicate if max number of NVMeoF gateway
groups have been reached in a cluster.

Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
5 months agomonitoring: add NVMeoFMaxGatewayGroups
Vallari Agrawal [Mon, 3 Feb 2025 18:22:47 +0000 (23:52 +0530)]
monitoring: add NVMeoFMaxGatewayGroups

Add config NVMeoFMaxGatewayGroups to config.libsonnet
and set it to 4 (groups).

Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
5 months agoMerge pull request #61628 from kamoltat/wip-ksirivad-fix-stretch-mode-doc
Zac Dover [Mon, 3 Feb 2025 17:41:14 +0000 (03:41 +1000)]
Merge pull request #61628 from kamoltat/wip-ksirivad-fix-stretch-mode-doc

doc/rados/operations/stretch-mode: fix mistake in stretch mode

Reviewed-by: Zac Dover <zac.dover@proton.me>
5 months agodoc/rados/operations/stretch-mode: fix mistake in stretch mode 61628/head
Kamoltat Sirivadhna [Mon, 3 Feb 2025 17:18:44 +0000 (17:18 +0000)]
doc/rados/operations/stretch-mode: fix mistake in stretch mode

Degraded stretch mode should only half the "min_size" not
"size".

Fixes: No tracker (doc changes)
Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
5 months agoMerge pull request #61232 from xxhdx1985126/wip-67888-followup
Yuri Weinstein [Mon, 3 Feb 2025 15:59:32 +0000 (07:59 -0800)]
Merge pull request #61232 from xxhdx1985126/wip-67888-followup

osd/PeeringState: rename "cancel_backfill" to "suspend_backfill"

Reviewed-by: Samuel Just <sjust@redhat.com>
5 months agoMerge pull request #61397 from amathuria/wip-amat-test-osdmap-pruning
SrinivasaBharathKanta [Mon, 3 Feb 2025 15:43:28 +0000 (21:13 +0530)]
Merge pull request #61397 from amathuria/wip-amat-test-osdmap-pruning

mon/test_mon_osdmap_prune: Use first_pinned instead of first_committed

5 months agoMerge pull request #61365 from Matan-B/wip-matanb-snapmapper-logs
SrinivasaBharathKanta [Mon, 3 Feb 2025 15:43:09 +0000 (21:13 +0530)]
Merge pull request #61365 from Matan-B/wip-matanb-snapmapper-logs

osd/SnapMapper: Improve logging

5 months agoMerge pull request #61328 from adamemerson/wip-64191
SrinivasaBharathKanta [Mon, 3 Feb 2025 15:42:43 +0000 (21:12 +0530)]
Merge pull request #61328 from adamemerson/wip-64191

test/neorados: Silence mismatched new/delete warning

5 months agoMerge pull request #60945 from NitzanMordhai/wip-nitzan-crushwrapper-corpus-squid
SrinivasaBharathKanta [Mon, 3 Feb 2025 15:42:19 +0000 (21:12 +0530)]
Merge pull request #60945 from NitzanMordhai/wip-nitzan-crushwrapper-corpus-squid

dencoder tests fix type backwards incompatible checks

5 months agowin32_deps_build.sh: pin zlib tag 61627/head
Lucian Petrut [Mon, 3 Feb 2025 14:53:05 +0000 (14:53 +0000)]
win32_deps_build.sh: pin zlib tag

The zlib Windows build started to fail, probably because of this:
https://github.com/madler/zlib/issues/1038

  Cloning into 'zlib'...
  make: *** No rule to make target 'zconf.h', needed by 'adler32.o'.

We'll pin the zlib version for now to unblock the Windows build.

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
5 months agoqa/suites/nvmeof: Remove watchdog from thrasher
Vallari Agrawal [Thu, 30 Jan 2025 12:13:48 +0000 (17:43 +0530)]
qa/suites/nvmeof: Remove watchdog from thrasher

This commit does the following:
1. remove watchdog from thrasher
1. remove wait from fio_test
3. change thrasher switcher wait-time to 10 mins

Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
5 months agosrc: modernize sample.ceph.conf 61620/head
Anthony D'Atri [Sun, 2 Feb 2025 21:38:14 +0000 (16:38 -0500)]
src: modernize sample.ceph.conf

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
5 months agoMerge pull request #61577 from ronen-fr/wip-rf-just-me
Ronen Friedman [Sun, 2 Feb 2025 14:22:07 +0000 (16:22 +0200)]
Merge pull request #61577 from ronen-fr/wip-rf-just-me

osd/scrub: remove unnecessary loop

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
5 months agoMerge pull request #61538 from leonidc/fix-duplicated-optimized
leonidc [Sun, 2 Feb 2025 14:05:02 +0000 (16:05 +0200)]
Merge pull request #61538 from leonidc/fix-duplicated-optimized

nvmeofgw* : fix duplicated optimized host's pathes

5 months agoMerge pull request #61590 from ronen-fr/wip-rf-noinfo-repair
Ronen Friedman [Sun, 2 Feb 2025 14:02:21 +0000 (16:02 +0200)]
Merge pull request #61590 from ronen-fr/wip-rf-noinfo-repair

osd/scrub: discard repair_oinfo_oid()

Reviewed-by: Samuel Just <sjust@redhat.com>
5 months agoMerge pull request #61394 from ronen-fr/wip-rf-cacher-v2
Ronen Friedman [Sun, 2 Feb 2025 13:55:09 +0000 (15:55 +0200)]
Merge pull request #61394 from ronen-fr/wip-rf-cacher-v2

common: modify md_config_obs_impl API

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
5 months agoMerge pull request #60426 from ronen-fr/wip-rf-svwperf
Ronen Friedman [Sun, 2 Feb 2025 13:49:49 +0000 (15:49 +0200)]
Merge pull request #60426 from ronen-fr/wip-rf-svwperf

common/perf_counters: enabling 'find()' by logger name

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
5 months agodoc/cephadm: clarify "Monitoring OSD State" 61616/head
Zac Dover [Sat, 1 Feb 2025 21:50:07 +0000 (07:50 +1000)]
doc/cephadm: clarify "Monitoring OSD State"

Change "Remove an OSD" to "Monitoring OSD State During OSD Removal" and
reword a sentence so that it more clearly refers to the process under
discussion.

Signed-off-by: Zac Dover <zac.dover@proton.me>
5 months agoMerge pull request #61613 from zdover23/wip-doc-2025-02-02-architecture 61617/head
Zac Dover [Sat, 1 Feb 2025 21:38:32 +0000 (07:38 +1000)]
Merge pull request #61613 from zdover23/wip-doc-2025-02-02-architecture

doc/architecture: remove sentence

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
5 months agodoc/architecture: remove sentence 61613/head
Zac Dover [Sat, 1 Feb 2025 21:15:32 +0000 (07:15 +1000)]
doc/architecture: remove sentence

Remove a sentence that is more marketing than reference.

Signed-off-by: Zac Dover <zac.dover@proton.me>
5 months agoMerge pull request #61561 from athanatos/sjust/wip-crimson-recovery-69412
Samuel Just [Fri, 31 Jan 2025 18:44:49 +0000 (10:44 -0800)]
Merge pull request #61561 from athanatos/sjust/wip-crimson-recovery-69412

crimson: take obc lock during push commit on primary

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
5 months agoMerge pull request #61001 from MaxKellermann/common_includes
Ilya Dryomov [Fri, 31 Jan 2025 10:50:57 +0000 (11:50 +0100)]
Merge pull request #61001 from MaxKellermann/common_includes

common: add missing includes

Reviewed-by: Adam Emerson <aemerson@redhat.com>
5 months agoMerge pull request #61598 from idryomov/wip-rbd-migration-https-doc
Ilya Dryomov [Thu, 30 Jan 2025 23:01:10 +0000 (00:01 +0100)]
Merge pull request #61598 from idryomov/wip-rbd-migration-https-doc

doc/rbd: use https links in live import examples

Reviewed-by: Zac Dover <zac.dover@proton.me>
Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
5 months agocrimson/.../replicated_recovery_backend: take excl lock while pushes commit 61561/head
Samuel Just [Wed, 22 Jan 2025 02:41:48 +0000 (18:41 -0800)]
crimson/.../replicated_recovery_backend: take excl lock while pushes commit

Fixes: https://tracker.ceph.com/issues/69412
Signed-off-by: Samuel Just <sjust@redhat.com>
5 months agocrimson/.../replicated_recovery_backend: route pushes earlier
Samuel Just [Wed, 22 Jan 2025 02:47:09 +0000 (18:47 -0800)]
crimson/.../replicated_recovery_backend: route pushes earlier

Let ReplicatedRecoveryBackend::handle_recovery_op route pushes
between handle_push and handle_pull_response instead of
ReplicatedRecoveryBackend::handle_push.

Signed-off-by: Samuel Just <sjust@redhat.com>
5 months agopybind/mgr/cephadm: allow setting custom monitoring_port for nfs 61578/head
Omid Yoosefi [Wed, 29 Jan 2025 20:48:52 +0000 (15:48 -0500)]
pybind/mgr/cephadm: allow setting custom monitoring_port for nfs

ganesha config allows this, so allow users to set their own custom
ports in case they wish to do so.

Signed-off-by: Omid Yoosefi <omidyoosefi@ibm.com>
5 months agomgr/cephadm: add Server_Scope = <fsid> to NFSv4 section of ganesha conf 61571/head
Adam King [Wed, 29 Jan 2025 17:02:50 +0000 (12:02 -0500)]
mgr/cephadm: add Server_Scope = <fsid> to NFSv4 section of ganesha conf

From the ganesha team

"""
In the NFSv4 param block, we need a parameter Server_Scope set to some value common among all servers in a cluster.

The default with it blank is to use the hostname which may be different for each server in the cluster.
"""

This is related to ongoing work on high availability nfs. From the cephadm side
we just need to make sure all nfs daemons in the cluster end up with
the same value for the Server_Scope field. This patch uses the cluster
id (which we already brought into the template as the "namespace" attribute)

Signed-off-by: Adam King <adking@redhat.com>
5 months agodoc/rbd: use https links in live import examples 61598/head
Ilya Dryomov [Thu, 30 Jan 2025 19:30:18 +0000 (20:30 +0100)]
doc/rbd: use https links in live import examples

Even though it's explicitly said that "http" stream can be used to
import via both HTTP and HTTPS, it can still be confusing that "type":
"http" is expected to go with "url": "https://...".  Switch example
URLs from HTTP to HTTPS to make it more obvious.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
5 months agoMerge pull request #57551 from linuxbox2/wip-rgwlc-66111
Matt Benjamin [Thu, 30 Jan 2025 17:15:36 +0000 (12:15 -0500)]
Merge pull request #57551 from linuxbox2/wip-rgwlc-66111

rgwlc: send pool transition notifications too

5 months agoMerge pull request #60250 from aainscow/interval_set_enhancements
Alex Ainscow [Thu, 30 Jan 2025 17:06:00 +0000 (17:06 +0000)]
Merge pull request #60250 from aainscow/interval_set_enhancements

include: interval_set: Relax requirements and enhance performance of interval sets

5 months agoMerge pull request #61135 from rkachach/fix_issue_cephadm_services_registry
Adam King [Thu, 30 Jan 2025 16:43:58 +0000 (11:43 -0500)]
Merge pull request #61135 from rkachach/fix_issue_cephadm_services_registry

mgr/cephadm: using service registry pattern for cephadm services

Reviewed-by: Adam King <adking@redhat.com>
5 months agoMerge pull request #59480 from bill-scales/ec_partial_read
Bill Scales [Thu, 30 Jan 2025 16:17:25 +0000 (16:17 +0000)]
Merge pull request #59480 from bill-scales/ec_partial_read

Further EC partial stripe read fixes

5 months agoMerge pull request #61591 from gbregman/main
Gil Bregman [Thu, 30 Jan 2025 15:59:11 +0000 (17:59 +0200)]
Merge pull request #61591 from gbregman/main

mgr/cephadm/nvmeof: Add verify_listener_ip field to NVMeOF configuration

5 months agomgr/cephadm: create OSD daemon deploy specs through make_daemon_spec 61593/head
Adam King [Thu, 30 Jan 2025 14:15:37 +0000 (09:15 -0500)]
mgr/cephadm: create OSD daemon deploy specs through make_daemon_spec

That function handles setting up the extra container/entrypoint
args for the daemon during initial deployment. Having the
CephadmDaemonDeploySpec made directly in the OSD deployment
workflow means initial deployments of OSDs won't have the
extra container/entrypoint args from the spec

Fixes: https://tracker.ceph.com/issues/69734
Signed-off-by: Adam King <adking@redhat.com>
5 months agoMerge PR #61537 into main
Venky Shankar [Thu, 30 Jan 2025 12:44:05 +0000 (18:14 +0530)]
Merge PR #61537 into main

* refs/pull/61537/head:
libcephfs_proxy: implement ceph_readdir_r()

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Anoop C S <anoopcs@cryptolab.net>
5 months agoqa/workunits/rbd: add test_import_nbd_stream_disconnected() 61567/head
Ilya Dryomov [Tue, 28 Jan 2025 08:33:37 +0000 (09:33 +0100)]
qa/workunits/rbd: add test_import_nbd_stream_disconnected()

When the NBD server is killed, nbd_pread() can set errno to at least
ENOTCONN, EINVAL and 0 which is supposed to stand for "no additional
errno information is available for this error".  Add a test to ensure
that "rbd migration execute" command always fails and that the image
isn't transitioned to MIGRATION_STATE_EXECUTED in this scenario.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
5 months agolibrbd: stop filtering async request error codes
Ilya Dryomov [Wed, 29 Jan 2025 11:56:34 +0000 (12:56 +0100)]
librbd: stop filtering async request error codes

The roots of this go back to 2015 when snap create was changed to
filter EEXIST in commit 63f6c9bac9a4 ("librbd: fixed snap create race
conditions") and flatten respectively EINVAL in commit ef7e210c3f74
("librbd: better handling for duplicate flatten requests").  From there
this pattern made it to most other operations that can be proxied
including "rbd migration execute".

The motivation was to suppress generation of an "expected" error in
response to a duplicate async request notification for the operation.
However, doing this at the top of the handler (right before returning
to the caller) and for an error as generic as EINVAL is super fragile.
It's trivial for an error that is being filtered to sneak in with
a lower level change completely unnoticed.  For example, live migration
recently added NBD stream which is implemented on top of libnbd and it
turns out that some libnbd APIs return EINVAL on various occasions when
the NBD endpoint disappears and an error like ENOTCONN would make more
sense.  If this occurs during "rbd migration execute" operation, the
rest of librbd never learns that migration was disrupted and the image
is transitioned to MIGRATION_STATE_EXECUTED, thus handing a partially
imported (read: corrupted) image to the user.

Luckily, with commits 07fbc4b71df4 ("librbd: track complete async
operation requests") and 96bc20445afb ("librbd: track complete async
operation return code"), the scenario which originally prompted error
code filtering isn't an issue anymore.  Despite a few shortcomings
(e.g. when an async request notification is acked with result 0, it's
impossible to tell whether a) a new operation was kicked off, b) there
is an operation that is still in progress or c) it's for an operation
that completed earlier but hasn't "expired" yet), even just commit
07fbc4b71df4 by itself prevents a duplicate notification from kicking
off a second operation that could generate an error for something that
actually succeeded.  With that in mind, eradicate error code filtering
from Operations class.

Fixes: https://tracker.ceph.com/issues/58185
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
5 months agoqa/tasks/nvmeof.py: Add teardown() method
Vallari Agrawal [Wed, 29 Jan 2025 15:34:04 +0000 (21:04 +0530)]
qa/tasks/nvmeof.py: Add teardown() method

Add teardown method to remove nvmeof service
before rest of the cluster tearsdown.

Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
5 months agoqa/tasks/nvmeof.py: Ignore systemctl_stop thrashing method
Vallari Agrawal [Tue, 28 Jan 2025 12:43:17 +0000 (18:13 +0530)]
qa/tasks/nvmeof.py: Ignore systemctl_stop thrashing method

Do not use systemctl_stop method to thrash daemons,
just use 'ceph orch daemon stop' and 'ceph orch daemon rm'
methods to thrash nvmeof gateways.

Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
5 months agoqa/tasks/nvmeof.py: Fix do_checks() method
Vallari Agrawal [Tue, 28 Jan 2025 09:18:15 +0000 (14:48 +0530)]
qa/tasks/nvmeof.py: Fix do_checks() method

All checks currently run on initator node, now
run all "ceph" commands on one of gateway hosts
instead of initator nodes. And run "nvme list"
and "nvme list-subsys" checks on initator node.

Add retry (5 times) to do_checks if any command fails.

Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>