]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
2 weeks agoqa/rgw: fix perl tests missing Amazon::S3 module 64279/head
Mark Kogan [Wed, 25 Jun 2025 12:21:49 +0000 (12:21 +0000)]
qa/rgw: fix perl tests missing Amazon::S3 module

and a second case where perl tests can fail without error output

1. fix errors like: `Can't locate Amazon/S3.pm in @INC (you may need to
   install the Amazon::S3 module)`
by priming the perl tests with installing the Amazon::S3 module from cpan

ex:
```
2025-06-23T19:18:40.162 INFO:tasks.workunit.client.0.smithi090.stderr:Can't locate Amazon/S3.pm in @INC (you may need to install the Amazon::S3 module) (@INC contains: /usr/local/lib64/perl5/5.32 ...
```

2. log an error when RGW process is not detected

Fixes: https://tracker.ceph.com/issues/71577
Signed-off-by: Mark Kogan <mkogan@redhat.com>
(cherry picked from commit 7faa23f160c9f4b40d25fe27f2345dbf999b0c84)

2 weeks agoMerge pull request #64029 from VallariAg/wip-71721-tentacle
Vallari Agrawal [Mon, 30 Jun 2025 16:13:50 +0000 (21:43 +0530)]
Merge pull request #64029 from VallariAg/wip-71721-tentacle

tentacle: mon: Revert "mon: Add nvmeof group/gateway name in "ceph -s""

2 weeks agoMerge pull request #64120 from yuvalif/wip-logging-backports
Yuval Lifshitz [Mon, 30 Jun 2025 13:49:37 +0000 (16:49 +0300)]
Merge pull request #64120 from yuvalif/wip-logging-backports

tentacle: rgw: bucket logging backports

2 weeks agoMerge pull request #64116 from vshankar/wip-ignore-osd-down
Venky Shankar [Mon, 30 Jun 2025 10:33:52 +0000 (16:03 +0530)]
Merge pull request #64116 from vshankar/wip-ignore-osd-down

tentacle: qa/cephfs: ignore `OSD_DOWN/osds down` warning

Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>
2 weeks agoMerge pull request #64089 from vshankar/wip-cephfs-client-fixes
Venky Shankar [Mon, 30 Jun 2025 06:50:58 +0000 (12:20 +0530)]
Merge pull request #64089 from vshankar/wip-cephfs-client-fixes

tentacle: client: cephfs user-space client fixes

Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>
2 weeks agoMerge pull request #63533 from vshankar/wip-revert-referent-inodes-tentacle
Venky Shankar [Mon, 30 Jun 2025 06:50:37 +0000 (12:20 +0530)]
Merge pull request #63533 from vshankar/wip-revert-referent-inodes-tentacle

tentacle: mds: revert referent inodes

Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>
2 weeks agoMerge pull request #63457 from vshankar/wip-client-secfix-tentacle
Venky Shankar [Mon, 30 Jun 2025 06:50:22 +0000 (12:20 +0530)]
Merge pull request #63457 from vshankar/wip-client-secfix-tentacle

tentacle: client: disallow unprivileged users to escalate root privileges

Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>
3 weeks agoMerge pull request #64098 from idryomov/wip-71335-tentacle
Ilya Dryomov [Fri, 27 Jun 2025 14:35:57 +0000 (16:35 +0200)]
Merge pull request #64098 from idryomov/wip-71335-tentacle

tentacle: librbd/cache/pwl: fix memory leak in SyncPoint persist context cleanup

Reviewed-by: Ramana Raja <rraja@redhat.com>
3 weeks agoMerge pull request #64099 from idryomov/wip-67984-tentacle
Ilya Dryomov [Fri, 27 Jun 2025 14:35:38 +0000 (16:35 +0200)]
Merge pull request #64099 from idryomov/wip-67984-tentacle

tentacle: librbd: retry list_snap_orders() once instead of failing sort_snaps()

Reviewed-by: VinayBhaskar-V <vvarada@redhat.com>
3 weeks agoMerge pull request #64100 from idryomov/wip-71226-tentacle
Ilya Dryomov [Fri, 27 Jun 2025 14:34:56 +0000 (16:34 +0200)]
Merge pull request #64100 from idryomov/wip-71226-tentacle

tentacle: librbd/api/Mirror: return EINVAL from image_get_mode() when the image is disabled for mirroring

Reviewed-by: Ramana Raja <rraja@redhat.com>
3 weeks agoMerge pull request #64101 from idryomov/wip-rbd-std-variant-tentacle
Ilya Dryomov [Fri, 27 Jun 2025 14:34:30 +0000 (16:34 +0200)]
Merge pull request #64101 from idryomov/wip-rbd-std-variant-tentacle

tentacle: librbd, tools: migrate from boost::variant to std::variant

Reviewed-by: Ramana Raja <rraja@redhat.com>
3 weeks agoMerge pull request #64102 from idryomov/wip-cls-test-default-tentacle
Ilya Dryomov [Fri, 27 Jun 2025 14:34:08 +0000 (16:34 +0200)]
Merge pull request #64102 from idryomov/wip-cls-test-default-tentacle

tentacle: cls/rbd: use default values for non-decoded fields in test instances

Reviewed-by: Ramana Raja <rraja@redhat.com>
3 weeks agoMerge pull request #64212 from ronen-fr/wip-rf-64211-tentacle
Ronen Friedman [Fri, 27 Jun 2025 12:36:58 +0000 (15:36 +0300)]
Merge pull request #64212 from ronen-fr/wip-rf-64211-tentacle

tentacle: osd/scrub: 'starts' messages should name PGs, not shards

Reviewed-by: Aishwarya Mathuria <amathuri@redhat.com>
3 weeks agoMerge pull request #64154 from zdover23/wip-doc-2025-06-25-backport-64107-to-tentacle
Zac Dover [Thu, 26 Jun 2025 20:18:53 +0000 (06:18 +1000)]
Merge pull request #64154 from zdover23/wip-doc-2025-06-25-backport-64107-to-tentacle

tentacle: doc/radosgw: remove "pubsub_event_triggered"

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Reviewed-by: Yuval Lifshitz <ylifshit@ibm.com>
3 weeks agodoc/radosgw: remove "pubsub_event_triggered" 64154/head
Zac Dover [Mon, 23 Jun 2025 08:07:40 +0000 (18:07 +1000)]
doc/radosgw: remove "pubsub_event_triggered"

Remove "pubsub_event_triggered" from the list of "Notification
Performance Statistics". It is obsolete.

Fixes: https://tracker.ceph.com/issues/71789
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 44dc57fc15749583fc13560c9409b7687df7c223)

3 weeks agoosd/scrub: 'starts' messages should name PGs, not shards 64212/head
Ronen Friedman [Thu, 26 Jun 2025 13:27:57 +0000 (08:27 -0500)]
osd/scrub: 'starts' messages should name PGs, not shards

By mistake, the 'scrub starts' message included the shard ID
of the primary OSD, instead of just the PG ID.

Fixes: https://tracker.ceph.com/issues/71780
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
(cherry picked from commit e8cde5811f07f0847a1aac20279aa83f57e4562d)

3 weeks agoMerge pull request #64183 from ronen-fr/wip-rf-64182-tentacle
Ronen Friedman [Thu, 26 Jun 2025 10:34:52 +0000 (13:34 +0300)]
Merge pull request #64183 from ronen-fr/wip-rf-64182-tentacle

tentacle: osd/scrub: some perf counters had their priority set to '0'

Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>
3 weeks agoMerge pull request #63615 from ceph/djg-tentacle-rtd
David Galloway [Wed, 25 Jun 2025 23:37:08 +0000 (19:37 -0400)]
Merge pull request #63615 from ceph/djg-tentacle-rtd

tentacle: .github: Fix RTD build retrigger

3 weeks agoMerge pull request #64051 from cbodley/wip-71752-tentacle
Casey Bodley [Wed, 25 Jun 2025 17:47:48 +0000 (13:47 -0400)]
Merge pull request #64051 from cbodley/wip-71752-tentacle

tentacle: fix: the RGW crash caused by special characters

Reviewed-by: Adam C. Emerson <aemerson@redhat.com>
3 weeks agoosd/scrub: some perf counters priority was '0' 64183/head
Ronen Friedman [Wed, 25 Jun 2025 14:25:08 +0000 (09:25 -0500)]
osd/scrub: some perf counters priority was '0'

Some scrub perf counters were created without specifying
individual priorities, assuming by mistake that the
default priority is '_INTERESTING'. That was not the case,
and those perf counters were not reported.

Fixes: https://tracker.ceph.com/issues/71842
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
(cherry picked from commit cf1864a61061bc9de05eedd987f64307bcf7c501)

3 weeks agoMerge pull request #64133 from afreen23/wip-71809-tentacle
afreen23 [Wed, 25 Jun 2025 14:34:16 +0000 (20:04 +0530)]
Merge pull request #64133 from afreen23/wip-71809-tentacle

tentacle: mgr/dashboard: Add --force flag for listeners

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
3 weeks ago[DNM] change bucket logging branch 64120/head
Yuval Lifshitz [Wed, 25 Jun 2025 14:09:03 +0000 (14:09 +0000)]
[DNM] change bucket logging branch

Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
3 weeks agoMerge pull request #64143 from zdover23/wip-doc-2025-06-25-backport-64105-to-tentacle
Zac Dover [Wed, 25 Jun 2025 13:59:06 +0000 (23:59 +1000)]
Merge pull request #64143 from zdover23/wip-doc-2025-06-25-backport-64105-to-tentacle

tentacle: doc/radosgw: add "persistent_topic_len"

Reviewed-by: Yuval Lifshitz <ylifshit@ibm.com>
3 weeks agoMerge pull request #64131 from afreen23/wip-71805-tentacle
afreen23 [Wed, 25 Jun 2025 09:41:55 +0000 (15:11 +0530)]
Merge pull request #64131 from afreen23/wip-71805-tentacle

tentacle: mgr/dashboard: Allow host with labels in listener form

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
3 weeks agoMerge pull request #64045 from shraddhaag/wip-71743-tentacle
Shraddha Agrawal [Wed, 25 Jun 2025 08:05:15 +0000 (13:35 +0530)]
Merge pull request #64045 from shraddhaag/wip-71743-tentacle

tentacle: mon: add config option to toggle availability score feature

3 weeks agomgr/dashboard: Allow host with labels in listener form 64131/head
Afreen Misbah [Mon, 16 Jun 2025 17:09:46 +0000 (22:39 +0530)]
mgr/dashboard: Allow host with labels in listener form

- Currently, listeners cannot be added with the Ceph Dashboard if the gateway nodes are selected by label instead of hosts.

- Refactored the code to incorporate nodes with labels

- Also added missing typings and removed 'any'

Fixes https://tracker.ceph.com/issues/71686

Signed-off-by: Afreen Misbah <afreen@ibm.com>
(cherry picked from commit 0bd2704a88f517b48196a8b1a3c07b0f8032b0f6)

3 weeks agomgr/dashboard: Add --force flag for listeners 64133/head
Afreen Misbah [Mon, 16 Jun 2025 15:16:39 +0000 (20:46 +0530)]
mgr/dashboard: Add --force flag for listeners

Fixes https://tracker.ceph.com/issues/71685

Signed-off-by: Afreen Misbah <afreen@ibm.com>
(cherry picked from commit 604d351a4e7bbf47baf57e10f67219a2eea919e0)

3 weeks agoMerge pull request #64112 from zdover23/wip-doc-2025-06-23-backport-64103-to-tentacle
Zac Dover [Wed, 25 Jun 2025 05:19:34 +0000 (15:19 +1000)]
Merge pull request #64112 from zdover23/wip-doc-2025-06-23-backport-64103-to-tentacle

tentacle: doc/radosgw: improve "pubsub_push_pending" info

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Reviewed-by: Yuval Lifshitz <ylifshit@ibm.com>
3 weeks agoMerge pull request #64044 from Naveenaidu/wip-71741-tentacle
Yuri Weinstein [Tue, 24 Jun 2025 18:31:13 +0000 (11:31 -0700)]
Merge pull request #64044 from Naveenaidu/wip-71741-tentacle

tentacle: doc/mgr/telemetry: add doc for telemetry upgrade tests

Reviewed-by: Laura Flores <lflores@redhat.com>
3 weeks agodoc/radosgw: add "persistent_topic_len" 64143/head
Zac Dover [Mon, 23 Jun 2025 08:26:09 +0000 (18:26 +1000)]
doc/radosgw: add "persistent_topic_len"

Add the labeled counter "persistent_topic_len" to the list of
"Notification Performance Statistics" in doc/radosgw/notifications.rst.

Fixes: https://tracker.ceph.com/issues/71791
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 5743303ae5cfba97ab4c6bdc7f12e115c4a847a6)

3 weeks agoMerge pull request #64138 from zdover23/wip-doc-2025-06-25-backport-64106-to-tentacle
Zac Dover [Tue, 24 Jun 2025 17:45:50 +0000 (03:45 +1000)]
Merge pull request #64138 from zdover23/wip-doc-2025-06-25-backport-64106-to-tentacle

tentacle: doc/radosgw: add "persistent_topic_size"

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
3 weeks agoMerge pull request #63814 from zdover23/wip-doc-2025-06-09-backport-62714-to-tentacle
Zac Dover [Tue, 24 Jun 2025 17:11:25 +0000 (03:11 +1000)]
Merge pull request #63814 from zdover23/wip-doc-2025-06-09-backport-62714-to-tentacle

tentacle: doc/rados/operations: Improve stretch-mode.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
3 weeks agoMerge pull request #64125 from zdover23/wip-doc-2025-06-24-backport-64104-to-tentacle
Zac Dover [Tue, 24 Jun 2025 16:34:22 +0000 (02:34 +1000)]
Merge pull request #64125 from zdover23/wip-doc-2025-06-24-backport-64104-to-tentacle

tentacle: doc/radosgw: remove "pubsub_event_lost"

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Reviewed-by: Yuval Lifshitz <ylifshit@ibm.com>
3 weeks agodoc/radosgw: add "persistent_topic_size" 64138/head
Zac Dover [Mon, 23 Jun 2025 08:35:05 +0000 (18:35 +1000)]
doc/radosgw: add "persistent_topic_size"

Add "persistent_topic_size" to the list of "Notification Performance
Statistics" in doc/radosgw/notifications.rst.

Fixes: https://tracker.ceph.com/issues/71792
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 5f96ade1734d1ab7053b792a1df4e316e31691a5)

3 weeks agoMerge pull request #64031 from VallariAg/wip-71723-tentacle
Yuri Weinstein [Tue, 24 Jun 2025 14:09:05 +0000 (07:09 -0700)]
Merge pull request #64031 from VallariAg/wip-71723-tentacle

tentacle: monitoring: Fix NVMeoF subsys/namespace limit alerts

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
3 weeks agoMerge pull request #64043 from Naveenaidu/wip-71718-tentacle
Naveen Naidu [Tue, 24 Jun 2025 10:20:06 +0000 (15:50 +0530)]
Merge pull request #64043 from Naveenaidu/wip-71718-tentacle

tentacle: osd/PeeringState: handle race condition of DeferBackfill event for Backfilling state

3 weeks agodoc/rados/operations: Improve stretch-mode.rst 63814/head
Anthony D'Atri [Mon, 7 Apr 2025 18:37:53 +0000 (14:37 -0400)]
doc/rados/operations: Improve stretch-mode.rst

doc/rados/operations: Improve stretch-mode.rst

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
(cherry picked from commit 8c27efcf0e41e5ed14a578a271b457ed3758cbda)

3 weeks agodoc/radosgw: remove "pubsub_event_lost" 64125/head
Zac Dover [Mon, 23 Jun 2025 08:18:07 +0000 (18:18 +1000)]
doc/radosgw: remove "pubsub_event_lost"

Remove "pubsub_event_lost" from the list of "Notification Performance
Statistics" in doc/radosgw/notifications.rst. "pubsub_event_lost" is now
obsolete.

Fixes: https://tracker.ceph.com/issues/71790
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit b308f50d1e0c2d238c3b2bf54df99cc7ac2ac679)

3 weeks agoMerge pull request #64014 from cbodley/wip-71715-tentacle
Dan Mick [Mon, 23 Jun 2025 17:09:43 +0000 (10:09 -0700)]
Merge pull request #64014 from cbodley/wip-71715-tentacle

tentacle: deb: use glob match to support systemd unit dir changes

3 weeks agorgw/logging: make unique part of log file both random and incremental
Yuval Lifshitz [Mon, 16 Jun 2025 11:05:25 +0000 (11:05 +0000)]
rgw/logging: make unique part of log file both random and incremental

new format will be: 10 char incremental count (so 32bit uint fit in it).
and 6 char alphanumeric random part.
this should fix possible race conditions in case of multisite

Fixes: https://tracker.ceph.com/issues/71608
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit fa2bc49e2e8fb7a1d353a4b741e8d4f1edb9807d)

3 weeks agorgw/logging: return the last object name that was actually comitted
Yuval Lifshitz [Tue, 3 Jun 2025 10:30:46 +0000 (10:30 +0000)]
rgw/logging: return the last object name that was actually comitted

when comitting a pending object that was never created we should
not reply the object name as the name of the comitted object.
instead, we should return the name of the object that was actuaslly
comitted.

Fixes: https://tracker.ceph.com/issues/71219
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit efe4473000e1a4510549c9e39ce984e8f3bb1db6)

3 weeks agorgw/logging: send flushed object name in API reply
Yuval Lifshitz [Wed, 7 May 2025 09:34:30 +0000 (09:34 +0000)]
rgw/logging: send flushed object name in API reply

Fixes: https://tracker.ceph.com/issues/71219
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit fb2b037355b265266556a772c9c8ed5dad934475)

3 weeks agorgw/logging: log only object ACls in journal mode
Yuval Lifshitz [Thu, 5 Jun 2025 07:48:07 +0000 (07:48 +0000)]
rgw/logging: log only object ACls in journal mode

Fixes: https://tracker.ceph.com/issues/71563
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit ce2d09efebb14c79ad1f8540d0824c477ba0b836)

3 weeks agorgw/logging: fix partitioned key format
Yuval Lifshitz [Tue, 3 Jun 2025 15:06:44 +0000 (15:06 +0000)]
rgw/logging: fix partitioned key format

Fixes: https://tracker.ceph.com/issues/71537
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit 500534382e708e055fe248ca567639fe66e58450)

3 weeks agorgw/logginag: make unique portion of log object name orderd
Yuval Lifshitz [Tue, 13 May 2025 12:12:19 +0000 (12:12 +0000)]
rgw/logginag: make unique portion of log object name orderd

Fixes: https://tracker.ceph.com/issues/71308
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit 198fa75258bc82d4b9d681275ea7a61a24df8799)

3 weeks agoMerge pull request #64028 from zdover23/wip-doc-2025-06-19-backport-63328-to-tentacle
Zac Dover [Mon, 23 Jun 2025 14:54:37 +0000 (00:54 +1000)]
Merge pull request #64028 from zdover23/wip-doc-2025-06-19-backport-63328-to-tentacle

tentacle: doc/cephadm: Fix automodule generation in certmgr.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Reviewed-by: Adam King <adking@redhat.com>
3 weeks agoMerge pull request #64117 from zdover23/wip-doc-2025-06-24-backport-64075-to-tentacle
Zac Dover [Mon, 23 Jun 2025 14:51:32 +0000 (00:51 +1000)]
Merge pull request #64117 from zdover23/wip-doc-2025-06-24-backport-64075-to-tentacle

tentacle: doc/rados/operations: Actually mention `upmap_max_deviation` setting â€¦

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
3 weeks agorgw/logging: add size in MPU complete in standard mode
Yuval Lifshitz [Mon, 12 May 2025 18:05:30 +0000 (18:05 +0000)]
rgw/logging: add size in MPU complete in standard mode

Fixes: https://tracker.ceph.com/issues/71288
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit ccb6635922f260a779a9779d7def8335058e5431)

3 weeks agoMerge pull request #64094 from zdover23/wip-doc-2025-06-23-backport-64026-to-tentacle
Anthony D'Atri [Mon, 23 Jun 2025 14:14:19 +0000 (10:14 -0400)]
Merge pull request #64094 from zdover23/wip-doc-2025-06-23-backport-64026-to-tentacle

tentacle: doc/radosgw: update aws specification link

3 weeks agorgw/logging: part upload operation name should be REST.PUT.PART
Yuval Lifshitz [Tue, 13 May 2025 16:45:27 +0000 (16:45 +0000)]
rgw/logging: part upload operation name should be REST.PUT.PART

(in standard mode)

Fixes: https://tracker.ceph.com/issues/71312
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit 8dbe7117b658835a6f4e158b18ca35ac5f49c097)

3 weeks agorgw/logging: extract tenant from bucket name on admin flush
Yuval Lifshitz [Thu, 24 Apr 2025 15:34:46 +0000 (15:34 +0000)]
rgw/logging: extract tenant from bucket name on admin flush

test instructions:
https://gist.github.com/yuvalif/adfa186fdbe9ad4c5b689753a15ec480

bug was introduced in: 790c38eacc52cc4c14beb48fca8b204235632793

Fixes: https://tracker.ceph.com/issues/71231
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit 1e117ce6e29a633fd9ef776972bd610ad76f13ff)

3 weeks agorgw/logging: support object metadata changes in journal mode
Yuval Lifshitz [Mon, 12 May 2025 15:43:52 +0000 (15:43 +0000)]
rgw/logging: support object metadata changes in journal mode

Fixes: https://tracker.ceph.com/issues/71255
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit 94016c7786f29f3061a181c132abd4069337f5c9)

3 weeks agorgw/logging: add mtime to get-bucket-logging response
Yuval Lifshitz [Tue, 20 May 2025 08:44:05 +0000 (08:44 +0000)]
rgw/logging: add mtime to get-bucket-logging response

Fixes: https://tracker.ceph.com/issues/71385
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit 81e09bee540b353e26e165dea175e59934adc79d)

3 weeks agodoc/rados/operations: Actually mention `upmap_max_deviation` setting name 64117/head
Niklas Hambüchen [Sat, 21 Jun 2025 17:53:34 +0000 (19:53 +0200)]
doc/rados/operations: Actually mention `upmap_max_deviation` setting name

Signed-off-by: Niklas Hambüchen <mail@nh2.me>
(cherry picked from commit 60797187f33ab69f1947d95106f33f4af3e8af5b)

3 weeks agoqa/cephfs: ignore `OSD_DOWN/osds down` warning 64116/head
Venky Shankar [Tue, 27 May 2025 07:26:12 +0000 (07:26 +0000)]
qa/cephfs: ignore `OSD_DOWN/osds down` warning

Runs have started failing a lot with the human friendly variant
of the warning. OSD_DOWN is in the ignore list, however, the human
friendly warning (osds down) isn't.

Fixes: http://tracker.ceph.com/issues/71446
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 6bdda0bb3411773a2fea5d4ff83db6be295c5a1f)

3 weeks agodoc/radosgw: improve "pubsub_push_pending" info 64112/head
Zac Dover [Mon, 23 Jun 2025 08:47:05 +0000 (18:47 +1000)]
doc/radosgw: improve "pubsub_push_pending" info

Explain in greater detail what the counter "pubsub_push_pending" counts.

Fixes: https://tracker.ceph.com/issues/71793
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 26f2c0ebbe364cd27f8f0ae1adc954ce206371b7)

3 weeks agocls/rbd: use default values for non-decoded fields in test instances 64102/head
Kefu Chai [Sat, 14 Jun 2025 13:44:05 +0000 (21:44 +0800)]
cls/rbd: use default values for non-decoded fields in test instances

Previously, test instances for cls_rbd_snap used non-default values
for the "parent" field, which is ignored during decoding. The
check-generated.sh test passed because they reused the same instance
for re-encoding, preserving undecoded fields.

An upcoming change will allocate new instances for each encode/decode
verification instead of reusing instances. This will expose
discrepancies between original test instances and re-encoded values
when fields contain non-default values but aren't decoded.

This change sets ignored fields to their default values in test
instances, ensuring consistency between encoding and decoding
operations regardless of the verification approach used.

Since the incompatibility of cls_rbd_snap's on-disk format was
introduced in 32b14ed1, which was introduced Ceph v14, we will
mark this version the first incompatible version in ceph-object-corpus
in the sense that the re-encoded cls_rbd_snap with v8 struct version
is different from the original copy if its parent field is set with
< v8 struct version.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit a329b00579546ad2c069d792bde34cd44cfa0b43)

3 weeks agolibrbd, tools: migrate from boost::variant to std::variant 64101/head
Kefu Chai [Wed, 7 May 2025 00:42:52 +0000 (08:42 +0800)]
librbd, tools: migrate from boost::variant to std::variant

Complete migration started in commit 017f333, replacing boost::variant with
std::variant throughout the librbd codebase. This change is part of our ongoing
effort to reduce third-party dependencies by leveraging C++ standard library
alternatives where possible.

Benefits include:
- Improved code readability and maintainability
- Reduced external dependency surface
- More consistent API usage with other components

Implementation note: Unlike Boost.variant, std::variant lacks built-in
operator<< support. This commit implements the necessary operator<< for
AttributeValue, our specific std::variant instantiation, to preserve the
existing behavior.

Also, despite that `apply_visit()` calls can be replaced with `visit()`
without being qualified with `std::` because of ADL, we are taking this
opportunity to adding the `std::` prefix for better readability.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 786ea203a28374d8d9bf0b8cc9c3c43dcf686328)

3 weeks agopybind/mgr/dashboard: fetch image's mirror mode 64100/head
Ramana Raja [Tue, 13 May 2025 16:37:52 +0000 (12:37 -0400)]
pybind/mgr/dashboard: fetch image's mirror mode

... only if the image is not disabled for mirroring.

If the image is disabled for mirroring, fetching the image's
mirroring mode is invalid. So validate that the image is not disabled
for mirroring before fetching the mirroring mode.

Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit 0a705115337d11231c21209c056c6c588c1bc8eb)

3 weeks agopybind/mgr/rbd_support: check whether mirroring is enabled
Ramana Raja [Tue, 6 May 2025 00:07:18 +0000 (20:07 -0400)]
pybind/mgr/rbd_support: check whether mirroring is enabled

... before fetching the mirroring mode of the image.

In the CreateSnapshotRequests class, which asynchronously issues mirror
snapshot creation requests, prevalidation includes checking that the
image is enabled for snapshot-based mirroring and is marked as primary.
Since mirroring mode can only be queried if mirroring is already
enabled, the code first fetches the image’s mirroring info to verify
that mirroring is enabled, and only then retrieves the mirroring mode.

Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit b3f1d2d183d330d80d4e539e3a038f30586a2de0)

3 weeks agolibrbd/api/Mirror: return EINVAL from image_resync()
Ramana Raja [Tue, 6 May 2025 20:19:09 +0000 (16:19 -0400)]
librbd/api/Mirror: return EINVAL from image_resync()

... when mirroring is not enabled for the image.

Mirror::image_resync() returns ENOENT when mirroring is disabled for the
image. Instead, make it return EINVAL indicating that the call is
invalid when mirroring is not enabled for the image. This also causes
the public facing C, C++, and Python APIs that resync an image to
return EINVAL or raise an equivalent exception when mirroring is not
enabled for the image.

Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit 4c992e6c8555ba370717c11d9d8ead1a52f97968)

3 weeks agolibrbd/mirror/PromoteRequest: return EINVAL
Ramana Raja [Mon, 5 May 2025 23:37:42 +0000 (19:37 -0400)]
librbd/mirror/PromoteRequest: return EINVAL

... instead of ENOENT when mirroring is not enabled for the image.

The PromoteRequest async state machine returns ENOENT when mirroring is
not enabled for the image. Instead, make it return EINVAL similar to
DemoteRequest's behavior, which is more appropriate. This also causes
the public facing C, C++, and Python APIs that promote an image
to return EINVAL or raise an equivalent exception when mirroring is
not enabled for the image.

Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit eb19563c93f55757e9dabd050dfdd02233f33532)

3 weeks agolibrbd/api/Mirror: return EINVAL from image_get_mode()
Ramana Raja [Mon, 5 May 2025 17:31:34 +0000 (13:31 -0400)]
librbd/api/Mirror: return EINVAL from image_get_mode()

... when the image is disabled for mirroring.

When an image is disabled for mirroring, fetching the image's
mirroring mode is invalid. So, modify the Mirror::image_get_mode()
internal API to return EINVAL instead of success when mirroring is
disabled.

The Mirror::image_get_mode() method is called by the public C++, C, and
Python APIs that fetch the mirroring mode of an image. The behavior of
these public APIs will change. They will return an error code or raise
an exception indicating that it's an invalid operation to fetch the
image's mirroring mode when mirroring is disabled.

Fixes: https://tracker.ceph.com/issues/71226
Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit 479014e372a994813d8820e59f69479acc7ea06b)

3 weeks agotools/rbd/action/MirrorPool: remove dead branch
Ramana Raja [Mon, 5 May 2025 18:29:08 +0000 (14:29 -0400)]
tools/rbd/action/MirrorPool: remove dead branch

mirror_image_get_info() API doesn't fail with ENOENT when mirroring is
disabled since commit c9c8852. So, no need to handle ENOENT error from
mirror_image_get_info() API.

Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit 268966a364468c12b675c8bfae762931a3ea102d)

3 weeks agolibrbd: retry list_snap_orders() once instead of failing sort_snaps() 64099/head
VinayBhaskar-V [Thu, 15 May 2025 14:18:30 +0000 (19:48 +0530)]
librbd: retry list_snap_orders() once instead of failing sort_snaps()

If snapshot listing races with snapshot creation, rbd group snap ls
may spuriously return an unsorted listing and internal fail_if_not_sorted=true
consumers may generate a spurious EINVAL error. This is because snap orders
are listed before snaps themselves and the "missing order for snap" check in
sort_snaps() is driven by the list of snaps.
This can be improved by grabbing snap order keys one more time, adding the newly
discovered snap orders to m_snap_orders and retrying instead of immediately failing the sort.

Fixes: https://tracker.ceph.com/issues/67984
Signed-off-by: VinayBhaskar-V <vvarada@redhat.com>
(cherry picked from commit 3c21aec85e986f5206e83fcdcf5b1e0441b35f56)

3 weeks agodoc/radosgw: update aws specification link 64094/head
Zac Dover [Thu, 19 Jun 2025 06:24:24 +0000 (16:24 +1000)]
doc/radosgw: update aws specification link

Update the link to the AWS specification format.

Fixes: https://tracker.ceph.com/issues/68619
Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit cca1a17d07fd31ccf13acf846ad13e6fad85d5f2)

3 weeks agolibrbd/cache/pwl: fix memory leak in SyncPoint persist context cleanup 64098/head
Kefu Chai [Tue, 3 Jun 2025 08:07:33 +0000 (16:07 +0800)]
librbd/cache/pwl: fix memory leak in SyncPoint persist context cleanup

Previously, SyncPoint allocated two C_Gather instances tracked by raw
pointers but failed to properly clean them up when only a single sync
point existed, causing memory leaks detected by AddressSanitizer.

This change fixes the leak by modifying AbstractWriteLog::shut_down()
to check for prior sync points in the chain. When the current sync point
is the only one present, we now activate the m_prior_log_entries_persisted
context to ensure:

- The onfinish callback executes and releases the captured strong
  reference to the enclosing SyncPoint
- The parent m_sync_point_persist context completes and gets properly
  released

This ensures all allocated contexts are cleaned up correctly during
shutdown, eliminating the memory leak.

The ASan report:

```
Indirect leak of 2064 byte(s) in 1 object(s) allocated from:
    #0 0x56440919ae2d in operator new(unsigned long) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_librbd+0x2f3de2d) (BuildId: 6a04677c6ee5235f1a41815df807f97c5b96d4cd)
    #1 0x56440bd67751 in __gnu_cxx::new_allocator<Context*>::allocate(unsigned long, void const*) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/ext/new_allocator.h:127:27
    #2 0x56440bd676e0 in std::allocator<Context*>::allocate(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/allocator.h:185:32
    #3 0x56440bd676e0 in std::allocator_traits<std::allocator<Context*>>::allocate(std::allocator<Context*>&, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:464:20
    #4 0x56440bd6730b in std::_Vector_base<Context*, std::allocator<Context*>>::_M_allocate(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:346:20
    #5 0x7fd33e00e8d1 in std::vector<Context*, std::allocator<Context*>>::reserve(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/vector.tcc:78:22
    #6 0x7fd33e00c51c in librbd::cache::pwl::SyncPoint::SyncPoint(unsigned long, ceph::common::CephContext*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/SyncPoint.cc:20:27
    #7 0x56440bd65f26 in decltype(::new((void*)(0)) librbd::cache::pwl::SyncPoint(std::declval<unsigned long&>(), std::declval<ceph::common::CephContext*&>())) std::construct_at<librbd::cache::pwl::SyncPoint, unsigned long&, ceph::common::CephContext*&>(librbd::cache::pwl::SyncPoint*, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:97:39
    #8 0x56440bd65b98 in void std::allocator_traits<std::allocator<librbd::cache::pwl::SyncPoint>>::construct<librbd::cache::pwl::SyncPoint, unsigned long&, ceph::common::CephContext*&>(std::allocator<librbd::cache::pwl::SyncPoint>&, librbd::cache::pwl::SyncPoint*, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:518:4
    #9 0x56440bd657d3 in std::_Sp_counted_ptr_inplace<librbd::cache::pwl::SyncPoint, std::allocator<librbd::cache::pwl::SyncPoint>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<unsigned long&, ceph::common::CephContext*&>(std::allocator<librbd::cache::pwl::SyncPoint>, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:519:4
    #10 0x56440bd65371 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<librbd::cache::pwl::SyncPoint, std::allocator<librbd::cache::pwl::SyncPoint>, unsigned long&, ceph::common::CephContext*&>(librbd::cache::pwl::SyncPoint*&, std::_Sp_alloc_shared_tag<std::allocator<librbd::cache::pwl::SyncPoint>>, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:651:6
    #11 0x56440bd65163 in std::__shared_ptr<librbd::cache::pwl::SyncPoint, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<librbd::cache::pwl::SyncPoint>, unsigned long&, ceph::common::CephContext*&>(std::_Sp_alloc_shared_tag<std::allocator<librbd::cache::pwl::SyncPoint>>, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1342:14
    #12 0x56440bd650e6 in std::shared_ptr<librbd::cache::pwl::SyncPoint>::shared_ptr<std::allocator<librbd::cache::pwl::SyncPoint>, unsigned long&, ceph::common::CephContext*&>(std::_Sp_alloc_shared_tag<std::allocator<librbd::cache::pwl::SyncPoint>>, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:409:4
    #13 0x56440bd65057 in std::shared_ptr<librbd::cache::pwl::SyncPoint> std::allocate_shared<librbd::cache::pwl::SyncPoint, std::allocator<librbd::cache::pwl::SyncPoint>, unsigned long&, ceph::common::CephContext*&>(std::allocator<librbd::cache::pwl::SyncPoint> const&, unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:862:14
    #14 0x56440bca97e7 in std::shared_ptr<librbd::cache::pwl::SyncPoint> std::make_shared<librbd::cache::pwl::SyncPoint, unsigned long&, ceph::common::CephContext*&>(unsigned long&, ceph::common::CephContext*&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:878:14
    #15 0x56440bd443c8 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::new_sync_point(librbd::cache::pwl::DeferredContexts&) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:1905:20
    #16 0x56440bd42e4c in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::flush_new_sync_point(librbd::cache::pwl::C_FlushRequest<librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>>*, librbd::cache::pwl::DeferredContexts&) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:1951:3
    #17 0x56440bd9cbf2 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::flush_new_sync_point_if_needed(librbd::cache::pwl::C_FlushRequest<librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>>*, librbd::cache::pwl::DeferredContexts&) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:1990:5
    #18 0x56440bd9c636 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::internal_flush(bool, Context*)::'lambda'(librbd::cache::pwl::GuardedRequestFunctionContext&)::operator()(librbd::cache::pwl::GuardedRequestFunctionContext&) const /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:2152:9
    #19 0x56440bd9b9b4 in boost::detail::function::void_function_obj_invoker<librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::internal_flush(bool, Context*)::'lambda'(librbd::cache::pwl::GuardedRequestFunctionContext&), void, librbd::cache::pwl::GuardedRequestFunctionContext&>::invoke(boost::detail::function::function_buffer&, librbd::cache::pwl::GuardedRequestFunctionContext&) /opt/ceph/include/boost/function/function_template.hpp:100:11
    #20 0x56440bd29321 in boost::function_n<void, librbd::cache::pwl::GuardedRequestFunctionContext&>::operator()(librbd::cache::pwl::GuardedRequestFunctionContext&) const /opt/ceph/include/boost/function/function_template.hpp:789:14
    #21 0x56440bd28d85 in librbd::cache::pwl::GuardedRequestFunctionContext::finish(int) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/Request.h:335:5
    #22 0x5644091e0fe0 in Context::complete(int) /home/jenkins-build/build/workspace/ceph-pull-requests/src/include/Context.h:102:5
    #23 0x56440bd9b378 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::detain_guarded_request(librbd::cache::pwl::C_BlockIORequest<librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>>*, librbd::cache::pwl::GuardedRequestFunctionContext*, bool) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:1202:20
    #24 0x56440bd96c50 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::internal_flush(bool, Context*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:2154:3
    #25 0x56440bd1e4b5 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::shut_down(Context*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/cache/pwl/AbstractWriteLog.cc:703:3
    #26 0x56440bdb9022 in librbd::cache::pwl::TestMockCacheSSDWriteLog_compare_and_write_compare_matched_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librbd/cache/pwl/test_mock_SSDWriteLog.cc:403:7
```

Fixes: https://tracker.ceph.com/issues/71335
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 05fd6f90e6e528c628af7fbb106b73e89d57464c)

3 weeks agotest: multi client file read/write test for extending writes 64089/head
Venky Shankar [Tue, 20 May 2025 12:20:39 +0000 (12:20 +0000)]
test: multi client file read/write test for extending writes

Credit to @anoopcs9 for the reproducer.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 624cb4ce5931faa7a0ecb6673312ff7f9624050c)

 Conflicts:
src/test/libcephfs/test.cc

Test `InodeGetPut` is not in tentacle branch.

3 weeks agoclient: do not check file size when inode does not have Fc caps
Venky Shankar [Tue, 20 May 2025 12:19:41 +0000 (12:19 +0000)]
client: do not check file size when inode does not have Fc caps

Since the client is holding Fr caps, the read request can be
directly sent to the OSD. The offset/in->size comparison check
is causing the read request to return with no data since in->size
isn't yet updated when another client does an extending write.

Introduced-by: 942474c2f5b4c696364f3b7411ae7d96444edfa8
Fixes: http://tracker.ceph.com/issues/70726
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 2b74598afc52d1a6cb98ef6b524ec162360cf040)

3 weeks agoclient: asynchronous fsync can decrement request ref twice
Venky Shankar [Fri, 30 May 2025 18:11:19 +0000 (18:11 +0000)]
client: asynchronous fsync can decrement request ref twice

After the asynchronous execution context is woken up when waiting
for Fb caps reference to be released causing the clien to crash
as per:

```
0x00007f3115b2452c in __pthread_kill_implementation () from /lib64/libc.so.6
0x00007f3115ad7686 in raise () from /lib64/libc.so.6
0x00007f3115ac1833 in abort () from /lib64/libc.so.6
0x00007f3113375d0a in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/common/assert.cc:74
0x00007f3113375e6f in ceph::__ceph_assert_fail (ctx=...) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/common/assert.cc:79
0x00007f311237db1d in xlist<MetaRequest*>::item::~item (this=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/xlist.h:31
MetaRequest::~MetaRequest (this=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/MetaRequest.cc:65
Client::put_request (this=0x564b491726c0, request=0x7f301c0165c0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:2140
0x00007f31123c88ad in Client::C_nonblocking_fsync_state::advance (this=0x7f307002e9f0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:11905
0x00007f3112331ccd in Context::complete (this=0x7f3070009250, r=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/Context.h:99
0x00007f311246a964 in Client::signal_context_list(std::__cxx11::list<Context*, std::allocator<Context*> >&) [clone .constprop.0] (ls=std::__cxx11::list = {...}, this=<optimized out>)
    at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:4257
0x00007f3112395f45 in Client::put_cap_ref (this=0x564b491726c0, in=0x7f306807be90, cap=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:3611
0x00007f31123331f3 in Client::C_Write_Finisher::finish_io (r=0, this=0x7f30240442d0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:11381
Client::CWF_iofinish::finish (this=<optimized out>, r=0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.h:1481
0x00007f3112331ccd in Context::complete (this=0x7f302401afd0, r=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/Context.h:99
0x00007f31123c5242 in Client::C_Lock_Client_Finisher::finish (this=0x7f302403c9d0, r=0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:11372
0x00007f3112331ccd in Context::complete (this=0x7f302403c9d0, r=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/Context.h:99
0x00007f31134374ad in Finisher::finisher_thread_entry (this=0x564b491730b0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/common/Finisher.cc:72
0x00007f3115b227e2 in start_thread () from /lib64/libc.so.6
0x00007f3115ba7800 in clone3 () from /lib64/libc.so.6
0x0000000000000000 in ?? ()
```

Fixes: http://tracker.ceph.com/issues/71510
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit ad5a42c75cacfe7cd28d053455e9612fc96d4191)

3 weeks agoclient: fix memory leak in Client::CRF_iofinish::complete
Shachar Sharon [Tue, 22 Oct 2024 12:06:54 +0000 (15:06 +0300)]
client: fix memory leak in Client::CRF_iofinish::complete

Commit 1210ddf7a ("Client: Add non-blocking helper classes") introduced
Client::C_Read_Finisher Context object for async READ operations, but
it has a read-after-free bug which may cause memory leak when calling
libcephf's non-blocking ceph_ll_nonblocking_readv_writev API with async
READ:

ceph_ll_nonblocking_readv_writev (READ)
  Client::ll_preadv_pwritev
  ...
    Client::_read_async
      Context::complete
        Client::CRF_iofinish::complete
          Client::CRF_iofinish::finish
          CRF->finish_io()
            Client::C_Read_Finisher::finish_io
            ...
            delete this; // frees CRF_iofinish->CRF
          if (CRF->iofinished) // use-after-free of CRF
            delete this; // may not get here

A possible memory leak depends on timing and race with other thread
allocation which alters the memory address of CRF->iofinished to
false, thus skipping the last delete operation.

The check of `if (CRF->iofinished)` is unnecessary: it is always set to
true upon calling CRF->finish_io(). Thus, there is no need to have the
override function Client::CRF_iofinish::complete() as it now has the
same logic as Context::complete(). Removed.

Signed-off-by: Shachar Sharon <ssharon@redhat.com>
(cherry picked from commit 6dc77563d4dac8c7e2f41dae445acba7694fa192)

4 weeks agoMerge pull request #64066 from ronen-fr/wip-rf-64048-tentacle
Ronen Friedman [Sat, 21 Jun 2025 07:41:49 +0000 (10:41 +0300)]
Merge pull request #64066 from ronen-fr/wip-rf-64048-tentacle

tentacle: osd/scrub: clarify that osd_scrub_auto_repair_num_errors counts objects

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
4 weeks agoMerge pull request #63747 from aclamk/aclamk-fix-70911-envelope-dirty-recover-tentacle
Jaya Prakash [Fri, 20 Jun 2025 14:13:05 +0000 (19:43 +0530)]
Merge pull request #63747 from aclamk/aclamk-fix-70911-envelope-dirty-recover-tentacle

tentacle: os/bluestore: Fix bluefs_fnode_t::seek

Reviewed-by: Jaya Prakash <jayaprakash@ibm.com>
4 weeks agoosd/scrub: clarify that osd_scrub_auto_repair_num_errors counts objects 64066/head
Ronen Friedman [Thu, 19 Jun 2025 15:27:38 +0000 (10:27 -0500)]
osd/scrub: clarify that osd_scrub_auto_repair_num_errors counts objects

'osd_scrub_auto_repair_num_errors' limits the number of damaged objects
that we will try to auto-repair during a scrub. Its documentation
referred to "number of errors", which did not fit the implementation.

Fixes: https://tracker.ceph.com/issues/71754
Fixes: Red Hat BZ2316244
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
(cherry picked from commit 680b58ffd0bf5b213ec525f8d783297fb0b14343)

4 weeks agoMerge pull request #64057 from zdover23/wip-doc-2025-06-20-backport-63163-to-tentacle
Anthony D'Atri [Fri, 20 Jun 2025 13:25:53 +0000 (09:25 -0400)]
Merge pull request #64057 from zdover23/wip-doc-2025-06-20-backport-63163-to-tentacle

tentacle: doc/radosgw: Cosmetic improvements in dynamicresharding.rst

4 weeks agodoc/radosgw: Cosmetic improvements in dynamicresharding.rst 64057/head
Ville Ojamo [Wed, 7 May 2025 09:48:24 +0000 (16:48 +0700)]
doc/radosgw: Cosmetic improvements in dynamicresharding.rst

Make reference to config section a hyperlink.

Capitalization consistency: use title case in section titles, fix two
invalid capitalizations in text.

Promptify CLI example commands.

A JSON key-value pair is a "property" and not an "object".

Use an ordered list instead of inline code with hardcoded list numbers.

Use the American "canceled" (majority of occurrences in doc/) instead of
"cancelled".

Use admonitions instead of spelling out "Note:".
Clarify language on sharding cleanup for multisite.

Format JSON keys as inline code.

Indent example JSON output from radosgw-admin correctly (same as real
output) with 4 spaces.

Use colon instead of full stop at the end of text that describes the
following example command. Move admonition to after such example
command.

Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
(cherry picked from commit cbb9ab7716ae98ab80e485a6a4e3149e49be88aa)

4 weeks agofix: typo remove whitespace 64051/head
mertsunacoglu [Tue, 10 Jun 2025 12:11:59 +0000 (14:11 +0200)]
fix: typo remove whitespace

Signed-off-by: mertsunacoglu <emin.sunacoglu@clyso.com>
(cherry picked from commit 343db61578adafc25716df8e071c4282ed084fbf)

4 weeks agofix: Revert url_decode to old behaviour
Emin [Tue, 10 Jun 2025 09:03:21 +0000 (11:03 +0200)]
fix: Revert url_decode to old behaviour

Signed-off-by: Emin <emin.sunacoglu@clyso.com>
(cherry picked from commit c603ce719aca906d75af60e7d31bf13db09d8ec6)

4 weeks agofix: remove double url_decode from the copy_source and fix url_decode
Emin [Mon, 26 May 2025 14:11:19 +0000 (16:11 +0200)]
fix: remove double url_decode from the copy_source and fix url_decode

Signed-off-by: Emin <emin.sunacoglu@clyso.com>
(cherry picked from commit 1510987b8606d8906ba53d4f343a788209707bcf)

4 weeks agofix:Add empty string check after url_decode
Emin [Wed, 21 May 2025 12:53:45 +0000 (14:53 +0200)]
fix:Add empty string check after url_decode

Signed-off-by: Emin <emin.sunacoglu@clyso.com>
(cherry picked from commit c43ea6253d01c538ea08b371b159a7360c2042cf)

4 weeks agoqa/standalone/mon/availability.sh: add test for config option 64045/head
Shraddha Agrawal [Thu, 29 May 2025 10:10:01 +0000 (15:40 +0530)]
qa/standalone/mon/availability.sh: add test for config option

This commit adds two tests, first, to ensure we get an error
message when the feature is disabled. It checks if the config
option, enable_availability_tracking is working properly.
Second test ensures that we actually do stop calculating the
score when the feature is disabled.

Fixes: https://tracker.ceph.com/issues/71494
Signed-off-by: Shraddha Agrawal <shraddhaag@ibm.com>
(cherry picked from commit dc9ee94a8dc787324b898822fdaebfb83dfc7e37)

4 weeks agodoc: add docs and update release notes for the new config option
Shraddha Agrawal [Thu, 29 May 2025 08:05:40 +0000 (13:35 +0530)]
doc: add docs and update release notes for the new config option

Fixes: https://tracker.ceph.com/issues/71494
Signed-off-by: Shraddha Agrawal <shraddhaag@ibm.com>
(cherry picked from commit 80c7492a808609e0f4933e5d2e5ee24af0b4e2d8)

4 weeks agomon/MgrStatMonitor: ignore duration for which feature is off
Shraddha Agrawal [Thu, 22 May 2025 10:26:41 +0000 (15:56 +0530)]
mon/MgrStatMonitor: ignore duration for which feature is off

When the availability tracking feature is disabled, we should not
be updating the score. We should start recalculating the score
when the user enables the features again. Essentially, for the
purpose of calculating the score, we need to ignore the duration
for which the feature was turned off.

The score is calculated from the uptime and downtime durations
recorded in `pool_availability` object. These durations are updated
in `calc_pool_availability` by adding the diff between last_uptime/
last_downtime and now.

To discard the duration for which the feature was turned off, we
need to offset the uptime/downtime by this duration. A simple way
to do this is to update the last_uptime and last_downtime to the
timestamp when the feature is toggled on again. To implement the
same, we record the time at which the feature is toggled from off
to on. When `calc_pool_availability` is invoked, if a reset is
required, it resets last_uptime and last_downtime before proceeding
with availability calculations.

We only care about the state when the feature is toggled from off to
on. All other toggle states for the config option will not have any
effect on the score.

Fixes: https://tracker.ceph.com/issues/71494
Signed-off-by: Shraddha Agrawal <shraddhaag@ibm.com>
(cherry picked from commit d81d2af8fcd708f20f54f863dd613fade57af6e5)

4 weeks agoMgrStatMonitor: add config observer
Shraddha Agrawal [Thu, 22 May 2025 09:16:50 +0000 (14:46 +0530)]
MgrStatMonitor: add config observer

This commit adds a config observer to MgrStatMonitor so we
can track when a user enables/disables enable_availability_tracking
config option. The time difference between disabling and then
enabling the config option will be used to offset the uptime
and/or downtime from the availability score feature.

Fixes: https://tracker.ceph.com/issues/71494
Signed-off-by: Shraddha Agrawal <shraddhaag@ibm.com>
(cherry picked from commit c318f80ee2eeefbba9865f07026e18c313cba558)

4 weeks agomon/MgrStatMonitor.cc: do not update score when disabled
Shraddha Agrawal [Thu, 22 May 2025 08:20:57 +0000 (13:50 +0530)]
mon/MgrStatMonitor.cc: do not update score when disabled

This commit adds changes to ensure the availability score
tracking is not updated when the feature is disabled. We
will preserve the score calculated before the feature is
turned off and start updating it again when the feature
is enabled.

Fixes: https://tracker.ceph.com/issues/71494
Signed-off-by: Shraddha Agrawal <shraddhaag@ibm.com>
(cherry picked from commit 017c9b9f4fa0d3286d52b7acb7df912327b1f836)

4 weeks agosrc/common/options: add config option for availability score
Shraddha Agrawal [Tue, 6 May 2025 06:20:59 +0000 (11:50 +0530)]
src/common/options: add config option for availability score

This commit modifies src/common/options/mon.yaml.in to add a
new config option to enable/disable tracking availability
score. This config option can be modified dynamically at
runtime as well.

To enable tracking availability score, we can run the
following command:

  ceph config set mon enable_availability_tracking true

By default, tracking availability score is enabled.

To disable tracking availability score:

  ceph config set mon enable_availablity_tracking false

When the feature is turned off, invoking the
`availability-status` command will display an error, prompting
the user to turn on the feature using the config option.

Fixes: https://tracker.ceph.com/issues/71494
Signed-off-by: Shraddha Agrawal <shraddhaag@ibm.com>
(cherry picked from commit 9ccef704550148b63f973ce69ab2147f7a162ba4)

4 weeks agodoc/mgr/telemetry: add doc for telemetry upgrade tests 64044/head
Naveen Naidu [Mon, 27 Jan 2025 14:29:38 +0000 (19:59 +0530)]
doc/mgr/telemetry: add doc for telemetry upgrade tests

Signed-off-by: Naveen Naidu <naveen.naidu@ibm.com>
(cherry picked from commit f11034b2a2203b0f9b1cf6da24ed45269c0f878d)

4 weeks agoosd/PeeringState: handle race condition of DeferBackfill event for Backfilling state 64043/head
Naveen Naidu [Thu, 29 May 2025 08:58:32 +0000 (14:28 +0530)]
osd/PeeringState: handle race condition of DeferBackfill event for Backfilling state

Currently when PG in `Backfilling` state receives a `DeferBackfill`
event, there are cases when that event could race with
`MOSDPGBackfill::OP_BACKFILL_FINISH` becasue the PG has already
finished backfilling. In such case, the following
happens:
  1. PG state set to `PG_STATE_BACKFILL_WAIT`
  2. Suspend backfilling
  3. Discard the event

Notice that we do not reschedule backfill in the above steps, this can
lead to a situation where the PG gets stuck in a `backfill_wait` state
forever. This bug got introduced due to the following commit:

`865839f`: osd/PeeringState: check racing with OP_BACKFILL_FINISH when defering
backfill
Link: https://github.com/ceph/ceph/pull/60185
This commit, fixes that by making sure that in race conditions such as
above - we only discard the event.

Fixes: https://tracker.ceph.com/issues/71010
Signed-off-by: Naveen Naidu <naveen.naidu@ibm.com>
(cherry picked from commit b2bd15b4485f367c3f599a3d233d6e506b3285d1)

4 weeks agoMerge pull request #64020 from zdover23/wip-doc-2025-06-19-backport-63983-to-tentacle
Anthony D'Atri [Thu, 19 Jun 2025 11:09:20 +0000 (07:09 -0400)]
Merge pull request #64020 from zdover23/wip-doc-2025-06-19-backport-63983-to-tentacle

tentacle: doc/radosgw/admin.rst: explain bucket and uid flags for bucket quota

4 weeks agoMerge pull request #64023 from zdover23/wip-doc-2025-06-19-backport-63907-to-tentacle
Anthony D'Atri [Thu, 19 Jun 2025 11:07:21 +0000 (07:07 -0400)]
Merge pull request #64023 from zdover23/wip-doc-2025-06-19-backport-63907-to-tentacle

tentacle: doc/radosgw: edit cloud-transition (1 of x)

4 weeks agoMerge pull request #63799 from ronen-fr/wip-rf-63758-tentacle
Ronen Friedman [Thu, 19 Jun 2025 10:58:10 +0000 (13:58 +0300)]
Merge pull request #63799 from ronen-fr/wip-rf-63758-tentacle

tentacle: osd/scrub: move m_session_started_at out of Session ctor

Reviewed-by: Samuel Just <sjust@redhat.com>
4 weeks agoMerge pull request #64032 from zdover23/wip-doc-2025-06-19-backport-60440-to-tentacle
Anthony D'Atri [Thu, 19 Jun 2025 10:50:18 +0000 (06:50 -0400)]
Merge pull request #64032 from zdover23/wip-doc-2025-06-19-backport-60440-to-tentacle

tentacle: doc: mgr/dashboard: add OAuth2 SSO documentation

4 weeks agodoc: mgr/dashboard: add OAuth2 SSO documentation 64032/head
Pedro Gonzalez Gomez [Tue, 22 Oct 2024 19:11:56 +0000 (21:11 +0200)]
doc: mgr/dashboard: add OAuth2 SSO documentation

Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>
(cherry picked from commit 3e63860433a53d7d92d593beb3a4a02643b6ea98)

doc: mgr/dashboard: add --enable-auth flag

Add an instruction that includes the --enable-auth flag in a "git orch
apply mgmt-gateway" command, in accordance with a request made by
afreen23 here: https://github.com/ceph/ceph/pull/60440#discussion_r1953530599

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 30dc60b81493537daf0805faf50b47460c2f80d1)

4 weeks agomonitoring: Fix NVMeoF subsys/namespace limit alerts 64031/head
Vallari Agrawal [Thu, 24 Apr 2025 12:08:12 +0000 (17:38 +0530)]
monitoring: Fix NVMeoF subsys/namespace limit alerts

Change NVMeoFTooManyNamespaces and NVMeoFTooManySubsystems
alert to trigger for ">= $limit" instead of "> $limit".

Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
(cherry picked from commit 73dadbd269bebe1529a7c247725c5b6aabb1a093)

4 weeks agoRevert PR "mon: Add nvmeof group/gateway name in "ceph -s"" 64029/head
Vallari Agrawal [Wed, 21 May 2025 12:20:32 +0000 (17:50 +0530)]
Revert PR "mon: Add nvmeof group/gateway name in "ceph -s""

Revert "mon: show count of active/total nvmeof gws in "ceph -s""
This reverts commit 3065ffeb01428dd319bdcd4f1c16c3f92a32c723.

Revert "mon: Add nvmeof group/gateway name in  "ceph -s""
This reverts commit e3fab2a50f1a1d8444dbf34a6df22733f4f1be17.

Fixes: https://tracker.ceph.com/issues/71435
Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
(cherry picked from commit 4cfc6f105cba6e5366b060fe8afbe5b6cfa623fe)

4 weeks agodoc/cephadm: Fix automodule generation in certmgr.rst 64028/head
Ville Ojamo [Sun, 18 May 2025 04:28:48 +0000 (11:28 +0700)]
doc/cephadm: Fix automodule generation in certmgr.rst

Wrong path was passed to automodule and no documentation was generated.
Use the right file name.

Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
(cherry picked from commit f36746c2a472ef6876279a4d0d13b449327b95a1)

4 weeks agoMerge pull request #63679 from zdover23/wip-doc-2025-06-04-backport-63623-to-tentacle
Zac Dover [Thu, 19 Jun 2025 05:17:17 +0000 (15:17 +1000)]
Merge pull request #63679 from zdover23/wip-doc-2025-06-04-backport-63623-to-tentacle

tentacle: doc/mgr: edit iostat.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
4 weeks agoMerge pull request #63763 from zdover23/wip-doc-2025-06-06-backport-63085-to-tentacle
Zac Dover [Thu, 19 Jun 2025 05:03:07 +0000 (15:03 +1000)]
Merge pull request #63763 from zdover23/wip-doc-2025-06-06-backport-63085-to-tentacle

tentacle: doc/src/common/options: mgr.yaml.in edit

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
4 weeks agodoc/radosgw: edit cloud-transition (1 of x) 64023/head
Zac Dover [Thu, 12 Jun 2025 11:28:57 +0000 (21:28 +1000)]
doc/radosgw: edit cloud-transition (1 of x)

Edit the first hundred lines of doc/radosgw/cloud-transition.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 9ad5a65fe6cf883cf34bceae0314f55bcf599c96)

4 weeks agoMerge pull request #63954 from zdover23/wip-doc-2025-06-16-backport-63821-to-tentacle
Zac Dover [Thu, 19 Jun 2025 04:52:29 +0000 (14:52 +1000)]
Merge pull request #63954 from zdover23/wip-doc-2025-06-16-backport-63821-to-tentacle

tentacle: doc/src: edit osd.yaml.in (osd_deep_scrub_interval_cv)

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>