]> git.apps.os.sepia.ceph.com Git - ceph-ci.git/log
ceph-ci.git
5 days agoMerge pull request #65679 from Hezko/wip-73256-tentacle
afreen23 [Mon, 29 Sep 2025 18:48:36 +0000 (00:18 +0530)]
Merge pull request #65679 from Hezko/wip-73256-tentacle

tentacle: mgr/dashboard: fix None force param handling in ns add_host so it won't raise exceptions

Reviewed-by: Nizamudeen A <nia@redhat.com>
5 days agoMerge pull request #65677 from Hezko/wip-73258-tentacle
afreen23 [Mon, 29 Sep 2025 18:48:05 +0000 (00:18 +0530)]
Merge pull request #65677 from Hezko/wip-73258-tentacle

tentacle: mgr/dashboard: add nsid param to ns add command

Reviewed-by: Nizamudeen A <nia@redhat.com>
5 days agoMerge pull request #65678 from Hezko/wip-73257-tentacle
Pedro Gonzalez Gomez [Mon, 29 Sep 2025 11:54:58 +0000 (13:54 +0200)]
Merge pull request #65678 from Hezko/wip-73257-tentacle

tentacle: mgr/dashboard: --no-group-append default value to False, aligned with old cli"

Reviewed-by: nizamial09 <nia@redhat.com>
5 days agoMerge pull request #65705 from rhcs-dashboard/wip-73275-tentacle
Pedro Gonzalez Gomez [Mon, 29 Sep 2025 11:50:54 +0000 (13:50 +0200)]
Merge pull request #65705 from rhcs-dashboard/wip-73275-tentacle

tentacle: mgr/dashboard: Blank entry for Storage Capacity in dashboard under Cluster > Expand Cluster > Review

Reviewed-by: Pedro Gonzalez Gomez <pegonzal@ibm.com>
5 days agoMerge pull request #65692 from aaSharma14/wip-73273-tentacle
Pedro Gonzalez Gomez [Mon, 29 Sep 2025 11:46:16 +0000 (13:46 +0200)]
Merge pull request #65692 from aaSharma14/wip-73273-tentacle

tentacle: ceph-mixin: Update monitoring mixin

Reviewed-by: Pedro Gonzalez Gomez <pegonzal@ibm.com>
5 days agoMerge pull request #65708 from aaSharma14/wip-73292-tentacle
Pedro Gonzalez Gomez [Mon, 29 Sep 2025 11:36:33 +0000 (13:36 +0200)]
Merge pull request #65708 from aaSharma14/wip-73292-tentacle

tentacle: monitoring: fix MTU Mismatch alert rule and expr

Reviewed-by: nizamial09 <nia@redhat.com>
Reviewed-by: Pedro Gonzalez Gomez <pegonzal@ibm.com>
5 days agoMerge pull request #65672 from rhcs-dashboard/wip-73251-tentacle
Pedro Gonzalez Gomez [Mon, 29 Sep 2025 11:29:47 +0000 (13:29 +0200)]
Merge pull request #65672 from rhcs-dashboard/wip-73251-tentacle

tentacle: mgr/dashboard: fix data mismatch in Advance section in Tiering.

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Pedro Gonzalez Gomez <pegonzal@ibm.com>
5 days agomonitoring: fix MTU Mismatch alert rule and expr
Aashish Sharma [Wed, 2 Jul 2025 11:05:14 +0000 (16:35 +0530)]
monitoring: fix MTU Mismatch alert rule and expr

Fixes: https://tracker.ceph.com/issues/73290
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit bee24dec441b9e6b263e4498c2ab333b0a60a52d)

Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/cluster/prometheus/active-alert-list/active-alert-list.component.html
src/pybind/mgr/dashboard/frontend/src/app/ceph/cluster/prometheus/active-alert-list/active-alert-list.component.ts

5 days agoMerge pull request #65583 from VallariAg/wip-73109-tentacle
Vallari Agrawal [Mon, 29 Sep 2025 07:13:35 +0000 (12:43 +0530)]
Merge pull request #65583 from VallariAg/wip-73109-tentacle

tentacle: qa/suites/nvmeof: add upgrade sub-suite

6 days agomgr/dashboard: Blank entry for Storage Capacity in dashboard under Cluster > Expand...
Naman Munet [Wed, 24 Sep 2025 07:23:40 +0000 (12:53 +0530)]
mgr/dashboard: Blank entry for Storage Capacity in dashboard under Cluster > Expand Cluster > Review

https://tracker.ceph.com/issues/73220

Signed-off-by: Naman Munet <naman.munet@ibm.com>
(cherry picked from commit a01909e7588c7ff757079475e3ea6f1dc3054db7)

8 days agoMerge pull request #65587 from adamemerson/wip-perfcounters-unique-string-tentacle
Yuri Weinstein [Fri, 26 Sep 2025 19:56:11 +0000 (12:56 -0700)]
Merge pull request #65587 from adamemerson/wip-perfcounters-unique-string-tentacle

tentacle: common: Allow PerfCounters to return a provided service ID

Reviewed-by: Adam Emerson <aemerson@redhat.com>
8 days agoMerge pull request #65478 from pdvian/wip-72911-tentacle
Yuri Weinstein [Fri, 26 Sep 2025 15:11:55 +0000 (08:11 -0700)]
Merge pull request #65478 from pdvian/wip-72911-tentacle

tentacle: osd: stop scrub_purged_snaps() from ignoring osd_beacon_report_interval

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
8 days agoMerge pull request #65464 from pdvian/wip-72851-tentacle
Yuri Weinstein [Fri, 26 Sep 2025 15:11:10 +0000 (08:11 -0700)]
Merge pull request #65464 from pdvian/wip-72851-tentacle

tentacle: mgr/DaemonState: Minimise time we hold the DaemonStateIndex lock

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
8 days agoceph-mixin: reset auto_count to 10
Aashish Sharma [Wed, 17 Sep 2025 09:34:11 +0000 (15:04 +0530)]
ceph-mixin: reset auto_count to 10

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit ebca859c5d22740678baff0353b39c5642f1eb0b)

8 days agoceph-mixin: Update monitoring mixin
SuperQ [Mon, 21 Apr 2025 09:47:41 +0000 (11:47 +0200)]
ceph-mixin: Update monitoring mixin

Update `rate()` queries to be more accurate. The use of `irate()` leads
to misleading graphs because it only looks at the last 2 samples over
the selected time range step interval. Also use `$__rate_interval`
consistently in order to scale over short and long time ranges.
* Replace `irate()` with `rate()` to avoid sample bias.
* Use `$__rate_interval` consistently.
* Update auto_count/min to provide higher detail graphs.

Fixes: https://tracker.ceph.com/issues/72343
Signed-off-by: SuperQ <superq@gmail.com>
Signed-off-by: Ankush Behl <cloudbehl@gmail.com>
(cherry picked from commit 9c4cd107a41292aba547fdd4a3721cbc554a6b6a)

8 days agoMerge pull request #65619 from aaSharma14/wip-73166-tentacle
Aashish Sharma [Fri, 26 Sep 2025 10:36:52 +0000 (16:06 +0530)]
Merge pull request #65619 from aaSharma14/wip-73166-tentacle

tentacle: mgr/dashboard: fix zone update API forcing STANDARD storage class

Reviewed-by: Afreen Misbah <afreen@ibm.com>
8 days agoMerge pull request #65622 from aaSharma14/wip-73168-tentacle
Aashish Sharma [Fri, 26 Sep 2025 10:33:24 +0000 (16:03 +0530)]
Merge pull request #65622 from aaSharma14/wip-73168-tentacle

tentacle: mgr/dashboard: Allow FQDN in Connect Cluster form -> Cluster API URL

Reviewed-by: Afreen Misbah <afreen@ibm.com>
9 days agoMerge pull request #65675 from rhcs-dashboard/wip-73234-tentacle
afreen23 [Fri, 26 Sep 2025 06:36:53 +0000 (12:06 +0530)]
Merge pull request #65675 from rhcs-dashboard/wip-73234-tentacle

tentacle: mgr/dashboard: FS - Attach Command showing undefined for MountData

Reviewed-by: Afreen Misbah <afreen@ibm.com>
9 days agoMerge pull request #65601 from kotreshhr/wip-73131-tentacle
Jos Collin [Fri, 26 Sep 2025 06:02:40 +0000 (11:32 +0530)]
Merge pull request #65601 from kotreshhr/wip-73131-tentacle

tentacle: cephfs-journal-tool: Journal trimming issue

Reviewed-by: Jos Collin <jcollin@redhat.com>
9 days agoMerge pull request #65670 from aaSharma14/wip-73229-tentacle
Aashish Sharma [Fri, 26 Sep 2025 05:11:33 +0000 (10:41 +0530)]
Merge pull request #65670 from aaSharma14/wip-73229-tentacle

tentacle: monitoring: fix "In" OSDs in Cluster-Advanced grafana panel. Also change units from decbytes to bytes wherever used in the panel

Reviewed-by: Afreen Misbah <afreen@ibm.com>
9 days agomgr/dashboard: fix None force param handling in ns add_host so it won't raise exceptions
Tomer Haskalovitch [Sun, 14 Sep 2025 06:10:22 +0000 (09:10 +0300)]
mgr/dashboard: fix None force param handling in ns add_host so it won't raise exceptions

Signed-off-by: Tomer Haskalovitch <tomer.haska@ibm.com>
(cherry picked from commit 38f62c4a379bfbe0bc57ebee4fc1aa6661c75dca)

9 days agomgr/dashboard: --no-group-append default value to False, aligned with old CLI
Tomer Haskalovitch [Fri, 12 Sep 2025 00:58:44 +0000 (03:58 +0300)]
mgr/dashboard: --no-group-append default value to False, aligned with old CLI

Signed-off-by: Tomer Haskalovitch <tomer.haska@ibm.com>
(cherry picked from commit 46b74faa763e7894e62558f14f786c870d740b29)

9 days agomgr/dashboard: add nsid param to ns add command
Tomer Haskalovitch [Wed, 10 Sep 2025 09:02:03 +0000 (12:02 +0300)]
mgr/dashboard: add nsid param to ns add command

Signed-off-by: Tomer Haskalovitch <tomer.haska@ibm.com>
(cherry picked from commit ee37978e7341ad3c29f986f316d89cb76b26efb5)

9 days agomgr/dashboard: FS - Attach Command showing undefined for MountData
Dnyaneshwari [Fri, 19 Sep 2025 11:01:43 +0000 (16:31 +0530)]
mgr/dashboard: FS - Attach Command showing undefined for MountData

Fixes: https://tracker.ceph.com/issues/73137
Signed-off-by: Dnyaneshwari Talwekar <dtalwekar@redhat.com>
(cherry picked from commit 50ef955207e7095578dc09820885a3dd0d6b3d52)

9 days agomgr/dashboard: fix data mismatch in Advance section in Tiering.
Dnyaneshwari [Thu, 21 Aug 2025 06:05:03 +0000 (11:35 +0530)]
mgr/dashboard: fix data mismatch in Advance section in Tiering.

Fixes: https://tracker.ceph.com/issues/72641
Signed-off-by: Dnyaneshwari Talwekar <dtalweka@redhat.com>
(cherry picked from commit 300e5058a5e80f7d679fc1d1c0a646f03c5dcb1b)

9 days agoMerge pull request #65653 from rhcs-dashboard/wip-73224-tentacle
afreen23 [Thu, 25 Sep 2025 07:41:49 +0000 (13:11 +0530)]
Merge pull request #65653 from rhcs-dashboard/wip-73224-tentacle

tentacle: mgr/dashboard: Tiering form - Placement Target in Advanced Section

Reviewed-by: Afreen Misbah <afreen@ibm.com>
9 days agoMerge pull request #65650 from rhcs-dashboard/wip-73199-tentacle
afreen23 [Thu, 25 Sep 2025 07:36:41 +0000 (13:06 +0530)]
Merge pull request #65650 from rhcs-dashboard/wip-73199-tentacle

tentacle: mgr/dashboard:[NFS] add Subvolume Groups and Subvolumes in "Edit NFS Export form"

Reviewed-by: Afreen Misbah <afreen@ibm.com>
10 days agomonitoring/ceph_mixin: fix Cluster - Advanced OSD grafana panel
Aashish Sharma [Wed, 17 Sep 2025 06:58:16 +0000 (12:28 +0530)]
monitoring/ceph_mixin: fix Cluster - Advanced OSD grafana panel

1. Fixes the promql expr used to calculate "In" OSDs in
   ceph-cluster-advanced.json.
2. Fixes the color coding for the single state panels used in the OSDs
   grafana panel like "In", "Out" etc

Fixes: https://tracker.ceph.com/issues/72810
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 53a6856d603e0fe4ff31f76e19263a80359a9f1d)

10 days agoMerge pull request #65658 from rhcs-dashboard/wip-73239-tentacle
afreen23 [Thu, 25 Sep 2025 04:52:33 +0000 (10:22 +0530)]
Merge pull request #65658 from rhcs-dashboard/wip-73239-tentacle

tentacle: mgr/dashboard: add multiple ceph users deletion

Reviewed-by: Nizamudeen A <nia@redhat.com>
10 days agoMerge pull request #65657 from rhcs-dashboard/wip-73213-tentacle
afreen23 [Thu, 25 Sep 2025 04:52:11 +0000 (10:22 +0530)]
Merge pull request #65657 from rhcs-dashboard/wip-73213-tentacle

tentacle: mgr/dashboard: fix smb button and table column

Reviewed-by: Nizamudeen A <nia@redhat.com>
10 days agomgr/dashboard: add multiple ceph users deletion
Pedro Gonzalez Gomez [Wed, 27 Aug 2025 14:41:41 +0000 (16:41 +0200)]
mgr/dashboard: add multiple ceph users deletion

Fixes: https://tracker.ceph.com/issues/72752
Signed-off-by: Pedro Gonzalez Gomez <pegonzal@ibm.com>
(cherry picked from commit 14ca16576d16de49c07725fb4b0feb112c8a1a43)

10 days agomgr/dashboard: fix SMB custom DNS button and linked_to_cluster col
Pedro Gonzalez Gomez [Tue, 26 Aug 2025 12:05:45 +0000 (14:05 +0200)]
mgr/dashboard: fix SMB custom DNS button and linked_to_cluster col

- The button 'add custom DNS' in smb cluster form should only appear for active directory where is relevant.

- The linked_to_cluster column data is missing from smb standalone

- Some refactoring to remove magic strings and use FormControl for publicAddrs field

Fixes: https://tracker.ceph.com/issues/73096
Signed-off-by: Pedro Gonzalez Gomez <pegonzal@ibm.com>
(cherry picked from commit 9ce943e21558d17b3a214840b39bb57eab0cbd85)

10 days agoMerge pull request #65292 from adk3798/tentacle-cephadm-build-el10rpm
Adam King [Wed, 24 Sep 2025 15:20:03 +0000 (11:20 -0400)]
Merge pull request #65292 from adk3798/tentacle-cephadm-build-el10rpm

tentacle: cephadm: fix building rpm-sourced cephadm zippapp on el10

Reviewed-by: John Mulligan <jmulligan@redhat.com>
10 days agoMerge pull request #65654 from rhcs-dashboard/wip-73223-tentacle
afreen23 [Wed, 24 Sep 2025 13:00:19 +0000 (18:30 +0530)]
Merge pull request #65654 from rhcs-dashboard/wip-73223-tentacle

tentacle: Form retains old data when switching from edit to create

Reviewed-by: Afreen Misbah <afreen@ibm.com>
10 days agoMerge pull request #65254 from joscollin/wip-72505-tentacle
Venky Shankar [Wed, 24 Sep 2025 08:50:28 +0000 (14:20 +0530)]
Merge pull request #65254 from joscollin/wip-72505-tentacle

tentacle: client: fix unmount hang after lookups

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
10 days agoMerge pull request #65362 from ifed01/wip-ifed-fix-snapdiff-fragment-tent
Venky Shankar [Wed, 24 Sep 2025 08:49:18 +0000 (14:19 +0530)]
Merge pull request #65362 from ifed01/wip-ifed-fix-snapdiff-fragment-tent

tentacle: mds: fix snapdiff result fragmentation

Reviewed-by: Venky Shankar <vshankar@redhat.com>
10 days agomgr/dashboard: Form retains old data when switching from edit to create mode
pujashahu [Thu, 11 Sep 2025 13:40:27 +0000 (19:10 +0530)]
mgr/dashboard: Form retains old data when switching from edit to create mode

Fixes: https://tracker.ceph.com/issues/72989
Signed-off-by: pujashahu <pshahu@redhat.com>
(cherry picked from commit 918dff407d912b3a5ac068e0050467396668163c)

10 days agomgr/dashboard: Tiering form - Placement Target in Advanced Section
Dnyaneshwari [Wed, 20 Aug 2025 04:46:21 +0000 (10:16 +0530)]
mgr/dashboard: Tiering form - Placement Target in Advanced Section

Fixes: https://tracker.ceph.com/issues/72545
Signed-off-by: Dnyaneshwari Talwekar <dtalweka@redhat.com>
(cherry picked from commit aa3bb8adddac675ea3c6dcd0bd4e9143743124b8)

10 days agomgr/dashboard:[NFS] add Subvolume Groups and Subvolumes in "Edit NFS Export" form
Dnyaneshwari [Wed, 6 Aug 2025 09:42:43 +0000 (15:12 +0530)]
mgr/dashboard:[NFS] add Subvolume Groups and Subvolumes in "Edit NFS Export" form

Fixes: https://tracker.ceph.com/issues/72435
Signed-off-by: Dnyaneshwari Talwekar <dtalweka@redhat.com>
(cherry picked from commit aa7a586c5690e26bfc99687a601f8c0c3c221aa7)

11 days agoMerge pull request #65261 from joscollin/wip-72149-tentacle
Jos Collin [Wed, 24 Sep 2025 01:08:50 +0000 (06:38 +0530)]
Merge pull request #65261 from joscollin/wip-72149-tentacle

tentacle: mon/FSCommands: avoid unreachable code triggering compiler warning

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
11 days agomon/FSCommands: avoid unreachable code triggering compiler warning
Patrick Donnelly [Sat, 31 May 2025 01:44:32 +0000 (21:44 -0400)]
mon/FSCommands: avoid unreachable code triggering compiler warning

    In file included from /home/pdonnell/ceph/src/mds/FSMap.h:31,
                     from /home/pdonnell/ceph/src/mon/PaxosFSMap.h:20,
                     from /home/pdonnell/ceph/src/mon/MDSMonitor.h:26,
                     from /home/pdonnell/ceph/src/mon/FSCommands.cc:17:
    /home/pdonnell/ceph/src/mds/MDSMap.h: In member function ‘int FileSystemCommandHandler::set_val(Monitor*, FSMap&, MonOpRequestRef, const cmdmap_t&, std::ostream&, FileSystemCommandHandler::fs_or_fscid, std::string, std::string)’:
    /home/pdonnell/ceph/src/mds/MDSMap.h:223:40: warning: ‘fsp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
      223 |   bool test_flag(int f) const { return flags & f; }
          |                                        ^~~~~
    /home/pdonnell/ceph/src/mon/FSCommands.cc:417:21: note: ‘fsp’ was declared here
      417 |   const Filesystem* fsp;
          |                     ^~~

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 24acaaf766336466021caadba1facd2901775435)

11 days agoMerge pull request #65635 from adk3798/tentacle-cephadm-pin-cheroot
Adam King [Tue, 23 Sep 2025 12:56:14 +0000 (08:56 -0400)]
Merge pull request #65635 from adk3798/tentacle-cephadm-pin-cheroot

tentacle: pybind/mgr: pin cheroot version in requirements-required.txt

Reviewed-by: John Mulligan <jmulligan@redhat.com>
12 days agoMerge pull request #65628 from phlogistonjohn/jjm-t-65514
David Galloway [Tue, 23 Sep 2025 02:27:32 +0000 (19:27 -0700)]
Merge pull request #65628 from phlogistonjohn/jjm-t-65514

tentacle: build-with-container: add argument groups to organize options

12 days agopybind/mgr: pin cheroot version in requirements-required.txt
Adam King [Mon, 22 Sep 2025 21:05:07 +0000 (17:05 -0400)]
pybind/mgr: pin cheroot version in requirements-required.txt

With python 3.10 (didn't seem to happen with python 3.12) the
pybind/mgr/cephadm/tests/test_node_proxy.py test times out.
This appears to be related to a new release of the cheroot
package and a github issues describing the same problem
we're seeing has been opened by another user
https://github.com/cherrypy/cheroot/issues/769

It is worth noting that the workaround described in that
issue does also work for us. If you add

```
import cheroot
cheroot.server.HTTPServer._serve_unservicable = lambda: None
```

after the existing imports in test_node_proxy.py the
test hanging issue also disappears. Also worth noting the
particular pin of

cheroot~=10.0

was chosen as it matches the existing pin being used
in pybind/mgr/dashboard/constraints.txt

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 6231955b5d00ae6b3630ee94e85b2449092ef0fe)

12 days agoMerge pull request #65485 from tobias-urdin/tentacle-rgw-admin-bucket-pagination
Yuri Weinstein [Mon, 22 Sep 2025 18:16:14 +0000 (11:16 -0700)]
Merge pull request #65485 from tobias-urdin/tentacle-rgw-admin-bucket-pagination

tentacle: rgw/admin: Add max-entries and marker to bucket list

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
12 days agoMerge pull request #65488 from BBoozmen/wip-72970-tentacle
Yuri Weinstein [Mon, 22 Sep 2025 18:15:25 +0000 (11:15 -0700)]
Merge pull request #65488 from BBoozmen/wip-72970-tentacle

tentacle: RGW: multi object delete op; skip olh update for all deletes but the last one

Reviewed-by: Casey Bodley <cbodley@redhat.com>
12 days agoMerge pull request #65594 from adk3798/tentacle-cephadm-nvmeof-stray
Adam King [Mon, 22 Sep 2025 15:08:56 +0000 (11:08 -0400)]
Merge pull request #65594 from adk3798/tentacle-cephadm-nvmeof-stray

tentacle: mgr/cephadm: don't mark nvmeof daemons without pool and group in name as stray

Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>
12 days agobuild-with-container: add argument groups to organize options
John Mulligan [Fri, 12 Sep 2025 17:52:25 +0000 (13:52 -0400)]
build-with-container: add argument groups to organize options

Use the argparse add_argument_group feature to organize the mass of
arguments into more sensible categories. Hopefully, someone reading
over the `--help` output can now more easily see options that
are useful rather than being overwhelmed by a wall of text.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 71a1be4dd0aea004da56c2f518ee70a281a3f7d3)

12 days agoMerge pull request #65617 from spuiuk/tentacle-doc-provider
Anthony D'Atri [Mon, 22 Sep 2025 13:27:24 +0000 (09:27 -0400)]
Merge pull request #65617 from spuiuk/tentacle-doc-provider

tentacle: doc/mgr/smb: document the 'provider' option for smb share

12 days agoMerge pull request #65259 from joscollin/wip-72284-tentacle
Jos Collin [Mon, 22 Sep 2025 13:03:50 +0000 (18:33 +0530)]
Merge pull request #65259 from joscollin/wip-72284-tentacle

tentacle:  mds: wrong snap check for directory with parent snaps

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
12 days agomgr/dashboard: Allow FQDN in Connect Cluster form -> Cluster API URL
Aashish Sharma [Wed, 17 Sep 2025 11:53:01 +0000 (17:23 +0530)]
mgr/dashboard: Allow FQDN in Connect Cluster form -> Cluster API URL

Allow the user to enter URL with FQDN in the Cluster API URL filed in
the Connect Cluster form inside Multi-cluster tab.

Fixes: https://tracker.ceph.com/issues/73077
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 8b088d41d751a9c98d152ddb63a9a01111b6340b)

12 days agomgr/dashboard: fix zone update API forcing STANDARD storage class
Aashish Sharma [Thu, 18 Sep 2025 10:59:52 +0000 (16:29 +0530)]
mgr/dashboard: fix zone update API forcing STANDARD storage class

The zone update REST API (`edit_zone`) always attempted to configure a
placement target for the `STANDARD` storage class, even when the request
was intended for a different storage class name.
This caused failures in deployments where `STANDARD` is not defined.

Changes:
Club add placement target and add storage class methods into one single
add_placement_targets_storage_class_zone method which takes the storage
class as a param as well alongside the rest of the placement params.

Fixes: https://tracker.ceph.com/issues/73105
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 135f3adb4973be493925839e946e7a5fc75e7d5c)

12 days agoMerge pull request #64885 from vshankar/wip-72391-tentacle
Dhairya Parmar [Mon, 22 Sep 2025 11:19:07 +0000 (16:49 +0530)]
Merge pull request #64885 from vshankar/wip-72391-tentacle

tentacle: mds/MDSDaemon: unlock `mds_lock` while shutting down Beacon and others

12 days agoMerge pull request #64888 from vshankar/wip-72285-tentacle
Dhairya Parmar [Mon, 22 Sep 2025 10:59:30 +0000 (16:29 +0530)]
Merge pull request #64888 from vshankar/wip-72285-tentacle

tentacle: qa/suites/upgrade: add "Replacing daemon mds" to ignorelist

12 days agoMerge pull request #64953 from batrick/wip-72514-tentacle
Dhairya Parmar [Mon, 22 Sep 2025 10:54:35 +0000 (16:24 +0530)]
Merge pull request #64953 from batrick/wip-72514-tentacle

tentacle: mds: skip charmap handler check for MDS requests

12 days agoMerge pull request #65132 from chrisphoffman/wip-72644-tentacle
Dhairya Parmar [Mon, 22 Sep 2025 10:46:21 +0000 (16:16 +0530)]
Merge pull request #65132 from chrisphoffman/wip-72644-tentacle

tentacle: client: use path supplied in statfs

12 days agoMerge pull request #65163 from joscollin/wip-72153-tentacle
Dhairya Parmar [Mon, 22 Sep 2025 10:19:41 +0000 (15:49 +0530)]
Merge pull request #65163 from joscollin/wip-72153-tentacle

tentacle: mds: dump export_ephemeral_random_pin as double

12 days agoMerge pull request #64650 from rishabh-d-dave/wip-72201-tentacle
Dhairya Parmar [Mon, 22 Sep 2025 10:15:00 +0000 (15:45 +0530)]
Merge pull request #64650 from rishabh-d-dave/wip-72201-tentacle

tentacle: mgr/vol: keep and show clone source info

12 days agoMerge pull request #65346 from joscollin/wip-72803-tentacle
Jos Collin [Mon, 22 Sep 2025 09:09:43 +0000 (14:39 +0530)]
Merge pull request #65346 from joscollin/wip-72803-tentacle

tentacle: mds: Fix readdir when osd is full.

Reviewed-by: Kotresh HR <khiremat@redhat.com>
12 days agodoc/mgr/smb: document the 'provider' option for smb share
Sachin Prabhu [Thu, 1 May 2025 10:59:54 +0000 (11:59 +0100)]
doc/mgr/smb: document the 'provider' option for smb share

Signed-off-by: Sachin Prabhu <sp@spui.uk>
(cherry picked from commit 742659b18a21cd8ccc36a0f0a53bea265a13a541)
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
12 days agoMerge pull request #65564 from xhernandez/wip-73075-tentacle
Jos Collin [Mon, 22 Sep 2025 08:20:20 +0000 (13:50 +0530)]
Merge pull request #65564 from xhernandez/wip-73075-tentacle

tentacle: Add normalization and casesensitive options to the subvolume group creation command

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
13 days agoMerge pull request #65262 from joscollin/wip-71831-tentacle
Jos Collin [Mon, 22 Sep 2025 06:17:16 +0000 (11:47 +0530)]
Merge pull request #65262 from joscollin/wip-71831-tentacle

tentacle: mgr/volumes: Keep mon caps if auth key has remaining mds/osd caps

Reviewed-by: Kotresh HR <khiremat@redhat.com>
2 weeks agoMerge pull request #65540 from NitzanMordhai/wip-72996-tentacle
SrinivasaBharathKanta [Sat, 20 Sep 2025 12:27:53 +0000 (17:57 +0530)]
Merge pull request #65540 from NitzanMordhai/wip-72996-tentacle

tentacle: qa/workunits/rados: remove cache tier test

2 weeks agoMerge pull request #65369 from Naveenaidu/wip-72819-tentacle
SrinivasaBharathKanta [Sat, 20 Sep 2025 12:27:34 +0000 (17:57 +0530)]
Merge pull request #65369 from Naveenaidu/wip-72819-tentacle

tentacle: qa/suites/rados/thrash-old-clients: Add OSD warnings to ignore list

2 weeks agoMerge pull request #65213 from ifed01/wip-ifed-discard-threads-better-lifecycle-tent
Yuri Weinstein [Fri, 19 Sep 2025 22:43:31 +0000 (15:43 -0700)]
Merge pull request #65213 from ifed01/wip-ifed-discard-threads-better-lifecycle-tent

tentacle: blk/kernel: improve DiscardThread life cycle.

Reviewed-by: YiteGu <yitegu0@gmail.com>
2 weeks agocephfs-journal-tool:: Don't reset the journal trim position
Kotresh HR [Thu, 18 Sep 2025 06:41:11 +0000 (06:41 +0000)]
cephfs-journal-tool:: Don't reset the journal trim position

If the fs had to go through journal recovery and reset,
the cephfs-journal-tool resets the journal trim position
because of which the old unused journal objects just stay
forever in the metadata pool. The patch fixes the issue.
Now, the old stale journal objects are trimmed during the
regular trimming cycle helping to recover space in the
metadata pool.

Fixes: https://tracker.ceph.com/issues/69708
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 4f9a926a467c03a410e5ec5a81031e72f2193f25)

2 weeks agoqa: Validate cephfs-journal-tool reset trim
Kotresh HR [Thu, 18 Sep 2025 07:40:01 +0000 (07:40 +0000)]
qa: Validate cephfs-journal-tool reset trim

Validates that the cephfs-journal-tool reset
doesn't reset the trim position so that the
journal trim takes care of trimming the older
unused journal objects helping to recover the
space in metadata pool.

Fixes: https://tracker.ceph.com/issues/69708
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 7e85556318ae2707730ed0d5f2ef9a1d817ec6e0)

2 weeks agoMerge pull request #65560 from rhcs-dashboard/wip-73063-tentacle
Nizamudeen A [Fri, 19 Sep 2025 03:15:38 +0000 (08:45 +0530)]
Merge pull request #65560 from rhcs-dashboard/wip-73063-tentacle

tentacle: mgr/dashboard: fix missing schedule interval in rbd API

2 weeks agomgr/cephadm: don't mark nvmeof daemons without pool and group in name as stray
Adam King [Wed, 7 May 2025 20:02:56 +0000 (16:02 -0400)]
mgr/cephadm: don't mark nvmeof daemons without pool and group in name as stray

Cephadm's naming of these daemons always includes the pool and
group name associated with the nvmeof service. Nvmeof recently
has started to register with the cluster using names that
don't include that, resulting in warnings likes

```
[WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
    stray daemon nvmeof.vm-01.hwwhfc on host vm-01 not managed by cephadm
```

where cephadm knew that nvmeof daemon as

```
[ceph: root@vm-00 /]# ceph orch ps --daemon-type nvmeof
NAME                            HOST   PORTS                   STATUS   REFRESHED  AGE  MEM USE  MEM LIM  VERSION    IMAGE ID
nvmeof.foo.group1.vm-01.hwwhfc  vm-01  *:5500,4420,8009,10008  stopped     5m ago  25m        -        -  <unknown>  <unknown>
```

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 695680876eb8af0891e3776888b6361dc8728c86)

2 weeks agoMerge pull request #65570 from shraddhaag/wip-shraddhaag-availability-default-tentacle
Yuri Weinstein [Thu, 18 Sep 2025 19:24:12 +0000 (12:24 -0700)]
Merge pull request #65570 from shraddhaag/wip-shraddhaag-availability-default-tentacle

tentacle: options/mon: disable availability tracking by default

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2 weeks agoMerge pull request #65562 from rzarzynski/ec_fixpack3_pr-tentacle
Yuri Weinstein [Thu, 18 Sep 2025 19:21:25 +0000 (12:21 -0700)]
Merge pull request #65562 from rzarzynski/ec_fixpack3_pr-tentacle

tentacle: EC fixpack 3 (with depedencies)

Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>
Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>
2 weeks agorgw: Record the `service_unique_id`, if present, in the SrviceMap
Adam C. Emerson [Mon, 8 Sep 2025 18:19:20 +0000 (14:19 -0400)]
rgw: Record the `service_unique_id`, if present, in the SrviceMap

For consistency and ease associating the two.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 3a94a7b2ed02d20b2bc839b283e60cf4778f69e4)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
2 weeks agocommon: Allow PerfCounters to return a provided service ID
Adam C. Emerson [Fri, 5 Sep 2025 15:31:40 +0000 (11:31 -0400)]
common: Allow PerfCounters to return a provided service ID

Dashboard has asked for a unique identifier that can be associated
with services. This commit provides a component of that
functionality. Enforcing uniqueness is beyond the scope of this PR and
is the responsibility of cluster setup and orchestration. The scope of
uniqueness is a matter of policy and up to the design of cluster setup
and orchestration software.

We provide the `--service_unique_id` argument that can be passed on
the command line when executing a Ceph service that uses
`global_init`. If non-empty, a `service_unique_id` section is added to
the PerfCounters dump for that service. This section has a single
entry whose name is set to the argument of `service_unique_id` and
whose value is arbitrary. If unspecified or empty, no
`service_unique_id` section is added.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 6dc322421f7a3758251fe29e3f35934231358011)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
2 weeks agoqa: Add nvmeof:upgrade suite
Vallari Agrawal [Wed, 17 Sep 2025 09:07:40 +0000 (14:37 +0530)]
qa: Add nvmeof:upgrade suite

Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>
(cherry picked from commit 6d1552dc6def87c1e2d85d3e2f687fd19d85b25e)

2 weeks agooptions/mon: disable availability tracking by default
Shraddha Agrawal [Tue, 16 Sep 2025 13:52:27 +0000 (19:22 +0530)]
options/mon: disable availability tracking by default

Signed-off-by: Shraddha Agrawal <shraddhaag@ibm.com>
(cherry picked from commit ef7effaa33bd6b936d7433e668d36f80ed7bee65)

2 weeks agoMerge pull request #65520 from shraddhaag/wip-73013-tentacle
Shraddha Agrawal [Wed, 17 Sep 2025 20:15:51 +0000 (01:45 +0530)]
Merge pull request #65520 from shraddhaag/wip-73013-tentacle

tentacle: mon: add config option to change availability score update interval

2 weeks agoMerge pull request #65218 from cbodley/wip-72714-tentacle
Yuri Weinstein [Wed, 17 Sep 2025 20:11:39 +0000 (13:11 -0700)]
Merge pull request #65218 from cbodley/wip-72714-tentacle

tentacle: rgw/s3: remove 'aws-chunked' from Content-Encoding response

Reviewed-by: Adam Emerson <aemerson@redhat.com>
2 weeks agoMerge pull request #65543 from leonidc/wip-73048-tentacle
Yuri Weinstein [Wed, 17 Sep 2025 20:09:17 +0000 (13:09 -0700)]
Merge pull request #65543 from leonidc/wip-73048-tentacle

tentacle: nvmeofgw:

Reviewed-by: Aviv Caro <Aviv.Caro@ibm.com>
2 weeks agoMerge pull request #65542 from leonidc/wip-73045-tentacle
Yuri Weinstein [Wed, 17 Sep 2025 20:08:39 +0000 (13:08 -0700)]
Merge pull request #65542 from leonidc/wip-73045-tentacle

tentacle: nvmeofgw: cleanup pending map upon monitor restart

Reviewed-by: Aviv Caro <Aviv.Caro@ibm.com>
2 weeks agoMerge pull request #64610 from avanthakkar/wip-72209-tentacle
Adam King [Wed, 17 Sep 2025 13:18:28 +0000 (09:18 -0400)]
Merge pull request #64610 from avanthakkar/wip-72209-tentacle

tentacle: mgr/prometheus: add smb_metadata metric

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Afreen Misbah <afreen@ibm.com>
2 weeks agoMerge pull request #64895 from adk3798/tentacle=cephadm-limit-list-server-calls
Adam King [Wed, 17 Sep 2025 13:05:24 +0000 (09:05 -0400)]
Merge pull request #64895 from adk3798/tentacle=cephadm-limit-list-server-calls

tentacle:  mgr/cephadm: limit calls to list_servers

Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>
Reviewed-by: Kushal Deb <Kushal.Deb@ibm.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2 weeks agoosd: Reduce the amount of status invalidations when rolling shards forwards during...
Jon Bailey [Wed, 20 Aug 2025 10:11:09 +0000 (11:11 +0100)]
osd: Reduce the amount of status invalidations when rolling shards forwards during peering

Currently stats invalidations happen during peering when rolling forward shards.
We can reduce this so we only invalidate the stats when we don't have any other shards at the version we want to roll the stats forwards to.
In the cases where we have a shard with the stats at the correct version, we use those stats instead of invalidating.
If we do not have any shards with the correct version of stats, we do the invalidate as before.

Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
(cherry picked from commit b5cad2694569b7f0eef173f87a7eecb2ddd6b27e)

2 weeks agoosd: Optimized EC incorrectly rolled backwards write
Bill Scales [Wed, 27 Aug 2025 13:44:08 +0000 (14:44 +0100)]
osd: Optimized EC incorrectly rolled backwards write

A bug in choose_acting in this scenario:

* Current primary shard has been absent so has missed the latest few writes
* All the recent writes are partial writes that have not updated shard X
* All the recent writes have completed

The authorative shard is chosen from the set of primary-capable shards
that have the highest last epoch started, these have all got log entries
for the recent writes.

The get log shard is chosen from the set of shards that have the highest
last epoch started, this chooses shard X because its furthest behind

The primary shard last update is not less than get log shard last
update so this if statement decides that it has a good enough log:

if ((repeat_getlog != nullptr) &&
    get_log_shard != all_info.end() &&
    (info.last_update < get_log_shard->second.last_update) &&
    pool.info.is_nonprimary_shard(get_log_shard->first.shard)) {

We then proceed through peering using the primary log and the
log from shard X. Neither have details about the recent writes
which are then incorrectly rolled back.

The if statement should be looking at last_update for the
authorative shard rather than the get_log_shard, the code
would then realize that it needs to get the log from the
authorative shard first and then have a second pass
where it gets the log from the get log shard.

Peering would then have information about the partial writes
(obtained from the authorative shards log) and could correctly
roll these writes forward by deducing that the get_log_shard
didn't have these log entries because they were partial writes.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit ac4e0926bbac4ee4d8e33110b8a434495d730770)

2 weeks agoosd: Clear zero_for_decode for shards where read failed on recovery
Alex Ainscow [Tue, 12 Aug 2025 16:12:45 +0000 (17:12 +0100)]
osd: Clear zero_for_decode for shards where read failed on recovery

Not clearing this can lead to a failed decode, which panics, rather than
a recovery or IO failure.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 6365803275b1b6a142200cc2db9735d48c86ae03)

2 weeks agoosd: Reduce buffer-printing debug strings to debug level 30
Alex Ainscow [Fri, 8 Aug 2025 15:20:32 +0000 (16:20 +0100)]
osd: Reduce buffer-printing debug strings to debug level 30

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
# Conflicts:
# src/osd/ECBackend.cc
(cherry picked from commit b4ab3b1dcef59a19c67bb3b9e3f90dfa09c4f30b)

2 weeks agoosd: Fix segfault in EC debug string
Alex Ainscow [Fri, 8 Aug 2025 09:25:53 +0000 (10:25 +0100)]
osd: Fix segfault in EC debug string

The old debug_string implementation was potentially reading up to 3
bytes off the end of an array. It was also doing lots of unnecessary
bufferlist reconstructs. This refactor of this function fixes both
issues.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit da3ccdf4d03e40b747f8876449199102e53e00ce)

2 weeks agoosd: Optimized EC backfill interval has wrong versions
Bill Scales [Fri, 8 Aug 2025 08:58:14 +0000 (09:58 +0100)]
osd: Optimized EC backfill interval has wrong versions

Bug in the optimized EC code creating the backfill
interval on the primary. It is creating a map with
the object version for each backfilling shard. When
there are multiple backfill targets the code was
overwriting oi.version with the version
for a shard that has had partial writes which
can result in the object not being backfilled.

Can manifest as a data integirty issue, scrub
error or snapshot corruption.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit acca514f9a3d0995b7329f4577f6881ba093a429)

2 weeks agoosd: Optimized EC choose_acting needs to use best primary shard
Bill Scales [Mon, 4 Aug 2025 15:24:41 +0000 (16:24 +0100)]
osd: Optimized EC choose_acting needs to use best primary shard

There have been a couple of corner case bugs with choose_acting
with optimized EC pools in the scenario where a new primary
with no existing log is choosen and find_best_info selects
a non-primary shard as the authorative shard.

Non-primary shards don't have a full log so in this scenario
we need to get the log from a shard that does have a complete
log first (so our log is ahead or eqivalent to authorative shard)
and then repeat the get log for the authorative shard.

Problems arise if we make different decisions about the acting
set and backfill/recovery based on these two different shards.
In one bug we osicillated between two different primaries
because one primary used one shard to making peering decisions
and the other primary used the other shard, resulting in
looping flip/flop changes to the acting_set.

In another bug we used one shard to decide that we could do
async recovery but then tried to get the log from another
shard and asserted because we didn't have enough history in
the log to do recovery and should have choosen to do a backfill.

This change makes optimized EC pools always choose the
best !non_primary shard when making decisions about peering
(irrespective of whether the primary has a full log or not).
The best overall shard is now only used for get log when
deciding how far to rollback the log.

It also sets repeat_getlog to false if peering fails because
the PG is incomplete to avoid looping forever trying to get
the log.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit f3f45c2ef3e3dd7c7f556b286be21bd5a7620ef7)

2 weeks agoosd: Do not sent PDWs if read count > k
Alex Ainscow [Fri, 1 Aug 2025 14:09:58 +0000 (15:09 +0100)]
osd: Do not sent PDWs if read count > k

The main point of PDW (as currently implemented) is to reduce the amount
of reading performed by the primary when preparing for a read-modify-write (RMW).

It was making the assumption that if any recovery was required by a
conventional RMW, then a PDW is always better. This was an incorrect assumption
as a conventional RMW performs at most K reads for any plugin which
supports PDW. As such, we tweak this logic to perform a conventional RMW
if the PDW is going to read k or more shards.

This should improve performance in some minor areas.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit cffd10f3cc82e0aef29209e6e823b92bdb0291ce)

2 weeks agoosd: Fix decode for some extent cache reads.
Alex Ainscow [Wed, 18 Jun 2025 19:46:49 +0000 (20:46 +0100)]
osd: Fix decode for some extent cache reads.

The extent cache in EC can cause the backend to perform some surprising reads. Some
of the patterns were discovered in test that caused the decode to attempt to
decode more data than was anticipated during the read planning, leading to an
assert. This simple fix reduces the scope of the decode to the minimum.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 2ab45a22397112916bbcdb82adb85f99599e03c0)

2 weeks agoosd: Optimized EC calculate_maxles_and_minlua needs to use ...
Bill Scales [Fri, 1 Aug 2025 10:48:18 +0000 (11:48 +0100)]
osd: Optimized EC calculate_maxles_and_minlua needs to use ...
exclude_nonprimary_shards

When an optimized EC pool is searching for the best shard that
isn't a non-primary shard then the calculation for maxles and
minlua needs to exclude nonprimary-shards

This bug was seen in a test run where activating a PG was
interrupted by a new epoch and only a couple of non-primary
shards became active and updated les. In the next epoch
a new primary (without log) failed to find a shard that
wasn't non-primary with the latest les. The les of
non-primary shards should be ignored when looking for
an appropriate shard to get the full log from.

This is safe because an epoch cannot start I/O without
at least K shards that have updated les, and there
are always K-1 non-primary shards. If I/O has started
then we will find the latest les even if we skip
non-primary shards. If I/O has not started then the
latest les ignoring non-primary shards is the
last epoch in which I/O was started and has a good
enough log+missing list.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 72d55eec85afa4c00fac8dc18a1fb49751e61985)

2 weeks agoosd: Optimized EC choose_async_recovery_ec must use auth_shard
Bill Scales [Fri, 1 Aug 2025 09:39:16 +0000 (10:39 +0100)]
osd: Optimized EC choose_async_recovery_ec must use auth_shard

Optimized EC pools modify how GetLog and choose_acting work,
if the auth_shard is a non-primary shard and the (new) primary
is behind the auth_shard then we cannot just get the log from
the non-primary shard because it will be missing entries for
partial writes. Instead we need to get the log from a shard
that has the full log first and then repeat GetLog to get
the log from the auth_shard.

choose_acting was modifying auth_shard in the case where
we need to get the log from another shard first. This is
wrong - the remainder of the logic in choose_acting and
in particular choose_async_recovery_ec needs to use the
auth_shard to calculate what the acting set will be.
Using a different shard occasional can cause a
different acting set to be selected (because of
thresholds about the number of log entries behind
a shard needs to be to perform async recovery) and
this can lead to two shards flip/flopping with
different opinions about what the acting set should be.

Fix is to separate out which shard will be returned
to GetLog from the auth_shard which will be used
for acting set calculations.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 3c2161ee7350a05e0d81a23ce24cd0712dfef5fb)

2 weeks agoosd: Optimized EC don't try to trim past crt
Bill Scales [Fri, 1 Aug 2025 09:22:47 +0000 (10:22 +0100)]
osd: Optimized EC don't try to trim past crt

If there is an exceptionally long sequence of partial writes
that did not update a shard that is followed by a full write
then it is possible that the log trim point is ahead of the
previous write to the shard (and hence crt). We cannot trim
beyond crt. In this scenario its fine to limit the trim to crt
because the shard doesn't have any of the log entries for the
partial writes so there is nothing more to trim.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 645cdf9f61e79764eca019f58a4d9c6b51768c81)

2 weeks agoosd: Optimized EC missing call to apply_pwlc after updating pwlc
Bill Scales [Fri, 1 Aug 2025 08:56:23 +0000 (09:56 +0100)]
osd: Optimized EC missing call to apply_pwlc after updating pwlc

update_peer_info was updating pwlc with a newer version received
from another shard, but failed to update the peer_info's to
reflect the new pwlc by calling apply_pwlc.

Scenario was primary receiving an update from shard X which had
newer information about shard Y. The code was calling apply_pwlc
for shard X but not for shard Y.

The fix simplifies the logic in update_peer_info - if we are
the primary update all peer_info's that have pwlc. If we
are a non-primary and there is pwlc then update info.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit d19f3a3bcbb848e530e4d31cbfe195973fa9a144)

2 weeks agoosd: Optimized EC don't apply pwlc for divergent writes
Bill Scales [Wed, 30 Jul 2025 11:44:10 +0000 (12:44 +0100)]
osd: Optimized EC don't apply pwlc for divergent writes

Split pwlc epoch into a separate variable so that we
can use epoch and version number when comparing if
last_update is within a pwlc range. This ensures that
pwlc is not applied to a shard that has a divergent
write, but still tracks the most recent update of pwlc.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit d634f824f229677aa6df7dded57352f7a59f3597)

2 weeks agoosd: Optimized EC present_shards no longer needed
Bill Scales [Wed, 30 Jul 2025 11:41:34 +0000 (12:41 +0100)]
osd: Optimized EC present_shards no longer needed

present_shards is no longer needed in the PG log entry, this has been
replaced with code in proc_master_log that calculates which shards were
in the last epoch started and are still present.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 880a17e39626d99a0b6cc8259523daa83c72802c)

2 weeks agoosd: Optimized EC proc_master_log fix roll-forward logic when shard is absent
Bill Scales [Mon, 28 Jul 2025 08:26:36 +0000 (09:26 +0100)]
osd: Optimized EC proc_master_log fix roll-forward logic when shard is absent

Fix bug in optimized EC code where proc_master_log incorrectly did not
roll forward a write if one of the written shards is missing in the current
epoch and there is a stray version of that shard that did not receive the
write.

As long as the currently present shards that participated in les and were
updated by a write have the update then the write should be rolled-forward.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit e0e8117769a8b30b2856f940ab9fc00ad1e04f63)

2 weeks agoosd: Refactor find_best_info and choose_acting
Bill Scales [Mon, 28 Jul 2025 08:21:54 +0000 (09:21 +0100)]
osd: Refactor find_best_info and choose_acting

Refactor find_best_info to have separate function to calculate
maxles and minlua. The refactor makes history_les_bound
optional, tidy up the choose_acting interface removing this
where it is not used.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit f1826fdbf136dc7c96756f0fb8a047c9d9dda82a)

2 weeks agoosd: EC Optimizations proc_master_log boundary case bug fixes
Bill Scales [Thu, 17 Jul 2025 18:17:27 +0000 (19:17 +0100)]
osd: EC Optimizations proc_master_log boundary case bug fixes

Fix a couple of bugs in proc_master_log for optimized EC
pools dealing with boundary conditions such as an empty
log and merging two logs that diverge from the very first
entry.

Refactor the code to handle the boundary conditions and
neaten up the code.

Predicate the code block with if (pool.info.allows_ecoptimizations())
to make it clear this code path is only for optimized EC pools.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 1b44fd9991f5f46b969911440363563ddfad94ad)

2 weeks agoosd: Invalidate stats during peering if we are rolling a shard forwards.
Jon Bailey [Fri, 25 Jul 2025 13:16:35 +0000 (14:16 +0100)]
osd: Invalidate stats during peering if we are rolling a shard forwards.

This change will mean we always recalculate stats upon rolling stats forwards. This prevent the situation where we end up with incorrect statistics due to where we always take the stats of the oldest shard during peering; causing outdated pg stats being applied for cases where the oldest shards are shards that don't see partial writes where num_bytes has changed on other places after that point on that shard.

Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
(cherry picked from commit b178ce476f4a5b2bb0743e36d78f3a6e23ad5506)