]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log
ceph.git
11 days agocrimson/os/seastore: fix laddr_t formatter and its use 69476/head
Ronen Friedman [Mon, 15 Jun 2026 11:24:18 +0000 (11:24 +0000)]
crimson/os/seastore: fix laddr_t formatter and its use

'laddr_t' existing formatter did not support a ':x' format specifier
(actually - the output was always hexadecomal).
Here we remove the ':x', but also refactor the custom formatter to
avoid using the streambuf mechanism.
Note - SEASTORE_LADDR_USE_BOOST_U128 is no longer supported by the formatter.

Fixes: https://tracker.ceph.com/issues/77399
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2 weeks agoMerge pull request #69005 from Jayaprakash-ibm/wip-jaya-mon-features-test-fix
SrinivasaBharathKanta [Sat, 13 Jun 2026 22:54:52 +0000 (04:24 +0530)]
Merge pull request #69005 from Jayaprakash-ibm/wip-jaya-mon-features-test-fix

qa: fix TEST_mon_features feature checks in mon/misc.sh

2 weeks agoMerge pull request #68650 from leonidc/fix_force_exit_gw
SrinivasaBharathKanta [Sat, 13 Jun 2026 22:54:33 +0000 (04:24 +0530)]
Merge pull request #68650 from leonidc/fix_force_exit_gw

nvmeofgw:fix forcing unavalable gw exit by sending

2 weeks agoMerge pull request #68435 from stzuraski898/wip-sz-76048
SrinivasaBharathKanta [Sat, 13 Jun 2026 22:53:49 +0000 (04:23 +0530)]
Merge pull request #68435 from stzuraski898/wip-sz-76048

mgr: ActivePyModules does not set Description in labeled get_perf_schema_python

2 weeks agoMerge pull request #68018 from kotreshhr/mirror-asok-metrics
Kotresh HR [Sat, 13 Jun 2026 18:29:00 +0000 (23:59 +0530)]
Merge pull request #68018 from kotreshhr/mirror-asok-metrics

tools/cephfs_mirror: Add metrics

2 weeks agoMerge pull request #69421 from ShwetaBhosale1/fix_issue_77340_remove_-P_from_shebang_...
Kefu Chai [Sat, 13 Jun 2026 01:13:48 +0000 (09:13 +0800)]
Merge pull request #69421 from ShwetaBhosale1/fix_issue_77340_remove_-P_from_shebang_flags

ceph.spec.in: disable -P in python shebang for cephadm binary

Reviewed-by: Redouane Kachach <rkachach@redhat.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Kefu Chai <k.chai@proxmox.com>
2 weeks agoqa: Add mirror metrics testcases 68018/head
Kotresh HR [Mon, 25 May 2026 18:44:03 +0000 (00:14 +0530)]
qa: Add mirror metrics testcases

Add testcases for newly introduced mirror
metrics and validate it via 'fs mirror peer status'
asok interface.

Fixes: https://tracker.ceph.com/issues/73453
Signed-off-by: Kotresh HR <khiremat@redhat.com>
2 weeks agodoc: Update the mirroring doc with new metrics fields
Kotresh HR [Mon, 25 May 2026 18:22:29 +0000 (23:52 +0530)]
doc: Update the mirroring doc with new metrics fields

Update the mirroring documentation and also the
release notes with new metrics introduced and it's
availability via 'fs mirror peer status' asok
interface.

Fixes: https://tracker.ceph.com/issues/73453
Signed-off-by: Kotresh HR <khiremat@redhat.com>
2 weeks agoqa: Fix the mirroring tests with new nested peer_status output
Kotresh HR [Mon, 25 May 2026 17:34:57 +0000 (23:04 +0530)]
qa: Fix the mirroring tests with new nested peer_status output

Fixes: https://tracker.ceph.com/issues/73453
Signed-off-by: Kotresh HR <khiremat@redhat.com>
2 weeks agotools/cephfs_mirror: Nest peer_status metrics by dir path and peer uuid
Kotresh HR [Fri, 5 Jun 2026 14:23:14 +0000 (19:53 +0530)]
tools/cephfs_mirror: Nest peer_status metrics by dir path and peer uuid

Restructure peer_status output so mirrored directory paths can be
shared by multiple peers without key collisions. Metrics are grouped
as metrics/<dir_path>/peer/<peer_uuid>/ instead of flat dir keys.

Sample output:
--------------
1. When two dirs are syncing.
{
    "metrics": {
        "/parent/d0": {
            "peer": {
                "8a85ab25-70f9-48e9-b82d-56324e75209b": {
                    "state": "syncing",
                    "current_syncing_snap": {
                        "id": 2,
                        "name": "d0_snap0",
                        "sync-mode": "full",
                        "avg_read_throughput_bytes": "9.01 MiB/s",
                        "avg_write_throughput_bytes": "26.74 MiB/s",
                        "crawl": {
                            "state": "completed",
                            "duration": "2s"
                        },
                        "datasync_queue_wait": {
                            "state": "completed",
                            "duration": "0s"
                        },
                        "bytes": {
                            "sync_bytes": "60.83 MiB",
                            "total_bytes": "149.94 MiB",
                            "sync_percent": "40.57%"
                        },
                        "files": {
                            "sync_files": 2028,
                            "total_files": 5000,
                            "sync_percent": "40.56%"
                        },
                        "eta": "10s"
                    },
                    "snaps_synced": 0,
                    "snaps_deleted": 0,
                    "snaps_renamed": 0
                }
            }
        },
        "/parent/d1": {
            "peer": {
                "8a85ab25-70f9-48e9-b82d-56324e75209b": {
                    "state": "syncing",
                    "current_syncing_snap": {
                        "id": 3,
                        "name": "d1_snap0",
                        "sync-mode": "full",
                        "avg_read_throughput_bytes": "6.80 MiB/s",
                        "avg_write_throughput_bytes": "20.04 MiB/s",
                        "crawl": {
                            "state": "in-progress",
                            "duration": "2s"
                        },
                        "datasync_queue_wait": {
                            "state": "completed",
                            "duration": "1s"
                        },
                        "bytes": {
                            "sync_bytes": "4.12 MiB",
                            "total_bytes": "124.98 MiB",
                            "sync_percent": "3.30%"
                        },
                        "files": {
                            "sync_files": 125,
                            "total_files": 4189,
                            "sync_percent": "2.98%"
                        },
                        "eta": "18s"
                    },
                    "snaps_synced": 0,
                    "snaps_deleted": 0,
                    "snaps_renamed": 0
                }
            }
        }
    }
}
---------
2. When two directories are synced

------------------------------------------
{
    "metrics": {
        "/parent/d0": {
            "peer": {
                "8a85ab25-70f9-48e9-b82d-56324e75209b": {
                    "state": "idle",
                    "last_synced_snap": {
                        "id": 2,
                        "name": "d0_snap0",
                        "crawl_duration": "2s",
                        "datasync_queue_wait_duration": "0s",
                        "sync_duration": "30s",
                        "sync_time_stamp": "422538.254127s",
                        "sync_bytes": "149.94 MiB",
                        "sync_files": 5000
                    },
                    "snaps_synced": 1,
                    "snaps_deleted": 0,
                    "snaps_renamed": 0
                }
            }
        },
        "/parent/d1": {
            "peer": {
                "8a85ab25-70f9-48e9-b82d-56324e75209b": {
                    "state": "idle",
                    "last_synced_snap": {
                        "id": 3,
                        "name": "d1_snap0",
                        "crawl_duration": "2s",
                        "datasync_queue_wait_duration": "1s",
                        "sync_duration": "33s",
                        "sync_time_stamp": "422546.205798s",
                        "sync_bytes": "149.94 MiB",
                        "sync_files": 5000
                    },
                    "snaps_synced": 1,
                    "snaps_deleted": 0,
                    "snaps_renamed": 0
                }
            }
        }
    }
}

Fixes: https://tracker.ceph.com/issues/73453
Signed-off-by: Kotresh HR <khiremat@redhat.com>
2 weeks agotools/cephfs_mirror: Add datasync_queue_wait_duration metric
Kotresh HR [Fri, 8 May 2026 00:22:59 +0000 (05:52 +0530)]
tools/cephfs_mirror: Add datasync_queue_wait_duration metric

Add the metric which measures the time spent by the snapshot
in the data queue waiting for the datasync threads.

Sample output:
When still 'waiting' in queue
{
    "/d1": {
        "state": "syncing",
        "current_syncing_snap": {
            "id": 18,
            "name": "d1_snap5",
            "sync-mode": "delta",
            "avg_read_throughput_bytes": "0.00 B/s",
            "avg_write_throughput_bytes": "0.00 B/s",
            "crawl": {
                "state": "in-progress",
                "duration": "13s"
            },
            "datasync_queue_wait": {
                "state": "waiting",
                "duration": "12s"
            },
            "bytes": {
                "sync_bytes": "0.00 B",
                "total_bytes": "110.99 MiB",
                "sync_percent": "0.00%"
            },
            "files": {
                "sync_files": 0,
                "total_files": 3719,
                "sync_percent": "0.00%"
            },
            "eta": "calculating..."
        },
        "last_synced_snap": {
            "id": 15,
            "name": "d1_snap4"
        },
        "snaps_synced": 0,
        "snaps_deleted": 0,
        "snaps_renamed": 0
    },
}
---------------
After 'complete'
{
    "/d1": {
        "state": "syncing",
        "current_syncing_snap": {
            "id": 18,
            "name": "d1_snap5",
            "sync-mode": "delta",
            "avg_read_throughput_bytes": "11.66 MiB/s",
            "avg_write_throughput_bytes": "34.55 MiB/s",
            "crawl": {
                "state": "completed",
                "duration": "17s"
            },
            "datasync_queue_wait": {
                "state": "completed",
                "duration": "19s"
            },
            "bytes": {
                "sync_bytes": "149.94 MiB",
                "total_bytes": "149.94 MiB",
                "sync_percent": "100.00%"
            },
            "files": {
                "sync_files": 5000,
                "total_files": 5000,
                "sync_percent": "100.00%"
            },
            "eta": "0s"
        },
        "last_synced_snap": {
            "id": 15,
            "name": "d1_snap4"
        },
        "snaps_synced": 0,
        "snaps_deleted": 0,
        "snaps_renamed": 0
    }
}
-----
Also stored in last_sync_snap section
{
    "/d1": {
        "state": "idle",
        "last_synced_snap": {
            "id": 18,
            "name": "d1_snap5",
            "crawl_duration": "17s",
            "datasync_queue_wait_duration": "19s",
            "sync_duration": "44s",
            "sync_time_stamp": "8172.009480s",
            "sync_bytes": "149.94 MiB",
            "sync_files": 5000
        },
        "snaps_synced": 1,
        "snaps_deleted": 0,
        "snaps_renamed": 0
    }
}

Fixes: https://tracker.ceph.com/issues/73453
Signed-off-by: Kotresh HR <khiremat@redhat.com>
2 weeks agotools/cephfs_mirror: Add eta metrics
Kotresh HR [Sat, 28 Mar 2026 11:23:33 +0000 (16:53 +0530)]
tools/cephfs_mirror: Add eta metrics

Add estimate time of completion for the current
syncing snapshot. The calculation takes into
account the average read/write throughput from
the start of snapshot sync and not the current
read/write throughput. So the ETA is affected
accordingly.

Sample output:
-------------
{
    "/d0": {
        "state": "syncing",
        "current_syncing_snap": {
            "id": 2,
            "name": "d0_snap0",
            "sync-mode": "full",
            "avg_read_throughput_bytes": "3.28 MiB/s",
            "avg_write_throughput_bytes": "71.03 MiB/s",
            "crawl": {
                "state": "completed",
                "duration": "1s"
            },
            "bytes": {
                "sync_bytes": "2.31 MiB",
                "total_bytes": "149.94 MiB",
                "sync_percent": "1.54%"
            },
            "files": {
                "sync_files": 67,
                "total_files": 5000,
                "sync_percent": "1.34%"
            },
            "eta": "calculating..."
        },
        "snaps_synced": 0,
        "snaps_deleted": 0,
        "snaps_renamed": 0
    }
}
------------------------------------------
{
    "/d0": {
        "state": "syncing",
        "current_syncing_snap": {
            "id": 2,
            "name": "d0_snap0",
            "sync-mode": "full",
            "avg_read_throughput_bytes": "12.17 MiB/s",
            "avg_write_throughput_bytes": "66.46 MiB/s",
            "crawl": {
                "state": "completed",
                "duration": "1s"
            },
            "bytes": {
                "sync_bytes": "26.64 MiB",
                "total_bytes": "149.94 MiB",
                "sync_percent": "17.77%"
            },
            "files": {
                "sync_files": 892,
                "total_files": 5000,
                "sync_percent": "17.84%"
            },
            "eta": "10s"
        },
        "snaps_synced": 0,
        "snaps_deleted": 0,
        "snaps_renamed": 0
    }
}

Fixes: https://tracker.ceph.com/issues/73453
Signed-off-by: Kotresh HR <khiremat@redhat.com>
2 weeks agotools/cephfs_mirror: Add read/write throughput
Kotresh HR [Sat, 28 Mar 2026 10:57:02 +0000 (16:27 +0530)]
tools/cephfs_mirror: Add read/write throughput

The read throughput added measures the bytes
read per second from the source ceph filesystem.
Similarly, the write throughput added measures
the bytes written per second to the remote ceph
filesystem. It's derived from the time spent
in preadv and pwritev calls.

Sample output:
-------------
{
    "/d0": {
        "state": "syncing",
        "current_syncing_snap": {
            "id": 2,
            "name": "d0_snap0",
            "sync-mode": "full",
            "avg_read_throughput_bytes": "12.69 MiB/s",
            "avg_write_throughput_bytes": "54.49 MiB/s",
            "crawl": {
                "state": "completed",
                "duration": "1s"
            },
            "bytes": {
                "sync_bytes": "149.94 MiB",
                "total_bytes": "149.94 MiB",
                "sync_percent": "100.00%"
            },
            "files": {
                "sync_files": 5000,
                "total_files": 5000,
                "sync_percent": "100.00%"
            }
        },
        "snaps_synced": 0,
        "snaps_deleted": 0,
        "snaps_renamed": 0
    }
}
-------------

Fixes: https://tracker.ceph.com/issues/73453
Signed-off-by: Kotresh HR <khiremat@redhat.com>
2 weeks agotools/cephfs_mirror: Add crawl-state and sync-mode metric
Kotresh HR [Sat, 28 Mar 2026 10:12:43 +0000 (15:42 +0530)]
tools/cephfs_mirror: Add crawl-state and sync-mode metric

The 'crawl' and 'sync-mode' metric is added.

sync-mode: full/delta,
"crawl": {
           "state": "completed",
           "duration": "37s"
       }

sync-mode:
---------
The 'sync-mode: full/delta' is added to peer status.
The 'delta' means, blockdiff along with snapdiff is
being used to sync the files where as 'full' means
full directory is crawled and each file is synced
entirely.

crawl:
-----
The state can be in-progress/completed. This
identifies whether the crawler thread is done
queuing the files for data sync threads.

The time taken for the duration is also shown.
If the crawl is in-progress, the duration
would show the time taken till then from the
start of the crawl. If the crawl state is
completed, then duration indicates total
time taken for the crawl.

The crawl duration is shown in "d h m s" format.
The existing 'sync_duration' in last_synced_snap
is also formatted

The values are as below. When crawl state is
completed, the 'total_files' metric doesn't
grow anymore.

crawl_duration:
--------------
The crawl_duration of last snapshot is saved in last_synced_snap
section as well.

Sample outputs:
---------------
{
    "/d0": {
        "state": "syncing",
        "current_syncing_snap": {
            "id": 2,
            "name": "d0_snap0",
            "sync-mode": "full",
            "crawl": {
                "state": "in-progress",
                "duration": "21s"
            },
            "bytes": {
                "sync_bytes": "149.25 MiB",
                "total_bytes": "176.47 MiB",
                "sync_percent": "84.57%"
            },
            "files": {
                "sync_files": 4931,
                "total_files": 5845,
                "sync_percent": "84.36%"
            }
        },
        "snaps_synced": 0,
        "snaps_deleted": 0,
        "snaps_renamed": 0
    }
}
------------------------------------------
{
    "/d0": {
        "state": "syncing",
        "current_syncing_snap": {
            "id": 2,
            "name": "d0_snap0",
            "sync-mode": "full",
            "crawl": {
                "state": "completed",
                "duration": "37s"
            },
            "bytes": {
                "sync_bytes": "891.39 MiB",
                "total_bytes": "901.52 MiB",
                "sync_percent": "98.88%"
            },
            "files": {
                "sync_files": 29656,
                "total_files": 30000,
                "sync_percent": "98.85%"
            }
        },
        "snaps_synced": 0,
        "snaps_deleted": 0,
        "snaps_renamed": 0
    }
}
---------
  {
        "/d0": {
            "state": "syncing",
            "current_syncing_snap": {
                "id": 3,
                "name": "d0_snap1",
                "sync-mode": "delta",
                "crawl": {
                    "state": "completed",
                    "duration": "15s"
                },
                "bytes": {
                    "sync_bytes": "120.20 MiB",
                    "total_bytes": "149.94 MiB",
                    "sync_percent": "80.16%"
                },
                "files": {
                    "sync_files": 4032,
                    "total_files": 5000,
                    "sync_percent": "80.64%"
                }
            },
            "last_synced_snap": {
                "id": 2,
                "name": "d0_snap0",
                "crawl_duration": "17s",
                "sync_duration": 45,
                "sync_time_stamp": "5642.805770s",
                "sync_bytes": "300.85 MiB",
                "sync_files": 10000
            },
            "snaps_synced": 1,
            "snaps_deleted": 0,
            "snaps_renamed": 0
        }
    }
-------------
{
    "/d0": {
        "state": "idle",
        "last_synced_snap": {
            "id": 2,
            "name": "d0_snap0",
            "crawl_duration": "17s",
            "sync_duration": "2m 38s",
            "sync_time_stamp": "9259.225009s",
            "sync_bytes": "901.52 MiB",
            "sync_files": 30000
        },
        "snaps_synced": 1,
        "snaps_deleted": 0,
        "snaps_renamed": 0
    }
}

Fixes: https://tracker.ceph.com/issues/73453
Signed-off-by: Kotresh HR <khiremat@redhat.com>
2 weeks agotools/cephfs_mirror: Add inprogress bytes and files metric
Kotresh HR [Mon, 16 Feb 2026 10:59:31 +0000 (16:29 +0530)]
tools/cephfs_mirror: Add inprogress bytes and files metric

Add following mirroring progress metrics to current_syncing_snap
as below

bytes:
  sync_bytes - bytes synced till now
  total_bytes - total bytes to be synced
  sync_percent - Percentage of bytes synced till now
files:
  total_files - Total files to be synced
  sync_files - files synced till now
  sync_percent - Percentage of files synced till now

sync_files and sync_bytes are also stored in last_synced_snap section
after the snapshot is synced.

The bytes is formatted as below.

Sample output:
--------
{
    "/d0": {
        "state": "syncing",
        "current_syncing_snap": {
            "id": 3,
            "name": "d0_snap1",
            "bytes": {
                "sync_bytes": "120.20 MiB",
                "total_bytes": "149.94 MiB",
                "sync_percent": "80.16%"
            },
            "files": {
                "sync_files": 4032,
                "total_files": 5000,
                "sync_percent": "80.64%"
            }
        },
        "last_synced_snap": {
            "id": 2,
            "name": "d0_snap0",
            "sync_duration": 45,
            "sync_time_stamp": "5642.805770s",
            "sync_bytes": "300.85 MiB",
            "sync_files": 10000
        },
        "snaps_synced": 1,
        "snaps_deleted": 0,
        "snaps_renamed": 0
    }
}

Fixes: https://tracker.ceph.com/issues/73453
Signed-off-by: Kotresh HR <khiremat@redhat.com>
2 weeks agoMerge pull request #69300 from rkachach/fix_issue_mgmt_gw_qa
Redouane Kachach [Fri, 12 Jun 2026 13:16:14 +0000 (15:16 +0200)]
Merge pull request #69300 from rkachach/fix_issue_mgmt_gw_qa

qa: extend the ignore-list for the mgmt-gateway test suite

Reviewed-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
2 weeks agoMerge pull request #64618 from sseshasa/wip-fix-mclock-slow-ops-during-recovery
Sridhar Seshasayee [Fri, 12 Jun 2026 13:07:50 +0000 (18:37 +0530)]
Merge pull request #64618 from sseshasa/wip-fix-mclock-slow-ops-during-recovery

osd/scheduler: Classify subOp reads according to op priority for mClock

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
2 weeks agoMerge pull request #69284 from JonBailey1993/remove_incorrect_unit_tests
Jon Bailey [Fri, 12 Jun 2026 12:49:01 +0000 (13:49 +0100)]
Merge pull request #69284 from JonBailey1993/remove_incorrect_unit_tests

test: Remove invalid unit test

Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>
2 weeks agoMerge pull request #68859 from ashjosh1git/ceph-tracker-74872-cephadm-debug-log
Redouane Kachach [Fri, 12 Jun 2026 11:04:13 +0000 (13:04 +0200)]
Merge pull request #68859 from ashjosh1git/ceph-tracker-74872-cephadm-debug-log

mgr/cephadm: Control cephadm files logging based on a mgr flag

Reviewed-by: Redouane Kachach <rkachach@ibm.com>
Reviewed-by: Adam King <adking@redhat.com>
2 weeks agoqa: extend the ignore-list for the mgmt-gateway test suite 69300/head
Redouane Kachach [Fri, 5 Jun 2026 08:14:28 +0000 (10:14 +0200)]
qa: extend the ignore-list for the mgmt-gateway test suite

Let's add CEPHADM_AGENT_DOWN and CEPHADM_STRAY_DAEMON errors

Fixes: https://tracker.ceph.com/issues/77131
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
2 weeks agoMerge pull request #69247 from rkachach/fix_issue_76979
Redouane Kachach [Fri, 12 Jun 2026 10:51:21 +0000 (12:51 +0200)]
Merge pull request #69247 from rkachach/fix_issue_76979

mgr/cephadm: Don't skip OSDs with non-empty osdspec_affinity

Reviewed-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
Reviewed-by: Laura Flores <lflores@redhat.com>
2 weeks agoMerge pull request #69299 from rkachach/fix_issue_77130
Redouane Kachach [Fri, 12 Jun 2026 10:50:24 +0000 (12:50 +0200)]
Merge pull request #69299 from rkachach/fix_issue_77130

qa/cephadm: fix test_repos.sh for jammy nodes

Reviewed-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
Reviewed-by: Adam King <adking@redhat.com>
2 weeks agoMerge pull request #67466 from rkachach/fix_new_secrets_mgr_module_v0
Redouane Kachach [Fri, 12 Jun 2026 10:48:32 +0000 (12:48 +0200)]
Merge pull request #67466 from rkachach/fix_new_secrets_mgr_module_v0

Adding new secrets mgr module support

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
2 weeks agoMerge pull request #67384 from ashjosh1git/ceph-tracker-74986-validate-pro-name
Redouane Kachach [Fri, 12 Jun 2026 10:36:59 +0000 (12:36 +0200)]
Merge pull request #67384 from ashjosh1git/ceph-tracker-74986-validate-pro-name

python-common: Improve profile name string validation

Reviewed-by: Redouane Kachach <rkachach@ibm.com>
2 weeks agoMerge pull request #69080 from kginonredhat/issue-75365-Grafana-container-fails-to...
Redouane Kachach [Fri, 12 Jun 2026 10:34:42 +0000 (12:34 +0200)]
Merge pull request #69080 from kginonredhat/issue-75365-Grafana-container-fails-to-start-reject-localhost

cephadm: set Grafana http_addr to 0.0.0.0 when unset

Reviewed-by: Redouane Kachach <rkachach@ibm.com>
Reviewed-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
2 weeks agoMerge pull request #68158 from rhcs-dashboard/fix-75826-main
Redouane Kachach [Fri, 12 Jun 2026 10:30:14 +0000 (12:30 +0200)]
Merge pull request #68158 from rhcs-dashboard/fix-75826-main

mgr/cephadm: set default prometheus template in config-key store unless overridden by the user

Reviewed-by: Redouane Kachach <rkachach@ibm.com>
2 weeks agoMerge pull request #68428 from kginonredhat/wip-74058-force-delete-data
Redouane Kachach [Fri, 12 Jun 2026 10:28:50 +0000 (12:28 +0200)]
Merge pull request #68428 from kginonredhat/wip-74058-force-delete-data

mgr/cephadm: plumb force_delete_data through daemon/service removal

Reviewed-by: Redouane Kachach <rkachach@ibm.com>
2 weeks agoMerge pull request #66885 from amathuria/wip-amat-crimson-merge-support
Aishwarya Mathuria [Fri, 12 Jun 2026 09:48:37 +0000 (15:18 +0530)]
Merge pull request #66885 from amathuria/wip-amat-crimson-merge-support

Crimson PG merging support

2 weeks agodoc/cephadm: Document cephadm_binary_logging_level option 68859/head
Ashwin M. Joshi [Fri, 12 Jun 2026 08:38:18 +0000 (14:08 +0530)]
doc/cephadm: Document cephadm_binary_logging_level option

Add documentation for the new cephadm binary logging level configuration

Fixes: https://tracker.ceph.com/issues/74872
Signed-off-by: Ashwin M. Joshi <ashjosh1@in.ibm.com>
2 weeks agomgr/cephadm: Control cephadm.log messages based on a new mgr logging level flag
Ashwin M. Joshi [Tue, 10 Feb 2026 06:29:49 +0000 (11:59 +0530)]
mgr/cephadm: Control cephadm.log messages based on a new mgr logging level flag

  Introduces a new 'cephadm_binary_logging_level' config option to control
  the verbosity of cephadm logging to persistent destinations (cephadm.log, syslog).

  - Adds --logging-level CLI flag (info, debug, error, warning)
  - Adds mgr/cephadm/cephadm_binary_logging_level config option
  - Applies logging level to file and syslog handlers
  - Console handlers maintain their defaults for terminal UX

Fixes: https://tracker.ceph.com/issues/74872
Signed-off-by: Ashwin M. Joshi <ashjosh1@in.ibm.com>
2 weeks agoMerge pull request #69329 from tchaikov/wip-crimson-cleanup
Kefu Chai [Fri, 12 Jun 2026 07:43:57 +0000 (15:43 +0800)]
Merge pull request #69329 from tchaikov/wip-crimson-cleanup

crimson/osd: coroutinize OSD::start and remove OSD::startup_time

Reviewed-by: Aishwarya Mathuria <amathuri@redhat.com>
2 weeks agoMerge pull request #69412 from rhcs-dashboard/fix-77263-main
Aashish Sharma [Fri, 12 Jun 2026 07:32:10 +0000 (13:02 +0530)]
Merge pull request #69412 from rhcs-dashboard/fix-77263-main

mgr/dashboard: fix zone creation in rgw service creation form

Reviewed-by: Abhishek Desai <Abhishek.Desai1@ibm.com>
2 weeks agodoc/rados/configuration: Remove wpq recommendation warning for EC clusters 64618/head
Sridhar Seshasayee [Thu, 4 Jun 2026 06:58:35 +0000 (12:28 +0530)]
doc/rados/configuration: Remove wpq recommendation warning for EC clusters

Remove the warning that recommends using wpq scheduler as a fallback for EC
clusters. This issue is addressed by considering EC recovery reads as
background, assigning an accurate cost for those reads and tuning the QoS
parameters associated with best-effort class of operations.

Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>
2 weeks agomclock_common: adjust mClock profile parameters to prevent backfill starvation
Sridhar Seshasayee [Mon, 25 May 2026 12:14:54 +0000 (17:44 +0530)]
mclock_common: adjust mClock profile parameters to prevent backfill starvation

Adjust the 'background_best_effort' queue parameters across the
three standard mClock profiles (high_client_ops, balanced, and
high_recovery_ops) to ensure best effort ops are not starved.

Previously, the 'background_best_effort' queue carried a default allocation
of 0% (MIN) reservation and a weight of 1 under these profiles. When
concurrent client traffic is dense, the zero-reservation for example completely
starves backfill sub-ops (MSG_OSD_EC_READ) on pools with
'allow_ec_optimizations' set to false. This starvation forces the Primary OSD
to hold internal BlueStore transactions and PG object locks for extended
windows, causing severe client median (50th) latency inflation.

To prevent background starvation and resolve the effects of the primary lock
retention, the profile configurations are tuned as follows:

The following profile changes forces low-cost sub-ops to clear out of peer
queues rapidly to drop  primary locks, which helps improve the client
completion latency and tail latency (95th, 99th and 99.5th) percentile.

1. high_client_ops profile:
   - Grant 'background_best_effort' a safe 5% minimum reservation.
   - Scale the queue weight to 4.

2. balanced profile:
   - Grant 'background_best_effort' a 5% minimum reservation.
   - Set the queue weight to 2.

3. high_recovery_ops profile:
   - Grant 'background_best_effort' a 5% minimum reservation.
   - Set the queue weight to 2.

4. Modify the mClock config reference documentation to reflect the tuning
   changes to the best-effort QoS parameters across the profiles.

Note on Proportional Scaling Compatibility:
Configuring these changes shifts total reservations to 105% (e.g., 50%
client + 50% recovery + 5% best-effort under the Balanced profile). Under
heavy concurrent saturation, mClock's internal controls resolves this
gracefully via proportional down-scaling, preserving the underlying
device bandwidth limits for different classes of clients. For example instead
of the client being allocated 50% bandwidth, a slightly lower reservation is
allocated while shifting the remaining bandwidth to the best-effort queue.
This minor scaling shift is virtually unnoticeable to the client application,
but it prevents the internal queue deadlocks.

Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>
2 weeks agomclock_common, mClockScheduler: Add perf counters for scheduler ops
Sridhar Seshasayee [Tue, 21 Apr 2026 12:30:50 +0000 (18:00 +0530)]
mclock_common, mClockScheduler: Add perf counters for scheduler ops

Add perf counters to show the status pertaining to the number of ops,
dynamic queue lengths, queue latency and bytes read for the following
ops handled in the high queues and in the scheduler queues:
 - peering
 - client
 - ec reads/writes
 - ec recovery reads

Additional counters can be added in the future based on the requirement.

Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>
2 weeks agosrc/messages, osd: Calculate and set cost for subOpReads for mClock scheduler
Sridhar Seshasayee [Mon, 28 Jul 2025 11:09:34 +0000 (16:39 +0530)]
src/messages, osd: Calculate and set cost for subOpReads for mClock scheduler

Previously, sub-op reads returned a hardcoded cost of 0, bypassing
mClock's background bandwidth and tag calculation mechanisms. This
allowed backfill operations to proceed un-metered, occasionally causing
backend resource contention and driving up client tail latencies.

Cost is calculated based on whether the complete chunk/shard or a subchunk
needs to be read. The possible cases are:
1. Read the complete chunk aligned length:
   - Cost is set to the length of the chunk aligned extent size.
2. Fragmented reads:
   - Consider the subchunk length and count to calculate the cost.
   - compute_cost evaluates the exact layout of fragmented shard bytes on
     disk by summing up the active subchunk allocations exactly once
     (`fragmented_shard_bytes += k.second * subchunk_size`).
   - Linear Extent Scaling: Scale the baseline footprint cleanly by
     multiplying it against the true count of read extents (`tl.size()`),
     achieving a highly efficient O(N) time complexity.

This linear cost model is compatible with pools running with
'allow_ec_optimizations' set to true. Under the FastEC optimized
pipeline, most operations are unified and bypass fragment slicing,
meaning requests will primarily match the Case 1 chunk-aligned path.
In Case 2 where applicable, the O(N) loop ensures that cost will
scale proportionally according to the layout.

It is important to note that the amount of data to read was set to an upper
bound defined by osd_recovery_max_chunk (8 MiB) and was rounded up to the
stripe width. The reason for setting a higher than actual upper bound is that
there may be cases where the object doesn't have the xattrs yet to determine
its size. Therefore, the amount to read was ultimatly set to ~(8 MiB / k)
where k is the number of data shards. This can cause mClock to prolong
the recovery times as items stay longer in the queue. To address this, the
amount to read is set to the remaining length of the object to recover
if the object size is known. Otherwise, the amount to read is set to the
recovery chunk size as before. Therefore, in some cases, only the first
recovery read could be costly if the object context is not known.

The MOSDECSubOpRead class introduces the following:
 - cost member. This necessitates an increment to the HEAD_VERSION and
   appropriate handling within the encode and decode methods.
 - compute_cost() that is called when creating the message by
   ECCommonL::ReadPipeline::do_read_op(). This calls into ECSubRead::cost()
   that performs the actual calculations to set the cost based on the cases
   mentioned above.
 - The same sequence applies to the EC optimized path in
   ECCommon::ReadPipeline::do_read_op().

Fixes: https://tracker.ceph.com/issues/71655
Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>
2 weeks agoosd/scheduler: Classify EC subOp reads according to op priority for mClock
Sridhar Seshasayee [Tue, 22 Jul 2025 08:39:16 +0000 (14:09 +0530)]
osd/scheduler: Classify EC subOp reads according to op priority for mClock

The change brings MSG_OSD_EC_READ into the fold of mClock scheduler. This
improves the scheduling of client and other classes of operation as they
are no longer unnecessarily preempted by the 'immediate' queue.
EC SubOps are now handled as follows:

 - EC SubOp reads generated during recovery will either go into the
   'background_recovery' or 'background_best_effort' class based on
   the recovery priority set for the op. EC SubOp reads generated due
   to client will continue to be classified as 'immediate'.

 - EC SubOp writes generated as a result of client operations will
   continue to be classified as 'immediate'.

 - EC SubOp replies are considered high priority and therefore
   continue to be classed as 'immediate'.

Fixes: https://tracker.ceph.com/issues/71655
Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>
2 weeks agoosd/scheduler/mClockScheduler: Fix line alignments
Sridhar Seshasayee [Tue, 22 Jul 2025 08:23:07 +0000 (13:53 +0530)]
osd/scheduler/mClockScheduler: Fix line alignments

Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>
2 weeks agoosd/scheduler/mClockScheduler: Log the size of high priority queues.
Sridhar Seshasayee [Tue, 22 Jul 2025 08:08:16 +0000 (13:38 +0530)]
osd/scheduler/mClockScheduler: Log the size of high priority queues.

Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>
2 weeks agosrc/common/mclock_common: Fix output formatting of SchedulerClass
Sridhar Seshasayee [Tue, 22 Jul 2025 07:43:55 +0000 (13:13 +0530)]
src/common/mclock_common: Fix output formatting of SchedulerClass

The earlier output formatting was resulting in the value and string
representation of the SchedulerClass being clubbed together for
e.g., "3client"

The formatting is now fixed to log SchedulerClass as "3 (client)".

Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>
2 weeks agoceph.spec.in: disable -P in python shebang for cephadm binary 69421/head
Shweta Bhosale [Thu, 11 Jun 2026 05:39:50 +0000 (11:09 +0530)]
ceph.spec.in: disable -P in python shebang for cephadm binary

Fixes: https://tracker.ceph.com/issues/77340
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
2 weeks agoMerge pull request #69136 from syedali237/rhcs-dashboard/hosts
Afreen Misbah [Thu, 11 Jun 2026 19:08:30 +0000 (00:38 +0530)]
Merge pull request #69136 from syedali237/rhcs-dashboard/hosts

mgr/dashboard: migrated host table tabs to resource pages

Reviewed-by: Afreen Misbah <afreen@ibm.com>
2 weeks agoMerge pull request #69408 from tchaikov/wip-spec-update-alternatives
David Galloway [Thu, 11 Jun 2026 18:01:05 +0000 (14:01 -0400)]
Merge pull request #69408 from tchaikov/wip-spec-update-alternatives

ceph.spec.in: add update-alternatives as runtime dep and correct macro call

2 weeks agoMerge pull request #68031 from syedali237/rhcs-dashboard/osd-component
Afreen Misbah [Thu, 11 Jun 2026 15:16:32 +0000 (20:46 +0530)]
Merge pull request #68031 from syedali237/rhcs-dashboard/osd-component

mgr/dashboard : carbonize OSD form component

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Devika Babrekar <devika.babrekar@ibm.com>
2 weeks agoMerge pull request #69402 from Ericmzhang/wip-fix-mon-stretch_cluster
Kamoltat (Junior) Sirivadhna [Thu, 11 Jun 2026 13:47:19 +0000 (09:47 -0400)]
Merge pull request #69402 from Ericmzhang/wip-fix-mon-stretch_cluster

qa: Fix stretch_cluster.py missing function call
Reviewed-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
2 weeks agomgr/dashboard: fix zone creation in rgw service creation form 69412/head
Aashish Sharma [Thu, 11 Jun 2026 09:19:04 +0000 (14:49 +0530)]
mgr/dashboard: fix zone creation in rgw service creation form

The zone creation request from the rgw service creation form was missing
the tier_type, sync_from and sync_from_all properties as a result the
zone creation was failing. This PR tends to fix this issue.

Fixes: https://tracker.ceph.com/issues/77263
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2 weeks agomgr: adding ceph_secrets_xxx.py files to the build/packaging 67466/head
Redouane Kachach [Thu, 4 Jun 2026 10:45:58 +0000 (12:45 +0200)]
mgr: adding ceph_secrets_xxx.py files to the build/packaging

Fixes: https://tracker.ceph.com/issues/74562
Assisted-by: Claude <claude.ai>
Assisted-by: ChatGPT <chatgpt.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
2 weeks agodoc/mgr/ceph_secrets: add documentation for the ceph_secrets module
Redouane Kachach [Mon, 8 Jun 2026 09:54:17 +0000 (11:54 +0200)]
doc/mgr/ceph_secrets: add documentation for the ceph_secrets module

Document CLI commands (set/get/get-value/ls/rm), the Python API via
CephSecretsClient, secret URI embedding and resolution, and epoch-based
change detection.

Fixes: https://tracker.ceph.com/issues/74562
Assisted-by: Claude <claude.ai>
Assisted-by: ChatGPT <chatgpt.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
2 weeks agomgr/ceph_secrets: add unit tests for all modules
Redouane Kachach [Thu, 11 Jun 2026 08:55:43 +0000 (10:55 +0200)]
mgr/ceph_secrets: add unit tests for all modules

Add pytest coverage for the full stack: secret types and URI/path
parsing, storage backend contract, Mon KV store (CRUD, epoch,
serialization, corruption handling), SecretMgr (scan/resolve),
module RPC surface and CLI handlers, and the CephSecretsClient
wrapper. Gate test imports on the UNITTEST env var following the
SMB module pattern.

Fixes: https://tracker.ceph.com/issues/74562
Assisted-by: Claude <claude.ai>
Assisted-by: ChatGPT <chatgpt.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
2 weeks agomgr/ceph_secrets: add ceph_secrets module to tox.ini
Redouane Kachach [Thu, 21 May 2026 13:46:15 +0000 (15:46 +0200)]
mgr/ceph_secrets: add ceph_secrets module to tox.ini

Adds ceph_secrets to tox.ini mypy and flake8 targets.

Signed-off-by: Redouane Kachach <rkachach@ibm.com>
2 weeks agomgr: add CephSecretsClient wrapper for ceph_secrets RPC
Redouane Kachach [Mon, 26 Jan 2026 13:51:44 +0000 (14:51 +0100)]
mgr: add CephSecretsClient wrapper for ceph_secrets RPC

Add a thin typed client around mgr.remote() for consuming the
ceph_secrets module. Exposes get/set/rm, epoch and version queries,
batch version fetch, scan and resolve helpers. Lives alongside
ceph_secrets_types.py so any mgr module can import it without
depending on the ceph_secrets package directly.

Fixes: https://tracker.ceph.com/issues/74562
Assisted-by: Claude <claude.ai>
Assisted-by: ChatGPT <chatgpt.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
2 weeks agomgr/ceph_secrets: add 'ceph secret' CLI commands and input parsing
Redouane Kachach [Mon, 26 Jan 2026 14:14:35 +0000 (15:14 +0100)]
mgr/ceph_secrets: add 'ceph secret' CLI commands and input parsing

This commit has the following changes:

1) Add the ceph_secrets mgr module entrypoint and wire it to
SecretMgr. Implement the core RPC surface consumed by other mgr
modules (secret_ls/get/set/rm, secret_get_value, secret_get_version)
and keep the implementation focused on the internal API.

2) Add user-facing CLI commands (ceph secret ls/get/set/rm) using
parse_secret_path. Secret data is accepted via -i (inbuf) only for
script-friendly usage. Add secret get-value for plain-string output
without a JSON envelope. Ensure consistent JSON output and error
mapping to EINVAL/ENOENT, while preserving safe non-reveal defaults
unless explicitly requested.

3) Add the scanning and resolution helpers (scan_refs,
scan_unresolved_refs, resolve_object) through the ceph_secrets module
RPC API. This lets consumers reliably detect secret:/... references and
resolve them inside nested objects without duplicating logic. The
behavior is delegated to SecretMgr to keep parsing/resolution
consistent across the stack.

Fixes: https://tracker.ceph.com/issues/74562
Assisted-by: Claude <claude.ai>
Assisted-by: ChatGPT <chatgpt.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
2 weeks agomgr/ceph_secrets: add SecretMgr for secrets handling and resolution
Redouane Kachach [Mon, 26 Jan 2026 13:46:39 +0000 (14:46 +0100)]
mgr/ceph_secrets: add SecretMgr for secrets handling and resolution

Introduce SecretMgr to encapsulate higher-level behavior on top of
SecretStoreMon: listing helpers, scan_refs, scan_unresolved_refs, and
resolve_object (walk nested dict/list structures). This keeps
parsing/substitution logic out of the mgr module entrypoint and makes
consumer behavior consistent. The module can now resolve secret://…
references deterministically and provide structured scan output.

Fixes: https://tracker.ceph.com/issues/74562
Assisted-by: Claude <claude.ai>
Assisted-by: ChatGPT <chatgpt.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
2 weeks agomgr/ceph_secrets: add SecretStoreMon mon-kv store implementation
Redouane Kachach [Mon, 23 Feb 2026 09:44:22 +0000 (10:44 +0100)]
mgr/ceph_secrets: add SecretStoreMon mon-kv store implementation

Introduce SecretRecord (data, metadata, versioning, timestamps) and
the canonical KV prefix (secret_store/v1/…). Add JSON serialization
helpers (to_json) including the ability to omit secret data unless
explicitly requested. This commit defines the “what we store and how
it looks” without wiring any mgr interactions yet.

Add SecretStoreMon implementing the backend using mgr’s KV
store (get_store, set_store, prefix listing). Implement set/get/rm
semantics, version increments, and list-by-prefix queries for
namespace/scope/target. This isolates persistence logic from CLI/RPC
concerns and provides deterministic record behavior for later layers.

Fixes: https://tracker.ceph.com/issues/74562
Assisted-by: Claude <claude.ai>
Assisted-by: ChatGPT <chatgpt.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
2 weeks agomgr/ceph_secrets: add storage backend protocol for mgr KV secrets
Redouane Kachach [Mon, 26 Jan 2026 13:40:43 +0000 (14:40 +0100)]
mgr/ceph_secrets: add storage backend protocol for mgr KV secrets

Define a minimal backend protocol for secret persistence
operations (get/set/rm/list), keeping the module implementation
decoupled from the backing store details. For now we will start with
monstore-db as secure KV store but the idea is to extend this to other
backends such as Vault.

Fixes: https://tracker.ceph.com/issues/74562
Assisted-by: Claude <claude.ai>
Assisted-by: ChatGPT <chatgpt.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
2 weeks agomgr/ceph_secrets: add secret reference types and parsing helpers
Redouane Kachach [Mon, 23 Feb 2026 09:39:49 +0000 (10:39 +0100)]
mgr/ceph_secrets: add secret reference types and parsing helpers

Introduce the shared types and parsing logic used across the secrets
module: secret scopes, secret references, and the exception hierarchy.
Includes validation for all supported addressing forms and clear
error messages on malformed input.

Fixes: https://tracker.ceph.com/issues/74562
Assisted-by: Claude <claude.ai>
Assisted-by: ChatGPT <chatgpt.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
2 weeks agoMerge pull request #69391 from guits/fix-raw-activate
Guillaume Abrioux [Thu, 11 Jun 2026 07:34:49 +0000 (09:34 +0200)]
Merge pull request #69391 from guits/fix-raw-activate

ceph-volume: fix raw activate when device path is stale

2 weeks agoMerge pull request #69375 from zdover23/2026-06-10-organizationmap-update
Zac Dover [Thu, 11 Jun 2026 06:32:20 +0000 (16:32 +1000)]
Merge pull request #69375 from zdover23/2026-06-10-organizationmap-update

organizationmap: add Zac Dover (Clyso)

Reviewed-by: Dan van der Ster <dan.vanderster@clyso.com>
2 weeks agoceph.spec.in: require update-alternatives for the osd scriptlets 69408/head
Kefu Chai [Thu, 11 Jun 2026 03:32:24 +0000 (11:32 +0800)]
ceph.spec.in: require update-alternatives for the osd scriptlets

ceph-osd-crimson and ceph-osd-classic call update-alternatives in their
%posttrans and %preun scriptlets but don't depend on it. declare it as a
scriptlet dependency so the binary is there when they run.

Fixes: https://tracker.ceph.com/issues/77323
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
2 weeks agoceph.spec.in: use %{_sbindir} instead of ${_sbindir} in osd %preun
Kefu Chai [Thu, 11 Jun 2026 03:29:05 +0000 (11:29 +0800)]
ceph.spec.in: use %{_sbindir} instead of ${_sbindir} in osd %preun

a37b5b5bde8c added %preun scriptlets that use ${_sbindir}, which is
shell syntax rather than an rpm macro, so it expands to empty at run
time and the scriptlet runs "/update-alternatives", failing on
uninstall/upgrade with:

  /var/tmp/rpm-tmp.K1fvm3: line 2: /update-alternatives: No such file or directory
  error: %preun(ceph-osd-crimson-2:20.3.0-5054.g33c1d671.el9.x86_64) scriptlet failed, exit status 127
  Error in PREUN scriptlet in rpm package ceph-osd-crimson.

use %{_sbindir}, like the %posttrans --install lines already do, so it
expands to /usr/sbin/update-alternatives at build time.

Fixes: https://tracker.ceph.com/issues/77323
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
2 weeks agocrimson/osd: reject Seastore PG merges across shards 66885/head
Aishwarya Mathuria [Mon, 18 May 2026 13:37:26 +0000 (13:37 +0000)]
crimson/osd: reject Seastore PG merges across shards

Seastore cannot merge collections between reactor shards currently.
On cross-shard detection, tell the monitor the source PG is not ready
(via MOSDPGReadyToMerge{ ready=false }) so the unsafe pg_num decrement
is never proposed, then send MOSDPGStopMerge to clamp pg_num_target and
permanently disable further shrink for the pool.

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
2 weeks agoqa/suites/crimson: Add a test for PG merging in the crimson suite
Aishwarya Mathuria [Tue, 13 Jan 2026 07:59:31 +0000 (07:59 +0000)]
qa/suites/crimson: Add a test for PG merging in the crimson suite

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
2 weeks agomon/OSDMonitor: introduce per-pool crimson_allow_pg_merge flag
Aishwarya Mathuria [Tue, 28 Apr 2026 06:41:22 +0000 (12:11 +0530)]
mon/OSDMonitor: introduce per-pool crimson_allow_pg_merge flag

Switch Crimson PG merge gating from a global config to a pool-scoped flag.

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
2 weeks agodoc/dev/crimson: Add explanation for PG merging
Aishwarya Mathuria [Mon, 19 Jan 2026 06:31:22 +0000 (12:01 +0530)]
doc/dev/crimson: Add explanation for PG merging

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
2 weeks agocrimson/osd: implement PG merge detection and orchestration in PGAdvanceMap
Aishwarya Mathuria [Fri, 9 Jan 2026 10:48:16 +0000 (10:48 +0000)]
crimson/osd: implement PG merge detection and orchestration in PGAdvanceMap

Integrate PG merge handling into the map advancement pipeline.

When pg_num shrinks between epochs, check_for_merges() returns a
merge_result_t describing whether this PG is a merge source, target, or
not involved. start() stops advancing through later epochs once a merge
is detected, then either finish_merge_advance() or the normal activate
path runs so complete_rctx() always happens in one place.

- check_for_merges(): detect pg_num shrink and dispatch merge_pg().
- merge_pg(): merge-only work — Seastore eligibility, source handoff
  setup, target rendezvous collection and PG::merge_from().
- finish_merge_advance(): commit rctx and complete the role-specific
  steps (source: complete_rctx, stop, register_merge_source; target:
  handle_advance_map, handle_activate_map, complete_rctx).

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
2 weeks agocrimson/osd/pg: implement PG::merge_from
Aishwarya Mathuria [Fri, 9 Jan 2026 07:20:44 +0000 (07:20 +0000)]
crimson/osd/pg: implement PG::merge_from

Add PG::merge_from to execute the merge of source PGs into a target PG.
This function builds a transaction to remove source-specifc metadata
objects and merge source collections into the target collection.

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
2 weeks agocrimson/osd: per-PG rendezvous for cross-shard merge source handoff
Aishwarya Mathuria [Thu, 8 Jan 2026 16:44:01 +0000 (16:44 +0000)]
crimson/osd: per-PG rendezvous for cross-shard merge source handoff

Add infrastructure so source PGs can be extracted from their birth
shard, moved to the target shard, and collected by the target PG before
merge proceeds.

Cross-shard safety: PGs are tied to their birth_shard for destruction.
register_merge_source() uses extract_pg() to detach the source,
seastar::foreign_ptr to hop cores, and
crimson::local_shared_foreign_ptr on the target so release routes
destruction back to the birth shard.

Synchronization: replace the per-shard ShardServices merge_info_t
registry (shared_promise waiters, ready_pgs staging, and cleanup hooks)
with merge state on the target PG itself. Source-side
register_merge_source() delivers PGs via PG::add_merge_source(); the
target waits in PG::collect_merge_sources(n) on a per-PG semaphore.
Duplicate source registrations are ignored. PG::stop() breaks the
semaphore so shutdown does not hang.

ShardServices::register_merge_source() and extract_pg() live in
shard_services; rendezvous types and methods live on PG.

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
2 weeks agocrimson/osd/pg: modify stop() function
Aishwarya Mathuria [Thu, 8 Jan 2026 11:09:58 +0000 (11:09 +0000)]
crimson/osd/pg: modify stop() function

This function ensures that when a PG is being removed or
merged and it calls stop() - it will clear primary state, and
notify the Monitor to clear any pending merge flags.

It will also call client_request_orderer.clear_and_cancel() ensuring
all remaining client requests are properly completed. This is needed
for merging in particular since on_change() is never called for the merge epoch
(handle_advance_map is skipped after merge detection), so
clear_and_cancel() is never invoked on the source PG's orderer.

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
2 weeks agocrimson/osd/shard_services: inherit from peering_sharded_service
Aishwarya Mathuria [Thu, 8 Jan 2026 10:57:37 +0000 (10:57 +0000)]
crimson/osd/shard_services: inherit from peering_sharded_service

Update ShardServices to inherit from seastar::peering_sharded_service.
This allows the service to access its own sharded container directly
via container() rather than manually storing a reference to it.

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
2 weeks agocrimson/osd: Add functions to notify mon when PGs are ready to merge
Aishwarya Mathuria [Wed, 7 Jan 2026 11:55:25 +0000 (11:55 +0000)]
crimson/osd: Add functions to notify mon when PGs are ready to merge

When a PG is in the pending merge state it is >= pg_num_pending and <
pg_num. When this happens, IO is paused and once the PG peers we notify
the mon that we are idle and safe to merge.
Use Gated for merge notify callbacks.

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2 weeks agoMerge PR #69404 into main
Patrick Donnelly [Thu, 11 Jun 2026 01:58:01 +0000 (21:58 -0400)]
Merge PR #69404 into main

* refs/pull/69404/head:
.github/milestone: add umbrella

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
2 weeks agoMerge pull request #60492 from anthonyeleven/more-pgs
Anthony D'Atri [Thu, 11 Jun 2026 00:24:36 +0000 (20:24 -0400)]
Merge pull request #60492 from anthonyeleven/more-pgs

src/common/options: Increase autoscaler PG target and overload values

2 weeks ago.github/milestone: add umbrella 69404/head
Patrick Donnelly [Wed, 10 Jun 2026 22:25:16 +0000 (18:25 -0400)]
.github/milestone: add umbrella

Fixes: https://tracker.ceph.com/issues/77308
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
2 weeks agoMerge PR #69399 into main v21.3.0
Patrick Donnelly [Wed, 10 Jun 2026 20:35:13 +0000 (16:35 -0400)]
Merge PR #69399 into main

* refs/pull/69399/head:
doc/dev/release-checklists: reset to skeleton

Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
2 weeks agoqa: Fix task missing function call 69402/head
Eric Zhang [Wed, 10 Jun 2026 19:41:21 +0000 (12:41 -0700)]
qa: Fix task missing function call
lambda was missing function call so always returned true

Signed-off-by: Eric Zhang <emzhang@ibm.com>
2 weeks agodoc/dev/release-checklists: reset to skeleton 69399/head
Patrick Donnelly [Wed, 10 Jun 2026 18:36:59 +0000 (14:36 -0400)]
doc/dev/release-checklists: reset to skeleton

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
2 weeks agoMerge PR #66726 into main v21.0.1
Patrick Donnelly [Wed, 10 Jun 2026 18:30:59 +0000 (14:30 -0400)]
Merge PR #66726 into main

* refs/pull/66726/head:
doc: Update documentation to reflect new functionality
test: Add integration tests for EC Omap operations and recovery
osd: Hook up omap operations in EC pools
osd: Allow for recovery of OMAP header and entries in EC pools
doc: Write design document to explain the reasoning behind implementing this feature
osd: Introduce functions required for EC OMAP support
osd: Add ECOmapJournal class and relocate OmapUpdateType enum class

Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>
Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 weeks agoMerge pull request #69051 from mheler/wip-rgw-http-reqs-lock
mheler [Wed, 10 Jun 2026 18:11:19 +0000 (13:11 -0500)]
Merge pull request #69051 from mheler/wip-rgw-http-reqs-lock

rgw/http: take reqs_lock when appending to reqs_change_state

2 weeks agoMerge pull request #68784 from mheler/wip-checksum-special-char
mheler [Wed, 10 Jun 2026 18:10:51 +0000 (13:10 -0500)]
Merge pull request #68784 from mheler/wip-checksum-special-char

rgw/cloud-transition: url-encode rgwx-source-key metadata header

2 weeks agomgr/dashboard: migrated host table tabs to resource pages 69136/head
Syed Ali Ul Hasan [Sat, 6 Jun 2026 16:47:23 +0000 (22:17 +0530)]
mgr/dashboard: migrated host table tabs to resource pages

Fixes :  https://tracker.ceph.com/issues/76712
Signed-off-by: Syed Ali Ul Hasan <syedaliulhasan19@gmail.com>
2 weeks agomgr/dashboard: carbonized OSD form component 68031/head
Syed Ali Ul Hasan [Wed, 10 Jun 2026 17:30:36 +0000 (23:00 +0530)]
mgr/dashboard: carbonized OSD form component

Fixes: https://tracker.ceph.com/issues/68265
Signed-off-by: Syed Ali Ul Hasan <syedaliulhasan19@gmail.com>
2 weeks agoMerge pull request #69256 from ronen-fr/wip-rf-stshards
Ronen Friedman [Wed, 10 Jun 2026 15:31:58 +0000 (18:31 +0300)]
Merge pull request #69256 from ronen-fr/wip-rf-stshards

crimson/osd: avoid calling get_sharded_store() for obj size

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Matan Breizman <mbreizma@redhat.com>
2 weeks agoMerge pull request #68888 from MattyWilliams22/mw-peering-state-rollforward
Matty Williams [Wed, 10 Jun 2026 15:20:23 +0000 (16:20 +0100)]
Merge pull request #68888 from MattyWilliams22/mw-peering-state-rollforward

osd: Fix condition for rolling forward pg log entries

Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>
Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>
2 weeks agoMerge pull request #69276 from afreen23/worktree-umbrella-release-notes
Afreen Misbah [Wed, 10 Jun 2026 14:33:03 +0000 (20:03 +0530)]
Merge pull request #69276 from afreen23/worktree-umbrella-release-notes

doc: add Dashboard and Monitoring release notes for Umbrella

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Naman Munet <nmunet@redhat.com>
2 weeks agoMerge pull request #68368 from kginonredhat/issue-75389-yaml-and-jinja2-deps-on-cento...
David Galloway [Wed, 10 Jun 2026 14:32:33 +0000 (10:32 -0400)]
Merge pull request #68368 from kginonredhat/issue-75389-yaml-and-jinja2-deps-on-centos-distro

ceph.spec: declare PyYAML and Jinja2 Requires for cephadm RPM

2 weeks agodoc: add Dashboard and Monitoring release notes for Umbrella 69276/head
Afreen Misbah [Mon, 25 May 2026 23:10:46 +0000 (04:40 +0530)]
doc: add Dashboard and Monitoring release notes for Umbrella

Signed-off-by: Afreen Misbah <afreen23@gmail.com>
2 weeks agoMerge pull request #68984 from Jayaprakash-ibm/wip-faster-alloc-recovery-testing
Jaya Prakash [Wed, 10 Jun 2026 11:31:07 +0000 (17:01 +0530)]
Merge pull request #68984 from Jayaprakash-ibm/wip-faster-alloc-recovery-testing

qa: Add Teuthology tests for BlueStore faster allocation recovery

Reviewed-by: Jaya Prakash <jayaprakash@ibm.com>
2 weeks agoMerge pull request #64369 from aclamk/aclamk-bs-faster-start-more
Jaya Prakash [Wed, 10 Jun 2026 11:30:14 +0000 (17:00 +0530)]
Merge pull request #64369 from aclamk/aclamk-bs-faster-start-more

bluestore: Faster allocation recovery - evolution

Reviewed-by: Jaya Prakash <jayaprakash@ibm.com>
2 weeks agoMerge pull request #68981 from aclamk/aclamk-kv-divide-range
Jaya Prakash [Wed, 10 Jun 2026 11:28:10 +0000 (16:58 +0530)]
Merge pull request #68981 from aclamk/aclamk-kv-divide-range

kv/KeyValueDB: New utility function util_divide_key_range

Reviewed-by: Jaya Prakash <jayaprakash@ibm.com>
2 weeks agoceph-volume: fix raw activate when device path is stale 69391/head
Guillaume Abrioux [Wed, 10 Jun 2026 11:22:14 +0000 (13:22 +0200)]
ceph-volume: fix raw activate when device path is stale

This changes unlink_bs_symlinks to use os.path.lexists instead
of os.path.exists. It can happen that devices get renumbered,
in that case, the OSD symlink still exists but its target device
is gone which means os.path.exists returns False, so the symlink
is never cleaned up and ceph-volume activate can fail later.

Fixes: https://tracker.ceph.com/issues/77295
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
2 weeks agoMerge pull request #69364 from eameh-LF/wip-doc-77191
Ilya Dryomov [Wed, 10 Jun 2026 10:00:45 +0000 (12:00 +0200)]
Merge pull request #69364 from eameh-LF/wip-doc-77191

doc/man: Remove stale EOL release names from deprecation notices

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2 weeks agocrimson/osd: move get_max_object_size() to store level 69256/head
Ronen Friedman [Wed, 3 Jun 2026 05:40:25 +0000 (05:40 +0000)]
crimson/osd: move get_max_object_size() to store level

is_offset_and_length_valid() called get_sharded_store() locally to
obtain the store-specific max_object_size. On alien cores (where
smp::count > store_shard_nums), the local store is inactive and the
call hits assert(shard_store.get_status() == true).

As the max object size is a store-specific property and not a
store-shard one, there is no reason to acquire the
store shard to obtain it. Instead -
a get_max_object_size() method is added to the Store interface.

Fixes: https://tracker.ceph.com/issues/76946
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2 weeks agodocs: organizationmap: add Zac Dover (Clyso) 69375/head
Zac Dover [Wed, 10 Jun 2026 01:00:18 +0000 (11:00 +1000)]
docs: organizationmap: add Zac Dover (Clyso)

Add Zac Dover (Clyso) to .organizationmap.

Signed-off-by: Zac Dover <zac.dover@clyso.com>
2 weeks agoMerge pull request #68990 from rhcs-dashboard/carbon-filter
Nizamudeen A [Wed, 10 Jun 2026 05:02:26 +0000 (10:32 +0530)]
Merge pull request #68990 from rhcs-dashboard/carbon-filter

mgr/dashboard: carbonize table filters

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Naman Munet <nmunet@redhat.com>
2 weeks agoMerge pull request #69374 from sunyuechi/wip-catch2-disconnected-guard
Kefu Chai [Wed, 10 Jun 2026 03:18:07 +0000 (11:18 +0800)]
Merge pull request #69374 from sunyuechi/wip-catch2-disconnected-guard

cmake: disable Catch2 tests when Catch2 is unavailable

Reviewed-by: Kefu Chai <k.chai@proxmox.com>
2 weeks agoMerge pull request #69120 from tchaikov/wip-crimson-fix-move-rctx
Kefu Chai [Wed, 10 Jun 2026 01:52:35 +0000 (09:52 +0800)]
Merge pull request #69120 from tchaikov/wip-crimson-fix-move-rctx

crimson/osd: give each split child its own PeeringCtx

Reviewed-by: Aishwarya Mathuria <amathuri@redhat.com>
2 weeks agocmake: disable Catch2 tests when Catch2 is unavailable 69374/head
Sun Yuechi [Wed, 10 Jun 2026 00:13:53 +0000 (08:13 +0800)]
cmake: disable Catch2 tests when Catch2 is unavailable

debhelper on noble passes -DFETCHCONTENT_FULLY_DISCONNECTED=ON, so CPM
cannot fetch Catch2 and silently skips it, leaving no Catch2 targets
behind and breaking the generate step. Fall back to WITH_CATCH2=OFF
with a warning instead.

Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>
2 weeks agoqa/workunits/mon: Update pg_autoscaler.sh in conjunction with https://github.com... 60492/head
Anthony D'Atri [Sat, 30 May 2026 01:36:48 +0000 (21:36 -0400)]
qa/workunits/mon: Update pg_autoscaler.sh in conjunction with https://github.com/ceph/ceph/pull/60492

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
2 weeks agoMerge pull request #61256 from irq0/wip/rgw-kms-cache
Adam Emerson [Tue, 9 Jun 2026 20:22:35 +0000 (16:22 -0400)]
Merge pull request #61256 from irq0/wip/rgw-kms-cache

RGW SSE-KMS secrets cache

Reviewed-by: Adam Emerson <aemerson@redhat.com>
2 weeks agoMerge pull request #69085 from dheart-joe/wip-reconstruct-allocations
Adam Kupczyk [Tue, 9 Jun 2026 19:06:36 +0000 (21:06 +0200)]
Merge pull request #69085 from dheart-joe/wip-reconstruct-allocations

os/bluestore: fix reallocation and corruption when shared_blob key is missing/undecodable