From: Kotresh HR Date: Mon, 25 May 2026 18:22:29 +0000 (+0530) Subject: doc: Update the mirroring doc with new metrics fields X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=55eceaae3d70d6e0d721bc8fadb0e5b6726257e6;p=ceph.git doc: Update the mirroring doc with new metrics fields Update the mirroring documentation and also the release notes with new metrics introduced and it's availability via 'fs mirror peer status' asok interface. Fixes: https://tracker.ceph.com/issues/73453 Signed-off-by: Kotresh HR --- diff --git a/PendingReleaseNotes b/PendingReleaseNotes index dca256f0dc7..90903a9875c 100644 --- a/PendingReleaseNotes +++ b/PendingReleaseNotes @@ -78,6 +78,11 @@ ``ceph fs snapshot mirror daemon status`` now shows the remote cluster's monitor addresses and cluster ID for each configured peer, making it easier to verify peer connectivity and troubleshoot mirroring issues. +* CephFS Mirroring: The ``fs mirror peer status`` admin socket command reports + additional per-directory sync metrics (sync mode, throughput, crawl and + data-sync queue timing, bytes/files progress, and ETA). Output is grouped + under ``metrics//peer/``. For more information, + see https://tracker.ceph.com/issues/73453 * RBD: Mirror snapshot creation and trash purge schedules are now automatically staggered when no explicit "start-time" is specified. This reduces scheduling spikes and distributes work more evenly over time. diff --git a/doc/cephfs/cephfs-mirroring.rst b/doc/cephfs/cephfs-mirroring.rst index cbd51081dfc..5b12ed4dffc 100644 --- a/doc/cephfs/cephfs-mirroring.rst +++ b/doc/cephfs/cephfs-mirroring.rst @@ -356,21 +356,113 @@ command parameter is of format ``filesystem-name@filesystem-id peer-uuid``:: $ ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@360 a2dc7784-e7a1-4723-b103-03ee8d8768f8 { - "/d0": { - "state": "idle", - "last_synced_snap": { - "id": 120, - "name": "snap1", - "sync_duration": 3, - "sync_time_stamp": "274900.558797s", - "sync_bytes": 52428800 - }, - "snaps_synced": 2, - "snaps_deleted": 0, - "snaps_renamed": 0 + "metrics": { + "/d0": { + "peer": { + "a2dc7784-e7a1-4723-b103-03ee8d8768f8": { + "state": "idle", + "last_synced_snap": { + "id": 120, + "name": "snap1", + "crawl_duration": "2s", + "datasync_queue_wait_duration": "1s", + "sync_duration": "33s", + "sync_time_stamp": "274900.558797s", + "sync_bytes": "149.94 MiB", + "sync_files": 5000 + }, + "snaps_synced": 2, + "snaps_deleted": 0, + "snaps_renamed": 0 + } + } + } } } +The per-directory fields are nested under ``metrics//peer/`` so +the same directory path can be reported for multiple peers without key collisions. + +.. _cephfs_mirror_peer_status_formatting: + +Value formatting +---------------- + +Several fields in the status output are formatted for readability rather than reported as raw +numbers. The subsections below describe each format; field tables later in this section refer +back to them. + +.. _cephfs_mirror_status_durations: + +Durations +--------- + +Fields: ``crawl_duration``, ``datasync_queue_wait_duration``, ``sync_duration``, +``crawl.duration``, ``datasync_queue_wait.duration``, and ``eta``. + +Elapsed time is rounded to the nearest whole second and displayed as a combination of days, +hours, minutes, and seconds. The format adapts to the magnitude: + +- ``s`` — seconds only, when less than one minute (for example, ``2s`` or ``33s``) +- ``m s`` — minutes and seconds, when less than one hour (for example, ``5m 12s``) +- ``h m s`` — hours, minutes, and seconds, when less than one day + (for example, ``1h 05m 30s``) +- ``d h m s`` — days, hours, minutes, and seconds, when one day or longer + (for example, ``1d 02h 30m 45s``) + +.. _cephfs_mirror_status_data_sizes: + +Data sizes +---------- + +Fields: ``sync_bytes``, ``bytes.sync_bytes``, ``bytes.total_bytes``, and the byte counts in +``last_synced_snap``. + +Byte counts use binary (IEC) units with two decimal places. The unit is chosen automatically +from ``B``, ``KiB``, ``MiB``, ``GiB``, ``TiB``, or ``PiB`` (powers of 1024). For example, +``149.94 MiB``. + +.. _cephfs_mirror_status_throughput: + +Throughput +---------- + +Fields: ``avg_read_throughput_bytes`` and ``avg_write_throughput_bytes``. + +Average transfer rate in bytes per second, using the same binary units as :ref:`data sizes +` with a ``/s`` suffix. For example, ``13.03 MiB/s`` means +13.03 mebibytes per second. + +.. _cephfs_mirror_status_percentages: + +Percentages +----------- + +Fields: ``bytes.sync_percent`` and ``files.sync_percent``. + +Percentage complete with two decimal places and a ``%`` suffix (for example, ``40.29%``). + +.. _cephfs_mirror_status_counts: + +Counts +------ + +Fields: ``sync_files``, ``total_files``, ``snaps_synced``, ``snaps_deleted``, ``snaps_renamed``, +and snapshot ``id``. + +Plain unsigned integers with no unit suffix. + +.. _cephfs_mirror_status_timestamp: + +Timestamp +--------- + +Field: ``sync_time_stamp``. + +Monotonic clock time in seconds (since daemon startup) when the snapshot sync finished, +printed with sub-second precision and an ``s`` suffix (for example, ``274900.558797s``). This +is not a wall-clock or epoch timestamp. + Synchronization stats including ``snaps_synced``, ``snaps_deleted`` and ``snaps_renamed`` are reset on daemon restart and/or when a directory is reassigned to another mirror daemon (when multiple mirror daemons are deployed). @@ -386,27 +478,128 @@ When a directory is currently being synchronized, the mirror daemon marks it as $ ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@360 a2dc7784-e7a1-4723-b103-03ee8d8768f8 { - "/d0": { - "state": "syncing", - "current_syncing_snap": { - "id": 121, - "name": "snap2" - }, - "last_synced_snap": { - "id": 120, - "name": "snap1", - "sync_duration": 3, - "sync_time_stamp": "274900.558797s", - "sync_bytes": 52428800 - }, - "snaps_synced": 2, - "snaps_deleted": 0, - "snaps_renamed": 0 + "metrics": { + "/d0": { + "peer": { + "a2dc7784-e7a1-4723-b103-03ee8d8768f8": { + "state": "syncing", + "current_syncing_snap": { + "id": 121, + "name": "snap2", + "sync-mode": "full", + "avg_read_throughput_bytes": "13.03 MiB/s", + "avg_write_throughput_bytes": "24.24 MiB/s", + "crawl": { + "state": "completed", + "duration": "2s" + }, + "datasync_queue_wait": { + "state": "complete", + "duration": "1s" + }, + "bytes": { + "sync_bytes": "60.40 MiB", + "total_bytes": "149.94 MiB", + "sync_percent": "40.29%" + }, + "files": { + "sync_files": 2013, + "total_files": 5000, + "sync_percent": "40.26%" + }, + "eta": "7s" + }, + "snaps_synced": 2, + "snaps_deleted": 0, + "snaps_renamed": 0 + } + } + } } } The mirror daemon marks it back to ``idle``, when the syncing completes. +When ``state`` is ``syncing``, ``current_syncing_snap`` includes the following +progress fields (see :ref:`cephfs_mirror_peer_status_formatting` for how values are +displayed): + +.. list-table:: + :widths: 35 65 + :header-rows: 1 + + * - Field + - Description + * - ``sync-mode`` + - Whether the snapshot is synchronized with a full tree copy (``full``) or incremental snapdiff/blockdiff (``delta``). + * - ``avg_read_throughput_bytes`` + - Average read rate from the primary filesystem for this snapshot sync. See + :ref:`Throughput `. + * - ``avg_write_throughput_bytes`` + - Average write rate to the remote peer for this snapshot sync. See + :ref:`Throughput `. + * - ``crawl.state`` + - Whether the directory tree walk is ``in-progress`` or ``completed``. While + ``in-progress``, ``bytes.total_bytes`` and ``files.total_files`` reflect only what + has been discovered so far and may keep increasing; once ``completed``, those totals + are fully discovered for this snapshot sync. + * - ``crawl.duration`` + - Elapsed crawl time so far, or total crawl time once ``crawl.state`` is ``completed``. + See :ref:`Durations `. + * - ``datasync_queue_wait.state`` + - Whether the snapshot is still ``waiting`` in the data-sync queue or has started transfer (``complete``). + * - ``datasync_queue_wait.duration`` + - Elapsed queue wait time so far, or total queue wait time once transfer has started. + See :ref:`Durations `. + * - ``bytes.sync_bytes`` + - Amount of file data synchronized so far for this snapshot. See + :ref:`Data sizes `. + * - ``bytes.total_bytes`` + - Total file data discovered for this snapshot sync. See + :ref:`Data sizes `. Increases during the crawl while + ``crawl.state`` is ``in-progress``; final once ``crawl.state`` is ``completed``. + * - ``bytes.sync_percent`` + - Percentage of ``total_bytes`` synchronized so far. See + :ref:`Percentages `. + * - ``files.sync_files`` + - Number of files synchronized so far for this snapshot. See + :ref:`Counts `. + * - ``files.total_files`` + - Total number of files discovered for this snapshot sync. See + :ref:`Counts `. Increases during the crawl while + ``crawl.state`` is ``in-progress``; final once ``crawl.state`` is ``completed``. + * - ``files.sync_percent`` + - Percentage of ``total_files`` synchronized so far. See + :ref:`Percentages `. + * - ``eta`` + - Estimated time remaining to finish the snapshot sync, or ``calculating...`` until enough + samples are collected. See :ref:`Durations `. + +``last_synced_snap`` includes these additional fields for the last completed snapshot sync +(see :ref:`cephfs_mirror_peer_status_formatting` for how values are displayed): + +.. list-table:: + :widths: 35 65 + :header-rows: 1 + + * - Field + - Description + * - ``crawl_duration`` + - Total time spent walking the directory tree for that snapshot sync. See + :ref:`Durations `. + * - ``datasync_queue_wait_duration`` + - Total time the snapshot waited in the data-sync queue before file transfer began. See + :ref:`Durations `. + * - ``sync_duration`` + - Total elapsed time for the snapshot sync. See :ref:`Durations `. + * - ``sync_time_stamp`` + - When the sync finished. See :ref:`Timestamp `. + * - ``sync_bytes`` + - Total file data synchronized for that snapshot. See + :ref:`Data sizes `. + * - ``sync_files`` + - Number of files synchronized for that snapshot. See :ref:`Counts `. + When a directory experiences a configured number of consecutive synchronization failures, the mirror daemon marks it as ``failed``. Synchronization for these directories is retried. By default, the number of consecutive failures before a directory is marked as failed @@ -419,24 +612,37 @@ E.g., adding a regular file for synchronization would result in failed status:: $ ceph fs snapshot mirror add cephfs /f0 $ ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@360 a2dc7784-e7a1-4723-b103-03ee8d8768f8 { - "/d0": { - "state": "idle", - "last_synced_snap": { - "id": 121, - "name": "snap2", - "sync_duration": 5, - "sync_time_stamp": "500900.600797s", - "sync_bytes": 78643200 + "metrics": { + "/d0": { + "peer": { + "a2dc7784-e7a1-4723-b103-03ee8d8768f8": { + "state": "idle", + "last_synced_snap": { + "id": 121, + "name": "snap2", + "crawl_duration": "2s", + "datasync_queue_wait_duration": "1s", + "sync_duration": "44s", + "sync_time_stamp": "500900.600797s", + "sync_bytes": "149.94 MiB", + "sync_files": 5000 + }, + "snaps_synced": 3, + "snaps_deleted": 0, + "snaps_renamed": 0 + } + } }, - "snaps_synced": 3, - "snaps_deleted": 0, - "snaps_renamed": 0 - }, - "/f0": { - "state": "failed", - "snaps_synced": 0, - "snaps_deleted": 0, - "snaps_renamed": 0 + "/f0": { + "peer": { + "a2dc7784-e7a1-4723-b103-03ee8d8768f8": { + "state": "failed", + "snaps_synced": 0, + "snaps_deleted": 0, + "snaps_renamed": 0 + } + } + } } } @@ -454,24 +660,38 @@ In the remote filesystem:: $ ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@360 a2dc7784-e7a1-4723-b103-03ee8d8768f8 { - "/d0": { - "state": "failed", - "failure_reason": "snapshot 'snap2' has invalid metadata", - "last_synced_snap": { - "id": 120, - "name": "snap1", - "sync_duration": 3, - "sync_time_stamp": "274900.558797s" + "metrics": { + "/d0": { + "peer": { + "a2dc7784-e7a1-4723-b103-03ee8d8768f8": { + "state": "failed", + "failure_reason": "snapshot 'snap2' has invalid metadata", + "last_synced_snap": { + "id": 120, + "name": "snap1", + "crawl_duration": "2s", + "datasync_queue_wait_duration": "1s", + "sync_duration": "33s", + "sync_time_stamp": "274900.558797s", + "sync_bytes": "149.94 MiB", + "sync_files": 5000 + }, + "snaps_synced": 2, + "snaps_deleted": 0, + "snaps_renamed": 0 + } + } }, - "snaps_synced": 2, - "snaps_deleted": 0, - "snaps_renamed": 0 - }, - "/f0": { - "state": "failed", - "snaps_synced": 0, - "snaps_deleted": 0, - "snaps_renamed": 0 + "/f0": { + "peer": { + "a2dc7784-e7a1-4723-b103-03ee8d8768f8": { + "state": "failed", + "snaps_synced": 0, + "snaps_deleted": 0, + "snaps_renamed": 0 + } + } + } } } diff --git a/doc/dev/cephfs-mirroring.rst b/doc/dev/cephfs-mirroring.rst index 9fe072967f3..9cba6cce34c 100644 --- a/doc/dev/cephfs-mirroring.rst +++ b/doc/dev/cephfs-mirroring.rst @@ -393,20 +393,35 @@ status. Commands of this kind take the form ``filesystem-name@filesystem-id peer :: { - "/d0": { - "state": "idle", - "last_synced_snap": { - "id": 120, - "name": "snap1", - "sync_duration": 0.079997898999999997, - "sync_time_stamp": "274900.558797s" - }, - "snaps_synced": 2, - "snaps_deleted": 0, - "snaps_renamed": 0 + "metrics": { + "/d0": { + "peer": { + "a2dc7784-e7a1-4723-b103-03ee8d8768f8": { + "state": "idle", + "last_synced_snap": { + "id": 120, + "name": "snap1", + "crawl_duration": "2s", + "datasync_queue_wait_duration": "1s", + "sync_duration": "33s", + "sync_time_stamp": "274900.558797s", + "sync_bytes": "149.94 MiB", + "sync_files": 5000 + }, + "snaps_synced": 2, + "snaps_deleted": 0, + "snaps_renamed": 0 + } + } + } } } +Several fields in the status output are formatted for readability rather than reported as raw +numbers. See :ref:`Value formatting ` in +:doc:`/cephfs/cephfs-mirroring` for duration, data size, throughput, percentage, count, and +timestamp formats. + Synchronization stats such as ``snaps_synced``, ``snaps_deleted`` and ``snaps_renamed`` are reset when the daemon is restarted or (when multiple mirror daemons are deployed), when a directory is reassigned to another mirror @@ -419,6 +434,49 @@ A directory can be in one of the following states:: - `syncing`: The directory is currently being synchronized - `failed`: The directory has hit upper limit of consecutive failures +:: + + { + "metrics": { + "/d0": { + "peer": { + "a2dc7784-e7a1-4723-b103-03ee8d8768f8": { + "state": "syncing", + "current_syncing_snap": { + "id": 121, + "name": "snap2", + "sync-mode": "full", + "avg_read_throughput_bytes": "13.03 MiB/s", + "avg_write_throughput_bytes": "24.24 MiB/s", + "crawl": { + "state": "completed", + "duration": "2s" + }, + "datasync_queue_wait": { + "state": "complete", + "duration": "1s" + }, + "bytes": { + "sync_bytes": "60.40 MiB", + "total_bytes": "149.94 MiB", + "sync_percent": "40.29%" + }, + "files": { + "sync_files": 2013, + "total_files": 5000, + "sync_percent": "40.26%" + }, + "eta": "7s" + }, + "snaps_synced": 2, + "snaps_deleted": 0, + "snaps_renamed": 0 + } + } + } + } + } + When a directory hits a configured number of consecutive synchronization failures, the mirror daemon marks it as ``failed``. Synchronization for these directories is retried. By default, the number of consecutive failures before a @@ -439,23 +497,37 @@ status: :: { - "/d0": { - "state": "idle", - "last_synced_snap": { - "id": 120, - "name": "snap1", - "sync_duration": 0.079997898999999997, - "sync_time_stamp": "274900.558797s" + "metrics": { + "/d0": { + "peer": { + "a2dc7784-e7a1-4723-b103-03ee8d8768f8": { + "state": "idle", + "last_synced_snap": { + "id": 120, + "name": "snap1", + "crawl_duration": "2s", + "datasync_queue_wait_duration": "1s", + "sync_duration": "33s", + "sync_time_stamp": "274900.558797s", + "sync_bytes": "149.94 MiB", + "sync_files": 5000 + }, + "snaps_synced": 2, + "snaps_deleted": 0, + "snaps_renamed": 0 + } + } }, - "snaps_synced": 2, - "snaps_deleted": 0, - "snaps_renamed": 0 - }, - "/f0": { - "state": "failed", - "snaps_synced": 0, - "snaps_deleted": 0, - "snaps_renamed": 0 + "/f0": { + "peer": { + "a2dc7784-e7a1-4723-b103-03ee8d8768f8": { + "state": "failed", + "snaps_synced": 0, + "snaps_deleted": 0, + "snaps_renamed": 0 + } + } + } } }