]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commit
osd: add last_degraded field to pg_stat_t 68910/head
authorSridhar Seshasayee <sridhar.seshasayee@ibm.com>
Wed, 6 May 2026 15:11:33 +0000 (20:41 +0530)
committerSridhar Seshasayee <sridhar.seshasayee@ibm.com>
Thu, 4 Jun 2026 14:14:56 +0000 (19:44 +0530)
commit252d14923858b6695dad4a7d70f70ed3881abd28
tree84dd7033716cb49605dc7cb35e95b538b44e965e
parent32435add6799aa62a52370530aae84b8a4a2c8c6
osd: add last_degraded field to pg_stat_t

Introduce a 'last_degraded' timestamp to the pg_stat_t structure to track
the initial point of redundancy loss. This field, used in conjunction
with 'last_clean', allows the manager to calculate a cluster-wide
durability score by measuring the duration of vulnerability windows.

Changes:
1) Add last_degraded (utime_t) to pg_stat_t in osd_types.h.
2) Increment pg_stat_t encoding version to 31. The decode logic
   defaults last_degraded to last_clean for backward compatibility
   during rolling upgrades.
3) Update operator==, dump(), and generate_test_instances() to
   support ceph-dencoder testing and JSON output.
4) Implement latching logic in PeeringState::prepare_stats_for_publish():
   - A PG is considered vulnerable if in DEGRADED or UNDERSIZED state.
   - last_degraded is set to 'now' only if it is <= last_clean,
     effectively latching the timestamp to the start of the failure
     event until the PG next becomes clean.
5) Standalone tests to verify:
   - The last_degraded timestamp latching logic.
   - Verify last_degraded timestamp is modified when OSDs are marked 'out' for
     draining purposes in which case PGs are marked undersized.
6) Release note the addition of 'last_degraded' field to PG stats.

Fixes: https://tracker.ceph.com/issues/76604
Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>
PendingReleaseNotes
qa/standalone/osd/osd-recovery-stats.sh
src/osd/PeeringState.cc
src/osd/osd_types.cc
src/osd/osd_types.h