Introduce a 'last_degraded' timestamp to the pg_stat_t structure to track
the initial point of redundancy loss. This field, used in conjunction
with 'last_clean', allows the manager to calculate a cluster-wide
durability score by measuring the duration of vulnerability windows.
Changes:
1) Add last_degraded (utime_t) to pg_stat_t in osd_types.h.
2) Increment pg_stat_t encoding version to 31. The decode logic
defaults last_degraded to last_clean for backward compatibility
during rolling upgrades.
3) Update operator==, dump(), and generate_test_instances() to
support ceph-dencoder testing and JSON output.
4) Implement latching logic in PeeringState::prepare_stats_for_publish():
- A PG is considered vulnerable if in DEGRADED or UNDERSIZED state.
- last_degraded is set to 'now' only if it is <= last_clean,
effectively latching the timestamp to the start of the failure
event until the PG next becomes clean.
5) Standalone tests to verify:
- The last_degraded timestamp latching logic.
- Verify last_degraded timestamp is modified when OSDs are marked 'out' for
draining purposes in which case PGs are marked undersized.
6) Release note the addition of 'last_degraded' field to PG stats.