git-server-git.apps.pok.os.sepia.ceph.com Git

author	Redouane Kachach <rkachach@ibm.com>
	Wed, 15 Apr 2026 16:05:36 +0000 (18:05 +0200)
committer	Redouane Kachach <rkachach@ibm.com>
	Thu, 16 Apr 2026 08:05:11 +0000 (10:05 +0200)
commit	7b2a9141053fd639138ab5bc1db09c32e354cf12
tree	f5bee9e45ff4c80b21e517ba3e6ac9b7f3ecf01f	tree \| snapshot
parent	4f9a9b5f2d479e16e6b0ff058fad5954367ce76c	commit \| diff

qa: fix misleading "in cluster log" failures during cluster log scan

Summary:

Fix misleading failure reasons reported as `"… in cluster log"` when
no such log entry actually exists.

The cephadm task currently treats `grep` errors from the cluster log
scan as if they were actual log matches. This can produce bogus
failure summaries when `ceph.log` is missing, especially after early
failures such as image pull or bootstrap problems.

Problem:

first_in_ceph_log() currently:

- returns stdout if a match is found
- otherwise returns stderr

The caller then treats any non-None value as a real cluster log hit and formats it as:

"<value>" in cluster log

That means an error like:

grep: /var/log/ceph/<fsid>/ceph.log: No such file or directory

can be misreported as if it came from the cluster log.

This change makes cluster log scanning robust and accurate by:

- checking whether /var/log/ceph/<fsid>/ceph.log exists before scanning
- using check_status=False for the grep pipeline
- treating only stdout as a real log match
- treating stderr as a scan error instead of log content
- avoiding overwrite of a more accurate pre-existing failure_reason
- reporting scan failures separately as cluster log scan failed

Fixes: https://tracker.ceph.com/issues/76051
Signed-off-by: Redouane Kachach <rkachach@ibm.com>