]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log
ceph.git
2 weeks agoosd: reset_tp_timeout should reset timeout for all shards
Bill Scales [Mon, 24 Nov 2025 09:18:21 +0000 (09:18 +0000)]
osd: reset_tp_timeout should reset timeout for all shards

ShardedThreadPools are only used by the classic OSD process
which can have more than one thread for the same shard. Each
thread has a heartbeat timeout used to detect stalled threads.
Some code that is known to take a long time makes calls to
reset_tp_timeout to reset this timeout. However for sharded
pools this can be ineffective because it is common for threads
for the same shard to use the same locks (e.g. PG Lock) and
therefore if thread A is taking a long time and resetting
its timeout while holding a lock, thread B for the same shard
is liable to be waiting for the same lock, will not be
resetting its timeout and can be timed out.

Debug for issue 72879 showed heartbeat timeouts occurring at
the same time for both shards, an attempt to fix the problem
by calling reset_tp_timeout for the slow thread still showed
the other threads for the shard timing out waiting for the PG
lock that was held bythe slow thread. Looking at the OSD code
most places where reset_tp_timeout is called the thread is
holding the PG lock.

This commit moves the concept of shard_index from OSD into
ShardedThreadPool and modifies reset_tp_timeout so that it resets
the timeout for all threads for the same shard.

Some code calls reset_tp_timeout from inside loops that can take
a long time without consideration for how long the thread has
actually been running for. There is a risk that this type of
call could repeatedly reset the timeout for another shard which
is genuinely stuck and hence defeat the heartbeat checks. To
prevent this reset_tp_timeout is modified to be a NOP unless
the thread has been processing the current workitem for more
than 0.5 seconds. Therefore threads have to be slow but making
forward progress to be abe to reset the timeout.

Fixes: https://tracker.ceph.com/issues/72879
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
3 weeks agoMerge pull request #66372 from tchaikov/wip-qa-encoder-exclude
Kefu Chai [Mon, 24 Nov 2025 08:27:14 +0000 (16:27 +0800)]
Merge pull request #66372 from tchaikov/wip-qa-encoder-exclude

qa/suites/rados/encoder: exclude ceph-osd-classic when installing LTS…

Reviewed-by: Matan Breizman <mbreizma@ibm.com>
3 weeks agoqa/suites/rados/encoder: exclude ceph-osd-* when installing LTS releases 66372/head
Kefu Chai [Sat, 22 Nov 2025 00:24:36 +0000 (08:24 +0800)]
qa/suites/rados/encoder: exclude ceph-osd-* when installing LTS releases

In a37b5b5, the ceph-osd-classic and ceph-osd-crimson packages were
added to qa/packages/packages.yaml. The "install" task uses this file as
the default package list for all branches, including LTS releases like
Reef.

However, a37b5b5 only exists in the main branch and won't be backported
to LTS branches. This causes installation failures in the rados/encoder
test suite, which verifies forward compatibility by installing LTS
releases and testing whether they can decode the latest corpus.

Exclude ceph-osd-classic and ceph-osd-crimson from LTS installations to
ensure the test suite can successfully install ceph-dencoder, which is
required for the interoperability tests.

Fixes: https://tracker.ceph.com/issues/73957
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
3 weeks agoMerge pull request #66293 from anthonyeleven/instore.dbnoonecanhearyouscream
Anthony D'Atri [Mon, 24 Nov 2025 06:07:04 +0000 (01:07 -0500)]
Merge pull request #66293 from anthonyeleven/instore.dbnoonecanhearyouscream

doc: Improve start/hardware-recommendations.rst

3 weeks agoMerge pull request #65995 from pcuzner/rocksdb_compaction_metric
Laura Flores [Sat, 22 Nov 2025 00:04:21 +0000 (18:04 -0600)]
Merge pull request #65995 from pcuzner/rocksdb_compaction_metric

rados/osd: enable compact_running perfcounter at PRIO=5

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Neha Ojha <nojha@ibm.com>
3 weeks agodoc: Improve start/hardware-recommendations.rst 66293/head
Anthony D'Atri [Mon, 17 Nov 2025 17:57:29 +0000 (12:57 -0500)]
doc: Improve start/hardware-recommendations.rst

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
4 weeks agoMerge pull request #66179 from rhcs-dashboard/73766-remove-subalerts-detail
afreen23 [Mon, 17 Nov 2025 09:52:17 +0000 (15:22 +0530)]
Merge pull request #66179 from rhcs-dashboard/73766-remove-subalerts-detail

mgr/dashboard : Remove subalerts details for multiple subalerts

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Ankush Behl <cloudbehl@gmail.com>