PriorityQueueBase::do_clean() shouldn't remove ClientRec instances which
still have queued requests. Otherwise, very low priority clients might
end up having requests actually lost, which shouldn't be possible.
In the OSD, this resulted in PGRecovery items being lost if queued with
background_best_effort while expanding a cluster. Such items can
legitimately sit in the queue for a long period of time as they
represent background data migration which is allowed to be starved by an
aggressive client workload. Dropping the items broke an assumption in
the OSD that all items enqueued would eventually be dequeued resulting
in resources being leaked.
Fixes: https://tracker.ceph.com/issues/61594
Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit
35cf0f6c1355ff420c431340ac36d9a3c029d6c2)
if (erase_point > 0 || idle_point > 0) {
for (auto i = client_map.begin(); i != client_map.end(); /* empty */) {
auto i2 = i++;
- if (erase_point &&
+ if (!(i2->second->has_request()) &&
+ erase_point &&
erased_num < erase_max &&
i2->second->last_tick <= erase_point) {
delete_from_heaps(i2->second);