If we post an rx buffer and there is a timeout, the revocation can happen
while the reader has consumed the buffers but before it has decoded and
constructed the message. In particular, we calculate a crc32c over the
data portion of the message after we've taken the buffers and dropped the
lock.
Instead of fixing this race (for example, by reverifying rx_buffers under
the lock while calculating the crc.. bleh), just skip the rx buffer
optimization entirely when a timeout is present.
Note that this doesn't cover the op_cancel() paths, but none of those users
provide static buffers to read into.
Fixes: #9582
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
ldout(cct, 20) << " revoking rx buffer for " << op->tid << " on " << op->con << dendl;
op->con->revoke_rx_buffer(op->tid);
}
- if (op->outbl && op->outbl->length()) {
+ if (op->outbl &&
+ op->ontimeout == NULL && // only post rx_buffer if no timeout; see #9582
+ op->outbl->length()) {
ldout(cct, 20) << " posting rx buffer for " << op->tid << " on " << con << dendl;
op->con = con;
op->con->post_rx_buffer(op->tid, *op->outbl);