Commit
918c12c2ab5d ("monclient: avoid key renew storm on clock skew")
made wait_auth_rotating() wait for a key set with a valid "current" key
(instead of any key set, including with all keys expired if the clocks
are skewed). While a good idea in general, this is a bit too stringent
because the monitors will hand out key sets with "current" key that is
_just_ about to expire. There is nothing wrong with that as "next" key
is also there, valid for the entire auth_service_ticket_ttl. So even
if the daemon is talking to the leader, it is possible to get a key set
with an expired "current" key. If the daemon is talking to a peon, it
is pretty easy to run into in practice. This, coupled with the fact
that _check_auth_rotating() explicitly allows the keys to go slightly
out of date, can lead to wait_auth_rotating() stalling the boot for up
to 30 seconds:
15:41:11.824+0000 1 ... ==== auth_reply(proto 2 0 (0) Success)
15:41:41.824+0000 0 monclient: wait_auth_rotating timed out after 30
15:41:41.824+0000 -1 mds.b unable to obtain rotating service keys; retrying
Apply the same 30 second or less tolerance in wait_auth_rotating().
Fixes: https://tracker.ceph.com/issues/50390
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
6160ed75fcc2a648da4b696fd0ec20b95c4a0a61)
Conflicts:
src/mon/MonClient.cc [ commit
85157d5aae3d ("mon:
s/Mutex/ceph::mutex/") not in nautilus ]
{
std::lock_guard l(monc_lock);
utime_t now = ceph_clock_now();
+ utime_t cutoff = now;
+ cutoff -= std::min(30.0, cct->_conf->auth_service_ticket_ttl / 4.0);
utime_t until = now;
until += timeout;
return 0;
while (auth_principal_needs_rotating_keys(entity_name) &&
- rotating_secrets->need_new_secrets(now)) {
+ rotating_secrets->need_new_secrets(cutoff)) {
if (now >= until) {
ldout(cct, 0) << __func__ << " timed out after " << timeout << dendl;
return -ETIMEDOUT;
ldout(cct, 10) << __func__ << " waiting (until " << until << ")" << dendl;
auth_cond.WaitUntil(monc_lock, until);
now = ceph_clock_now();
+ cutoff = now;
+ cutoff -= std::min(30.0, cct->_conf->auth_service_ticket_ttl / 4.0);
}
ldout(cct, 10) << __func__ << " done" << dendl;
return 0;