mon_tick_interval is 5 seconds by default. monitors update their
rotating keys every mon_tick_interval. before monitors forms a
quorum, the auth requests from clients are put into the wait list.
these requests are re-enqueued once the monitors form a quorum. but
there is a small window of mon_tick_interval, before they are able
to serve the auth requests even after their claim to be able to
server requests. if these re-enqueued requests happen to be served
in this window, and if authx is enabled, they will be greeted with
errors like
handle_auth_bad_method server allowed_methods [2] but i only support [2]
in the case of ceph cli, the error would look like:
[errno 13] RADOS permission denied (error connecting to the cluster)
so, to address this issue, the EACCES error is ignored when waiting
for a quorum.
Signed-off-by: Kefu Chai <kchai@redhat.com>
"""
from functools import wraps
import contextlib
+import errno
import random
import signal
import time
tries=timeout // sleep,
action=f'wait for quorum size {size}') as proceed:
while proceed():
- if len(self.get_mon_quorum()) == size:
- break
+ try:
+ if len(self.get_mon_quorum()) == size:
+ break
+ except CommandFailedError as e:
+ # could fail instea4d of blocked if the rotating key of the
+ # connected monitor is not updated yet after they form the
+ # quorum
+ if e.exitstatus == errno.EACCES:
+ pass
+ else:
+ raise
self.log("quorum is size %d" % size)
def get_mon_health(self, debug=False):