From: Matthew Vernon Date: Wed, 19 Sep 2018 12:26:26 +0000 (+0100) Subject: restart_osd_daemon.sh.j2 - consider active+clean+* pgs as OK X-Git-Tag: v3.2.0beta3~17 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=04f4991648568e079f19f8e531a11a5fddd45c87;p=ceph-ansible.git restart_osd_daemon.sh.j2 - consider active+clean+* pgs as OK After restarting each OSD, restart_osd_daemon.sh checks that the cluster is in a good state before moving on to the next one. One of the checks it does is that the number of pgs in the state "active+clean" is equal to the total number of pgs in the cluster. On large clusters (e.g. we have 173,696 pgs), it is likely that at least one pg will be scrubbing and/or deep-scrubbing at any one time. These pgs are in state "active+clean+scrubbing" or "active+clean+scrubbing+deep", so the script was erroneously not including them in the "good" count. Similar concerns apply to "active+clean+snaptrim" and "active+clean+snaptrim_wait". Fix this by considering as good any pg whose state contains active+clean. Do this as an integer comparison to num_pgs in pgmap. (could this be backported to at least stable-3.0 please?) Closes: #2008 Signed-off-by: Matthew Vernon --- diff --git a/roles/ceph-defaults/templates/restart_osd_daemon.sh.j2 b/roles/ceph-defaults/templates/restart_osd_daemon.sh.j2 index 1d9db15b7..5aa3b714d 100644 --- a/roles/ceph-defaults/templates/restart_osd_daemon.sh.j2 +++ b/roles/ceph-defaults/templates/restart_osd_daemon.sh.j2 @@ -9,7 +9,7 @@ check_pgs() { return 0 fi while [ $RETRIES -ne 0 ]; do - test "[""$($docker_exec ceph $CEPH_CLI -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["pgmap"]["num_pgs"])')""]" = "$($docker_exec ceph $CEPH_CLI -s -f json | python -c 'import sys, json; print [ i["count"] for i in json.load(sys.stdin)["pgmap"]["pgs_by_state"] if i["state_name"] == "active+clean"]')" + test "$($docker_exec ceph $CEPH_CLI -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["pgmap"]["num_pgs"])')" -eq "$($docker_exec ceph $CEPH_CLI -s -f json | python -c 'import sys, json; print sum ( [ i["count"] for i in json.load(sys.stdin)["pgmap"]["pgs_by_state"] if "active+clean" in i["state_name"]])')" RET=$? test $RET -eq 0 && return 0 sleep $DELAY