When wait is mopping up connections it may hit one that
is in the process of accepting. It will unregister it
whilst the accept() thread is trying to set it up,
aborting the accept and getting it reaped. However,
the pipe mop-up does not clear_pipe() the way that
mark_down(), mark_down_all(), and fault() do, which
leads to this assert.
Pipe is accepting...
-161> 2016-12-22 17:31:45.460613
37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/
146098963 pipe(0x3e2a5c20 sd=31 :6804 s=0 pgs=0 cs=0 l=1 c=0x3e2a6f40).accept: setting up session_security.
-160> 2016-12-22 17:31:45.460733
37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/
146098963 pipe(0x3e2a5c20 sd=31 :6804 s=0 pgs=0 cs=0 l=1 c=0x3e2a6f40).accept new session
-159> 2016-12-22 17:31:45.460846
37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/
146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).accept success, connect_seq = 1, sending READY
-158> 2016-12-22 17:31:45.460959
37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/
146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).accept features
1152921504336314367
wait() is shutting down...
-156> 2016-12-22 17:31:45.461882
9506ac0 20 -- 172.21.15.14:6804/20738 wait: stopping accepter thread
-155> 2016-12-22 17:31:45.462994
9506ac0 10 accepter.stop accept listening on: 15
...
-116> 2016-12-22 17:31:45.482137
9506ac0 10 -- 172.21.15.14:6804/20738 wait: closing pipes
-115> 2016-12-22 17:31:45.482850
9506ac0 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/
146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).unregister_pipe
-114> 2016-12-22 17:31:45.483421
9506ac0 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/
146098963 pipe(0x3e2a5c20 sd=31 :6804 s=2 pgs=7 cs=1 l=1 c=0x3e2a6f40).stop
...which interrupts the accept()...
-113> 2016-12-22 17:31:45.484164
37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/
146098963 pipe(0x3e2a5c20 sd=31 :6804 s=4 pgs=7 cs=1 l=1 c=0x3e2a6f40).accept fault after register
and makes accept() return failure, and reader() to exit
and reap...
-110> 2016-12-22 17:31:45.486103
9506ac0 10 -- 172.21.15.14:6804/20738 wait: waiting for pipes 0x3e2a5c20 to close
-109> 2016-12-22 17:31:45.487146
37353700 10 -- 172.21.15.14:6804/20738 queue_reap 0x3e2a5c20
-108> 2016-12-22 17:31:45.487658
9506ac0 10 -- 172.21.15.14:6804/20738 reaper
-107> 2016-12-22 17:31:45.487722
9506ac0 10 -- 172.21.15.14:6804/20738 reaper reaping pipe 0x3e2a5c20 172.21.15.35:0/
146098963
-106> 2016-12-22 17:31:45.487816
9506ac0 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/
146098963 pipe(0x3e2a5c20 sd=31 :6804 s=4 pgs=7 cs=1 l=1 c=0x3e2a6f40).discard_queue
-105> 2016-12-22 17:31:45.494742
37353700 10 -- 172.21.15.14:6804/20738 >> 172.21.15.35:0/
146098963 pipe(0x3e2a5c20 sd=31 :6804 s=4 pgs=7 cs=1 l=1 c=0x3e2a6f40).reader done
...
-92> 2016-12-22 17:31:45.527589
9506ac0 -1 /mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/
11.1.0-6151-ge1781dd/rpm/el7/BUILD/
ceph-11.1.0-6151-ge1781dd/src/msg/simple/SimpleMessenger.cc: In function 'void SimpleMessenger::reaper()' thread
9506ac0 time 2016-12-22 17:31:45.488264
/mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/
11.1.0-6151-ge1781dd/rpm/el7/BUILD/
ceph-11.1.0-6151-ge1781dd/src/msg/simple/SimpleMessenger.cc: 235: FAILED assert(!cleared)
Fixes: http://tracker.ceph.com/issues/15784
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit
948f97b3bdd39269a38277238a61f24e5fec6196)