]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
msgr: more conservative locking, thread join asserts
authorSage Weil <sage@newdream.net>
Fri, 12 Feb 2010 21:38:38 +0000 (13:38 -0800)
committerSage Weil <sage@newdream.net>
Fri, 12 Feb 2010 21:38:38 +0000 (13:38 -0800)
commit10ae652f0325a19e435685704f90d0b3327abcc6
treee6a1c1256dcd4dba526ae58a6dde8eafb15b7cf7
parentf5209d750c8129c608d7c62d49fe0b1e33a59b6c
msgr: more conservative locking, thread join asserts

We caught a bunch of crashes like this:

10.02.11 17:01:01.600660 7f87070c3950 -- 10.3.14.134:6800/8203 >> 10.3.14.130:6800/18914 pipe(0x7fc2be2cebe0 sd=36 pgs=2409 cs=1 l=0).do_sendmsg error Broken pipe
10.02.11 17:01:01.600700 7f87070c3950 -- 10.3.14.134:6800/8203 >> 10.3.14.130:6800/18914 pipe(0x7fc2be2cebe0 sd=36 pgs=2409 cs=1 l=0).writer error sending 0x7fc27da1c570, 32: Broken pipe
10.02.11 17:01:01.600796 7f87070c3950 -- 10.3.14.134:6800/8203 >> 10.3.14.130:6800/18914 pipe(0x7fc2be2cebe0 sd=-1 pgs=2409 cs=1 l=0).fault initiating reconnect
...
./common/Thread.h: In function 'int Thread::join(void**)':
./common/Thread.h:66: FAILED assert(0)
 1: (Thread::join(void**)+0x73) [0x64fcd3]
 2: (SimpleMessenger::Pipe::join_reader()+0x68) [0x6555a2]
 3: (SimpleMessenger::Pipe::connect()+0xf5) [0x645be9]
 4: (SimpleMessenger::Pipe::writer()+0x157) [0x64793d]
 5: (SimpleMessenger::Pipe::Writer::entry()+0x19) [0x63e107]
 6: (Thread::_entry_func(void*)+0x20) [0x64e816]
 7: /lib/libpthread.so.0 [0x7fc2c3bbdfc7]
 8: (clone()+0x6d) [0x7fc2c2e005ad]

that look a bit like multiple procs were racing into
join_reader().  Add an assert to catch that if it happens again,
and also wrap thread starts in pipe_lock to ensure we keep the
_running flags in sync with reality.  Add in a few other
sanity checks too.
src/msg/SimpleMessenger.cc
src/msg/SimpleMessenger.h