Sage Weil [Sun, 18 Nov 2012 16:34:35 +0000 (08:34 -0800)]
mon: shutdown async signal handler sooner
Before the mon, and lockdep, in particular.
#0 __pthread_mutex_lock (mutex=0x30) at pthread_mutex_lock.c:50
#1 0x0000000000816092 in ceph::log::Log::submit_entry (this=0x0, e=0x2f4a270) at log/Log.cc:138
#2 0x00000000007ee0f8 in handle_fatal_signal (signum=11) at global/signal_handler.cc:100
#3 <signal handler called>
#4 0x00000000008e1300 in lockdep_will_lock (name=0x959aa7 "SignalHandler::lock", id=17) at common/lockdep.cc:163
#5 0x00000000008867fc in Mutex::_will_lock (this=0x2f20428) at ./common/Mutex.h:56
#6 0x0000000000886605 in Mutex::Lock (this=0x2f20428, no_lockdep=false) at common/Mutex.cc:81
#7 0x00000000007eeb95 in SignalHandler::entry (this=0x2f20300) at global/signal_handler.cc:198
#8 0x00000000008b0bd1 in Thread::_entry_func (arg=0x2f20300) at common/Thread.cc:43
#9 0x00007f36fefd6b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#10 0x00007f36fd80b6dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()
#0 0x00007f36fefd7e75 in pthread_join (threadid=139874129766144, thread_return=0x0) at pthread_join.c:89
#1 0x00000000008b11ec in Thread::join (this=0x2f20300, prval=0x0) at common/Thread.cc:130
#2 0x00000000007eeae7 in SignalHandler::shutdown (this=0x2f20300) at global/signal_handler.cc:186
#3 0x00000000007ee9cf in SignalHandler::~SignalHandler (this=0x2f20300, __in_chrg=<optimized out>) at global/signal_handler.cc:175
#4 0x00000000007eea58 in SignalHandler::~SignalHandler (this=0x2f20300, __in_chrg=<optimized out>) at global/signal_handler.cc:176
#5 0x00000000007ee643 in shutdown_async_signal_handler () at global/signal_handler.cc:324
#6 0x00000000006de9d2 in main (argc=7, argv=0x7fffbfb8a1e8) at ceph_mon.cc:439
Sage Weil [Sun, 4 Nov 2012 16:21:50 +0000 (08:21 -0800)]
mon/MonClient: use thread-safe RNG for picking monitors
Avoid using shared-state rand() when picking monitors. This way we don't
screw with library users like test_librbd_fsx that rely on srand() and
rand() being deterministic.
Sage Weil [Sat, 17 Nov 2012 00:10:30 +0000 (16:10 -0800)]
msg/Accepter: only close socket if >= 0
It is possible for rebind() to fail, in which case the OSD will go through
it's shutdown procedure and call stop(). This is simpler than trying to
avoid calling stop() when rebind() fails.
Fixes: #3504 Signed-off-by: Sage Weil <sage@inktank.com>
Josh Durgin [Fri, 16 Nov 2012 00:20:33 +0000 (16:20 -0800)]
ObjectCacher: fix off-by-one error in split
This error left a completion that should have been attached
to the right BufferHead on the left BufferHead, which would
result in the completion never being called unless the buffers
were merged before it's original read completed. This would cause
a hang in any higher level waiting for a read to complete.
The existing loop went backwards (using a forward iterator),
but stopped when the iterator reached the beginning of the map,
or when a waiter belonged to the left BufferHead.
If the first list of waiters should have been moved to the right
BufferHead, it was skipped because at that point the iterator
was at the beginning of the map, which was the main condition
of the loop.
Restructure the waiters-moving loop to go forward in the map instead,
so it's harder to make an off-by-one error.
Josh Durgin [Fri, 16 Nov 2012 20:26:16 +0000 (12:26 -0800)]
ObjectCacher: retry reads when they are incomplete
Skipping these callbacks when there's a racing write or
a gap in the results causes the original reads they represent
to never be completed. If the read falls within the range
of a BufferHead, retry all waiters no matter what.
Sage Weil [Fri, 16 Nov 2012 22:19:25 +0000 (14:19 -0800)]
common/ceph_argparse: fix malloc failure check
CID 743418 (#1 of 1): Dereference before null check (REVERSE_INULL)
Null-checking "argv" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
Sage Weil [Fri, 16 Nov 2012 22:18:21 +0000 (14:18 -0800)]
mon/MonClient: initialize ptr in ctor
CID 743433 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
At (2): Non-static class member "authorize_handler_registry" is not initialized in this constructor nor in any functions that it calls.
Sage Weil [Fri, 16 Nov 2012 22:11:05 +0000 (14:11 -0800)]
osdc/ObjectCacher: faux use-after-free
CID 743435 (#1 of 1): Use after free (USE_AFTER_FREE)
At (68): Passing freed pointer "rd" as an argument to function "std::basic_ostream<char, std::char_traits<char> >::operator <<(void const *)".
Josh Durgin [Tue, 13 Nov 2012 18:28:32 +0000 (10:28 -0800)]
test: add ObjectCacher stress test that does not use a cluster
Use a fake writeback handler and respond to all requests with -ENOENT.
This tests that all operations will complete, and the cache doesn't
lose waiters or callbacks.
Gary Lowell [Fri, 16 Nov 2012 08:46:41 +0000 (00:46 -0800)]
build: update for boost_thread library.
There is a difference in naming conventions between debian and
rpm based distributions for this library. In configure.ac we
check first for boost_thread-mt, then if it's not found check
for boost_thread. A side effect of the AC_CEHCK_LIB macro is
to add the library to the $LIBS, so the explicit -llibboost_thread
in the Makefile has been removed.
(cherry picked from commit f0c7bb363000037bbf7d58ac6e2d39d0f10200fe)
Gary Lowell [Fri, 16 Nov 2012 08:46:41 +0000 (00:46 -0800)]
build: update for boost_thread library.
There is a difference in naming conventions between debian and
rpm based distributions for this library. In configure.ac we
check first for boost_thread-mt, then if it's not found check
for boost_thread. A side effect of the AC_CEHCK_LIB macro is
to add the library to the $LIBS, so the explicit -llibboost_thread
in the Makefile has been removed.
Sage Weil [Fri, 16 Nov 2012 00:50:39 +0000 (16:50 -0800)]
os/FileStore: only try BTRFS_IOC_SUBVOL_CREATE on btrfs
Only try to create a btrfs subvolume if the fs is btrfs. Otherwise, just
create a directory. Then we can error out on *any* ioctl error, and not
rely on the ioctl error code to determine if we failed because we are on
a non-btrfs or a real error.
Fixes: #3052 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Alex Elder [Thu, 15 Nov 2012 23:51:34 +0000 (17:51 -0600)]
run_xfstests.sh: activate more tests that now work
I've gone through the set of xfstests that were previously found to
not work. Some of those now do work, and with the addition of an
option to pass to "mkfs.xfs" a large number of other tests now
produce expected output as well.
This patch updates the default list of tests to run to reflect
the result of this exercise. The following 50 additional tests
are now run by default:
Sage Weil [Thu, 15 Nov 2012 01:00:57 +0000 (17:00 -0800)]
mon: calculate failed_since relative to message receive time
Instead of looking at the current time we process the message, look at the
receive time. This gives us a more real failure time given that messages
may be requeued.
It doesn't solve the problem when messages are forwarded between monitors
due to an election, but that's ok; this is still a net improvement.
Yehuda Sadeh [Wed, 14 Nov 2012 19:30:34 +0000 (11:30 -0800)]
rgw: relax date format check
Don't try to parse beyond the GMT or UTC. Some clients use
special date formatting. If we end up misparsing the date
it'll fail in the authorization, so don't need to be too
restrictive.