Sage Weil [Sun, 23 Jun 2013 15:52:46 +0000 (08:52 -0700)]
mon: do not leak no_reply messages
I think I assumed no_reply() was releasing the references, but it is
not. Which is better, since send_reply() doesn't either. Fix the leaks
by dropping the message ref explicitly.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
Sage Weil [Thu, 20 Jun 2013 18:11:50 +0000 (11:11 -0700)]
mon: make 'log ...' command wait for commit before reply
Previously we would just dump the command argument to our local log client
and reply immediately, which could lose the message if we then restarted.
Instead, commit directly and wait before replying.
Also, log as the actual client, not as the monitor processing the message.
Fixes: #5409 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Samuel Just [Thu, 20 Jun 2013 01:57:05 +0000 (18:57 -0700)]
FileStore: apply changes after disabling m_filestore_replica_fadvise
Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit ed8b0e65bde14d0a3a08bc233dee6a997e379dcc)
ceph-disk: make list_partition behave with unusual device names
When you get device names like sdaa you do not want to mistakenly conclude that
sdaa is a partition of sda. Use /sys/block/$device/$partition existence
instead.
mon: Monitor: make sure we backup a monmap during sync start
First of all, we must find a monmap to backup. The newest version.
Secondly, we must make sure we back it up before clearing the store.
Finally, we must make sure that we don't remove said backup while
clearing the store; otherwise, we would be out of a backup monmap if the
sync happened to fail (and if the monitor happened to be killed before a
new sync had finished).
This patch makes sure these conditions are met.
Fixes: #5256 (partially)
Backport: cuttlefish
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
mon: Monitor: obtain latest monmap on sync store init
Always use the highest version amongst all the typically available
monmaps: whatever we have in memory, whatever we have under the
MonmapMonitor's store, and whatever we have backed up from a previous
sync. This ensures we always use the newest version we came across
with.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
mon: Monitor: don't remove 'mon_sync' when clearing the store during abort
Otherwise, we will end up losing the monmap we backed up when we started
the sync, and the monitor may be unable to start if it is killed or
crashes in-between the sync abort and finishing a new sync.
Fixes: #5256 (partially)
Backport: cuttlefish
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 18 Jun 2013 03:32:15 +0000 (20:32 -0700)]
common/Preforker: fix warning
common/Preforker.h: In member function ‘int Preforker::signal_exit(int)’:
warning: common/Preforker.h:82:45: ignoring return value of ‘ssize_t safe_write(int, const void*, size_t)’, declared with attribute warn_unused_result [-Wunused-result]
This is harder than it should be to fix. :(
http://stackoverflow.com/questions/3614691/casting-to-void-doesnt-remove-warn-unused-result-error
Whatever, I guess we can do something useful with this return value.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com>
Sage Weil [Tue, 18 Jun 2013 03:28:24 +0000 (20:28 -0700)]
client: fix warning
client/Client.cc: In member function 'virtual void Client::ms_handle_remote_reset(Connection*)':
warning: client/Client.cc:7892:9: enumeration value 'STATE_NEW' not handled in switch [-Wswitch]
warning: client/Client.cc:7892:9: enumeration value 'STATE_OPEN' not handled in switch [-Wswitch]
warning: client/Client.cc:7892:9: enumeration value 'STATE_CLOSED' not handled in switch [-Wswitch]
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com>
Sage Weil [Sun, 16 Jun 2013 03:06:33 +0000 (20:06 -0700)]
ceph-disk: do not stop activate-all on first failure
Keep going even if we hit one activation error. This avoids failing to
start some disks when only one of them won't start (e.g., because it
doesn't belong to the current cluster).
Yehuda Sadeh [Fri, 14 Jun 2013 21:53:54 +0000 (14:53 -0700)]
rgw: escape prefix correctly when listing objects
Fixes: #5362
When listing objects prefix needs to be escaped correctly (the
same as with the marker). Otherwise listing objects with prefix
that starts with underscore doesn't work.
Backport: bobtail, cuttlefish
Sage Weil [Fri, 14 Jun 2013 22:01:14 +0000 (15:01 -0700)]
ceph.spec: install/uninstall init script
This was commented out almost years ago in commit 9baf5ef4 but it is not
clear to me that it was correct to do so. In any case, we are not
installing the rc.d links for ceph, which means it does not start up after
a reboot.
Sage Weil [Fri, 14 Jun 2013 20:34:40 +0000 (13:34 -0700)]
ceph-disk: add 'activate-all'
Scan /dev/disk/by-parttypeuuid for ceph OSDs and activate them all. This
is useful when the event didn't trigger on the initial udev event for
some reason.
Sage Weil [Mon, 17 Jun 2013 03:13:51 +0000 (20:13 -0700)]
mon: make mark_me_down asserts match check
The OSD may have sent a request where the message source does not match
the target in the message. Verify that the target matches so that it
matches the assert.
Sage Weil [Sun, 16 Jun 2013 20:36:19 +0000 (13:36 -0700)]
ceph: do not print status to output file when talking to old mons
The old cli would send the status message to stdout instead of stderr;
we try to emulate that behavior when talking to old monitors because
they send some useful data to outs instead of the data payload.
However, when outputting to a *file*, the outs would still go to
stdout. Maintain that so that, e.g.,
ceph mon getmap -o /tmp/foo
doesn't prefix the monmap with 'got latest monmap\n'.
Sage Weil [Sat, 15 Jun 2013 15:14:40 +0000 (08:14 -0700)]
common/Preforker: fix broken recursion on exit(3)
If we exit via preforker, call exit(3) and not recursively back into
Preforker::exit(r). Otherwise you get a hang with the child blocked
at:
Thread 1 (Thread 0x7fa08962e7c0 (LWP 5419)):
#0 0x000000309860e0cd in write () from /lib64/libpthread.so.0
#1 0x00000000005cc906 in Preforker::exit(int) ()
#2 0x00000000005c8dfb in main ()
and the parent at
#0 0x000000309860eba7 in waitpid () from /lib64/libpthread.so.0
#1 0x00000000005cc87a in Preforker::parent_wait() ()
#2 0x00000000005c75ae in main ()
Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Sat, 15 Jun 2013 00:30:44 +0000 (17:30 -0700)]
ceph: add newline when using old monitors
The old tool would print a newline after outs, e.g. from 'ceph osd create'.
Do the same when we are talking to old monitors. Also, put outs at the
top, not the bottom!
Tweak the json code to not add the newline again if we already did so
above.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Sage Weil [Fri, 14 Jun 2013 04:56:23 +0000 (21:56 -0700)]
ceph-disk: do not use mount --move (or --bind)
The kernel does not let you mount --move when the parent mount is
shared (see, e.g., https://bugzilla.redhat.com/show_bug.cgi?id=917008
for another person this also confused). We can't use --bind either
since that (on RHEL at least) screws up /etc/mtab so that the final
result looks like
osd.0: debug_ms=1/1
osd.1: debug_ms=1/1
osd.2: Problem getting command descriptions from ('osd', '2'), ENXIO
osd.3: Problem getting command descriptions from ('osd', '3'), ENXIO
osd.4: Problem getting command descriptions from ('osd', '4'), ENXIO
osd.5: Problem getting command descriptions from ('osd', '5'), ENXIO
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Sage Weil [Fri, 14 Jun 2013 18:21:25 +0000 (11:21 -0700)]
upstart: start ceph-all on runlevel [2345]
Starting when only one network interface has started breaks machines with
multiple nics in very problematic ways.
There may be an earlier trigger that we can use for cases where other
services on the local machine depend on ceph, but for now this is better
than the existing behavior.
Sage Weil [Thu, 13 Jun 2013 22:54:58 +0000 (15:54 -0700)]
ceph-disk: implement 'activate-journal'
Activate an osd via its journal device. udev populates its symlinks and
triggers events in an order that is not related to whether the device is
an osd data partition or a journal. That means that triggering
'ceph-disk activate' can happen before the journal (or journal symlink)
is present and then fail.
Similarly, it may be that they are on different disks that are hotplugged
with the journal second.
This can be wired up to the journal partition type to ensure that osds are
started when the journal appears second.
Sage Weil [Wed, 12 Jun 2013 01:35:01 +0000 (18:35 -0700)]
ceph-disk: call partprobe outside of the prepare lock; drop udevadm settle
After we change the final partition type, sgdisk may or may not trigger a
udev event, depending on how well udev is behaving (it varies between
distros, it seems). The old code would often settle and wait for udev to
activate the device, and then partprobe would uselessly fail because it
was already mounted.
Call partprobe only at the very end, after prepare is done. This ensures
that if partprobe calls udevadm settle (which is sometimes does) we do not
get stuck.
Drop the udevadm settle. I'm not sure what this accomplishes; take it out,
at least until we determine we need it.
Sage Weil [Fri, 14 Jun 2013 00:38:02 +0000 (17:38 -0700)]
librados: add missing #include
librados/librados.cc: In function 'int rados_mon_command_target(void*, const char*, const char**, size_t, const char*, size_t, char**, size_t*, char**, size_t*)':
error: librados/librados.cc:1877: 'LONG_MAX' was not declared in this scope
error: librados/librados.cc:1877: 'LONG_MIN' was not declared in this scope
Sage Weil [Thu, 13 Jun 2013 23:39:30 +0000 (16:39 -0700)]
librados: wait for osdmap for commands that need it
In commit 7e1cf87b5158c870e2a118ed6d316be8cb9818ce we stopped waiting for
the osdmap on start because the Objecter will normally wait, but for some
commands we assume the osdmap is recent(ish).
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Sage Weil [Thu, 13 Jun 2013 18:27:49 +0000 (11:27 -0700)]
mon/MonmapMonitor: remove unused label
mon/MonmapMonitor.cc: In member function 'bool MonmapMonitor::preprocess_command(MMonCommand*)':
mon/MonmapMonitor.cc:273:2: warning: label 'out' defined but not used [-Wunused-label]