Tommi Virtanen [Tue, 29 Mar 2011 16:21:09 +0000 (09:21 -0700)]
common: Make armor.h safe to use from C.
mount.ceph needs to base64-decode the secrets, so we can get rid of
the kernel-side base64 decode, but it doesn't need all of common lib.
And it is written in C.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Tommi Virtanen [Tue, 29 Mar 2011 00:32:24 +0000 (17:32 -0700)]
mount.ceph: Modprobe ceph before trying the mount.
This will be needed for the next few commits, where we try to load the
keys into the kernel; without ceph.ko loaded, the key type will not be
recognized.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Tommi Virtanen [Mon, 28 Mar 2011 22:45:45 +0000 (15:45 -0700)]
vstart.sh: Filter out IPv6 and localhost IP addresses.
On e.g. Ubuntu 10.10, hostname --ip-address outputs something
like "::1 10.1.2.3 127.0.1.1", and this makes the generated
config be invalid. Get rid of the entries we can't use.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Since NULL is really just a macro defined to be 0, we must use
(char*)NULL or similar to force the compiler to use a true pointer value
as the last argument to the run_cmd varargs function. Otherwise, the 0
gets promoted to an int, which probably is not the same length as a
pointer these days (32 vs. 64.)
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Fri, 25 Mar 2011 19:37:48 +0000 (12:37 -0700)]
journaler: remove ack/safe distinction
Rip out old complexity to _only_ pay attention to when data is safely
committed on disk. No more ack/safe distinction or ack_barrier complexity
(to preserve ordered with some submissions waiting on ack and some safe).
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 25 Mar 2011 16:51:53 +0000 (09:51 -0700)]
journaler: issue separate reads per period
This lets us potentially digest any read data as soon as possible. Before
the Filer would issue a string of reads and we'd only get the data back
once _all_ those objects were read.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 25 Mar 2011 16:30:16 +0000 (09:30 -0700)]
journler: make readahead/prefetch smarter
Always try to prefetch N segments ahead of the current read position. The
old implementation would read a bunch of data, process it all, then read
a bunch more. This was suboptimal on a couple different levels.
Also, make an internal _is_readable() _not_ do the prefetch step; only do
that for external callers.
Fixes: #929 Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 24 Mar 2011 03:58:03 +0000 (20:58 -0700)]
mds: remove mds_log_unsafe mode
The mds_log_unsafe mode would wait for ack for some journal writes, and
safe for others. Now that we can reply to client requests without waiting
for the journal to flush (as of ~2 years ago), this distinction is no
longer useful. It is also more error-prone, as it complicates the code
and vastly expands the possible combinations of MDS failures and replay
scenarios we need to verify.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 24 Mar 2011 00:17:44 +0000 (17:17 -0700)]
mds: reimplement laggy
The goal is for the MDS to stop processing requests when it hasn't heard
from the monitors, to avoid a situation where a rogue process goes off
doing its own thing. Yes, if we fail it over the cmds can't write to the
object store, but it can reply to clients when it may not be appropriate
or good to do so.
The old logic was fragile and wonky, with messages getting deferred, and
then re-deferred. This implementation is much cleaner and should be much
more efficient and less fragile. There are still improvements to be made
as far as which messages we do/do not process when we think we're laggy.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 24 Mar 2011 03:37:04 +0000 (20:37 -0700)]
mds: skip redundant flush before journal segment trim
Back in olden times when we would would wait for acks for some journal
writes, we did an extra wait_for_safe() before discarding a journal segment
to make sure anything being discarded was safely committed in newers
segments. These days mds_log_unsafe is always false (and
journaler_safe is true), so we can skip this check.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 22 Mar 2011 04:38:36 +0000 (21:38 -0700)]
osd: factor pg get-or-create code into common helper
handle_pg_notify and _process_pg_info both lookup or create a PG based
on an incoming message. Factor that code into a common helper. There
were a few differences in that the pg notify handler code deals with
more cases (namely, pg creation), but this is harmless for the more
general _process_pg_info caller.
Closes: #577 Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Samuel Just [Tue, 22 Mar 2011 21:52:15 +0000 (14:52 -0700)]
FileStore: replace op_queue_throttle with op_queue_reserve_throttle
Previously, queue_op would call op_queue_throttle while holding the
journal_lock. op_queue_throttle, however, can sleep.
We fix the problem by:
1) Factor build_op out of queue_op
2) op_queue_throttle is now op_queue_reserve_throttle and takes an op as
an argument. op_queue_reserve_throttle can be called before the journal
lock is taken. This also avoids the race between calling throttle and
incrementing op_queue_bytes and op_queue_len.
3) queue_op now takes the op generated using build_op as an argument.
4) _journaled_ahead no longer needs to call throttle as
queue_transactions has already reserved space.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>