Sage Weil [Mon, 15 Feb 2010 21:47:41 +0000 (13:47 -0800)]
mds: infer 'follows' in journal_dirty_inode on non-head inodes
There are lots of callers to journal_dirty_inode that may
unwittingly be dealing with a non-head inode (e.g.
check_file_max). If the provided inode is snapped, infer an
appropriate follows values so as not to cow_inode() again.
Sage Weil [Fri, 12 Feb 2010 22:45:02 +0000 (14:45 -0800)]
osd: fix recovery requeue race
If a recovery op finished right as another recovery op was
begin started, we could get into start_recovery_ops() and get
max = 0 and not start anything. Since the PG wasn't being
requeued for later, it would never recover. So, requeue if we
race and get max == 0.
that look a bit like multiple procs were racing into
join_reader(). Add an assert to catch that if it happens again,
and also wrap thread starts in pipe_lock to ensure we keep the
_running flags in sync with reality. Add in a few other
sanity checks too.
Sage Weil [Fri, 12 Feb 2010 21:35:57 +0000 (13:35 -0800)]
mon: note mds beacon times more carefully
We need to update the beacon timestamp even when we are updating
the mds state. Otherwise we can get caught in a busy loop
between marking an mds laggy and !laggy because the beacon stamp
never updates.
So even if we are updating, and the reply will be slow, update
our timestamp, so we don't mark the mds laggy.
Sage Weil [Fri, 12 Feb 2010 21:27:49 +0000 (13:27 -0800)]
osd: bail out of interval loop completely
We're going backwards, so once this test fails, it always fails,
and we can break instead of continue. Any skipped intervals will
be pruned shortly anyway.
Sage Weil [Fri, 12 Feb 2010 21:26:19 +0000 (13:26 -0800)]
osd: always update up_thru if pg changes before going active
We already required this if prior PG members were down, so this
affected the 'failure' case. We now also require it for
non-failure PG changes (expansion, migration).
This fixes our maybe_went_rw calculation for prior PG intervals,
which is based on up_thru. If maybe_went_rw is false when the
pg actually went rw, we can lose (and have lost) data. But it is
not practical to calculate without up_thru being consistently
updated, because determining whether a pg would have been able to
go active depends on knowing last_epoch_started at a previous
point in time, which then determines how many prior intervals
may have been considered, which in turn determines whether
up_thru would have been updated, etc. Much simpler to update it
all the time.
This should not impose a significantly greater cost, since we
already need it for the failure case. And in general the
migration/expansion/whatever case is no more common nor critical
than the failure case.
Sage Weil [Tue, 9 Feb 2010 18:27:08 +0000 (10:27 -0800)]
init-ceph: Required-start: $remote_fs
This ensures /usr is mounted before ceph daemons start. It seems like
this may be problematic for hosts that act as both servers and clients,
but nfs-kernel-server does the same, so whatev!
Josef Bacik [Tue, 9 Feb 2010 16:24:23 +0000 (08:24 -0800)]
ceph: fix manpages so they are only installed once
While creating a spec file for CEPH, rpmbuild was complaining because make
install was copying the manpages in, and then copying them in again. This is
because man_MANS and dist_man_MANS are supposed to be two seperate lists that do
not overlap. So make install would install all the man pages in the man_MANS
list and the dist_man_MANS list. This patch kills the dist_man_MANS thing to
keep this from happening. This made rpmbuild happy, which makes me happy :).
Thanks,
Sage Weil [Tue, 9 Feb 2010 04:29:09 +0000 (20:29 -0800)]
osd: store local osd magic, whoami, and other static bits outside of ObjectStore
These values are immutable, and we also want to look at them prior to
forking and 'mounting' the ObjectStore. Just keep them in separate files
for simplicity.
This avoids the double filestore startup cost paid on cosd startup.
Sage Weil [Mon, 1 Feb 2010 23:44:26 +0000 (15:44 -0800)]
journal: make wrapping simpler
Take out weirdness that tries to keep journal items contiguous. No reason
not to split them across the end/beginning of the journal. In the general
case, this is the same # of seeks because we have to rewrite the header
anyway.
Yehuda Sadeh [Tue, 2 Feb 2010 00:03:51 +0000 (16:03 -0800)]
truncate: don't write beyong truncation with old trunc seq
In a scenario where a truncation that followed a write got to
the osd before the preceding write, we shouldn't write beyond
that truncation when the write is handled in the osd.