git.apps.os.sepia.ceph.com Git

]> git.apps.os.sepia.ceph.com Git - ceph.git/log

projects / ceph.git / log

commit | commitdiff | tree

Sage Weil [Mon, 17 Nov 2008 20:57:11 +0000 (12:57 -0800)]

cmonctl: pick new mon on timeout

commit | commitdiff | tree

Sage Weil [Mon, 17 Nov 2008 20:54:53 +0000 (12:54 -0800)]

mon: fix get_latest

commit | commitdiff | tree

Sage Weil [Mon, 17 Nov 2008 22:42:30 +0000 (14:42 -0800)]

msg: non-destructively copy data buffers in set_data()

commit | commitdiff | tree

Sage Weil [Mon, 17 Nov 2008 21:23:03 +0000 (13:23 -0800)]

mds: use last_sent (not last_open) to untangle cap release races

If we use last_open, the client has to be smart about ignoring
MDS revocations after it sends a release request.  (Or, the MDS has
to somehow know the ack is for an old cap.)  Instead, just
serialize release over all cap messages sent to the client.  It may
make for a slightly chattier cap release in some cases, but those
cases should be very rare, and this is simpler.

commit | commitdiff | tree

Sage Weil [Sat, 15 Nov 2008 00:54:32 +0000 (16:54 -0800)]

mds: be more forgiving on EPurgeFinish

Inode may not be in cache because of purge_stray() avoiding
journaling it's intent to purge. If that changes down the line,
add the assertion back.

commit | commitdiff | tree

Sage Weil [Mon, 17 Nov 2008 18:47:30 +0000 (10:47 -0800)]

osd: adjust build_prior any_up logic

We mark the pg 'down' unless there is at least one osd alive specifically
from the last epoch started.

commit | commitdiff | tree

Yehuda Sadeh [Mon, 17 Nov 2008 18:38:16 +0000 (10:38 -0800)]

kclient: silence down some warning

commit | commitdiff | tree

Yehuda Sadeh [Mon, 17 Nov 2008 18:37:30 +0000 (10:37 -0800)]

mds: fix an erroneous assertion (sage)

commit | commitdiff | tree

Sage Weil [Mon, 17 Nov 2008 17:15:44 +0000 (09:15 -0800)]

osd todos

commit | commitdiff | tree

Sage Weil [Mon, 17 Nov 2008 17:13:08 +0000 (09:13 -0800)]

osd: fix deadlock on map_lock vs peer_stat_lock

commit | commitdiff | tree

Sage Weil [Mon, 17 Nov 2008 17:03:47 +0000 (09:03 -0800)]

osd: remember past intervals instead of recalculating each time

This _vastly_ improves the speed of build_prior (and thus activate_map).
There is no need to recalculate this information each time as it is fully
dependent on _old_ OSDMaps, not current cluster state.

commit | commitdiff | tree

Sage Weil [Mon, 17 Nov 2008 17:01:30 +0000 (09:01 -0800)]

msgr: adjust mark_down locking to avoid possible race

commit | commitdiff | tree

Sage Weil [Mon, 17 Nov 2008 17:01:03 +0000 (09:01 -0800)]

cmonctl: reprobe every second

commit | commitdiff | tree

Sage Weil [Sat, 15 Nov 2008 05:14:13 +0000 (21:14 -0800)]

osd: clear_map_cache at end of activate_map

after we're done with it

commit | commitdiff | tree

Sage Weil [Sat, 15 Nov 2008 04:50:35 +0000 (20:50 -0800)]

osd: introduce map_lock RWLock, take read lock during heartbeat

This prevents a race between handle_osd_map updating the map while
heartbeat() is using it to ping peers.

Currently we take a write ref over the entirely to handle_osd_map; we may
be able to push that down a bit.

commit | commitdiff | tree

Sage Weil [Sat, 15 Nov 2008 00:58:37 +0000 (16:58 -0800)]

msgr: small cleanup

commit | commitdiff | tree

Sage Weil [Sat, 15 Nov 2008 00:58:17 +0000 (16:58 -0800)]

lockdep: force backtraces on specific mutexes

Maintaining backtraces is expensive to do for every acquisition. Make a
per-mutex flag so that specific deadlocks can be tracked down.

commit | commitdiff | tree

Sage Weil [Sat, 15 Nov 2008 00:56:59 +0000 (16:56 -0800)]

osd: maintain a cache of past osd maps during repeering

It's expensive and stupid to load and reparse them for each PG.

commit | commitdiff | tree

Sage Weil [Sat, 15 Nov 2008 00:09:09 +0000 (16:09 -0800)]

osd: pause/unpause recovery thread while processing map

Otherwise bad things happen (everyone assumes *osdmap is static and
readable).

commit | commitdiff | tree

Sage Weil [Sat, 15 Nov 2008 00:31:53 +0000 (16:31 -0800)]

mds: journal updates _after_ predirty_parents (which adds parent context)

This ensures the dirlumps occur in an order that can be replayed
to reconstruct the hierarchy (ancestors first).

commit | commitdiff | tree

Sage Weil [Sat, 15 Nov 2008 00:02:15 +0000 (16:02 -0800)]

mds: remove bad assertion

Inode may still be dirty. bah.

commit | commitdiff | tree

Sage Weil [Fri, 14 Nov 2008 23:52:58 +0000 (15:52 -0800)]

mds: mark inode clean only when purge is complete

Otherwise we confuse CDir dirty vs commit rules.

commit | commitdiff | tree

Sage Weil [Fri, 14 Nov 2008 23:39:00 +0000 (15:39 -0800)]

mds: only mark clean if dirty

commit | commitdiff | tree

Sage Weil [Fri, 14 Nov 2008 23:31:07 +0000 (15:31 -0800)]

mds: adjust purge_stray sequence; include explicit ino destroy

First purge the inode content. Don't bother journaling our intent,
as that's implied by the fact that it's an unused stray.

Once purged, journal an event that destroys the inode and unlinks
the dentry. Don't remove null dentry itself, as we still need to
update the stray dir... it will get removed when that is committed.

commit | commitdiff | tree

Sage Weil [Fri, 14 Nov 2008 23:26:01 +0000 (15:26 -0800)]

mds: avoid unnecessary issue_caps in file_eval

commit | commitdiff | tree

Sage Weil [Fri, 14 Nov 2008 23:25:46 +0000 (15:25 -0800)]

mds: fix placement of eval_stray call on caps release

commit | commitdiff | tree

Sage Weil [Fri, 14 Nov 2008 23:02:34 +0000 (15:02 -0800)]

mds: restructure purge_stray to remove inode objects, _then_ dentry

This ensures that any inode we are purging is referenced in the
hierarchy, since we do not destroy the stray dentry until it is
completely gone.

commit | commitdiff | tree

Sage Weil [Fri, 14 Nov 2008 22:31:24 +0000 (14:31 -0800)]

mds: mark and pin dentries while purging, so they don't get trimmed out from under us

Aslo avoid purging more than once.

Previously it was possible to drop the dentry from the cache while
it was being purged.

commit | commitdiff | tree

Sage Weil [Fri, 14 Nov 2008 21:48:30 +0000 (13:48 -0800)]

mon: commit large numbers of state values quickly

Write them all, then sync once at the end.

Also include some infrastructure for using the latest stashed value
to recover. Don't use it yet, though. The interaction with
keeping last_committed and latest stashed values in sync wrt a
failure between the two is a bit tricky.

commit | commitdiff | tree

Sage Weil [Fri, 14 Nov 2008 20:42:56 +0000 (12:42 -0800)]

mon: use generic stash mechism to manage latest version of paxos-managed object

commit | commitdiff | tree

Sage Weil [Fri, 14 Nov 2008 20:08:55 +0000 (12:08 -0800)]

kclient: use generic timeout/retry code for various monitor request types

commit | commitdiff | tree

Sage Weil [Fri, 14 Nov 2008 19:14:30 +0000 (11:14 -0800)]

kclient: pick new mon if statfs is unresponsive; clean up other retry code

commit | commitdiff | tree

Sage Weil [Fri, 14 Nov 2008 01:45:37 +0000 (17:45 -0800)]

streamtest: fix recursive locking

commit | commitdiff | tree

Sage Weil [Fri, 14 Nov 2008 00:48:15 +0000 (16:48 -0800)]

journal: detect size of raw block devices properly

commit | commitdiff | tree

Sage Weil [Fri, 14 Nov 2008 00:37:56 +0000 (16:37 -0800)]

osd: only trim pg log if pg contains complete set of osds

Eventually we may want to also impose some maximum pg log size. At
some point the cost of the long log will approach the cost of
building a backlog...

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 23:45:36 +0000 (15:45 -0800)]

osdmap: fix type conversions

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 23:22:24 +0000 (15:22 -0800)]

crush: mention license. minor cleanup

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 22:38:08 +0000 (14:38 -0800)]

be quiet

commit | commitdiff | tree

Yehuda Sadeh [Thu, 13 Nov 2008 22:56:43 +0000 (14:56 -0800)]

Merge branch 'unstable' of ssh://ceph.newdream.net/git/ceph into unstable

commit | commitdiff | tree

Yehuda Sadeh [Thu, 13 Nov 2008 22:56:00 +0000 (14:56 -0800)]

kclient: fix small resource leak when mds is down on mount

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 22:36:48 +0000 (14:36 -0800)]

csyn: fix msgr startup

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 22:31:57 +0000 (14:31 -0800)]

kclient: whitespace

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 22:26:53 +0000 (14:26 -0800)]

mon: standardize storage of monmap revisions

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 22:19:53 +0000 (14:19 -0800)]

mon: indicate last_consumed state after writing "latest" full maps

Until then, we may need old incremental states. This way paxos
won't discard them.

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 22:19:10 +0000 (14:19 -0800)]

paxos: only trim old states if they've been "consumed" by PaxosService

Higher level state may depend on these items. Only remove them
when its clear they are unneeded.

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 22:13:04 +0000 (14:13 -0800)]

mon: fix mon injectargs

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 21:33:05 +0000 (13:33 -0800)]

cmonctl: seed random number generator so we pick a truly random mds

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 21:30:26 +0000 (13:30 -0800)]

mds: treat open requests as non-idempotent

The problem is that the reply contains a capability, and as such
is statefull and can't be lost. Forwards by the MDS on behalf of
the client, however, introduce the possibility of multiple copies
or a request in flight if one of the MDSs fails, and the client
will drop any duplicate replies it receives.

Alternatively, the client could _also_ parse duplicate responses
(i.e. call fill_trace). I'm not sure if that's a good idea. In
any case, MDS forwarded requests are only really important for
dealing with flash flood scenarios on extremely large clusters,
so let's just set this aside for now.

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 20:45:18 +0000 (12:45 -0800)]

mds todo

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 20:43:03 +0000 (12:43 -0800)]

kclient: only kick requests when they may make progress

Kick requests if the mds they are waiting on
- failed. it may be possible to send the request elsewhere.
- became active.

The rest of the time we are just spinning our wheels.

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 20:42:00 +0000 (12:42 -0800)]

kclient: only submit mds request if mds is active

Wait until we get a new map instead.

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 19:32:34 +0000 (11:32 -0800)]

osd: always pick new mon during boot

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 19:29:33 +0000 (11:29 -0800)]

kclient: fix connect_seq on connect-side after connect

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 19:06:03 +0000 (11:06 -0800)]

mds: keep inode multiversion if it has snapped old_inodes

Once old_inodes gets cleaned out (snaps deleted), we can return
to normalcy.

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 18:56:30 +0000 (10:56 -0800)]

mon: pave way for more per-client mount info in monitor

Eventually we'll need some security stuff in here

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 18:48:25 +0000 (10:48 -0800)]

mon: client mon stat, dump commands; add to cmonctl -w

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 18:40:11 +0000 (10:40 -0800)]

msgr: include entity type in negotiation

This allows the accepting end to determine the policy it will use
on this connection, and return the correct LOSSY flag to the peer.

Also, some kclient simplification, cleanup.

And a connection ref count fix in process_accept().

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 18:01:54 +0000 (10:01 -0800)]

msgr: lotsa cleanup, protocol change, fixes, etc.

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 01:05:51 +0000 (17:05 -0800)]

msgr: various locking fixes

Fixes some deadlock problems.

We also avoid the use of newsd, and do a sync join() on the reader
thread when killing a pipe, to avoid leaking sockets.

Also fix a 'bad seq #' error due to the reader not rechecking the
pipe state after retaking the lock after reading some data.

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 01:03:42 +0000 (17:03 -0800)]

msgr: zero msgr header

Clears up some valgrind warnings

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 01:03:24 +0000 (17:03 -0800)]

thread: complain on bad join() calls

commit | commitdiff | tree

Sage Weil [Thu, 13 Nov 2008 01:03:11 +0000 (17:03 -0800)]

testmsgr: messenger tester

commit | commitdiff | tree

Sage Weil [Wed, 12 Nov 2008 22:15:22 +0000 (14:15 -0800)]

mds: multiversion inodes with multiple links, too

We may have remote links that get snapped.  They need to be able to
find the (single) anchored multiversion inode to get the correct
version.  (The anchor table isn't versioned.)

We'll need to go a step further than this and create snaprealms for
some of these too in order to handle inodes linked into multiple
realms.  But that needs backpointers first...

commit | commitdiff | tree

Yehuda Sadeh [Thu, 13 Nov 2008 19:19:14 +0000 (11:19 -0800)]

kclient: grab reference on inode before async operation

commit | commitdiff | tree

Sage Weil [Wed, 12 Nov 2008 22:31:22 +0000 (14:31 -0800)]

client: use separate locks for SafeCond completions

We can't reuse client_lock because it is already held when the completion
Context is called.

commit | commitdiff | tree

Sage Weil [Wed, 12 Nov 2008 22:18:04 +0000 (14:18 -0800)]

Revert "client: adjust objecter locking"

This reverts commit 1c70b1d8f62ad8d9eeef3a86ef36d8e6933c5702.

Actually, we need client_lock to protect access ot the objecter
and cache structures, as they have no internal locking.

Fix the safecond separately...

commit | commitdiff | tree

Sage Weil [Wed, 12 Nov 2008 22:05:31 +0000 (14:05 -0800)]

mds: cap may already be released in file_update_finish

There may be multiple release requests in the pipeline. Not ideal,
but whatever.

Also, drop locks later.

commit | commitdiff | tree

Sage Weil [Wed, 12 Nov 2008 19:23:01 +0000 (11:23 -0800)]

mds: remove cap _after_ journaling update, at the same time we send the msg

There was an ordering problem that could come up when we prepared
a release message and removed the cap, but then didn't send it to
the client until after the update was journaled.  This could cause
us to remove the _next_ instance of the capability (from a
subseqent open) in certain circumstances.

Instead, wait until after we journal the update before removing
the client cap and sending the ack.  Since time has passed,
reverify the release request seq is still >= the last_open at
that time.  Introduce a helper to avoid duplicating code for the
case where no journaling is necessary and the cap is immediately
released in _do_cap_update.

commit | commitdiff | tree

Sage Weil [Wed, 12 Nov 2008 18:23:15 +0000 (10:23 -0800)]

mds: use null snap context for purge if no realm

The inode may be unlinked, e.g. when we are replaying a journaled
purge_inode EUpdate. The snapc is not really important, as the
OSD will use the newer snapc it has for the object. And we only
really care when we're purging the HEAD anyway.

commit | commitdiff | tree

Sage Weil [Wed, 12 Nov 2008 20:25:33 +0000 (12:25 -0800)]

osd: keep head_exists accurate

Also, create object on setattr if it doesn't yet exist.

And don't munge ZERO -> DELETE, at least for now. What is the use case
for that, anyway?

commit | commitdiff | tree

Sage Weil [Wed, 12 Nov 2008 20:22:21 +0000 (12:22 -0800)]

filestore: implement touch

commit | commitdiff | tree

Sage Weil [Wed, 12 Nov 2008 20:22:12 +0000 (12:22 -0800)]

ebofs: implement touch

commit | commitdiff | tree

Sage Weil [Wed, 12 Nov 2008 20:21:57 +0000 (12:21 -0800)]

objectstore: introduce touch operation

Create an object if it doesn't exist.

commit | commitdiff | tree

Yehuda Sadeh [Wed, 12 Nov 2008 20:26:18 +0000 (12:26 -0800)]

kclient: some osd endianity fixes

commit | commitdiff | tree

Yehuda Sadeh [Wed, 12 Nov 2008 20:13:29 +0000 (12:13 -0800)]

kclient: fix symbol overshadowing

commit | commitdiff | tree

Sage Weil [Wed, 12 Nov 2008 00:00:59 +0000 (16:00 -0800)]

protocol, disk format change

commit | commitdiff | tree

Sage Weil [Tue, 11 Nov 2008 23:52:17 +0000 (15:52 -0800)]

osdmap: move offload from crush map into osdmap as osd_weight

commit | commitdiff | tree

Sage Weil [Tue, 11 Nov 2008 23:10:46 +0000 (15:10 -0800)]

dstart.sh: larger cluster

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 23:53:26 +0000 (15:53 -0800)]

mds: fix replay lookup of snapshotted null dentries

Look up replayed dentry by dnLAST, not dnfirst, as we do with
primary and remote dentries, because that is how we identify
dentry instances in the timeline.

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 23:46:15 +0000 (15:46 -0800)]

objecter: fix read scatter/gather

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 23:46:03 +0000 (15:46 -0800)]

osd: pass at_version by reference, so that cloning works

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 23:45:46 +0000 (15:45 -0800)]

mds: remove spurious warning

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 23:25:24 +0000 (15:25 -0800)]

osd: ignore logs i don't expect without freaking out

We may get a log we didn't think we requested if the prior_set gets rebuilt
or our peering is restarted for some other reason. Just ignore it, instead
of asserting.

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 22:43:17 +0000 (14:43 -0800)]

osd: assert length on write, zero

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 22:40:45 +0000 (14:40 -0800)]

objecter: whoops, do DELETE, not ZERO

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 21:42:14 +0000 (13:42 -0800)]

osd: fix typo/bug when picking osd to pull missing object from

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 21:41:48 +0000 (13:41 -0800)]

osd: fix missing vs lost counting idiocy

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 21:40:59 +0000 (13:40 -0800)]

filestore: cope with zero-length attribute values

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 21:40:50 +0000 (13:40 -0800)]

ebofs: cope with zero-length attribute values

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 18:55:10 +0000 (10:55 -0800)]

osd: use modify flag to decide whether to take read fast path

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 18:37:41 +0000 (10:37 -0800)]

kclient: fix up writes, reads for new op structure

Make sure osdc_readpages still returns bytes read, even though the
overall message result does not (pull it from the op.length).

Set MODIFY flag.

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 18:36:24 +0000 (10:36 -0800)]

osd: MODIFY is a flag; fix up op_read

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 17:46:53 +0000 (09:46 -0800)]

osd: simple higher-order append mutation

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 04:21:43 +0000 (20:21 -0800)]

kclient: fix osd reply handler sanity check

commit | commitdiff | tree

Sage Weil [Mon, 10 Nov 2008 04:15:57 +0000 (20:15 -0800)]

objecter: destructively take ops[], bufferlist passed to read(), modify().

Small optimization.