Sage Weil [Fri, 29 May 2009 17:43:31 +0000 (10:43 -0700)]
kclient: make session cap list an LRU; combine touch with check for issued caps
This both makes the issued cap check better (only traverses caps until we
get what we want), and combines the touch_cap step to bump it to the front
of the LRU.
So far all callers do the touch, so the touch arg is of dubious value..
Sage Weil [Thu, 28 May 2009 18:23:43 +0000 (11:23 -0700)]
osd: ship transaction (not op) to replicas
This simplifies the code path on the replica. It avoids duplicating
computation, which can be a win even for trivial ops when snaps are
involved. For active objects, it will avoid the computation as well.
Currently parallel execution will still happen if you specify a flag.
However, an object method should probably also be able to specify ||
execution is better in cases where the operation is deterministic and
|| exec will mean less data over the network.
Disable unused vector<T*> template, since it confuses the
vector<const char *> encoder.
Sage Weil [Thu, 28 May 2009 04:43:41 +0000 (21:43 -0700)]
osd: move object_info_t, exists, size into ObjectState
The main goal is to capture everything needed by the op context
(the object_info_t, exists, size) in something that doesn't include all
of ObjectContext (such as teh current access mode, num writers, and other
things that only the primary needs) so that the sub_op_modify setup is
as simple as possible.
This kills the annoying exists, size args in all the helpers.
Sage Weil [Wed, 27 May 2009 19:06:38 +0000 (12:06 -0700)]
initscript: remove notreelog by default
This screws us if we put the journal file on the same btrfs volume. Not
that it works that well anyway, but it works even worse this way. And
since FileStore never calls fsync(), it's fine.
Sage Weil [Tue, 26 May 2009 23:09:34 +0000 (16:09 -0700)]
kclient: avoid d_time reuse s.t. leases and dir rdcache can coexist
We want to be able to remember a lease even when we are issued
RDCACHE on the entire dir. (Otherwise, we fail to release the
dentry when doing an op, and have to wait for the revoke round
trip.)
So avoid reusing d_time for the rdcache_gen. Unconditionally set
it in update_dentry_lease.
Also, force inclusion of a no-op inode update if we need to include
the dentry lease release, even when the dir caps aren't changing.
Sage Weil [Mon, 25 May 2009 21:19:26 +0000 (14:19 -0700)]
osd: prepare clone before write op
This simplifies tracking (no need to set values aside we'll need for
clone). Instead, we just make_writeable() before any write op to ensure
the clone exists.
Sage Weil [Mon, 25 May 2009 20:18:14 +0000 (13:18 -0700)]
osd: track ObjectContext for cloned objects
This orders access to a newly cloned object. This is really only important
when you have a racing clone creation and a clone read. The read will
look at the head's context and expect the clone to be there, but we may
not have applied the write to disk yet. So, we set up an obc for the
cloned object too (with the same mode as the head).
Sage Weil [Sat, 23 May 2009 01:17:46 +0000 (18:17 -0700)]
osd: break apart write stages, transactions
We break the write preparation into three stages. First we run the ops
vector and build the op_t transaction. If it is non-empty, we build a
clone_t transaction to run before it, and a local_t that updates the osd's
PG log and metadata.
Take care to preserve old exists, size, and version values before running
the ops vector as those are clobbered but need to be send to the replica
osds.