Sage Weil [Mon, 15 Aug 2011 23:06:22 +0000 (16:06 -0700)]
osd: fix heartbeats after bad markdown
The heartbeat start message comes from hbin messenger, which has no port
and a nonce of the pid (at startup). When we mark ourselves down/up, and
then resend a start, the peer will send a RESETSESSION and the stat
message will get lost, and then we'll miss heartbeats.
Mark down all connections, so that when we reconnect, our start message
is not lost.
Sage Weil [Thu, 11 Aug 2011 20:55:17 +0000 (13:55 -0700)]
mds: avoid issue_caps on snapped inodes
Only head inodes have caps. Don't set need_issue if it's not a head inode.
This is cleaner than bailing out of issue_caps(); we shouldn't get to that
point. The places that set need_issue = true usually set gather++ too,
which also shouldn't happen on a snapped inode.
Fixes: #1390 Reported-by: Damien Churchill <damoxc@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 10 Aug 2011 22:38:25 +0000 (15:38 -0700)]
mds: don't wait for lock 'flushing' flag on replicas
If we are a replica, the 'flushing' means that we had dirty scatterlock
data and are waiting for it to get flushed out to the auth copy (by
cycling from MIX->LOCK, normally). If we end up with 'flushing' set
while in the MIX state, we can't wait for it to clear before responding
to a lock request from the primary or we'll deadlock.
On the auth, flushing means flushing to the log, which makes sense; that
will always make progress despite scatterlock activity.
This fixes a hang from 3-mds fsstress with thrashing exports. (Strangely
I never hit this on fatty.)
Josh Durgin [Tue, 9 Aug 2011 23:14:23 +0000 (16:14 -0700)]
librbd: deduplicate sparse read interpretation
AioBlockCompletions and read_iterate each had their own copy of this
code, leading to bugs when only one was changed. Move this to a
separate function, handle_sparse_read.
Sage Weil [Mon, 8 Aug 2011 19:16:41 +0000 (12:16 -0700)]
debian: explicitly bind library users to matching version
We are cheating with the shared libs by making small API changes without
bumping the soname. Bind users to a matching version to minimize user
pain. When the APIs become fully stable these will need to go away.
Fixes: #1354 Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Fri, 5 Aug 2011 21:28:29 +0000 (14:28 -0700)]
mds: chain rename subtree projections
We can have two renames for the same file in flight to the journal. Stack
them up in a list. The old project_subtree_rename() should have asserted
that the item wasn't already in the map before inserting it to catch this
at the front end. Now it doesn't matter; it's a list.
Don't allow string-valued configuration items to be changed using
injectargs unless they have observers. Otherwise, we could have
crashes, since one thread could be reading the std::string's internal
buffer after another thread frees that buffer during assignment.
Write a unit test to validate this behavior.
Also test that we can turn on and off the log_file using injectargs.
This is something that injectargs often gets used for in practice.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Thu, 4 Aug 2011 20:48:55 +0000 (13:48 -0700)]
osd: expect heartbeats from anyone peering depends on
We were getting heartbeats from just acting replicas. That's really not
enough if we want to be sure to detect failures of OSDs we depend on,
which includes any stray or up OSDs as well.
Sage Weil [Tue, 2 Aug 2011 21:19:41 +0000 (14:19 -0700)]
osd: change src_oid encoding -- FLAG DAY
The old encoding was mutually exclusive with putting any data payload on
the operation. That was stupid.. we can't, for example, do xattr ops then
on a src_oid.
Fix this by just including the oid in the data payload inline whenever the
bit is set in the op code. This changes the client protocol in an
incompatible way, which means users of the CLONERANGE operation need to be
upgrade/downgraded in unison.
Josh Durgin [Mon, 1 Aug 2011 17:45:19 +0000 (10:45 -0700)]
librados: fix notify deadlock
The success of the notify call needs to checked before waiting to
receive a notification. If we try notifying on an object that does not
exist, for example, it should fail with -ENOENT, and not hang.
Josh Durgin [Tue, 2 Aug 2011 19:14:06 +0000 (12:14 -0700)]
osd: put_object_context: tolerate pgs being deleted
PGs that are queued for deletion won't be in the osdmap,
and may not be in the pg_map, but if they are, it's safe to
put object_context. Otherwise, the pg is being deleted and
will clean up the object contexts itself.
osd, pg: clean up watchers on pg deletion and shutdown
Watchers and their object contexts need to be cleaned up so
they aren't used after the pg is gone. This happened if the
pool was deleted and the connection to the watcher was reset.
Sage Weil [Sat, 30 Jul 2011 05:10:17 +0000 (22:10 -0700)]
mds: fix create_subtree_map for new dirs
Currently mkdir foo ; rmdir foo fails because we can't get_subtree_map()
on a new directory that isn't linked in the committed plane. Since we are
journaling the projected subtree, it makes sense to use
get_projected_subtree_map() here.
It's easiest to keep in both the old and new directories in the rename
project map instead of looking at the next-to-most-recent parent for the
inode. The committed version is irrelevant (could conceivably be multiple
renames behind) and the current projected parent is just newdir; we need
olddir too, and we don't project for cross-mds rename anyway.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
This allows clients to determine whether they have the latest
mds, mon, or osd map. This is useful for figuring out if a pool
does not exist, or if the osdmap with it simply hasn't been
received yet.