Several functions examine argv in order to set options. Only the last
argument parsing pass should remove the '--' from the argument vector.
If it is removed earlier than that, entries may be parsed as options,
when that was not the user's intent.
This changes fixes the common argument parsing loops so that they do not
remove the double dash. It also rearranges some programs so that the
user's argument parsing loop comes last, rather than coming before the
common argument parsing loops.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Tue, 13 Sep 2011 04:23:00 +0000 (21:23 -0700)]
monclient: reopen session on monmap change
If our cur_mon is removed from the monmap, reopen the session. Do not
call _pick_new_mon() directly or we won't reset state, won't
reauthenticate, etc.
Instead of having global CompatSet objects, just have functions that can
return appropriate CompatSet objects. This avoids global constructor
and destructor ordering issues.
Fixes bug #1512
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Mon, 12 Sep 2011 02:19:32 +0000 (19:19 -0700)]
librbd: implement rbd buffered write window
Normal disks have a write cache and acknowledge writes before they reach
the platter. Among other things, this masks write latency. A flush
operation is needed when the user really cares that the writes are stable.
Implement a librbd write window that allows a window including the most
recent N bytes of writes to be immediately acked. An flush operation
blocks while they are pushed out to disk.
This differs from the typical disk in that writes are always immediately
sent to the backend store, while disks will buffer small writes for a time
(and, in fact, can be made to hold small writes in the cache indefinitely
under certain workloads).
Thus, 'rbd_writeback_window' may be a bit of a misnomer...
Currently this applies only to aio writes, not sync writes. That could
most easily be fixed by reimplementing write in terms of aio_write.
Sage Weil [Mon, 12 Sep 2011 01:58:42 +0000 (18:58 -0700)]
client: fix odd crash on rename
If the old_dentry is in the same dir, and it is the last dentry, we need
to keep the dir open.
This is hard to hit because the rename itself will typically instantiate
a null dentry on the target, and it's hard to construct a working where
a racing process makes us drop it. Fortunately this was triggered
reliably by the snaptest-git-ceph.sh workunit.
Fixes: #1519 Signed-off-by: Sage Weil <sage@newdream.net>
Samuel Just [Sat, 10 Sep 2011 04:39:27 +0000 (21:39 -0700)]
PG: generate backlog when confronted with corrupt log
Currently we throw out the log and start up anyway. With this change, we
would throw out the log, generate a fresh backlog, and then start up.
That may not be the best possible thing, but it's better than what we
currently do. Indirectly fixes #1502.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Tommi Virtanen [Fri, 9 Sep 2011 23:25:14 +0000 (16:25 -0700)]
man: Generate manpages from doc/man.
Keeping the generated files in version control lets us
support builds from scratch without requiring the full
documentation toolchain to be installed.
The files were just copied over from build-doc/output/man,
after a ./admin/build-doc call. When redoing this, also
take care to remove any roff output if a file was removed
from doc/man, and update Makefile.am.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
We were previously setting up a reference loop. But the only way
to get Sessions is via the Connection, so let's just give Sessions
the pointer, and give Connections a counted ref.
We can't do that if we're trying to be Valgrind-clean, so just
make the lock name part of the class.
As best I can tell, that ordered initialization is safe because
data members are initialized in the order they are declared. See eg
http://xenon.arcticus.com/c-morsels-initializer-list-execution-order
Sage Weil [Wed, 7 Sep 2011 20:28:21 +0000 (13:28 -0700)]
osd: take ondisk_read_lock on src_oids
We need to take the ondisk read lock on src oids for multiobject operations
(like clonerange) to ensure that written data has hit disk before we
clone it elsewhere.
Order of acquisition doesn't actually matter here, since the ondisk locks
are all leaves in the lock dependency hierarchy.
Samuel Just [Wed, 7 Sep 2011 16:59:15 +0000 (09:59 -0700)]
OSD: use creating_pgs[pgid].history in get_or_create_pg for new pg
If info.pgid is in creating_pgs, we should use the history from
creating_pgs. The history passed in will be an empty history from a
creation probe in that case.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Sage Weil [Wed, 7 Sep 2011 16:48:35 +0000 (09:48 -0700)]
client: add inode on IMPORT
If we get an IMPORT and don't have the inode, add it. This fixes a race
like:
mds0 -> client .. mknod reply (or similar)
mds0 -> mds1 .. migrate cap
mds1 -> client .. send IMPORT
client <- mds1 .. rx IMPORT, but don't have inode
client <- mds0 .. rx mknod reply, add it.
With this fix, we add the inode and set up the cap on IMPORT, and when we
get the mknod reply we update the inode immutable fields that aren't
present in the cap message (rdev, symlink target).
Fixes: #1513 Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Thu, 1 Sep 2011 23:32:24 +0000 (16:32 -0700)]
mds: bracket LOCK|AUTH -> PREXLOCK transition with start/finish_locking
Unlike other lock transitions, we are moving to an _unstable_ state and
then taking our (x)lock. That means that if we don't actually finish for
some reason (lock order changes, request is canceled, whatever) we leave
things in an unstable state--in this case, PREXLOCK, where nothing else
will touch it.
Call cancel_lock in drop_locks (or in acquire_locks when the order changes)
to clean up after an aborted lock attempt.
Original but reproduced (though not easily) by multimds collection task
fsstress_thrash_subtrees.yaml.
Fixes: #1425 Signed-off-by: Sage Weil <sage@newdream.net>