We didn't update buffer size according to read data.
Also, didn't update the total obj_size (was doing it only
for the second chunk being put, but for chunked input that
only had a single piece we ended up with zero obj_size). Also
remove assertion that second call to handle_data() means that
ofs > chunk size. This isn't true for chunked input.
rgw: abort_early should initialize formatter if needed
The formatter might not have been initialized, as we
abort early (e.g., when protocol handler wasn't found)
so we need to initialize it in order to dump error
status.
rgw: tie callbacks in different handlers directly to REST
Don't translate RESTful operations into a more meaningful
callback name. The handlers themselves should do that
translation. This way we can later register different
handlers with different meanings for the operations.
Sage Weil [Fri, 5 Oct 2012 16:10:31 +0000 (09:10 -0700)]
osd: Make --get-journal-fsid not really start the osd.
This way, it won't need -i ID and it won't access the osd_data_dir.
That makes it useful for locating the right osd to use with an
external journal partition.
Tommi Virtanen [Wed, 3 Oct 2012 19:38:38 +0000 (12:38 -0700)]
debian/control, ceph-disk-prepare: Depend on xfsprogs, use xfs by default.
Ext4 as a default is a bad choice, as we don't perform enough QA with
it. To use XFS as the default for ceph-disk-prepare, we need to depend
on xfsprogs.
btrfs-tools is already recommended, so no change there. If you set
osd_fs_type=btrfs, and don't have the package installed, you'll just
get an error message.
Tommi Virtanen [Wed, 3 Oct 2012 15:47:20 +0000 (08:47 -0700)]
ceph-disk-prepare: Avoid triggering activate before prepare is done.
Earlier testing never saw this, but now a mount of a disk triggers a
udev blockdev-added event, causing ceph-disk-activate to run even
before ceph-disk-prepare has had a chance to write the files and
unmount the disk.
Avoid this by using a temporary partition type uuid ("ceph 2 be"), and
only setting it to the permanent ("ceph osd"). The hotplug event won't
match the type uuid, and thus won't trigger ceph-disk-activate.
Tommi Virtanen [Tue, 2 Oct 2012 23:37:07 +0000 (16:37 -0700)]
ceph-disk-activate: Unmount on errors (if it did the mount).
This cleans up the error handling to not leave disks mounted
in /var/lib/ceph/tmp/mnt.* when something fails, e.g. when
the ceph command line tool can't talk to mons.
Tommi Virtanen [Tue, 2 Oct 2012 23:04:15 +0000 (16:04 -0700)]
ceph-disk-prepare: Allow specifying fs type to use.
Either use ceph.conf variable osd_fs_type or command line option
--fs-type=
Default is still ext4, as currently nothing guarantees xfsprogs
or btrfs-tools are installed.
Currently both btrfs and xfs seems to trigger a disk hotplug event at
mount time, thus triggering a useless and unwanted ceph-disk-activate
run. This will be worked around in a later commit.
Currently mkfs and mount options cannot be configured.
Bug: #2549 Signed-off-by: Tommi Virtanen <tv@inktank.com>
Dan Mick [Fri, 5 Oct 2012 18:17:44 +0000 (11:17 -0700)]
rbd: set_conf_param() rewhack:
1) comment set_conf_param and the loop that uses it
2) put back error checking for "called with full param list" in macro
3) make all the loop calls consistent
4) add a third arg placeholder to handle lock remove
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Josh Durgin [Fri, 5 Oct 2012 00:42:37 +0000 (17:42 -0700)]
OSD: separate class caps from normal read/write
This properly accounts for multi-op requests. Use MOSDOp->rmw_flags for
internal caps requirements, leaving MOSDOp->flags for client specified
options. Use accessors so the flags don't need to be known by the callers.
Also separate capability checks (need_*_cap) from the nature of the MOSDOp
(may_{read,write}). This preserves the semantics of may_{read,write},
which are used in several places outside of capability checks.
Dan Mick [Fri, 5 Oct 2012 00:52:05 +0000 (17:52 -0700)]
rbd: gracefully handle extra arguments
Instead of looping across all args, with increments inside the loop,
which can run off the end of the vector, demand that the final
argument parsing have exactly the right number of args, or complain
about the extras and die.
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Josh Durgin [Thu, 4 Oct 2012 23:50:42 +0000 (16:50 -0700)]
qa: report success if race is not detected
This test still verifies that the race is handled correctly if it
occurs, but will no longer clutter test results with spurious failures
when the race is not reproduced.
Sean Channel [Wed, 3 Oct 2012 21:02:55 +0000 (14:02 -0700)]
admin/build-doc: Use installed Sphinx and its dependencies, when possible.
This avoids the delay of installing Sphinx inside the virtualenv;
especially, compiling lxml is slow.
If Sphinx is not installed system-wide (or it's too old), this will
still install a copy inside the virtualenv, to keep working.
Thanks to Sean for the push to make this happen, and testing the
various scenarios; I (Tv) took the liberty of changing the commit to
use venv-python for the manpage build too, avoid the nonstandard
"which" command, be more careful about quoting, and explain more fully
what's going on in the comment.
Closes: https://github.com/ceph/ceph/pull/24 Signed-off-by: Sean Channel <pentabular@gmail.com> Signed-off-by: Tommi Virtanen <tv@inktank.com>
We should never consider old 'acks' from monitors on a new election. We
usually do it, but we didn't if an election expired, because this code
didn't foresee the possibility of monitors changing ranks in-between
elections -- which doesn't happen if we specify the monmap during the
monitor's mkfs, but may happen when relying on 'mon initial peers'.
Failing to do so triggered an assertion after fixing bug #3252.
Backport: argonaut
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
mon: Elector: bootstrap on new monmap from elector
Whenever we update the monmap we should bootstrap, in order to reset the
monitor's on-going activities and re-probe.
Not doing so contributed to bug #3252, during which we entered an infinite
election cycle. This may only happen though when we rely on 'mon initial
peers'. Specifying a monmap during the monitor's mkfs should not trigger
this bug.
Fixes: #3252
Backport: argonaut
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
Sam Lang [Sat, 29 Sep 2012 02:26:16 +0000 (19:26 -0700)]
client: Fix #2215 with cache inval in thread
The client currently deadlocks with kernel buffer cache invalidation
enabled, due to the client lock calling the invalidate callback, which
in turn sends up calls back to the userspace process which try to lock
the same client lock. The fix is to invoke the invalidate callback in
a separate thread, allowing _release, _flushed, etc. to complete,
unlocking the client lock so that the invalidate callback avoids deadlock
when the up call is made.
We construct a separate work queue (Finisher) that allows scheduling
the invalidate callbacks in a separate thread. The thread only starts
when the invalidate callback is set. If no callback is set, the cache
capability reference is decremented inline as before.
Some callers of invalidate_inode_cache (flush and update_inode_file_bits)
don't expect the cache capability to be decremented. Pass a keep_caps flag to
only decrement the capability ref in the _release case.
Also, we need to make sure the mds is aware that the client has dropped
the cache capability, so we add a call to check_caps in put_cap_ref for the
CEPH_CAP_FILE_CACHE capability.
Sage Weil [Wed, 3 Oct 2012 16:12:40 +0000 (09:12 -0700)]
mds: make migrate_stray() specify full path
The handle_client_rename() check expects a full path rooted in the MDSDIR.
Do so in migrate_stray().
Also, use the committed (not projected) dn linkage; this was a carry-over
from the original switch to this API forever ago, but the current callers
don't need to migrate an uncommitted stray. This also aligns us with
reintegrate_stray().
Reported-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 3 Oct 2012 15:23:08 +0000 (08:23 -0700)]
mds: fix stray reintegration check in handle_client_rename
The stray reintegration generates a source path that will be rooted in a
(possibly remote) MDS's MDSDIR; adjust this check accordingly. This is a
holdover from way back when the straydir was the base of the tree instead
of mdsdir.
Reported-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com>
Remove all existing usage, but leave the definition so third-party
class plugins don't break.
The public flag let *any* user execute a class method, as long
as they had read and/or write access as the method required. This is
better managed by the new osd caps infrastructure, and it was
entirely undocumented and unused, so it should be safe to remove.
Yan, Zheng [Tue, 2 Oct 2012 08:55:52 +0000 (16:55 +0800)]
mds: Avoid creating unnecessary snaprealm
When moving directory between snaprealms, we can avoid creating snaprealm
if the directory doesn't has its own snaprealm and directory was created
after both realms' newest snapshot.