Samuel Just [Mon, 29 Oct 2012 22:35:09 +0000 (15:35 -0700)]
osd/: add pool min_size parameter for min acting set size
Otherwise, a pg might go active with a single osd in the
acting set. If that osd subsequently dies, we potentially
loose client writes. Note: it's still possible for the
acting set to exceed min_size but fail to obey the spirit
of the user's crush settings (e.g., min_size is 2, but both
osds happen to be no the sam node).
Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil: <sage@inktank.com>
Dan Mick [Tue, 6 Nov 2012 23:28:10 +0000 (15:28 -0800)]
cls_rbd: send proper format of key to "last_read" for dir_list
rbd ls of format-2 images was looping on the first 64 (when more than 64
were present). The key name passed to the omap layer needs to always
contain the prefix, and the "inside-the-loop next-chunk" statement
was missing the "add the prefix" call.
Also, add a test for listing 100 images, format 1 and 2.
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Yehuda Sadeh [Mon, 29 Oct 2012 23:46:21 +0000 (16:46 -0700)]
rgw: resolve hostname dns cname record
Implements #3206
This allows using vanity domains. A CNAME record can now
be set for the domain that would point at the rgw instance,
with or without a bucket set as a subdomain.
Yehuda Sadeh [Wed, 24 Oct 2012 19:56:59 +0000 (12:56 -0700)]
rgw: don't reset multipart parts when updating their metadata
Fixes: #3401
The problem was that put_obj_meta() was assuming object is going
to be reset, so it was resetting the object anyway. This is not
true when dealing with the immutable multipart upload parts.
Dan Mick [Tue, 6 Nov 2012 00:13:19 +0000 (16:13 -0800)]
rbd: allow removal of image even if rbd_children deletion fails
Users have been seeing failures where rbd rm is half-done; could be
because of outstanding watches on the rbd_header object. The state
is that rbd_children no longer contains the child, but other pieces
remain; remove considers this a failure.
Fix: test for ENOENT from remove_child, and treat that as an ignorable
error and drive on. Simulate this in copy.sh by removing the
rbd_children object altogether, which also results in ENOENT return
from remove_child.
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Dan Mick [Tue, 6 Nov 2012 00:13:19 +0000 (16:13 -0800)]
rbd: allow removal of image even if rbd_children deletion fails
Users have been seeing failures where rbd rm is half-done; could be
because of outstanding watches on the rbd_header object. The state
is that rbd_children no longer contains the child, but other pieces
remain; remove considers this a failure.
Fix: test for ENOENT from remove_child, and treat that as an ignorable
error and drive on. Simulate this in copy.sh by removing the
rbd_children object altogether, which also results in ENOENT return
from remove_child.
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Samuel Just [Fri, 2 Nov 2012 20:02:15 +0000 (13:02 -0700)]
PG: use remove_object_with_snap_hardlinks for divergent objects
Otherwise, we end up leaving snap hardlinks in the snapshot
index directories. This eventually results in an EEXIST error
when we attempt to re-link the clone into place during
recovery.
Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Noah Watkins [Fri, 2 Nov 2012 20:56:46 +0000 (13:56 -0700)]
client: return EBADF for invalid file desc
Adds get_filehandle to Client which resolves a file descriptor or
returns NULL if the file descriptor is invalid. Libcephfs calls that
accept a file descriptor are changed to return -EBADF when
get_filehandle returns NULL.
Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Reviewed-by: Sage Weil <sage@inktank.com>
Samuel Just [Fri, 2 Nov 2012 20:02:15 +0000 (13:02 -0700)]
PG: use remove_object_with_snap_hardlinks for divergent objects
Otherwise, we end up leaving snap hardlinks in the snapshot
index directories. This eventually results in an EEXIST error
when we attempt to re-link the clone into place during
recovery.
Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Alex Elder [Thu, 1 Nov 2012 18:30:11 +0000 (13:30 -0500)]
run_xfstests.sh: add optional iteration count
This adds a "-c <count>" option to the run_xfstests.sh script so
the full set of tests can be repeated more than once without having
to go through the setup process each time.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Sage Weil [Tue, 30 Oct 2012 21:17:56 +0000 (14:17 -0700)]
ceph-disk-activate: avoid duplicating mounts if already activated
If the given device is already mounted at the target location, do not
mount --move it again and create a bunch of dup entries in the /etc/mtab
and kernel mount table.
Samuel Just [Wed, 31 Oct 2012 22:38:35 +0000 (15:38 -0700)]
osd/: add pg_log_entry_t::reverting_to for LOST_REVERT
Previously, we encoded the version to which we were
reverting in the prior_version field. Now, we explicitely
encode that version in the reverting_to field.
Using prior_version to encode the reverting_to version
could cause us to revert to the wrong version:
primary osd.1 writes foo 7'6(0'0)
primary osd.1 writes foo 9'9(7'6)
primary osd.0 learns the log up to 9'9 but recovers no objects
primary osd.1 dies
primary osd.0 reverts the foo in version 17'11(7'6) to 7'6
primary osd.1 comes back and starts to recover itself
foo is not missing on osd.1, and so the new log entry
17'11(7'6) causes osd.1's missing set to contain an entry
for foo with need=17'11 and have=7'6. recover_primary uses
this information to conclude that we can locally recover
the LOST_REVERT event from the local copy which it assumes
is 7'6 but in fact is 9'9.
This bug actually manifested as failing an assert in
populate_obc_watchers since the version on disk didn't
match the prior_version of the log event.