Sage Weil [Wed, 12 Sep 2012 00:12:06 +0000 (17:12 -0700)]
filejournal: do not enforce that bdev size >= osd journal size
If the configure osd journal size is > the block device size, warn, but
do not generate an error and abort startup. This makes it safe to have
a default 'osd journal size' value of, say, 1 GB without fear of breaking
existing clusters with smaller jouranl block devices.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Tommi Virtanen <tv@inktank.com>
RGWRados::delete_obj() was updated in commit 93218aeab7615ced131f0f1af49f6cda41b4a661, but we
failed to update the corresponding RGWCache api.
This commit fixes it.
Yehuda Sadeh [Wed, 29 Aug 2012 22:34:17 +0000 (15:34 -0700)]
rgw: store cluster params in a special object
We now have a cluster root pool that should hold the
cluster params. The cluster params are now read from
this object on startup, if object does not exist we
set its defaults and write it.
Yehuda Sadeh [Fri, 31 Aug 2012 05:21:15 +0000 (22:21 -0700)]
rgw: set atomic context for copy operation src and dest
This is required so that we handle both src and dest atomically. We
also set the prefetch flag on the src object, so that we read the
first chunk along with its attrs.
rbd: make --pool/--image args easier to understand for import
There's no need to set the default pool in set_pool_image_name - this
is done later, in a way that doesn't ignore --pool if --dest-pool
is not specified.
This means --pool and --image can be used with import, just like
the rest of the commands. Without this change, --dest and --dest-pool
had to be used, and --pool would be silently ignored for rbd import.
librbd, cls_rbd: close snapshot creation race with old format
If two clients created a snapshot at the same time, the one with the
higher snapshot id might be created first, so the lower snapshot id
would be added to the snapshot context and the snaphot seq would be
set to the lower one.
Instead of allowing this to happen, return -ESTALE if the snapshot id
is lower than the currently stored snapshot sequence number. On the
client side, get a new id and retry if this error is encountered.
Backport: argonaut Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
James Page [Wed, 12 Sep 2012 09:40:01 +0000 (10:40 +0100)]
Rejig the way the shared libraries are processed so that manual postinst/postrm scripts are not required for lib* packages, ensuring that the .so's in the ceph package are not detected
Tommi Virtanen [Tue, 11 Sep 2012 23:31:57 +0000 (16:31 -0700)]
upstart: Give everything a stop on stanza.
These are all tasks, and expected to exit somewhat quickly,
but e.g. ceph-create-keys has a loop where it waits for mon
to reach quorum, so it might still be in that loop when the
machine is shut down.
Tommi Virtanen [Tue, 11 Sep 2012 17:07:04 +0000 (10:07 -0700)]
upstart: Use "ceph osd crush create-or-move".
Now the weight is only set when adding the OSD to the CRUSH map for
the first time. Once it's there, it's only moved, and the weight is
left untouched.
Change the ceph.conf option for the initial weight from
osd_crush_weight to osd_crush_initial_weight, to reflect this.
If you don't want new OSDs to store data automatically (to minimize
balancing and keep a human in the control loop), you can now
set osd_crush_initial_weight=0.
Closes: #3101 Signed-off-by: Tommi Virtanen <tv@inktank.com>
Sage Weil [Tue, 11 Sep 2012 21:50:53 +0000 (14:50 -0700)]
obsync: if OrdinaryCallingFormat fails, try SubdomainCallingFormat
This blindly tries the Subdomain calling format if the ordinary method
fails. In particular, this works around buckets that present a
PermanentRedirect message.
See bug #3128.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Matthew Wodrich <matthew.wodrich@dreamhost.com>
Samuel Just [Tue, 11 Sep 2012 18:05:40 +0000 (11:05 -0700)]
ReplicatedPG: do not start_recovery_op if we are already pushing
Should fix bug #2761.
If we are already pushing soid, recovery_ops will only be decremented once for
all current pushes, so only increment recovery_ops if we are not currently
pushing it.
This bug causes us to leak a recovery op and get stuck in backfill.
Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 4 Sep 2012 22:25:20 +0000 (15:25 -0700)]
osd: fill in user log entry last after snapdir tran
Reorder the snapdir logic and ctx->at_version adjustments prior to filling
in the object_info_t and user_versions and all that stuff. Adjust
at_version after appending the log entry (so that it points to the next
position/version we will write at.. culminating in the actual user
event).
The user log entry contains the request id, which will be used
by replay ops to put themselves in the correct place in the
waiting_for_commit/ack maps. Thus, the repop needs to be tagged
with the same version as the log entry with the request id.
Thus, the request id bearing log entry should be the last in
the log entry vector.
This should fix #3072, wherein a replay which should wait on
the repop tagged as version '36 will instead wait on '35.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Tue, 11 Sep 2012 15:48:34 +0000 (08:48 -0700)]
mon: make redundant osd.NNN argument optional
Instead of 'osd crush set NNN osd.NNN weight loc...', make the second
osd.NNN option optional, and allow either NNN or osd.NNN to specify the
osd id. This makes the usage much more sane, but maintains backward
compatibility.
Sage Weil [Tue, 4 Sep 2012 22:25:20 +0000 (15:25 -0700)]
osd: fill in user log entry last after snapdir tran
Reorder the snapdir logic and ctx->at_version adjustments prior to filling
in the object_info_t and user_versions and all that stuff. Adjust
at_version after appending the log entry (so that it points to the next
position/version we will write at.. culminating in the actual user
event).
The user log entry contains the request id, which will be used
by replay ops to put themselves in the correct place in the
waiting_for_commit/ack maps. Thus, the repop needs to be tagged
with the same version as the log entry with the request id.
Thus, the request id bearing log entry should be the last in
the log entry vector.
This should fix #3072, wherein a replay which should wait on
the repop tagged as version '36 will instead wait on '35.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Instead of just keeping a flat usage info per bucket, we
now maintain a list of categories for which requests
usage is aggregated in. Ops are put in categories based
on their names.