Yehuda Sadeh [Fri, 16 Jan 2015 01:30:24 +0000 (17:30 -0800)]
rgw: bilog marker related fixes
Fix the way we parse the marker. Instead of specifying whether it's a
sharded or not sharded bucket, we pass a shard_id. If string itself
points to a singe shard, we'll use the passed shard_id, otherwise we'll
parse the string and determine the shard id by that. In this way when
referencing a single shard we can get the marker with either shard id
specified or not. This works with the non-shard case too.
Adjust the bilog listing function, set it to work with the new
interface. It was broken before, and there are multiple fixes to it.
Yehuda Sadeh [Wed, 14 Jan 2015 19:47:18 +0000 (11:47 -0800)]
rgw: wait for completion only if not completion available
In a bucket aio operation, wait for completions only if there are no
completions available. Otherwise we might wait forever, as everything
already complete.
Yehuda Sadeh [Tue, 13 Jan 2015 21:31:39 +0000 (13:31 -0800)]
rgw: fix memory leak
We were iterating on both completion_objs, and completions assuming that
they follow each other. They don't do it. While at it, index completions
by id, so that we could update the completed objs correctly.
Yehuda Sadeh [Fri, 9 Jan 2015 18:23:35 +0000 (10:23 -0800)]
rgw: only keep track for cleanup of rados objects that were written
Fixes: #10311
We're keeping track of rados objects that we've written so that we could
clean them up if needed. Earlier we weren't too accurate about it and
were also setting the head object that is yet to be written. This now
only applies to the tail data, and a bit clearer.
Yehuda Sadeh [Wed, 7 Jan 2015 23:30:27 +0000 (15:30 -0800)]
rgw: max shards configuration is part of the zone config
The zone config params are set in the region configuration. Also,
there's a ceph.conf configurable (rgw_override_bucket_index_max_shards)
for overriding this per rgw.
Yehuda Sadeh [Tue, 9 Dec 2014 21:58:09 +0000 (13:58 -0800)]
rgw: pass num shards on bucket initialization
Need to pass the actual num shards that are going to be used for this
specific bucket. Bucket may be created by applying metadata from
different zone, so num shards might be different.
Yehuda Sadeh [Fri, 5 Dec 2014 23:52:26 +0000 (15:52 -0800)]
rgw, cls_rgw: keep shard ids with oids
Instead of just having the list of oids, keep the shard ids together, so
that we can know on which shard the operation happened.
Bucket markers are just using the shard numeric id, instead of the
bucket instance shard id. This makes it easier to parse the markers
appropriately.
Yehuda Sadeh [Fri, 5 Dec 2014 22:10:50 +0000 (14:10 -0800)]
cls_rgw: clean up CLSRGWConcurrentIO
Class is no longer a template, and keeps a map of oids by shard_id. Call
issue_op() using both shard_id and oids. Shard id is used for mapping
the results in the derived classes.
Jason Dillaman [Tue, 13 Jan 2015 04:17:50 +0000 (23:17 -0500)]
librbd: flush pending AIO requests under all existing flush scenarios
AIO requests that are waiting on the image lock should be flushed
during all existing RBD flush scenarios. A few flush cases were
missed in the original implementation.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Mon, 3 Nov 2014 21:51:06 +0000 (16:51 -0500)]
librbd: differentiate between R/O vs R/W RBD features
The new RBD exclusive lock feature should be treated as a
feature that is only applied when the image is opened in
R/W mode.
Older clients will need to handle the updated
cls_rbd::get_features method in order to properly determine
the incompatible features for an image depending on the
current mode.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Sun, 16 Nov 2014 19:20:42 +0000 (14:20 -0500)]
librbd: Add convenience library to support unit tests
Unit tests need access to the private symbols of librbd no
longer exported from librbd.so. A new librbd_internal
convenience library was created to allow access.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Wed, 8 Oct 2014 12:41:53 +0000 (08:41 -0400)]
librbd: Integrate librbd with new exclusive lock feature
Operations that update the image now require the exclusive lock
if the feature is enabled. AIO write and discard operations will
automatically request the exclusive lock from the current leader
to support live-migration.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Matt Richards [Tue, 13 Jan 2015 00:59:42 +0000 (16:59 -0800)]
librados: bump rados version number
As a follow-on to 49d114f1fff90e5c0f206725a5eb82c0ba329376,
increment the "extra" version field so clients can easily
determine if they have a version of librados that properly
translates C API operation flags.
Signed-off-by: Matthew Richards <mattjrichards@gmail.com>
Sage Weil [Mon, 12 Jan 2015 22:00:21 +0000 (14:00 -0800)]
osd: enable filestore_extsize by default
Note that this will only get used if the kernel is new enough; if it is
older than 3.5 the option will get disabled and extsize will not be used
even if the option is set to true.
Sage Weil [Mon, 12 Jan 2015 21:59:39 +0000 (13:59 -0800)]
os/FileStore: verify kernel is new enough before using extsize ioctl
Old kernels have an XFS bug that exposes uninitialized data when the
extsize hint is set and only partially written. This is fixed by Linux
commit aff3a9edb7080f69f07fe76a8bd089b3dfa4cb5d, documented in XFS bug
http://oss.sgi.com/bugzilla/show_bug.cgi?id=874, and tested by XFS
test xfs/229 to prevent regressions.
Notably the original bug affects kernel 3.2, which is widely deployed with
ubuntu precise 12.04.
Backport: giant, firefly Signed-off-by: Sage Weil <sage@redhat.com>
John Spray [Mon, 5 Jan 2015 19:34:57 +0000 (19:34 +0000)]
mon: implement `fs reset`
This is for use in CephFS disaster recovery. When
the metadata pool has been forcibly reset to a single-MDS
metadata tree, we would like to reset the MDSMap to match.
Sage Weil [Mon, 17 Nov 2014 20:46:51 +0000 (12:46 -0800)]
osd/ReplicatedPG: drop unnecessary cache_mode checks
This currently enumerates all cache modes except none, and we don't
arrive in this function when caching is disabled. And creating a whiteout
is not cache_mode dependent. Simplify!