test: test_workload_gen: Add callback for collection destruction.
When we remove a collection, we must cleanup after the coll_entry_t we
once had on the available collections set. For some reason, we weren't
doing this.
This commit adds a new callback, which inherits from the 'OnReadable'
callback on the WorkloadGenerator class, that will be responsible for
deleting the coll_entry_t once we know the collection transaction
destroying the collection has finished.
test: test_workload_gen: Change CLI option and add '--help' usage.
With this commit, we support the following options (and old ones are no
longer available):
--test-num-colls VAL Set the number of collections
--test-num-objs-per-coll VAL Set the number of objects per
collection
--test-destroy-coll-per-N-trans VAL Set how many transactions to run
before destroying a collection.
And --help will show the program's usage description.
test: test_workload_gen: Default arguments, and minor changes.
Besides adding support for default arguments, passed onto global_init(),
this commit fixes a conflict in Makefile.am, and a missing lib
dependency. Also, we didn't used to pay attention to the return values
from store->mkfs() and store->mount(), and now do.
test: test_workload_gen: CodeStyle compliance and cleanup.
This commit aims at the compliance with Ceph's CodeStyle, as well
as cleaning up some lingering unused code.
Also, now we allow changing the default OSD data and journal
locations, as well as the OSD journal size, by providing the
options '--osd-data <PATH>', '--osd-journal <PATH>' and
'--osd-journal-size <VAL>' on the CLI arguments. If not provided,
these will default to 'workload_gen_dir', 'workload_gen_journal'
and '400', respectively.
In it's current state, the workload generator will queue a lot of
transactions onto the FileStore, and will wait if needed in case
there are too many in-flight transactions.
The workload generator will perform the transactions over a
pre-determined number of collections and objects, which may very
well be defined at runtime by using the options '-C <VAL>' and
'-O <VAL>' for collections and objects per collection, respectively.
If these are not provided, the program will default to 30 collections
and 6000 objects per collection.
Yehuda Sadeh [Tue, 27 Mar 2012 21:12:55 +0000 (14:12 -0700)]
rgw: remove pool_list(), can't list_objects() on system buckets
pool_list() was broken, replaced now with pool_iterate(). list_objects()
shouldn't be used any more with system buckets (raw pools), we can't
have it return sorted list of objects without reading the entire list.
Greg Farnum [Fri, 23 Mar 2012 17:31:29 +0000 (10:31 -0700)]
mon: Paxos needs to store the latest version permanently on-disk.
Previously it was only storing this m->latest_value in the stash,
which of course got overwritten. And then when somebody tried to read
it back, it failed!
Instead, require that the message include the regular version (not
just the stashed version), which the previous commit provides. And then
write the regular version to disk alongside the stash.
This set of procedures still suffers from some of the same disk consistency
issues as we recently fixed in slurping, but it's better than it was, and
fixing those would require a good deal more work.
Greg Farnum [Mon, 26 Mar 2012 18:03:27 +0000 (11:03 -0700)]
paxos: share_state sends every unknown value, including the stashed one
Sage points out that the stashed object might not be the same as the
one we actually archive. For instance, OSDMonitor stashes the full
OSDMap but the items it stores in the regular machine_name dir are
incrementals.
Sage Weil [Thu, 22 Mar 2012 15:33:09 +0000 (08:33 -0700)]
doc: update dev/peering document
- fix discussion of last epoch started
- define terms for current and past intervals
- describe role of pg info
- remove mention of the backlog
- fix discussion of up_thru
- etc.
Sage Weil [Sun, 18 Mar 2012 16:08:15 +0000 (09:08 -0700)]
osd: fix object_info.size mismatch file due to truncate_seq on new object
If the first write that creates an object includes a truncate_seq and
truncate_size, we were taking the truncte patch and doing a truncate op
in our transaction prior to the write, and then setting the object_info
size appropriately. However, if the object doesn't exist, the truncate
op fails even though the oi.size gets set.
Later, this turns up as a scrub error (see #2080).
Fix this by skipping the truncate if it is a new object. Instead, we
should just initialize our truncate_{seq,size} metadata so that we're all
up to date for any later writes.
Alternatively, we could touch the object and then truncate it (up) to the
large size, but this is sort of a waste; data beyond a short object eof is
defined to be zeros, so all we would accomplish is making recovery work
harder by copying zeros around.
Fixes: #2080 Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
Sage Weil [Fri, 16 Mar 2012 20:07:25 +0000 (13:07 -0700)]
osd: explicitly create new object,snap contexts on push
We specifically want to use this during recovery to avoid loading the obc
or ssc for a previous version of the object and populating the watchers.
We know we won't have any existing obc here because it is missing (old or
dne).
For the snapset context, we provide it explicitly when we recover the head
or snapset object (which we always do first). For clones, we re-use the
existing get_snapset_context(), which will either have the ssc open or
can load it from the head/snapset object.
Samuel Just [Fri, 16 Mar 2012 05:13:09 +0000 (22:13 -0700)]
ReplicatedPG,FileStore: clone should copy xattrs as well
_make_clone (called from make_writeable) and _rollback_to included
attr reads from head or a clone. In that case, an ondisk read
lock would be necessary. Now, clone also handles xattrs, so the
attr read should not be necessary.
Signed-off-by: Samuel Just <sam.just@dreamhost.com>
Sage Weil [Thu, 15 Mar 2012 17:35:40 +0000 (10:35 -0700)]
osd: maybe clear DEGRADED on recovery completion
We set degraded if we don't have enough "active" replicas, which excludes
the backfill target. We need to recheck that when we finish recovery and
the backfill target is now complete.
Fixes: #2160 Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
Sage Weil [Wed, 14 Mar 2012 19:14:20 +0000 (12:14 -0700)]
osd: rev cluster internal protocol
This covers:
- the push/pull changes in 0.43 (which we forgot to protect against; see
#2132)
- the new omap stuff for 0.44
Maybe we could make this finer grained so that ceph-osd would fail only
when mismatched versions are talking _and_ there is actual omap data in
play, but it's not worth the effort at this point.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>