Sage Weil [Tue, 6 Dec 2011 23:21:15 +0000 (15:21 -0800)]
objectstore: make list by hash *next > instead of >=
This means we should set it to a hash boundary or the last item of our
result set (not the next item we didn't include).
It means that during backfill we can set our last_backfill to the last
object we did recover and be sure that any new files locally will be
included in the next result set, and we can bound that result set by that
last object recovered and not include it in the resulting range.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Mon, 5 Dec 2011 19:38:12 +0000 (11:38 -0800)]
osd: track backfill with last_backfill, not interval_set<>
We always fill from the bottom up anyway. Using an hobject_t also gives us
a precise bound. It also makes things conceptually simpler: last_complete
and last_backfill bounding each of the two dimensions of updatedness.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
New process:
- pick best log, with preferences for those that might end up primary
- pick best primary that is log-contiguous with best log, with preference
for longer tails that will result in more acting osds.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 30 Nov 2011 00:44:17 +0000 (16:44 -0800)]
osd: fix push_to_replica typo
We are always pushing soid. If we are missing snapdir locally, that means
we can't do an informed efficient clone, and should push the whole
object... NOT that we should push snapdir!
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Samuel Just [Fri, 9 Dec 2011 22:42:21 +0000 (14:42 -0800)]
pybind: add object locator support to pybind pool listing
list_objects returns Object(). Object therefore now has an optional
locator_key parameter which will set up the object locator on Object()
methods to allow objects returned from list_objects with locator keys to
be used normally.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Josh Durgin [Fri, 9 Dec 2011 00:36:45 +0000 (16:36 -0800)]
udev: drop device number from name
The device number depends on how many rbd images have been
mapped. Removing it makes the name determined solely by the name,
image, and snapshot that are mapped, for ease of scripting or persistence
across reboots.
Samuel Just [Thu, 1 Sep 2011 21:56:01 +0000 (14:56 -0700)]
librados,Objecter,PG: list objects now includes the locator key
Previously, there was no way to recover the locator key used to create
and object. Now, rados_objects_list_next and ObjectIterator will return
the key as well as the object name.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Wed, 30 Nov 2011 18:38:09 +0000 (10:38 -0800)]
object.h: Sort hobject_t by nibble reversed hash
To match the HashIndex ordering, we need to sort hobject_t by the nibble
reversed hash. We store objects in the filestore in a directory tree
with the least significant nibble at the top and the most at the bottom
to facilitate pg splitting in the future.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Tue, 6 Dec 2011 21:23:03 +0000 (13:23 -0800)]
ReplicatedPG: don't crash on empty data_subset in sub_op_push
If data_subset is empty (i.e., the data we pulled is no longer useful),
we should mark complete false and continue rather than fail the
assert in range_end().
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Greg Farnum [Tue, 6 Dec 2011 22:24:08 +0000 (14:24 -0800)]
ReplicatedPG: do not ->put() scrub messages when adding to a WorkQueue.
This function is passing a reference from PG::active_rep_scrub to
the req_scrub_wq, not eliminating the reference (and the WorkQueue
doesn't grab a new reference itself, either).
The other alternative is to convert the WorkQueue to grab a
reference, but since they can cycle through the WorkQueue more than
once, and need to be ->put() outside the WorkQueue, I don't like
that option.
This should fix #1758.
Also add an assert to PG::_request_scrub_map to check on the other
possible cause of this bug (and fix the indentation).
Tommi Virtanen [Tue, 6 Dec 2011 20:13:03 +0000 (12:13 -0800)]
doc: Reorganize pip calls to use a requirements file.
The conditional before running pip install was unnecessary,
"pip install" on already installed packages is fast (as long
as it's not --upgrade), and --quiet makes it not spam the
console.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Sage Weil [Mon, 5 Dec 2011 18:52:24 +0000 (10:52 -0800)]
filejournal: remove bogus check in read_entry
It is perfectly fine to read events that are older than the fs's seq from
the journal; open() will skip them when positioning the read pointer on
open.
Also, this code is nonsensical; it always failed the assertion.
Sage Weil [Mon, 5 Dec 2011 17:34:44 +0000 (09:34 -0800)]
filejournal: set last_committed_seq based on fs, not journal
last_committed_seq is the last seq committed to the fs, not the journal.
Set it when we begin replay with the fs provided value, not from the newest
entry in the journal.
Sage Weil [Fri, 2 Dec 2011 23:35:38 +0000 (15:35 -0800)]
mon: stub perfcounters for monitor, cluster
The 'mon' perfcounter is for the local daemon and is always registered.
The 'cluster' perfcounter is for cluster state, and is only registered
(and thus only shows up via the admin socket) when the current daemon is
part of the cluster quorum.
This could conceivably cause the reply ordering mismatch seen in bug
#1490. Not sure why we didn't also fix this caller when we fixed that
bug last time :).
Sage Weil [Fri, 2 Dec 2011 17:58:45 +0000 (09:58 -0800)]
crush: ignore forcefed input that doesn't exist
This might happen if, e.g., the file_layout specifies an osd that later
is removed from the cluster entirely. Just ignore it instead of making
upper layers duplicate this check.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
And I change my mind.. I think this is most cleanly handled inside crush, so
we don't duplicate the same check that is generating the error with an different
data structure.