The client side behavior here is correct: we should feed the raw pg into
osdmap->pg_to_acting_osds. The real problem is(was!) that pgp_num > pg_num
in current maps, which is illegal.
Sage Weil [Sat, 15 Jan 2011 00:57:38 +0000 (16:57 -0800)]
osd: drop messages from before we moved back to boot state
We want to make sure we ignore any messages sent to us before we moved
back to the boot state (after being wrongly marked down). This is only
a problem currently while we are in the BOOT state and waiting to be
re-added to the map, because we may then call _share_map_incoming and
send something on the new rebound messenger to an old peer. Also assert
that we are !booting there to be sure.
Yehuda Sadeh [Sat, 15 Jan 2011 00:29:41 +0000 (16:29 -0800)]
auth: new rotating secret ttl should depend on now() + ttl
Before it only depended on the previous rotating secret (which was
always bigger than g_clock.now()). Since the tickets rotation is
never being done exactly when the old ticket expires (probably takes
a few seconds after that), then we ended up having tickets that expire
much sooner than we expected.
Unit tests should not parse the normal "-c ceph.conf" command line
arguments, they should not read config files, etc. If something
needs initializing for a specific unit tests, we'll either fix it
to not need it, initialize it just for that, or figure some nicer
way of doing this.
Greg Farnum [Sat, 15 Jan 2011 00:22:11 +0000 (16:22 -0800)]
MDS: Use new C_Gather::get_num_remaining() in MDCache.
It was using get_num(), which now reports the number created.
This probably wouldn't have worked previously except that
~C_Gather::C_GatherSub was inappropriately calling rm_sub().
Greg Farnum [Sat, 15 Jan 2011 00:11:01 +0000 (16:11 -0800)]
C_Gather: Rewrite for thread safety.
Previously, C_Gather wasn't thread safe at all,
and there was an issue with creating subs while some
subs were being finished.
These issues are now fixed.
Sage Weil [Fri, 14 Jan 2011 06:08:56 +0000 (22:08 -0800)]
mds: use common helper to journal a client session close
We saw a bug where an ESession close was followed by an EMetaBlob on that
session (see 6d0dc4bf64b2792d6fc007268c5a42ae4e2e583c). My best guess is
that a session timeout raced with a request waiting on locks (only the
explicit client close path was calling request_kill). To avoid that,
introduce a helper to journal client close so that the common work (killing
any pending requests AND releasing prealloc inos) happen in all cases.
Sage Weil [Fri, 14 Jan 2011 06:08:40 +0000 (22:08 -0800)]
mds: tolerate (with warning) replayed op with bad prealloc_inos
This comes up when an ESesssion close is followed by an EMetaBlob that
uses a prealloc_ino. That isn't supposed to happen (it's probably a corner
case with session timeout vs a request waiting on locks that didn't
get killed/canceled?). But tolerate it during replay just the same.
Sage Weil [Thu, 13 Jan 2011 21:14:24 +0000 (13:14 -0800)]
filejournal: rewrite completion handling, fix ordering on full->notfull
Rewriting the completion handling to be simpler, clearer, so that it is
easier to maintain a strict completion ordering invariant.
This also fixes an ordering bug: When restarting journal, we defer
initially until we get a committed_thru from the previous commit and then
do all those completions. That same logic needs to also apply to new items
submitted during that commit interval. This was broken before, but the
simpler structure fixes it. Fixes #666.
Tested-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Sage Weil <sage@newdream.net>
Samuel Just [Thu, 13 Jan 2011 20:18:17 +0000 (12:18 -0800)]
PG: activate should not enqueue snap_trimmer on a replica
Previously, activate would queue_snap_trim() for replicas if snap_trimq
ended up non-empty, guaranteeing a crash for any replica starting up
while purged_snaps lagged behind pool->cached_removed_snaps.
This should fix #702.
Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
Samuel Just [Wed, 12 Jan 2011 23:09:51 +0000 (15:09 -0800)]
ReplicatedPG: Fix oi.size bug in _rollback_to
_rollback_to calls _delete_head before cloning the clone into place.
_delete_head sets the object info size to 0. _rollback_to now resets
the size to match the rolled back object. Previously, this bug
manifested as a failed assert in scrub when checking the object sizes.
Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
Samuel Just [Wed, 12 Jan 2011 21:51:55 +0000 (13:51 -0800)]
ReplicatedPG: register_object_context and register_snapset_context cleanup
Previously, get_object_context and get_snapset_context did not register
the resulting objects. In some cases, these objects would not get
registered and multiple copies would end up created. This caused a bug
in find_object_context where get_snapset_context could return an object
distinct from the one referenced by the object returned from
get_object_context.
Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
Samuel Just [Wed, 12 Jan 2011 20:07:44 +0000 (12:07 -0800)]
ReplicatedPG: snap_trimmer work around
Currently, an OSD bug is causing snap_trimq to contain some snaps
already in purged_snaps. This work around should let kvmtest
come back up. A real fix is still needed.
Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
Greg Farnum [Tue, 4 Jan 2011 21:32:47 +0000 (13:32 -0800)]
uclient: Switch how inodes link to dentries a bit.
Inodes now have a set of parent dentries, rather than a single
pointer. This allows the cache to accurately represent multiple
hard links.
Various minor adjustments were made so that this change in
format works and is error checked.
Making oldest_update a class variable complicates log merging and wastes
space in the PG struct. Even though memory is big, cachelines are still
small. Just calculate it when we need it.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Tommi Virtanen [Tue, 11 Jan 2011 22:02:16 +0000 (14:02 -0800)]
Git ignored files cleanup.
Make gitignore entries not match recursively.
I wanted to introduce a directory "osdmaptool" to contain cli tests
for that tool, but all the files there were ignored because of these
rules. Better be explicit about what you want ignored.
Move all ignores for generated binaries to be together.
Samuel Just [Mon, 10 Jan 2011 22:45:06 +0000 (14:45 -0800)]
ReplicatedPG: Fix bug in rollback
Previously, _rollback_to assumed that the rollback was a noop if
ctx->clone_obc was set and it's prior version matches head's version.
However, this broke in sequences like:
Write "snap1 contents" to oid "blah"
create snapshot "snap1"
Write "snap2 contents" to oid "blah"
create snapshot "snap2"
rollback oid "blah" to snapshot "snap1"
In this case, make_writeable would have just cloned head to the snap2
clone, but the relevant clone is actually "snap1". _rollback_to now
verifies that the most recent clone is the correct one before assuming
that head is already correct.
Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
Tommi Virtanen [Fri, 7 Jan 2011 21:15:40 +0000 (13:15 -0800)]
Use Google Test framework for unit tests.
Use ``make check`` to run the tests.
The src/gtest directory comes from ``svn export
http://googletest.googlecode.com/svn/tags/release-1.5.0 src/gtest``
and running "git add -f src/gtest".
gtest is licensed under the New BSD license, see src/gtest/COPYING.
For more on Google Test, see http://code.google.com/p/googletest/
Changed autogen.sh regenerate gtest automake files too. Make sure to
run ``./autogen.sh && ./configure`` after merging this commit, or
incremental builds may fail. The automake integration is inspired
heavily by the protobuf project, and may still be problematic.
Make git ignore files generated by gtest compilation.
Currently putting in just one new-style unit test, refactoring old
tests to fit will come in separate commits.
Note: if you are starting daemons, listening on TCP ports, using
multiple machines, mounting filesystems, etc, it's not a unit test
and does not belong in this setup. A framework for system/integration
tests will be provided later.