Josh Durgin [Wed, 18 May 2011 00:36:39 +0000 (17:36 -0700)]
PG: update same_acting_since when acting or up changes
This is a hack since we currently use same_up_since to denote the beginning of an interval.
We should probably change this usaged or rename it to same_interval since.
Sage Weil [Tue, 17 May 2011 17:10:45 +0000 (10:10 -0700)]
msgr: avoid clearing connection_state on pipe replacement
read_message and write_message both dereference connection-state, so avoid
clearing it when replacing a pipe.
read_message still uses it to find rx_buffers in ways that may interfere
when two Pipes reference the connection, but currently that is only used
for lossy pipes. We could still take pipe_lock in that case, but it is
only an optimization (we copy the data if the buffers don't get used
directly) and probably not worth bothering with.
Sage Weil [Mon, 16 May 2011 21:47:29 +0000 (14:47 -0700)]
client: update ctime for auth, xattr
This mirrors the kclient fix in d8672d64. The client can have a newer
ctime due to auth or xattr excl caps. This fixes cases where ctime goes
backwards due to the right sequence of local operations and replies
from the MDS.
Sage Weil [Fri, 13 May 2011 20:01:52 +0000 (13:01 -0700)]
osd: lazily close connections to down peers
If we hear from a peer that should be dead, tell them, but mark our
connection so that it will close after that message is delivered or if
it encounters any errors.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 13 May 2011 20:01:08 +0000 (13:01 -0700)]
msgr: mark_down_on_empty and mark_disposable
Mark a connection to close when messages are sent, and to close on any
error. We can use this to tell people who should be dead that they should
be dead, but not waste resources reconnecting to them.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Samuel Just [Sat, 14 May 2011 00:30:50 +0000 (17:30 -0700)]
PG: Only pull the master log from a member of the prior_set
There must be a member of the prior_set such that no other
osd has a more recent last_update. This way, prior_set_affected
will ensure that we reset peering if the master log source
goes down.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Need to do this to get librgw to be usable as a standalone library
without unresolved symbols. Also, this makes it consistent with the rest
of the log level settings.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
libcommon depends on this file, and there's no other library that it
could go in. It is certainly silly to manually include it in every
application and library that uses libcommon.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
The DST_CONSISTENCY variable allows us to specify that the destination
is expected to use read-after-write consistency. If that is the case, we
don't have to do slow retries if certain operations fail.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Greg Farnum [Fri, 13 May 2011 23:22:50 +0000 (16:22 -0700)]
uclient: do not accept max_size changes unless they're from auth mds.
Unlike most of the cap options, max_size is an inode member. This meant
that if we got a shared cap grant from a replica MDS, we would set
the max_size to 0!
This caused hangs because when the client would request a new, larger
max_size from the auth MDS then the auth MDS would see the new size
as being smaller than the current max, and drop the message as
being spurious.
Greg Farnum [Wed, 11 May 2011 00:09:07 +0000 (17:09 -0700)]
MDS: don't journal slave ops if we only have caps.
Previously we wanted to journal if we had caps on something. Now
that we're being strict about only journaling stuff we're auth for,
that's a bad choice to make.
Greg Farnum [Wed, 11 May 2011 00:05:12 +0000 (17:05 -0700)]
uclient: be more careful about sending caps.
This should prevent us from "losing" caps off the dirty list. See
#1063. If we have dirty caps we don't want to short-circuit out
of sending caps just because what we're issued matches what we want.
Sage Weil [Thu, 12 May 2011 00:56:32 +0000 (17:56 -0700)]
objecter: set pgls start_epoch field
For each pg, start out with start_epoch = 0 in the first request. For
subsequent requests, set it to the first reply's epoch. This forces the
OSD to ignore our cookie and "restart" if the pg mapping changes and there
is a possibility of incomplete results.
The price we pay is the possibility of duplicate results.
Fixes: #1030 Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 12 May 2011 00:55:00 +0000 (17:55 -0700)]
osd: add pgls start_epoch field
If the pgls.start_epoch is set, the cookie is only considered valid if the
osd pg interval has not changed since then. If it has, then the cookie
is no longer valid and is ignored, effectively restarting the pgls process.
Old clients never set this and are unaffected.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 11 May 2011 23:27:18 +0000 (16:27 -0700)]
osd: prepend missing objects to pgls results
This will prepend any missing objects to the set of objects returned by
a sequence of PGLS operations. Because recovery can progress in parallel,
we may get some objects returned twice (first as a missing item, later
because it is on disk). This is better than not getting it at all. The
client will need to uniq the results as needed.
Because the missing set is guaranteed not to grow over a given mapping
interval only, the client should restart the whole PGLS sequence if the
pg primary changes.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 11 May 2011 20:58:09 +0000 (13:58 -0700)]
osd: key Missing::rmissing on version (not eversion)
This switches the key to the uint64_t (version_t) only, which is still
unique given a particular timeline (which is all we care about given a
particular Missing::missing). The last_requested pointer is updated
accordingly.
This will facilitate a hack to make PGLS work for degraded PGs (by using
the rmissing version offset as a cookie).
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Fix the obsync unit tests to take into account the new ACL changes.
ACLs must be either translated or ignored when copying between the
main and alt test buckets.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>