Sage Weil [Mon, 29 Apr 2013 18:06:36 +0000 (11:06 -0700)]
mon: remap creating pgs on startup
After Monitor::init_paxos() has loaded all of the PaxosService state,
we should then map creating pgs to osds. This ensures we do so after the
osdmap has been loaded and the pgs actually map somewhere meaningful.
Fixes: #4675 Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Mon, 29 Apr 2013 18:11:24 +0000 (11:11 -0700)]
mon: only map/send pg creations if osdmap is defined
This avoids calculating new pg creation mappings if the osdmap isn't
loaded yet, which currently happens when during Monitor::paxos_init()
on startup. Assuming osdmap epoch is nonzero, it should always be
safe to do this (although possibly unnecessary).
More cleanup here is certainly possible, but this is one step toward fixing
the bad behavior for #4675.
Sage Weil [Mon, 29 Apr 2013 17:45:31 +0000 (10:45 -0700)]
client: make dup reply a louder error
If we get a dup reply something is probably wrong! We should make sure
it appears more loudly in the log. In particular, it can lead to out
of sync cap state; see #4853.
Sage Weil [Mon, 29 Apr 2013 17:44:28 +0000 (10:44 -0700)]
client: fix session open vs mdsmap race with request kicking
A sequence like:
- ceph-fuse starts, make_request on getattr
- waits for mds to be active
- tries to open a session
- mds restarts, recovers
- eventually gets session open reply
- sends first getattr (even tho mds is in reconnect state)
- gets mdsmap update that mds is now active
- kicks request, resends getattr
- get first reply
- ignore second reply, caps get out of sync
The bug is that we send the first request when the MDS is still in
the reconnect state. The fix is to loop in make_request so that we
ensure all conditions are satisfied before sending the request. Any
time we wait, we loop, so that we know all conditions (still) pass if
we make it to the end.
Fixes: #4853 Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Sun, 28 Apr 2013 00:59:24 +0000 (17:59 -0700)]
ceph-filestore-dump: fix warnings on i386 build
tools/ceph-filestore-dump.cc: In member function ‘int header::get_header()’:
warning: tools/ceph-filestore-dump.cc:454:19: comparison between signed and unsigned integer expressions [-Wsign-compare]
tools/ceph-filestore-dump.cc: In member function ‘int footer::get_footer()’:
warning: tools/ceph-filestore-dump.cc:471:19: comparison between signed and unsigned integer expressions [-Wsign-compare]
tools/ceph-filestore-dump.cc: In member function ‘int super_header::read_super()’:
warning: tools/ceph-filestore-dump.cc:697:30: comparison between signed and unsigned integer expressions [-Wsign-compare]
Gary Lowell [Fri, 26 Apr 2013 08:53:08 +0000 (01:53 -0700)]
debian/rules: Fix tcmalloc breakage
Since all currently supported platforms have tcmalloc
available and it is now the default, remove broken check code
that turns it off if the package is not listed in build-depends.
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Sage Weil [Fri, 26 Apr 2013 19:22:28 +0000 (12:22 -0700)]
mon: cache osd epochs
The monitor may get a series of messages from the OSD that prompt it to
send incremental maps (pg_temp updates, failures, probably more). Avoid
sending the same incremental maps twice by keeping a cache of what epochs
we think the OSDs have.
This reduces monitor load, especially when the mon is a bit behind and is
getting a stream of delayed messages, and the work associated with sending
the inc maps prevents it from catching up.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
There was an issue when limit was being set, we didn't
break from the iterating loop if limit was reached. Also,
S3 does not enforce any limit, so keep that behavior.
Sage Weil [Fri, 26 Apr 2013 17:32:38 +0000 (10:32 -0700)]
mon: mark PaxosServiceMessage forward fields deprecated
These are no longer used; we manage forward state explicitly via the
Monitor sessions instead. Mark them deprecated so we don't accidentally
rely on them. Also, fix the annoying "mon.-1" garbage debug output that
is confusing.
Dan Mick [Fri, 26 Apr 2013 07:04:13 +0000 (00:04 -0700)]
debian/rules: use multiline search to look for Build-Depends
When Build-Depends was split into multiple lines (in commit 8f5c665744e58d6d51a1e86de55c1399f51cc1c3), the grep for
libgoogle-perftools-dev broke. Replace grep with perl for multiline
matching.
Sam Lang [Thu, 25 Apr 2013 23:52:06 +0000 (18:52 -0500)]
client: don't embed cap releases in clientreplay
If the client is sending replay requests, avoid sending embedded caps,
since the mds already has the client's caps from the reconnect.
This matches the behavior of the kernel client.
Fixes #4742. Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Dan Mick [Fri, 26 Apr 2013 07:04:13 +0000 (00:04 -0700)]
debian/rules: use multiline search to look for Build-Depends
When Build-Depends was split into multiple lines (in commit 8f5c665744e58d6d51a1e86de55c1399f51cc1c3), the grep for
libgoogle-perftools-dev broke. Replace grep with perl for multiline
matching.
Fixes: #4818 Signed-off-by: Dan Mick <dan.mick@inktank.com>
Sage Weil [Thu, 25 Apr 2013 23:47:15 +0000 (16:47 -0700)]
mon: do not forward other mon's requests to other mons
The request forwarding infrastructure is there for client requests.
However, we (ab)use it for mon's sending MLog messages: LogClient sends an
MLog message to itself, and that is either handled locally (if leader) or
forwarded to the leader.
If that races with an election, we were forwarding an MLog from another mon
to the leader. This is not necessary; the original MLog sender will resend
the request on election_finish() to the latest leader.
The fix is to adjust forward_request_leader() to only forward messages from
a mon if that mon is itself.
This was reproduced while testing the fix for #4748.
Samuel Just [Thu, 25 Apr 2013 21:08:57 +0000 (14:08 -0700)]
PG: clear want_acting when we leave Primary
This is somewhat annoying actually. Intuitively we want to
clear_primary_state when we leave primary, but when we restart
peering due to a change in prior set status, we can't afford
to forget most of our peering state. want_acting, on the
other hand, should never persist across peering attempts.
In fact, in the future, want_acting should be pulled into
the Primary state structure.
Fixes: #3904 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com>
Sage Weil [Thu, 25 Apr 2013 22:18:42 +0000 (15:18 -0700)]
mon: get own entity_inst_t via messenger, not monmap
There are intervals during bootstrap(*) during which we are part of the
monmap, but our name (mon->name) does not match the monmap's. This means
that calling monmap->get_inst(mon->name) is not a safe way to get our own
entity_inst_t.
Instead, use messenger->get_myinst(). This includes our addr (obviously)
and an up-to-date entity_name_t, too: in bootstrap we adjust the messenger
name at the same time as mon->rank, based on the contents of the monmap.
monmap->get_inst(mon->rank) would work too.
* During mkfs, the monmap may have noname-foo instead of the name if it was
generated from the mon_host lines or dns or whatever by
MonMap::build_initial(). This was the case for #4811.
Fixes: #4811 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>