Sage Weil [Wed, 13 Mar 2013 02:44:20 +0000 (19:44 -0700)]
mds: mark con for closed session disposable
If there is a fault while delivering the message, close the con. This will
clean up the Session state from memory. If the client doesn't get the
CLOSED message, they will reconnect (from their perspective, it is still
a lossless connection) and get a remote_reset event telling them that the
session is gone. The client code already handles this case properly.
Note that way back in 4ac45200f10e0409121948cea5226ca9e23bb5fb we removed
this because the client would reuse the same connection when it reopened
the session. Now the client never does that; it will mark_down the con
as soon as it is closed and open a new one for a new session... which means
the MDS will get a remote_reset and close out the old session.
Sage Weil [Wed, 13 Mar 2013 23:06:02 +0000 (16:06 -0700)]
client: validate/lookup mds session in each message handler
For every message handler, look up the MetaSession by int mds and verify
that the Connection* matches properly. If so, proceed; otherwise, discard
the message.
In the future, we probably want to link the MetaSession to the Connection's
priv field, but that can come later.
Clean up a bunch of submethods that take int mds while we're here.
Sage Weil [Fri, 8 Mar 2013 21:17:23 +0000 (13:17 -0800)]
client: handle ESTALE redirection in make_request(), not reply handler
Resending the request in the reply handler is a bit fugly and throws a
small wrench into moving to a MetaSession*-based approach. Check for
the case(s) where we *do* return ESTALE explicitly and fall through.
Otherwise, kick the caller and let them retry.
Sage Weil [Fri, 8 Mar 2013 20:09:03 +0000 (12:09 -0800)]
client: pass around MetaSession* instead of int mds
This is mostly just shuffling argument types around. In a few cases we
now assert that the session actually exists; these would have also been
problematic before when we call get_inst() on bad addrs or something, or
silently ignored bugs.
mon: Paxos: only finish a queued proposal if there's actually *any*
When proposing an older value learned during recovery, we don't create
a queued proposal -- we go straight through Paxos. Therefore, when
finishing a proposal, we must be sure that we have a proposal in the queue
before dereferencing it, otherwise we will segfault.
Fixes: #4250 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Yehuda Sadeh [Tue, 12 Mar 2013 19:56:01 +0000 (12:56 -0700)]
rgw: set up curl with CURL_NOSIGNAL
Fixes: #4425
Backport: bobtail
Apparently, libcurl needs that in order to be thread safe. Side
effect is that if libcurl is not compiled with c-ares support,
domain name lookups are not going to time out.
Issue affected keystone.
Danny Al-Gaaf [Tue, 12 Mar 2013 17:21:40 +0000 (18:21 +0100)]
mon/Monitor.h: return string instead of 'char *' from get_state_name()
Return a string instead of 'char *' to avoid usage of std::string:c_str()
to return a 'char *' from get_state_name().
Returning result of c_str() from a function is dangerous since the
result gets (may) invalid after the related string object gets
destroyed or out of scope (which is the case with return). So you may
end up with garbage in this case.
Related warning from cppcheck:
[src/mon/Monitor.h:172]: (error) Dangerous usage of c_str(). The value
returned by c_str() is invalid after this call.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Josh Durgin [Sat, 9 Mar 2013 02:57:24 +0000 (18:57 -0800)]
librbd: invalidate cache when flattening
The cache stores which objects don't exist. Flatten bypasses the cache
when doing its copyups, so when it is done the -ENOENT from the cache
is treated as zeroes instead of 'need to read from parent'.
Clients that have the image open need to forgot about the cached
non-existent objects as well. Do this during ictx_refresh, while the
parent_lock is held exclusively so no new reads from the parent can
happen until the updated parent metadata is visible, so no new reads
from the parent will occur.
Josh Durgin [Sat, 9 Mar 2013 01:53:31 +0000 (17:53 -0800)]
ObjectCacher: add a method to clear -ENOENT caching
Clear the exists and complete flags for any objects that have exists
set to false, and force any in-flight reads to retry if they get
-ENOENT instead of generating zeros.
This is useful for getting the cache into a consistent state for rbd
after an image has been flattened, since many objects which previously
did not exist and went up to the parent to retrieve data may now exist
in the child.
Josh Durgin [Sat, 9 Mar 2013 01:49:27 +0000 (17:49 -0800)]
ObjectCacher: keep track of outstanding reads on an object
Reads always use C_ReadFinish as a callback (and they are the only
user of this callback). Keep an xlist of these for each object, so
they can remove themselves as they finish. To prevent racing requests
and with discard removing objects from the cache, clear the xlist in
the object destructor, so if the Object is still valid the set_item
will still be on the list.
Make the ObjectCacher constructor take an Object* instead of the pool
and object id, which are derived from the Object* anyway.
On second thought, this will require a bit more care to ensure that all
of the paths radosgw needs to read/write from have the correct permissions
in the packages and so forth.
This increase only means that we'll keep more versions around before we
trim. It doesn't change the number of versions we'll keep around after
trimming (that's still as much as 'paxos_max_join_drift', i.e. 10), nor
does it change the criteria used to consider a monitor as having drifted
(same rule applies, 'paxos_max_join_drift').
This change however will enable the leader to put off trimming for a longer
period of time, giving a better chance for a monitor to join the cluster.
See, after going through the probing phase, at which point a monitor may
only be, say, 5 versions off, the same monitor may end up getting into the
quorum only to find that in-between probing and finally triggering an
election some 6 versions might have come to existence. Before this patch,
by then the state had been trimmed and the monitor would have to bootstrap
to perform a full store sync. With this patch in place, the monitor would
be able to sync the remaining 11 versions.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>