Gary Lowell [Wed, 24 Jul 2013 05:43:59 +0000 (22:43 -0700)]
configure.ac: Remove -rc suffix from the configure version number.
Remove the rc suffix since RPM complains about. For rc release
builds the "rc" in the git describe string is suffcient for
everyhting but RPM. For rc release builds (i.e. not gitbuilder)
add a flag to the spec file.
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Sage Weil [Wed, 24 Jul 2013 04:27:50 +0000 (21:27 -0700)]
global/signal_handler: use poll(2) instead of select(2)
Starting with commit 61a298c39c1a6684682e2b749e45a66d073182c8 we delay the
signal handler setup until after lots of other initialization has happened,
which can result in us having very large (>1024) open fds, which will
break the FD_SET macros for select(2). Use poll(2) instead.
Fixes: #5722 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Sage Weil [Mon, 22 Jul 2013 22:21:11 +0000 (15:21 -0700)]
client: signal mds sessions with Contexts instead of Conds
If we try to open an mds session and the MDS responds with close (aka,
"no"), we call _closed_mds_session() which signals the Cond*'s but then
deallocates the list. wait_on_list() then does a use-after-free trying
to remove itself.
Instead, use Context*'s, so that the waiter does not reference the list.
Fixes: #5689 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Dan Mick [Tue, 23 Jul 2013 04:54:16 +0000 (21:54 -0700)]
mon: "mds stat" must open/close section around dump_info
dump_info() got a new field outside the mdsmap section; it's ok for
the overall "report", but not for "mds stat". Add an enclosing section
in "mds stat". Fix test to expect new level.
Fixes: #5718 Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Danny Al-Gaaf [Tue, 23 Jul 2013 19:56:09 +0000 (21:56 +0200)]
ceph.spec.in: obsolete ceph-libs only on the affected distro
The ceph-libs package existed only on Redhat based distro,
there was e.g. never such a package on SUSE. Therefore: make
sure the 'Obsoletes' is only set on these affected distros.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Sage Weil [Tue, 23 Jul 2013 20:32:12 +0000 (13:32 -0700)]
mon/OSDMonitor: fix base case for 7fb3804fb workaround
After cluster creation, we have no full map stored and first_committed ==
1. In that case, there is no need for a full map, since we can get there
from OSDMap() and the incrementals.
Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Joao Eduardo Luis <joao@inktank.com>
Danny Al-Gaaf [Tue, 23 Jul 2013 19:56:09 +0000 (21:56 +0200)]
ceph.spec.in: obsolete ceph-libs only on the affected distro
The ceph-libs package existed only on Redhat based distro,
there was e.g. never such a package on SUSE. Therefore: make
sure the 'Obsoletes' is only set on these affected distros.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
mon: OSDMonitor: work around a full version bug introduced in 7fb3804fb
In 7fb3804fb860dcd0340dd3f7c39eec4315f8e4b6 we moved the full version
stashing logic to the encode_trim_extra() function. However, we forgot
to update the osdmap's 'latest_full' key that should always point to
the latest osdmap full version. This eventually degenerated in a missing
full version after a trim. This patch works around this bug by looking
for the latest available full osdmap version in the store and updating
'latest_full' to its proper value.
Related-to: #5704
Backport: cuttlefish
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
mon: OSDMonitor: update the osdmap's latest_full with the new full version
We used to do this on encode_full(), but since [1] we no longer rely on
PaxosService to manage the full maps for us. And we forgot to write down
the latest_full version to the store, leaving it in a truly outdated state.
rgw: translate swift request to s3 when forwarding
When forwarding a swift request to a different region, we
need to use the effective uri, and not just send the one
we got since we use S3 authentication for the forwarded
requests. This is achieved through a new using 'effective_uri'
param on the request info (which in swift ponts to the
plain bucket/object uri without the swift/v1 prefix(.
Also, rename the old req_state::effective_uri to relative_uri
in order to prevent confusion.
Currently doing it only when copying between regions. This is
needed so that the operation doesn't time out (as it can take
a long time and the web server may just hang on us since we're
not sending any data).
This is configurable and can be disabled. Currently only implemented
for S3.
Sage Weil [Sun, 21 Jul 2013 15:48:18 +0000 (08:48 -0700)]
mon/Paxos: fix pn for uncommitted value during collect/last phase
During the collect/last exchange, peers share any uncommitted values
with the leader. They are supposed to also share the pn under which
that value was accepted, but were instead using the just-accepted pn
value. This effectively meant that we *always* took the uncommitted
value; if there were multiples, which one we accepted depended on what
order the LAST messages arrived, not which pn the values were generated
under.
The specific failure sequence I observed:
- collect
- learned uncommitted value for 262 from myself
- send collect with pn 901
- got last with pn 901 (incorrect) for 200 (old) from peer
- discard our own value, remember the other
- finish collect phase
- ignore old uncommitted value
Fix this by storing a pending_v and pending_pn value whenever we accept
a value. Use this to send an appropriate pn value in the LAST reply
so that the leader can make it's decision about which uncommitted value
to accept based on accurate information. Also use it when we learn
the uncommitted value from ourselves.
We could probably be more clever about storing less information here,
for example by omitting pending_v and clearing pending_pn at the
appropriate point, but that would be more fragile. Similarly, we could
store a pn for *every* commit if we wanted to lay some groundwork for
having multiple uncommitted proposals in flight, but I don't want to
speculate about what is necessary or sufficient for a correct solution
there.
Fixes: #5698
Backport: cuttlefish, bobtail Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Mon, 22 Jul 2013 21:13:23 +0000 (14:13 -0700)]
mon/Paxos: only share uncommitted value if it is next
We may have an uncommitted value from our perspective (it is our lc + 1)
when the collector has a much larger lc (because we have been out for
the last few rounds). Only share an uncommitted value if it is in fact
the next value.
Samuel Just [Fri, 19 Jul 2013 22:56:52 +0000 (15:56 -0700)]
OSD::RemoveWQ: do not apply_transaction while blocking _try_resurrect_pg
Some callbacks take the osd lock, so we need to avoid blocking an
osd lock holding thread while waiting on a filestore callback.
Instead, just queue the transaction, and allow _try_resurrect_pg
to cancel us while we are waiting for the transaction to go through
(CLEARING_WAITING).
Fixes: #5672 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Danny Al-Gaaf [Sat, 20 Jul 2013 17:36:32 +0000 (19:36 +0200)]
test_cls_version.cc: don't free object twice, free the right one
Object 'librados::ObjectWriteOperation *op' is freed twice in the TEST
test_version_inc_read. Free instead 'librados::ObjectReadOperation *rop'
Related cppcheck warning:
[src/test/cls_version/test_cls_version.cc:79]: (error) Memory
pointed to by 'op' is freed twice.
This should also fix:
CID 1049247 (#1 of 1): Use after free (USE_AFTER_FREE)
deref_arg: Calling "librados::ObjectWriteOperation::~ObjectWriteOperation()"
dereferences freed pointer "op". (The dereference happens because this is
a virtual function call.)
CID 1049218 (#4 of 4): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable "rop" going out of scope leaks the storage it
points to.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Sat, 20 Jul 2013 17:00:50 +0000 (19:00 +0200)]
rgw: change RGWOp::name() to return string instead of char*
Return 'const string' instead of 'const char *' from RGWOp::name() to
avoid the usage of std::string:c_str() to return 'const char *' in
some cases in rgw_rest_replica_log.h.
Returning result of c_str() from a function is dangerous since the
result gets (may) invalid after the related string object gets
destroyed or out of scope (which is the case with return). So you
may end up with garbage in this case.
Related warning from cppcheck:
[src/rgw/rgw_rest_replica_log.h:39]: (error) Dangerous usage of
c_str(). The value returned by c_str() is invalid after this call.
[src/rgw/rgw_rest_replica_log.h:59]: (error) Dangerous usage of
c_str(). The value returned by c_str() is invalid after this call.
[src/rgw/rgw_rest_replica_log.h:79]: (error) Dangerous usage of
c_str(). The value returned by c_str() is invalid after this call
This should also fix:
CID 1049250 (#1 of 1): Wrapper object use after free (WRAPPER_ESCAPE)
escape: The internal representation of "s" escapes, but is destroyed
when it exits scope.
CID 1049251 (#1 of 1): Wrapper object use after free (WRAPPER_ESCAPE)
escape: The internal representation of "s" escapes, but is destroyed
when it exits scope.
CID 1049252 (#1 of 1): Wrapper object use after free (WRAPPER_ESCAPE)
escape: The internal representation of "s" escapes, but is destroyed
when it exits scope.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Sage Weil [Fri, 19 Jul 2013 23:36:01 +0000 (16:36 -0700)]
mon: OSDMonitor: only thrash and propose if we are the leader
'thrash_map' is only set if we are the leader, so we would thrash and
propose the pending value if we are the leader. However, we should keep
the 'is_leader()' check not only for clarity's sake (an unfamiliar reader
may cry OMGBUG, prompting to a patch much like this), but also because
we may lose a subsequent election and become a peon instead, while still
holding a 'thrash_map' value > 0 -- and we really don't want to propose
while being a peon.
Sage Weil [Fri, 19 Jul 2013 23:35:02 +0000 (16:35 -0700)]
mon/OSDMonitor: do not wait for readable in send_latest()
send_latest() checks for readable and, if untrue, will wait before sending
out the latest OSDMap. This is completely unnecessary; I think it is a
hold-over from when we have independent paxos states. An audit of all
callers confirms that everyone would be happy with whatever is committed,
even if we are in the process of committing an even newer version.
Effectively, everyone waits *above* this layer in the usual PaxosService
traps for whether we are readable or not. This means that waiting_for_map
and send_to_waiting() go away entirely, which is nice.
This addresses, among other things: send_to_waiting() is called from
update_from_paxos(), which can be called when we are not readable due to
the paxos commit/finish timing changes in f1ce8d7c955a24 and c711203c0d4b. If no subsequent update happens, those waiters never get
their maps.
Instead, we send them immediately--we know they are committed and old
history is as good as future history.
Fixes: #5643 Signed-off-by: Sage Weil <sage@inktank.com>
On peons, on_active() is only called when we *first* become active after an
election. Only on the leader is it called after each commit/update. This
makes this change cause other problems (broken subscriptions on peons, in
particular). We possibly should fix that, but there is also a simpler fix
for the original problem we were trying to solve.
The logic was a bit broken. Basically, we want to make sure
that region names are the same. However, if region name is not
set then we need to check whether it's the master region. This
can happen in upgrade cases where originally we didn't have
a region name set.
Multiple fixes:
- sync master, secondary entry point ver on creation
- use correct entry point version when removing entry point
- check correct version on bucket removal
was never initialized correctly anyway. It was only supposed to
be used for buckets, but it was never initialized in that case.
Using s->bucket_info.objv_tracker instead.
rgw: forward delete bucket request to master after removal
We can only forward the bucket removal to the master if it was
successfully removed locally.
The master region has no knowledge about whether the
bucket can be removed or not, e.g., there are still objects in the
bucket. If we send it to the master first, then it'll happily remove it
even though it might fail in the end.
We had a problem with bucket recreation, where we identified
that bucket has already existed, but missed the fact that it's
the same bucket, so removal of the bucket index was wrong.