Dan Mick [Wed, 12 Jun 2013 02:46:53 +0000 (19:46 -0700)]
ceph: make life easier on developers by handling in-tree runs
If <path-to-ceph> contains pybind and .libs:
- prepend <path-to-ceph>/pybind to PYTHONPATH
- append <path-to-ceph>/.libs to LD_LIBRARY_PATH if not already there
and exec self so it takes effect
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 11 Jun 2013 23:30:41 +0000 (16:30 -0700)]
mon: adjust trim defaults
User testing has shown that smaller values yield better results; see #4917.
Jim's testing has had good results with even more aggressive trimming, but I
would like to do more validation yet before changing defaults.
Sage Weil [Sun, 9 Jun 2013 00:38:07 +0000 (17:38 -0700)]
client: set issue_seq (not seq) in cap release
We regularly have been observing a stall where the MDS is blocked waiting
for a cap revocation (Ls, in our case) and never gets a reply. We finally
tracked down the sequence:
- mds issues cap seq 1 to client
- mds does revocation (seq 2)
- client replies
- much time goes by
- client trims inode from cache, sends release with seq == 2
- mds ignores release because its issue_seq is 1
- mds later tries to revoke other caps
- client discards message because it doesn't have the inode in cache
The problem is simply that we are using seq instead of issue_seq in the
cap release message. Note that the other release call site in
encode_inode_release() is correct. That one is much more commonly
triggered by short tests, as compared to this case where the inode needs to
get pushed out of the client cache.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Loic Dachary [Fri, 7 Jun 2013 13:51:28 +0000 (15:51 +0200)]
unit tests for PGLog::merge_log
The tests covers 100% of the LOC of merge_log. It is broken down
in 7 cases to enumerate all the situations it must address. Each case
is isolated in a independant code block where the conditions are
reproduced. Where possible and sensible to read, a code block covers
as much lines as possible. For instance:
The log entry (1,3) deletes the object x9 but the olog entry (2,3)
modifies it and is authoritative : the log entry (1,3) is divergent.
is the only test case covering a dozen "if" statements and half a
dozen "while/for" loops. It covers all the lines but it would be
useful to create others scenarii in the future.
Each test is made of a comment describing the test case, the
definition of the data structures to create the desired conditons, a
sequence of EXPECT_* checking that they are met, a single call to
merge_log and another sequence of EXPECT_* ( ordered to be easy to
compare with the first sequence ) checking all the desired side
effects.
The TestPGLog.cc file was untabified to improve the display of ascii
art when it is output as part of a diff.
Sage Weil [Sun, 9 Jun 2013 04:38:18 +0000 (21:38 -0700)]
librados: fix pg command test
Stat a bunch of (non-existent) random objects in the pool so ensure the
pg exists on the OSD before we assert that we get a 0 from querying it.
Although it is somewhat tempting to make the pg commands block until the
pg exists, that defeats much of the value of the command as a diagnostic
tool as it could block indefinitely instead of informing the admin/dev
that "the pg isn't there yet".
Dan Mick [Fri, 7 Jun 2013 23:39:34 +0000 (16:39 -0700)]
ceph: old daemons output to outs and outbuf, combine
When talking to old daemons, if a command succeeds, there may be
output on outs, outbuf, or both; combine them if there's no error,
and clear outs so it's not treated as stderr fodder.
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Dan Mick [Fri, 7 Jun 2013 23:24:28 +0000 (16:24 -0700)]
ceph: handle old OSDs as command destinations, fix status part of -w
For osd tell or pg <pgid> commands, the CLI sends the command directly
to the OSD; if the OSDs are still old, the command needs to be sent
in 'plain' (non-JSON) form. Also, the 'ceph status' from -w needs to
handle failure/fallback-to-old-command.
Refactor the guts of json_command() into send_command(), and call it
from json_command() and where needed for old-style commands.
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Yehuda Sadeh [Fri, 7 Jun 2013 04:53:00 +0000 (21:53 -0700)]
rgw: handle deep uri resources
In case of deep uri resources (ones created beyond a single level
of hierarchy, e.g. auth/v1.0) we want to create a new empty
handlers for the path if no handlers exists. E.g., for
auth/v1.0 we need to have a handler for 'auth', otherwise
the default S3 handler will be used, which we don't want.
Yehuda Sadeh [Fri, 7 Jun 2013 04:47:21 +0000 (21:47 -0700)]
rgw: fix get_resource_mgr() to correctly identify resource
Fixes: #5262
The original test was not comparing the correct string, ended up
with the effect of just checking the substring of the uri to match
the resource.
Sage Weil [Thu, 6 Jun 2013 23:35:54 +0000 (16:35 -0700)]
osd: do not include logbl in scrub map
This is a potentially use object/file, usually prefixed by a zeroed region
on disk, that is not used by scrub at all. It dates back to f51348dc8bdd5071b7baaf3f0e4d2e0496618f08 (2008) and the original version of
scrub.
This *might* fix #4179. It is not a leak per se, but I observed 1GB
scrub messages going over the write. Maybe the allocations are causing
fragmentation, or the sub_op queues are growing.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Note that a cursory look at the recovery code makes me think this needs
a much more serious overhaul. In particular, I don't think we should
be triggering recovery when transitioning *from* a stable state, but
explicitly when we are flagged, or when gathering. We should probably
also hold a wrlock over the recovery period and remove the force_wrlock
kludge from the final size check. Opened ticket #5268.
Dan Mick [Fri, 7 Jun 2013 00:40:28 +0000 (17:40 -0700)]
ceph, librados, rados.py, librados tests: pass cmd as array
Using ceph to pass commands to the old monitor requires the
message to have words in a vector; this means that we need to pass
the command as an array to rados_mon_command. Really, all of the
rados_X_command functions should take an array for flexibility and
parallel structure, so change them all, and the Python bindings,
and the test programs that use them.
Dan Mick [Thu, 6 Jun 2013 01:11:51 +0000 (18:11 -0700)]
librados, rados.py: rados_create2: add clustername and future flags
rados.py also gets "conf_defaults" dict for things you might want to
default in your app differently before ceph.conf gets to them; currently
used for ceph CLI to be able to set log_to_stderr/err_to_stderr true,
among others.
Sage Weil [Wed, 5 Jun 2013 14:55:46 +0000 (07:55 -0700)]
mon: upgrade auth database on leader
If we are the leader, and the auth database has not yet been upgraded,
do so. The upgrade consists of translating old-style (pre-v0.64) caps
to new-style caps (e.g., 'allow profile bootstrap-osd'). This happens
once and the conversion takes the form of a normal paxos transaction.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
Sage Weil [Sat, 1 Jun 2013 04:23:45 +0000 (21:23 -0700)]
mon: fix preforker exit behavior behavior
In 3c5706163b72245768958155d767abf561e6d96d we made exit() not actually
exit so that the leak checking would behave for a non-forking case.
That is only needed for the normal exit case; every other case expects
exit() to actually terminate and not continue execution.
Instead, make a signal_exit() method that signals the parent (if any)
and then lets you return. exit() goes back to it's usual behavior,
fixing the many other calls in main().
Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>