Sage Weil [Thu, 13 Jun 2013 23:39:30 +0000 (16:39 -0700)]
librados: wait for osdmap for commands that need it
In commit 7e1cf87b5158c870e2a118ed6d316be8cb9818ce we stopped waiting for
the osdmap on start because the Objecter will normally wait, but for some
commands we assume the osdmap is recent(ish).
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Sage Weil [Thu, 13 Jun 2013 18:27:49 +0000 (11:27 -0700)]
mon/MonmapMonitor: remove unused label
mon/MonmapMonitor.cc: In member function 'bool MonmapMonitor::preprocess_command(MMonCommand*)':
mon/MonmapMonitor.cc:273:2: warning: label 'out' defined but not used [-Wunused-label]
Dan Mick [Thu, 13 Jun 2013 01:08:17 +0000 (18:08 -0700)]
ceph, mon/OSDMonitor: fix up osd crush commands for <osd.N> or <N>
The new parsing code had been trying to allow flexibility for the
'old form' commands (where id could be different from N in osd.N),
but also accept 'new form' commands. The new rule is that where
there's an OSD specified in the osd crush command, it is of type
CephOsdName, which can be an id *or* 'osd.<id>', but not both.
Pass CephOsdName as int64_t 'id' for convenience in mon code
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 12 Jun 2013 23:36:21 +0000 (16:36 -0700)]
mon/MonClient: send commands to a specific monitor
This implementation is limited: we direct our command by reopening
a session with the specific monitor. If there is more than one of these
queued we will fail to reach either.
Sage Weil [Thu, 13 Jun 2013 01:13:12 +0000 (18:13 -0700)]
osdc/Objecter: fix handling for osd_command dne/down cases
Generalize the map check machinery that the pool dne check uses to also
get the latest map for OSD down/dne checks. This is better semantics, but
more important fixes the more immediate bug of returning the error code
to the caller from the osd_command -> _submit_command (that is ignored by
pretty much any caller) and then never triggering the callback.
Fixes: #5331 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Sage Weil [Wed, 12 Jun 2013 18:23:23 +0000 (11:23 -0700)]
mon: fix read of format_version out of leveldb
The get_version(string, string) is the wrong method; it combines the two
args into a key that is nested inside prefix (so it's prefix/a/b), but we
want perfix/format_version. Add a method to grab an int for this
particular combo and use that.
This fixes an infinite loop when we actually trigger this code.
Dan Mick [Wed, 12 Jun 2013 02:46:53 +0000 (19:46 -0700)]
ceph: make life easier on developers by handling in-tree runs
If <path-to-ceph> contains pybind and .libs:
- prepend <path-to-ceph>/pybind to PYTHONPATH
- append <path-to-ceph>/.libs to LD_LIBRARY_PATH if not already there
and exec self so it takes effect
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 11 Jun 2013 23:30:41 +0000 (16:30 -0700)]
mon: adjust trim defaults
User testing has shown that smaller values yield better results; see #4917.
Jim's testing has had good results with even more aggressive trimming, but I
would like to do more validation yet before changing defaults.
Sage Weil [Sun, 9 Jun 2013 00:38:07 +0000 (17:38 -0700)]
client: set issue_seq (not seq) in cap release
We regularly have been observing a stall where the MDS is blocked waiting
for a cap revocation (Ls, in our case) and never gets a reply. We finally
tracked down the sequence:
- mds issues cap seq 1 to client
- mds does revocation (seq 2)
- client replies
- much time goes by
- client trims inode from cache, sends release with seq == 2
- mds ignores release because its issue_seq is 1
- mds later tries to revoke other caps
- client discards message because it doesn't have the inode in cache
The problem is simply that we are using seq instead of issue_seq in the
cap release message. Note that the other release call site in
encode_inode_release() is correct. That one is much more commonly
triggered by short tests, as compared to this case where the inode needs to
get pushed out of the client cache.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Loic Dachary [Fri, 7 Jun 2013 13:51:28 +0000 (15:51 +0200)]
unit tests for PGLog::merge_log
The tests covers 100% of the LOC of merge_log. It is broken down
in 7 cases to enumerate all the situations it must address. Each case
is isolated in a independant code block where the conditions are
reproduced. Where possible and sensible to read, a code block covers
as much lines as possible. For instance:
The log entry (1,3) deletes the object x9 but the olog entry (2,3)
modifies it and is authoritative : the log entry (1,3) is divergent.
is the only test case covering a dozen "if" statements and half a
dozen "while/for" loops. It covers all the lines but it would be
useful to create others scenarii in the future.
Each test is made of a comment describing the test case, the
definition of the data structures to create the desired conditons, a
sequence of EXPECT_* checking that they are met, a single call to
merge_log and another sequence of EXPECT_* ( ordered to be easy to
compare with the first sequence ) checking all the desired side
effects.
The TestPGLog.cc file was untabified to improve the display of ascii
art when it is output as part of a diff.
Sage Weil [Sun, 9 Jun 2013 04:38:18 +0000 (21:38 -0700)]
librados: fix pg command test
Stat a bunch of (non-existent) random objects in the pool so ensure the
pg exists on the OSD before we assert that we get a 0 from querying it.
Although it is somewhat tempting to make the pg commands block until the
pg exists, that defeats much of the value of the command as a diagnostic
tool as it could block indefinitely instead of informing the admin/dev
that "the pg isn't there yet".
Dan Mick [Fri, 7 Jun 2013 23:39:34 +0000 (16:39 -0700)]
ceph: old daemons output to outs and outbuf, combine
When talking to old daemons, if a command succeeds, there may be
output on outs, outbuf, or both; combine them if there's no error,
and clear outs so it's not treated as stderr fodder.
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Dan Mick [Fri, 7 Jun 2013 23:24:28 +0000 (16:24 -0700)]
ceph: handle old OSDs as command destinations, fix status part of -w
For osd tell or pg <pgid> commands, the CLI sends the command directly
to the OSD; if the OSDs are still old, the command needs to be sent
in 'plain' (non-JSON) form. Also, the 'ceph status' from -w needs to
handle failure/fallback-to-old-command.
Refactor the guts of json_command() into send_command(), and call it
from json_command() and where needed for old-style commands.
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Yehuda Sadeh [Fri, 7 Jun 2013 04:53:00 +0000 (21:53 -0700)]
rgw: handle deep uri resources
In case of deep uri resources (ones created beyond a single level
of hierarchy, e.g. auth/v1.0) we want to create a new empty
handlers for the path if no handlers exists. E.g., for
auth/v1.0 we need to have a handler for 'auth', otherwise
the default S3 handler will be used, which we don't want.
Yehuda Sadeh [Fri, 7 Jun 2013 04:47:21 +0000 (21:47 -0700)]
rgw: fix get_resource_mgr() to correctly identify resource
Fixes: #5262
The original test was not comparing the correct string, ended up
with the effect of just checking the substring of the uri to match
the resource.
Sage Weil [Thu, 6 Jun 2013 23:35:54 +0000 (16:35 -0700)]
osd: do not include logbl in scrub map
This is a potentially use object/file, usually prefixed by a zeroed region
on disk, that is not used by scrub at all. It dates back to f51348dc8bdd5071b7baaf3f0e4d2e0496618f08 (2008) and the original version of
scrub.
This *might* fix #4179. It is not a leak per se, but I observed 1GB
scrub messages going over the write. Maybe the allocations are causing
fragmentation, or the sub_op queues are growing.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Note that a cursory look at the recovery code makes me think this needs
a much more serious overhaul. In particular, I don't think we should
be triggering recovery when transitioning *from* a stable state, but
explicitly when we are flagged, or when gathering. We should probably
also hold a wrlock over the recovery period and remove the force_wrlock
kludge from the final size check. Opened ticket #5268.
Dan Mick [Fri, 7 Jun 2013 00:40:28 +0000 (17:40 -0700)]
ceph, librados, rados.py, librados tests: pass cmd as array
Using ceph to pass commands to the old monitor requires the
message to have words in a vector; this means that we need to pass
the command as an array to rados_mon_command. Really, all of the
rados_X_command functions should take an array for flexibility and
parallel structure, so change them all, and the Python bindings,
and the test programs that use them.