Sage Weil [Wed, 22 Aug 2012 04:12:33 +0000 (21:12 -0700)]
objecter: use ordered map<> for tracking tids to preserve order on resend
We are using a hash_map<> to map tids to Op*'s. In handle_osd_map(),
we will recalc_op_target() on each Op in a random (hash) order. These
will get put in a temp map<tid,Op*> to ensure they are resent in the
correct order, but their order on the session->ops list will be random.
Then later, if we reset an OSD connection, we will resend everything for
that session in ops order, which is be incorrect.
Fix this by explicitly reordering the requests to resend in
kick_requests(), much like we do in handle_osd_map(). This lets us
continue to use a hash_map<>, which is faster for reasonable numbers of
requests. A simpler but slower fix would be to just use map<> instead.
This is one of many bugs contributing to #2947.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Tue, 21 Aug 2012 00:04:58 +0000 (17:04 -0700)]
mon: fix monitor cluster contraction race
If we contract to 1 monitor, we win_standalone_election() without bumping
the election epoch. Racing paxos updates can then reach us without being
ignored and trigger an assert:
mon/Paxos.cc: In function 'void Paxos::handle_accept(MMonPaxos*)' thread 7f85eae05700 time 2012-08-20 16:01:00.843937
mon/Paxos.cc: 468: FAILED assert(state == STATE_UPDATING)
Fixes: #3003 Reported-by: John Wilkins <john.wilkins@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
Tommi Virtanen [Tue, 21 Aug 2012 00:06:09 +0000 (17:06 -0700)]
mkcephfs, init-ceph: Warn if hostname "localhost" is seen in ceph.conf.
Given a ceph.conf that looks like
[osd.42]
host = localhost
mkcephfs used to exit with an obscure error message:
cat: /tmp/mkcephfs.MCBIHvn4Ru/key.*: No such file or directory
"localhost" was never intended to be a valid hostname to use there.
Warn if we see it, and skip the entry. You should use the proper short
hostname of the box.
As init-ceph and mkcephfs share this library, this change affects the
sysvinit scripts too. The behavior *shouldn't* change there (localhost
entries were ignored earlier, too), but you may see this extra
warning. Which is good.
Closes: #3001 Signed-off-by: Tommi Virtanen <tv@inktank.com>
Sage Weil [Mon, 20 Aug 2012 19:33:08 +0000 (12:33 -0700)]
osd: fix requeue order of dup ops
The waiting_for_ondisk (and ack) maps get dups of ops that are in progress.
If we have a peering change in which the role does not change, we will
requeue the in-progress ops but leave these in the waiting_for_ondisk
maps, which will then trigger an assert the next time we examine that map
and find it didn't match up with what we expected.
Fix this by requeuing these on any peering reset in on_change(). This
keeps the two queues in sync.
Fixes: #2956 Signed-off-by: Sage Weil <sage@inktank.com>
Travis Rhoden [Mon, 20 Aug 2012 20:29:11 +0000 (13:29 -0700)]
init-ceph: use SSH in "service ceph status -a" to get version
When running "service ceph status -a", a version number was never
returned for remote hosts, only for the local. This was because
the command to query the version number didn't use the do_cmd
function, which is responsible for running the command over SSH
when needed.
Modify the ceph init.d script to use do_cmd for querying the
Ceph version.
Travis Rhoden [Fri, 17 Aug 2012 20:45:09 +0000 (16:45 -0400)]
doc: mkcephfs man page, -c ceph.conf is not optional
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "ANSI_X3.4-1968" character set. ]
[ Some characters may be displayed incorrectly. ]
The man page for mkcephfs and the output of mkcephfs --help
do not agree with each other. the man page says -c ceph.conf
is optional, while mkcephfs --help says it is required.
Through empirical evidence, I believe it is required. Update
the man page to make it so.
Sage Weil [Fri, 17 Aug 2012 16:02:10 +0000 (09:02 -0700)]
mds: do not return null dentry lease on getattr
Specifically, /foo may exist and client may try to mount /foo/bar. That
GETATTR request is on #1/foo/bar, but we cannot return a null dentry on bar
because the client is not prepared to handle it and will crash in
fill_trace().
Fixes: #2959 Reported-by: Yan Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 17 Aug 2012 19:10:05 +0000 (12:10 -0700)]
librbd: hide ENOENT on discard
AioZero, Truncate, and Remove are only used by discard and resize
operations where ENOENT can be safely ignored. If that changes in the
future, we'll need to move the enoent flag setting into discard explicitly.
Fixes: #2958 Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Thu, 16 Aug 2012 23:55:32 +0000 (16:55 -0700)]
osd: avoid dereferencing pg info without lock
The gen_prefix() is used for debug prefixes, but traverses data structures
that can be modified when the lock is held. Only include them in the
prefix if the lock is held; otherwise print an abbreviated prefix that is
similarly greppable to the normal output.
Fixes: #2957 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Tommi Virtanen [Thu, 16 Aug 2012 23:35:30 +0000 (16:35 -0700)]
run-cli-tests: Check that virtualenv is found.
Commit 343cc792e847ca8901f6c08e41799a2fbbd2ca92 switched us from pip
-E to virtualenv, to keep up with the Python ecosystem, but left in
this old check for existence of "pip" as a command. We don't strictly
need that; what we need is a "virtualenv" command. PIP will be
available inside the virtualenv, by the time we get around to running
it. Check for virtualenv instead.
Sage Weil [Thu, 16 Aug 2012 21:34:01 +0000 (14:34 -0700)]
SyntheticClient: fix warnings
client/SyntheticClient.cc: In member function 'int SyntheticClient::play_trace(Trace&, std::string&, bool)':
client/SyntheticClient.cc:1494:22: warning: ordered comparison of pointer with integer zero [-Wextra]
CXX rados_sync.o
client/SyntheticClient.cc:1500:22: warning: ordered comparison of pointer with integer zero [-Wextra]
Sage Weil [Thu, 16 Aug 2012 00:48:06 +0000 (17:48 -0700)]
librbd: fix uninit var new_snap in ictrx_refresh()
Valgrind picked this up:
==22755== Conditional jump or move depends on uninitialised value(s)
==22755== at 0x4EC2A11: librbd::ictx_refresh(librbd::ImageCtx*) (internal.cc:1384)
==22755== by 0x4EC10F7: librbd::ictx_check(librbd::ImageCtx*) (internal.cc:1212)
==22755== by 0x4EBD246: librbd::info(librbd::ImageCtx*, rbd_image_info_t&, unsigned long) (internal.cc:841)
==22755== by 0x4E9D71A: rbd_stat (librbd.cc:584)
==22755== by 0x4039A5: check_trunc_hack (fsx.c:477)
==22755== by 0x4060FA: main (fsx.c:1508)
Sage Weil [Thu, 16 Aug 2012 21:27:35 +0000 (14:27 -0700)]
librbd: fix warning
librbd/internal.cc: In function 'int librbd::ictx_refresh(librbd::ImageCtx*)':
librbd/internal.cc:1334:59: warning: enumeral and non-enumeral type in conditional expression [enabled by default]
Cap the number of maps we delete on each pass through handle_osd_map. As
long as the target transaction size is larger than the number of maps we
get in each message, we'll be fine. Ensure we at least keep pace with
incoming maps in case those values' relative sizes nave flipped.
Fixes: #2856 Signed-off-by: Sage Weil <sage@inktank.com>