Sage Weil [Thu, 28 Feb 2013 20:46:00 +0000 (12:46 -0800)]
msgr: drop messages on cons with CLOSED Pipes
Back in commit 6339c5d43974f4b495f15d199e01a141e74235f5, we tried to make
this deal with a race between a faulting pipe and new messages being
queued. The sequence is
- fault starts on pipe
- fault drops pipe_lock to unregister the pipe
- user (objecter) queues new message on the con
- submit_message reopens a Pipe (due to this bug)
- the message managed to make it out over the wire
- fault finishes faulting, calls ms_reset
- user (objecter) closes the con
- user (objecter) resends everything
It appears as though the previous patch *meant* to drop *m on the floor in
this case, which is what this patch does. And that fixes the crash I am
hitting; see #4271.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Josh Durgin [Tue, 26 Feb 2013 21:20:08 +0000 (13:20 -0800)]
librbd: fix rollback size
The duplicate calls to get_image_size() and get_snap_size() replaced
by 5806226cf0743bb44eaf7bc815897c6846d43233 uncovered this. The first
call was using the currently set snap_id instead of the snapshot being
rolled back to.
Fixes: #4272 Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Josh Durgin [Mon, 25 Feb 2013 22:55:34 +0000 (14:55 -0800)]
systest: fix race with pool deletion
The second test have pool deletion and object listing wait on the same
semaphore to connect and start. This led to errors sometimes when the
pool was deleted before it could be opened by the listing process. Add
another semaphore so the pool deletion happens only after the listing
has begun.
Josh Durgin [Mon, 25 Feb 2013 19:33:48 +0000 (11:33 -0800)]
librbd: drop snap_lock before invalidating cache
Writeback will take the snap_lock, so read everything we need under it
before invalidating the cache. This avoids a recursive lock when writeback
uses snap_lock while snap_rollback() was holding it.
Remove a not-very-useful debugging message that depended on snap_lock being held.
Sage Weil [Sun, 24 Feb 2013 00:36:36 +0000 (16:36 -0800)]
mds: reencode MDSMap in MMDSMap if MDSENC feature is not present
In some cases the MMDSMap message from mon -> client passes from leader ->
peon -> client, and the leader doesn't encode with the correct feature
bits. As with MMOSDMap, we reencode the nested MDSMap based on the
features if relevant bits are not present.
We forgot to include this with the mds encoding changes.
Dan Mick [Fri, 22 Feb 2013 05:41:25 +0000 (21:41 -0800)]
configuration parsing: give better error for missing =
A ceph.conf line with "key" and no "= value" currently shows
"unexpected character while parsing putative key value,
at char N line M". There's no reason it can't be clearer.
Fixes: #4229 Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Greg Farnum [Thu, 21 Feb 2013 17:22:00 +0000 (09:22 -0800)]
mds: use inode_t::layout for dir layout policy
This cherry-pick is going in the reverse direction of normal. That's
because this direction makes for the minimal change -- this patchset
is required to fix the loss of directory layouts we were previously
seeing, but fixing it requires changing the encoding versions. So we
wrote it on top of Bobtail and let it update the struct_v's as they existed
then. Note that we here change a few encoding versions in ways which are
NOT COMPATIBLE with previous development code (but not any releases). In
particular, development code introduced and this removes the
file_layout_policy_t, and some of the CInode and EMetaBlob encoding
struct_v values were used in development code to mean one thing, but
mean something different due to the Bobtail patch.
Remove the default_file_layout struct, which was just a ceph_file_layout,
and store it in the inode_t. Rip out all the annoying code that put this
on the heap.
To aid in this usage, add a clear_layout() function to inode_t.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Signed-off-by: Greg Farnum <greg@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 36ed407e0f939a9bca57c3ffc0ee5608d50ab7ed)
Conflicts:
src/mds/CInode.cc
src/mds/CInode.h
src/mds/MDCache.cc
src/mds/Server.cc
src/mds/events/EMetaBlob.h
Cherry-pick- Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Mon, 21 Jan 2013 05:53:37 +0000 (21:53 -0800)]
mds: parse ceph.*.layout vxattr key/value content
Use qi to parse a strictly formatted set of key/value pairs. Be picky
about whitespace. Any subset of recognized keys is allowed. Parse the
same set of keys as the ceph.*.layout.* vxattrs.
Jan Harkes [Thu, 21 Feb 2013 20:17:38 +0000 (15:17 -0500)]
Fix failing > 4MB range requests through radosgw S3 API.
When a range request is made for more than rgw_get_obj_max_req_size
bytes the first returned chunk sets 'ret' to STATUS_PARTIAL_CONTENT and
all remaining chunks behave as if there is an error state and only
return a minimal header.
Fix this by passing STATUS_PARTIAL_CONTENT to set_req_state_err, but
leave the 'ret' member variable untouched.
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit c83a01d4e8dcd26eec24c020c5b79fcfa4ae44a3)
Sage Weil [Thu, 21 Feb 2013 18:30:08 +0000 (10:30 -0800)]
osd: clear recovery state on pg removal
This ensures we release our in-progress recovery counters, which prevents
recovery from getting blocked indefinitely when a pool removal races with
recovery ops.
Fixes: #4217
Backport: bobtail Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Mon, 18 Feb 2013 06:35:50 +0000 (22:35 -0800)]
osd: requeue pg waiters at the front of the finished queue
We could have a sequence like:
- op1
- notify
- op2
in the finished queue. Op1 gets put on waiting_for_pg, the notify
creates the pg and requeues op1 (and the end), op2 is handled, and
finally op1 is handled. That breaks ordering; see #2947.
Instead, when we wake up a pg, queue the waiting messages at the front
of the dispatch queue.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Mon, 18 Feb 2013 04:49:52 +0000 (20:49 -0800)]
osd: pull requeued requests off one at a time
Pull items off the finished queue on at a time. In certain cases, an
event may result in new items betting added to the finished queue that
will be put at the *front* instead of the back. See latest incarnation
of #2947.
Note that this is a significant changed in behavior in that we can
theoretically starve if an event keeps resulting in new events getting
generated. Beware!
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Tue, 19 Feb 2013 17:12:52 +0000 (09:12 -0800)]
osd: fix printf warning on pg_log_entry_t::get_key_name
warning: osd/osd_types.cc:1716:76: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'version_t {aka long long unsigned int}' [-Wformat]
warning: osd/osd_types.cc:1716:76: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'version_t {aka long long unsigned int}' [-Wformat]
Sage Weil [Tue, 19 Feb 2013 04:36:56 +0000 (20:36 -0800)]
rbd: udevadm settle before unmap
udev runs blkid on device close, and other such nonsense that can
make unmap fail with EBUSY. Settle before we unmap to avoid this if
possible. See #4183.
Closes: #4186 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
James Page [Mon, 18 Feb 2013 16:24:54 +0000 (16:24 +0000)]
Strip any trailing whitespace from rbd showmapped
More recent versions of ceph append a bit of whitespace to the line
after the name of the /dev/rbdX device; this causes the monitor check
to fail as it can't find the device name due to the whitespace.
This fix excludes any characters after the /dev/rbdN match.
Loic Dachary [Sun, 17 Feb 2013 19:38:52 +0000 (20:38 +0100)]
unit tests for src/common/buffer.{cc,h}
Implement unit tests covering most lines of code ( > 92% ) and all
methods as show by the output of make check-coverage :
http://dachary.org/wp-uploads/2013/03/ceph-lcov/ .
The following static constructors are implemented by opaque classes
defined in buffer.cc ( buffer::raw_char, buffer::raw_posix_aligned
etc. ). Testing the implementation of these classes is done by
variations of the calls to the static constructors.
The raw_mmap_pages class cannot be tested because it is commented out in
raw_posix_aligned. The raw_hack_aligned class is only tested under Cygwin.
The raw_posix_aligned class is not tested under Cygwin.
The unittest_bufferlist.sh script calls unittest_bufferlist with the
CEPH_BUFFER_TRACK=true environment variable to enable the code
tracking the memory usage. It cannot be done within the bufferlist.cc
file itself because it relies on the initialization of a global
variable ( buffer_track_alloc ).
When raw_posix_aligned is called on DARWIN, the data is not aligned
on CEPH_PAGE_SIZE because it calls valloc(size) which is the equivalent of
memalign(sysconf(_SC_PAGESIZE),size) and not memalign(CEPH_PAGE_SIZE,size).
For this reason the alignment test is de-activated on DARWIN.
The tests are grouped in
TEST(BufferPtr, ... ) for buffer::ptr
TEST(BufferListIterator, ...) for buffer::list::iterator
TEST(BufferList, ...) for buffer::list
TEST(BufferHash, ...) for buffer::hash
and each method ( and all variations of the prototype ) are
included into a single TEST() function.
Although most aspects of the methods are tested, including exceptions
and border cases, inconsistencies are not highlighted . For
instance
buffer::list::iterator i;
i.advance(1);
would dereference a buffer::raw NULL pointer although
buffer::ptr p;
p.wasted()
asserts instead of dereferencing the buffer::raw NULL pointer. It
would be better to always assert in case a NULL pointer is about to be
used. But this is a minor inconsistency that is probably not worth a
test.
The following buffer::list methods
ssize_t read_fd(int fd, size_t len);
int write_fd(int fd) const;
are not fully tested because the border cases cannot be reliably
reproduced. Going thru a pointer indirection when calling the ::writev
or safe_read functions would allow the test to create mockups to synthetize
the conditions for border cases.
it returned zero because. cmp only compared up to the length of the
smallest buffer and returned if they are identical. The function is
modified to compare the length of the buffers instead of returning.
a >= ab failed, throwing an instance of 'ceph::buffer::end_of_buffer'
because it tried to access a[1]. All comparison operators should be
tested using a lexicographic sort like strcmp or memcmp (-1, 0, 1).
In the meantime, the missing test is added:
if (l.length() == p && r.length() > p) return false;
A set of unit tests demonstrating the problem and covering all comparison
operators are added to show that the proposed fix works as expected.
Danny Al-Gaaf [Tue, 12 Feb 2013 17:51:51 +0000 (18:51 +0100)]
cls/lock/cls_lock.cc: use !lockers.empty() instead of size()
Use empty() since it should be prefered as it has, following the
standard, a constant time complexity regardless of the containter
type. The same is not guaranteed for size().
warning from cppchecker was:
[src/cls/lock/cls_lock.cc:209]: (performance) Possible inefficient
checking for 'lockers' emptiness.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 12 Feb 2013 17:48:50 +0000 (18:48 +0100)]
src/client/SyntheticClient.cc: use !subdirs.empty() instead of size()
Use empty() since it should be prefered as it has, following the
standard, a constant time complexity regardless of the containter
type. The same is not guaranteed for size().
warning from cppchecker was:
[src/client/SyntheticClient.cc:2706]: (performance) Possible
inefficient checking for 'subdirs' emptiness
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 12 Feb 2013 17:44:04 +0000 (18:44 +0100)]
client/Client.cc: use empty() instead of size()
Use empty() since it should be prefered as it has, following the
standard, a constant time complexity regardless of the containter
type. The same is not guaranteed for size().
warning from cppchecker was:
[src/client/Client.cc:3649]: (performance) Possible inefficient
checking for 'mds_sessions' emptiness.
[src/client/Client.cc:7489]: (performance) Possible inefficient
checking for 'osds' emptiness.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 12 Feb 2013 16:41:04 +0000 (17:41 +0100)]
mds/MDSMap.h: use up.empty() instead of up.size()
Use empty() since it should be prefered as it has, following the
standard, a constant time complexity regardless of the containter
type. The same is not guaranteed for size().
warning from cppchecker was:
[src/mds/MDSMap.h:448]: (performance) Possible inefficient
checking for 'up' emptiness.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 12 Feb 2013 16:25:00 +0000 (17:25 +0100)]
mds/CDentry.h: use projected.empty() instead of projected.size()
Use empty() since it should be prefered as it has, following the
standard, a constant time complexity regardless of the containter
type. The same is not guaranteed for size().
warning from cppchecker was:
[src/mds/CDentry.h:234]: (performance) Possible inefficient
checking for 'projected' emptiness.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 12 Feb 2013 16:20:19 +0000 (17:20 +0100)]
ceph_authtool.cc: use empty() instead of size()
Use empty() since it should be prefered as it has, following the
standard, a constant time complexity regardless of the containter
type. The same is not guaranteed for size().
warning from cppchecker was:
[src/ceph_authtool.cc:124]: (performance) Possible inefficient
checking for 'caps' emptiness.
[src/ceph_authtool.cc:237]: (performance) Possible inefficient
checking for 'caps' emptiness.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Sage Weil [Thu, 14 Feb 2013 23:39:43 +0000 (15:39 -0800)]
osd/OSDCap: tweak unquoted_word parsing in osd caps
Newer versions of spirit (1.49.0-3.1ubuntu1.1 in quantal, in particular)
dislike the construct with alnum and replace the - and _ with '\0' in the
resulting string.
Samuel Just [Thu, 14 Feb 2013 22:03:56 +0000 (14:03 -0800)]
OSD: always activate_map in advance_pgs, only send messages if up
We should always handle_activate_map() after handle_advance_map() in
order to kick the pg into a valid peering state for processing requests
prior to dropping the lock.
Additionally, we would prefer to avoid sending irrelevant messages
during boot, so only send if we are up according to the current service
osdmap.
Fixes: #4064
Backport: bobtail Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>