Jan Harkes [Thu, 21 Feb 2013 20:17:38 +0000 (15:17 -0500)]
Fix failing > 4MB range requests through radosgw S3 API.
When a range request is made for more than rgw_get_obj_max_req_size
bytes the first returned chunk sets 'ret' to STATUS_PARTIAL_CONTENT and
all remaining chunks behave as if there is an error state and only
return a minimal header.
Fix this by passing STATUS_PARTIAL_CONTENT to set_req_state_err, but
leave the 'ret' member variable untouched.
Sage Weil [Thu, 21 Feb 2013 18:30:08 +0000 (10:30 -0800)]
osd: clear recovery state on pg removal
This ensures we release our in-progress recovery counters, which prevents
recovery from getting blocked indefinitely when a pool removal races with
recovery ops.
Fixes: #4217
Backport: bobtail Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Mon, 21 Jan 2013 05:53:37 +0000 (21:53 -0800)]
mds: parse ceph.*.layout vxattr key/value content
Use qi to parse a strictly formatted set of key/value pairs. Be picky
about whitespace. Any subset of recognized keys is allowed. Parse the
same set of keys as the ceph.*.layout.* vxattrs.
Sage Weil [Wed, 20 Feb 2013 20:47:38 +0000 (12:47 -0800)]
mon: allow syslog level and facility for cluster log to be controlled
Allow user to control the minimum level to go to syslog for the client-
and server-side submission paths for the cluster log, along with the syslog
'facility'. See syslog(3) man page.
Also move the level checks into a LogEntry method.
Closes: #3704 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Joao Luis <joao.luis@inktank.com>
Sage Weil [Mon, 18 Feb 2013 07:23:27 +0000 (23:23 -0800)]
osd: introduce HASHPSPOOL pool flag, feature to avoid overlapping pg placements
The existing code will overlay the placement of PGs from pools because
it simply adds the ps to the pool as the CRUSH input. That means that
the layout/placement for pg 0.10 == 1.9 == 2.8 == 3.7 == 4.6 == ...,
which is not optimal.
Instead, use hash(ps, poolid). The avoids the initial problem of
the sequence being adjacent to other pools. It also avoids the (small)
possibility that hash(poolid) will drop us somewhere in the output
number space where our sequence of outputs overlaps with some other
pool; instead, out output sequence will be a fully random (for a well-
behaved hash).
Use the multi-input hash functions used by CRUSH for this.
Default to the legacy behavior for now. We won't enable this until
deployed systems and kernel code catch up.
Fixes: #4128 Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Mon, 18 Feb 2013 06:35:50 +0000 (22:35 -0800)]
osd: requeue pg waiters at the front of the finished queue
We could have a sequence like:
- op1
- notify
- op2
in the finished queue. Op1 gets put on waiting_for_pg, the notify
creates the pg and requeues op1 (and the end), op2 is handled, and
finally op1 is handled. That breaks ordering; see #2947.
Instead, when we wake up a pg, queue the waiting messages at the front
of the dispatch queue.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Mon, 18 Feb 2013 04:49:52 +0000 (20:49 -0800)]
osd: pull requeued requests off one at a time
Pull items off the finished queue on at a time. In certain cases, an
event may result in new items betting added to the finished queue that
will be put at the *front* instead of the back. See latest incarnation
of #2947.
Note that this is a significant changed in behavior in that we can
theoretically starve if an event keeps resulting in new events getting
generated. Beware!
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Tue, 19 Feb 2013 17:12:52 +0000 (09:12 -0800)]
osd: fix printf warning on pg_log_entry_t::get_key_name
warning: osd/osd_types.cc:1716:76: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'version_t {aka long long unsigned int}' [-Wformat]
warning: osd/osd_types.cc:1716:76: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'version_t {aka long long unsigned int}' [-Wformat]
Sage Weil [Tue, 19 Feb 2013 04:36:56 +0000 (20:36 -0800)]
rbd: udevadm settle before unmap
udev runs blkid on device close, and other such nonsense that can
make unmap fail with EBUSY. Settle before we unmap to avoid this if
possible. See #4183.
Closes: #4186 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
James Page [Mon, 18 Feb 2013 16:24:54 +0000 (16:24 +0000)]
Strip any trailing whitespace from rbd showmapped
More recent versions of ceph append a bit of whitespace to the line
after the name of the /dev/rbdX device; this causes the monitor check
to fail as it can't find the device name due to the whitespace.
This fix excludes any characters after the /dev/rbdN match.
Loic Dachary [Sun, 17 Feb 2013 19:38:52 +0000 (20:38 +0100)]
unit tests for src/common/buffer.{cc,h}
Implement unit tests covering most lines of code ( > 92% ) and all
methods as show by the output of make check-coverage :
http://dachary.org/wp-uploads/2013/03/ceph-lcov/ .
The following static constructors are implemented by opaque classes
defined in buffer.cc ( buffer::raw_char, buffer::raw_posix_aligned
etc. ). Testing the implementation of these classes is done by
variations of the calls to the static constructors.
The raw_mmap_pages class cannot be tested because it is commented out in
raw_posix_aligned. The raw_hack_aligned class is only tested under Cygwin.
The raw_posix_aligned class is not tested under Cygwin.
The unittest_bufferlist.sh script calls unittest_bufferlist with the
CEPH_BUFFER_TRACK=true environment variable to enable the code
tracking the memory usage. It cannot be done within the bufferlist.cc
file itself because it relies on the initialization of a global
variable ( buffer_track_alloc ).
When raw_posix_aligned is called on DARWIN, the data is not aligned
on CEPH_PAGE_SIZE because it calls valloc(size) which is the equivalent of
memalign(sysconf(_SC_PAGESIZE),size) and not memalign(CEPH_PAGE_SIZE,size).
For this reason the alignment test is de-activated on DARWIN.
The tests are grouped in
TEST(BufferPtr, ... ) for buffer::ptr
TEST(BufferListIterator, ...) for buffer::list::iterator
TEST(BufferList, ...) for buffer::list
TEST(BufferHash, ...) for buffer::hash
and each method ( and all variations of the prototype ) are
included into a single TEST() function.
Although most aspects of the methods are tested, including exceptions
and border cases, inconsistencies are not highlighted . For
instance
buffer::list::iterator i;
i.advance(1);
would dereference a buffer::raw NULL pointer although
buffer::ptr p;
p.wasted()
asserts instead of dereferencing the buffer::raw NULL pointer. It
would be better to always assert in case a NULL pointer is about to be
used. But this is a minor inconsistency that is probably not worth a
test.
The following buffer::list methods
ssize_t read_fd(int fd, size_t len);
int write_fd(int fd) const;
are not fully tested because the border cases cannot be reliably
reproduced. Going thru a pointer indirection when calling the ::writev
or safe_read functions would allow the test to create mockups to synthetize
the conditions for border cases.
it returned zero because. cmp only compared up to the length of the
smallest buffer and returned if they are identical. The function is
modified to compare the length of the buffers instead of returning.
a >= ab failed, throwing an instance of 'ceph::buffer::end_of_buffer'
because it tried to access a[1]. All comparison operators should be
tested using a lexicographic sort like strcmp or memcmp (-1, 0, 1).
In the meantime, the missing test is added:
if (l.length() == p && r.length() > p) return false;
A set of unit tests demonstrating the problem and covering all comparison
operators are added to show that the proposed fix works as expected.
Danny Al-Gaaf [Tue, 12 Feb 2013 17:51:51 +0000 (18:51 +0100)]
cls/lock/cls_lock.cc: use !lockers.empty() instead of size()
Use empty() since it should be prefered as it has, following the
standard, a constant time complexity regardless of the containter
type. The same is not guaranteed for size().
warning from cppchecker was:
[src/cls/lock/cls_lock.cc:209]: (performance) Possible inefficient
checking for 'lockers' emptiness.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 12 Feb 2013 17:48:50 +0000 (18:48 +0100)]
src/client/SyntheticClient.cc: use !subdirs.empty() instead of size()
Use empty() since it should be prefered as it has, following the
standard, a constant time complexity regardless of the containter
type. The same is not guaranteed for size().
warning from cppchecker was:
[src/client/SyntheticClient.cc:2706]: (performance) Possible
inefficient checking for 'subdirs' emptiness
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 12 Feb 2013 17:44:04 +0000 (18:44 +0100)]
client/Client.cc: use empty() instead of size()
Use empty() since it should be prefered as it has, following the
standard, a constant time complexity regardless of the containter
type. The same is not guaranteed for size().
warning from cppchecker was:
[src/client/Client.cc:3649]: (performance) Possible inefficient
checking for 'mds_sessions' emptiness.
[src/client/Client.cc:7489]: (performance) Possible inefficient
checking for 'osds' emptiness.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 12 Feb 2013 16:41:04 +0000 (17:41 +0100)]
mds/MDSMap.h: use up.empty() instead of up.size()
Use empty() since it should be prefered as it has, following the
standard, a constant time complexity regardless of the containter
type. The same is not guaranteed for size().
warning from cppchecker was:
[src/mds/MDSMap.h:448]: (performance) Possible inefficient
checking for 'up' emptiness.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 12 Feb 2013 16:25:00 +0000 (17:25 +0100)]
mds/CDentry.h: use projected.empty() instead of projected.size()
Use empty() since it should be prefered as it has, following the
standard, a constant time complexity regardless of the containter
type. The same is not guaranteed for size().
warning from cppchecker was:
[src/mds/CDentry.h:234]: (performance) Possible inefficient
checking for 'projected' emptiness.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 12 Feb 2013 16:20:19 +0000 (17:20 +0100)]
ceph_authtool.cc: use empty() instead of size()
Use empty() since it should be prefered as it has, following the
standard, a constant time complexity regardless of the containter
type. The same is not guaranteed for size().
warning from cppchecker was:
[src/ceph_authtool.cc:124]: (performance) Possible inefficient
checking for 'caps' emptiness.
[src/ceph_authtool.cc:237]: (performance) Possible inefficient
checking for 'caps' emptiness.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>