Loic Dachary [Wed, 8 Jan 2014 19:13:37 +0000 (20:13 +0100)]
erasure-code: ensure that coding chunks are page aligned
When coding chunks are allocated for jerasure, their address must be
aligned to page boundaries. The requirement is actually to be aligned on
a long long boundary but bufferlist do not allow for fine tuning of the
alignment.
If padding is necessary because the total size of the data to be encoded
is not a multiple of the alignment requirements as returned by
get_alignment(), the buffer is not only padded but also rebuilt using
rebuild_page_aligned() to preserve the page alignment that is expected
of the input buffer.
The overhead of rebuilding the whole input buffer when padding is
necessary could be reduced by only reallocating one buffer for the last
data chunk, therefore reducing the amount of data being copied. However,
this optimization is not going to be used if the caller takes care of
the padding, which is likely to be the case most of the time.
Andreas Peters [Wed, 18 Dec 2013 13:47:58 +0000 (14:47 +0100)]
EC-JERASURE: rewrite region-xor function using vector operations to get ~ x1.5 speedups for erasure code and guarantee proper 64-bit/128-bit buffer alignment
Loic Dachary [Tue, 7 Jan 2014 15:49:44 +0000 (16:49 +0100)]
common: fix large output in unittest_daemon_config
All tests in daemon_config use the global g_ceph_context
object. Introducing an expansion loop in it will impact all tests and
generate a very large output.
Remove the SubstitutionLoop test case which is also covered in
test/common/test_config.cc.
Yehuda Sadeh [Mon, 6 Jan 2014 20:53:58 +0000 (12:53 -0800)]
radosgw-admin: fix object policy read op
Fixes: #7083
This was broken when we fixed #6940. We use the same function to both
read the bucket policy and the object policy. However, each needed to be
treated differently. Restore old behavior for objects.
Loic Dachary [Sun, 5 Jan 2014 14:49:57 +0000 (15:49 +0100)]
common: unit tests for config::expand_meta
Part of the config.cc tests are in test/confutils.cc but they do not
cover meta variable expansion. Create unittest_config for config.{h,cc}
specific tests.
The test_md_config_t is made a friend of md_config_t to allow testing
private and protected methods.
test/cli/ceph-conf/show-config-value.t is used to check that the human
readable message message shows as expected when there is an expansion
loop.
Loic Dachary [Sun, 5 Jan 2014 14:38:55 +0000 (15:38 +0100)]
common: recursive implementation of config::expand_meta
Using a recursive implementation of variable expansions make it easier
to protect against loops and provide human readable messages when they
happen.
It also enables one variable to be substituted multiple times in the
same configuration option instead of just once because it is confused
with a variable expansion loop.
Noah Watkins [Sat, 4 Jan 2014 19:32:51 +0000 (11:32 -0800)]
onexit: add an on exit callback utility
Adds a class that executes registered callbacks in its destructor. Since
static duration objects have their destructors called when returning
from main or calling exit then this can be used as a replacement for
on_exit.
test: disable cross process sem tests on non-Linux
How to make this portable:
- MAP_ANONYMOUS -> MAP_ANON (OSX)
- sem_init (anonymous semaphore) needs to be replaced by named
semaphores using sem_open/sem_close. Use a memory address of the sem_t
variable to hack anonymous semaphore behavior.
- sem_getvalue isn't supported on OSX. it is used here to do
sem_wait/sem_post to bring a semaphore back to a specific value. to get
around this we may need to restructure the test so that the semaphore
can be destroyed and re-initialized rather than inspected as its
currently being done.
On OSX (and currently any platform missing the MSG_MORE
macro) the MSG_MORE optimization is disabled. The MSG_NOSIGNAL flag is
available on OSX but is called SO_NOSIGPIPE and must be set via
setsockopt.
Loic Dachary [Fri, 3 Jan 2014 21:52:55 +0000 (22:52 +0100)]
ceph-disk: fix false positive for gitbuilder
The output of test/ceph-disk.sh is very verbose which is good for
debugging errors. However it sometime contains strings that match
/error:/i which is picked by gitbuilder as a sign that the test fail,
even when the exit code is zero.
Remove from the output the three strings triggering false positive in
gitbuilder.
Sage Weil [Fri, 3 Jan 2014 20:51:15 +0000 (12:51 -0800)]
osdc/ObjectCacher: back off less during flush
In cce990efc8f2a58c8d0fa11c234ddf2242b1b856 we added a limit to avoid
holding the lock for too long. However, if we back off, we currently
wait for a full second, which is probably a bit much--we really just want
to give other threads a chance.
Backport: emperor Signed-off-by: Sage Weil <sage@inktank.com>
Loic Dachary [Wed, 1 Jan 2014 21:11:30 +0000 (22:11 +0100)]
ceph-disk: create the data directory if it does not exist
Instead of failing if the OSD data directory does not exist, create
it. Only do so if the data directory is not enforced to be a device via
the use of the --data-dev flag. The directory is not recursively created.
Loic Dachary [Mon, 30 Dec 2013 22:57:39 +0000 (23:57 +0100)]
ceph-disk: implement --mark-init=none
It is meant to be used when preparing and activating a directory that is
not to be used with init. No file is created to identify the init
system, no symbolic link is made to the directory in /var/lib/ceph
and the init scripts are not called.
Loic Dachary [Wed, 1 Jan 2014 21:07:57 +0000 (22:07 +0100)]
ceph-disk: fsid is a known configuration option
Use get_conf_with_default instead of get_conf because fsid is a known
ceph configuration option. It allows overriding via CEPH_ARGS which is
convenient for testing. Only options that are not found in config_opts.h
are fetch via get_conf.
Loic Dachary [Mon, 30 Dec 2013 22:07:27 +0000 (23:07 +0100)]
ceph-disk: which() uses PATH first
Instead of relying on a hardcoded set of if paths. Although this has the
potential of changing the location of the binary being used by ceph-disk
on an existing installation, it is currently only used for sgdisk. It
could be disruptive for someone using a modified version of sgdisk but
the odds of this happening are very low.
Loic Dachary [Mon, 30 Dec 2013 21:48:46 +0000 (22:48 +0100)]
ceph-disk: add --prepend-to-path to control execution
/usr/bin is hardcoded in front of some ceph programs which makes it
impossible to control where they are located via the PATH.
The hardcoded path cannot be removed altogether because it will most
likely lead to unexpected and difficult to diagnose problems for
existing installations where the PATH finds the program elsewhere.
The --prepend-to-path flag is added and defaults to /usr/bin : it prepends
to the PATH environment variable. The hardcoded path is removed
and the PATH will be used: since /usr/bin is searched first, the
legacy behavior will not change.
Working around missing integer types is pretty easy. For example, the
__u32 family are Linux-specific types, and using these in Ceph
internally is fine because we can typedef them.
Loic Dachary [Sun, 29 Dec 2013 12:21:02 +0000 (13:21 +0100)]
mon: tests for ceph-mon --mkfs
* auth none must not require a keyring
* --key can be used as an alternative to --keyring
* --mkfs is idempotent
* the --mon-data directory is created if it does not exist
* auth ceph requires --keyring
Loic Dachary [Wed, 1 Jan 2014 20:23:18 +0000 (21:23 +0100)]
common: CEPH_ARGS should trim whitespaces
CEPH_ARGS when parsed by env_to_vec did not trim trailing and leading
whitespaces: they would unexpectedly be parsed as empty arguments and
lead to confusing errors such as :
The parsing code is replaced with str_vec(). It also resolves the
problem of the hard limit to 1000 characters imposed by the static
buffer that would silently truncate any CEPH_ARGS content.
Loic Dachary [Mon, 30 Dec 2013 11:26:20 +0000 (12:26 +0100)]
ceph-disk: prepare --data-dir must not override files
ceph-disk does nothing when given a device that is already prepared. If
given a directory that already contains a successfully prepared OSD, it
will however override it.
Instead of overriding the files in the osd data directory, return
immediately if the magic file exists. Make it so the magic file is
created last to accurately reflect the success of the OSD preparation.
Loic Dachary [Sun, 29 Dec 2013 11:40:05 +0000 (12:40 +0100)]
mon: do not use the keyring if auth = none
The Monitor::is_keyring_required() predicate is defined to be used in
the mkfs code path and not require the keyring to be set unless it is
actually going to be used.
If authentication is turned on later on existing monitors, the keyring
files will have to be set by an external tool in the mon data directory
and the mon restarted.
Yan, Zheng [Wed, 1 Jan 2014 00:58:42 +0000 (08:58 +0800)]
mds: always store backtrace xattr in the default pool
when creating file in non-default pool, we need to store a backtrace
in the default pool in addition to the specified pool. Otherwise the
'lookup-by-ino' function will consider backtrace for file is missing.
Loic Dachary [Sun, 29 Dec 2013 12:14:14 +0000 (13:14 +0100)]
mon: make ceph-mon --mkfs idempotent
A mon is considered to exist if the mon-data directory exists and is not
empty. If ceph-mon --mkfs is run twice, it will display succeed the
second time around and display an informative message.