Sage Weil [Sat, 28 Dec 2013 20:23:22 +0000 (12:23 -0800)]
mds: require CEPH_FEATURE_OSD_TMAP2OMAP
Require that all OSDs support TMAP2OMAP before starting the MDS. This
avoids doing some work and then crashing with EOPNOTSUPP, and gives us
a more informative message in the logs.
Yan, Zheng [Tue, 24 Dec 2013 00:56:55 +0000 (08:56 +0800)]
mds: use OMAP to store dirfrags
MDS can fetch dirfrags from both TMAP and OMAP. When committing a
dirfrags that is stored in TMAP, MDS first uses OSD_OP_TMAP2OMAP
to convert corresponding TMAP to OMAP, then updates the resulting
OMAP.
Samuel Just [Fri, 10 Jan 2014 21:23:32 +0000 (13:23 -0800)]
os/DBObjectMap, FileStore: omap_clear should not remove xattrs
Prevously, FileStore::_omap_clear() used ObjectMap::clear(), which
incorrectly also blasts any stored xattrs. Instead, add
ObjectMap::clear_keys_header() to handle this case efficiently.
Fixes: #7065 Fixes: #7135 Signed-off-by: Samuel Just <sam.just@inktank.com>
Loic Dachary [Fri, 10 Jan 2014 16:49:21 +0000 (17:49 +0100)]
organizationmap: match authors with organizations
Using the same format as .mailmap, match author names with the
organization sponsoring them, if any. It can be used from the command
line to display git log statistics with results aggregated by company
names.
The git-check-mailmap command that was introduced in git 1.8.4 can be
used to use .mailmap first and then .organizationmap using the
normalized author names. For instance:
Greg Farnum [Thu, 9 Jan 2014 22:03:12 +0000 (14:03 -0800)]
FileStore: detect XFS properly
We were only setting m_fs_type = FS_TYPE_XFS if
m_filestore_replica_fadvise was also set -- presumably
the bug fix accidentally blocked off too much of the code type. This
resulted in our xattr counts always being set too low: the store
is mounted (and thus does _detectfs) twice; once in as part of the
not-as-conditional-as-it-looks convertfs in ceph_osd.cc, and once
as part of OSD::init().
Sage Weil [Thu, 9 Jan 2014 22:44:49 +0000 (14:44 -0800)]
mon: set next commit in mon command replies
The mon command acks include a version that is used by the client to
determine which version of the map they need to get or wait for in order
to see the effects of their command. Current we are returning
get_last_committed() everywhere, but we are about to commit something (and
waiting for it), which will increase that value by one. As a result,
clients are always getting epoch/version-1 instead of epoch.
This manifested by a LibRadosTier.Promote test that failed becaues the
OSD had the OSDMap updates adding the tier and overlay but not the final
map change that set the cache-mode to writeback. I suspect this is also
the cause of of spurious errors in the past where we've seen misdirected
request errors that made no sense.
Backport: emperor, dumpling Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Joao Eduardo Luis <joao@inktank.com>
Yehuda Sadeh [Tue, 7 Jan 2014 02:32:42 +0000 (18:32 -0800)]
rgw: convert bucket info if needed
Fixes: #7110
In dumpling, the bucket info was separated into bucket entry point and
bucket instance objects. When setting bucket attrs we only ended up
updating the bucket instance object. However, pre-dumpling buckets still
keep everything at the entry-point object, so acl changes didn't affect
anything (because we never updated the entry point). This change just
converts the bucket info into the new format.
Ken Dreyer [Thu, 9 Jan 2014 15:55:28 +0000 (08:55 -0700)]
remove spurious executable permissions on files
Fedora's rpmlint complains that some of the source code files in the
tree happen to be executable. Remove the execute bits from these files
to resolve the rpmlint warning.
Signed-off-by: Ken Dreyer <ken.dreyer@inktank.com>
Loic Dachary [Wed, 8 Jan 2014 19:13:37 +0000 (20:13 +0100)]
erasure-code: ensure that coding chunks are page aligned
When coding chunks are allocated for jerasure, their address must be
aligned to page boundaries. The requirement is actually to be aligned on
a long long boundary but bufferlist do not allow for fine tuning of the
alignment.
If padding is necessary because the total size of the data to be encoded
is not a multiple of the alignment requirements as returned by
get_alignment(), the buffer is not only padded but also rebuilt using
rebuild_page_aligned() to preserve the page alignment that is expected
of the input buffer.
The overhead of rebuilding the whole input buffer when padding is
necessary could be reduced by only reallocating one buffer for the last
data chunk, therefore reducing the amount of data being copied. However,
this optimization is not going to be used if the caller takes care of
the padding, which is likely to be the case most of the time.
Andreas Peters [Wed, 18 Dec 2013 13:47:58 +0000 (14:47 +0100)]
EC-JERASURE: rewrite region-xor function using vector operations to get ~ x1.5 speedups for erasure code and guarantee proper 64-bit/128-bit buffer alignment
Loic Dachary [Tue, 7 Jan 2014 15:49:44 +0000 (16:49 +0100)]
common: fix large output in unittest_daemon_config
All tests in daemon_config use the global g_ceph_context
object. Introducing an expansion loop in it will impact all tests and
generate a very large output.
Remove the SubstitutionLoop test case which is also covered in
test/common/test_config.cc.
Yehuda Sadeh [Mon, 6 Jan 2014 20:53:58 +0000 (12:53 -0800)]
radosgw-admin: fix object policy read op
Fixes: #7083
This was broken when we fixed #6940. We use the same function to both
read the bucket policy and the object policy. However, each needed to be
treated differently. Restore old behavior for objects.
Loic Dachary [Sun, 5 Jan 2014 14:49:57 +0000 (15:49 +0100)]
common: unit tests for config::expand_meta
Part of the config.cc tests are in test/confutils.cc but they do not
cover meta variable expansion. Create unittest_config for config.{h,cc}
specific tests.
The test_md_config_t is made a friend of md_config_t to allow testing
private and protected methods.
test/cli/ceph-conf/show-config-value.t is used to check that the human
readable message message shows as expected when there is an expansion
loop.
Loic Dachary [Sun, 5 Jan 2014 14:38:55 +0000 (15:38 +0100)]
common: recursive implementation of config::expand_meta
Using a recursive implementation of variable expansions make it easier
to protect against loops and provide human readable messages when they
happen.
It also enables one variable to be substituted multiple times in the
same configuration option instead of just once because it is confused
with a variable expansion loop.
Sage Weil [Sun, 5 Jan 2014 06:43:26 +0000 (22:43 -0800)]
mon: only send messages to current OSDs
When choosing a random OSD to send a message to, verify not only that
the OSD id is up but that the session is for the same instance of that OSD
by checking that the address matches.
Fixes: #7093
Backport: emperor, dumpling Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Sun, 5 Jan 2014 06:40:43 +0000 (22:40 -0800)]
osd: ignore OSDMap messages while we are initializing
The mon may occasionally send OSDMap messages to random OSDs, but is not
very descriminating in that we may not have authenticated yet. Ignore any
messages if that is the case; we will reqeust whatever we need during the
BOOTING state.
Fixes: #7093 Signed-off-by: Sage Weil <sage@inktank.com>
Noah Watkins [Sat, 4 Jan 2014 19:32:51 +0000 (11:32 -0800)]
onexit: add an on exit callback utility
Adds a class that executes registered callbacks in its destructor. Since
static duration objects have their destructors called when returning
from main or calling exit then this can be used as a replacement for
on_exit.
test: disable cross process sem tests on non-Linux
How to make this portable:
- MAP_ANONYMOUS -> MAP_ANON (OSX)
- sem_init (anonymous semaphore) needs to be replaced by named
semaphores using sem_open/sem_close. Use a memory address of the sem_t
variable to hack anonymous semaphore behavior.
- sem_getvalue isn't supported on OSX. it is used here to do
sem_wait/sem_post to bring a semaphore back to a specific value. to get
around this we may need to restructure the test so that the semaphore
can be destroyed and re-initialized rather than inspected as its
currently being done.
On OSX (and currently any platform missing the MSG_MORE
macro) the MSG_MORE optimization is disabled. The MSG_NOSIGNAL flag is
available on OSX but is called SO_NOSIGPIPE and must be set via
setsockopt.
Loic Dachary [Fri, 3 Jan 2014 21:52:55 +0000 (22:52 +0100)]
ceph-disk: fix false positive for gitbuilder
The output of test/ceph-disk.sh is very verbose which is good for
debugging errors. However it sometime contains strings that match
/error:/i which is picked by gitbuilder as a sign that the test fail,
even when the exit code is zero.
Remove from the output the three strings triggering false positive in
gitbuilder.
Sage Weil [Fri, 3 Jan 2014 20:51:15 +0000 (12:51 -0800)]
osdc/ObjectCacher: back off less during flush
In cce990efc8f2a58c8d0fa11c234ddf2242b1b856 we added a limit to avoid
holding the lock for too long. However, if we back off, we currently
wait for a full second, which is probably a bit much--we really just want
to give other threads a chance.
Backport: emperor Signed-off-by: Sage Weil <sage@inktank.com>
Loic Dachary [Wed, 1 Jan 2014 21:11:30 +0000 (22:11 +0100)]
ceph-disk: create the data directory if it does not exist
Instead of failing if the OSD data directory does not exist, create
it. Only do so if the data directory is not enforced to be a device via
the use of the --data-dev flag. The directory is not recursively created.
Loic Dachary [Mon, 30 Dec 2013 22:57:39 +0000 (23:57 +0100)]
ceph-disk: implement --mark-init=none
It is meant to be used when preparing and activating a directory that is
not to be used with init. No file is created to identify the init
system, no symbolic link is made to the directory in /var/lib/ceph
and the init scripts are not called.
Loic Dachary [Wed, 1 Jan 2014 21:07:57 +0000 (22:07 +0100)]
ceph-disk: fsid is a known configuration option
Use get_conf_with_default instead of get_conf because fsid is a known
ceph configuration option. It allows overriding via CEPH_ARGS which is
convenient for testing. Only options that are not found in config_opts.h
are fetch via get_conf.