Loic Dachary [Wed, 29 Jan 2014 13:52:22 +0000 (14:52 +0100)]
pybind: fix tests that do not fail as expected
A missing argument make the test fail indeed, but the intended test is
to demonstrate something else ( either character validation or excess of
arguments etc. ). The result is {} instead of None which is what should
have been expected in the first place.
Ideally there would be a more verbose way to check for syntactic errors
to make such mistakes less probable.
Loic Dachary [Sun, 2 Feb 2014 09:05:59 +0000 (10:05 +0100)]
mon: compute the ruleset of erasure-coded pools
The default ruleset of an erasure coded pool may depend on the
parameters used to configure it. In the case of a pyramidal /
hierarchical plugin, the desired ruleset will, for instance, chose from
datacenters and then from racks and disperse local coding chunks among
them.
For this reason the default ruleset cannot be hardcoded in config_opts
as it is for replicated pools. Instead, the "crush_ruleset" property is
interpreted to be the name of an existing crush ruleset to be used.
If the corresponding ruleset is found in a pending crushmap, the
prepare_pool_crush_ruleset will return EAGAIN. The "osd pool create"
caller is modified to handle the EAGAIN error and reschedules the message.
Loic Dachary [Sun, 2 Feb 2014 08:59:52 +0000 (09:59 +0100)]
mon: erasure code plugin loader helper
The get_erasure_code helper loads the erasure code plugin found in the
erasure-code-plugin string of the properties argument. It is meant to be
used to query the plugin to determine the desired size of a pool, the
more suitable ruleset to use etc.
Loic Dachary [Sun, 2 Feb 2014 08:56:13 +0000 (09:56 +0100)]
mon: pool create helper for crush ruleset
The crush ruleset of the replicated pools are by default set to
osd_pool_default_crush_replicated_ruleset but it may vary depending on
the pool type. Create a helper to compute the crush ruleset.
Loic Dachary [Sun, 2 Feb 2014 08:51:50 +0000 (09:51 +0100)]
mon: pool creation helper for size
The size of the replicated pools are by default set to
osd_pool_default_size but it may vary depending on the pool type. Create
a helper to compute the pool size.
Loic Dachary [Sat, 1 Feb 2014 09:21:00 +0000 (10:21 +0100)]
mon: no default ruleset except for replicated pools
Remove the hardcoded default ruleset for erasure coded pools and only
keep it for replicated pools. Move the logic up in the prepare_new_pool
method so that an error code can be returned before allocating the new
pending pool in case the ruleset is not initialized.
Loic Dachary [Sat, 1 Feb 2014 09:09:12 +0000 (10:09 +0100)]
mon: helper for pool properties parsing
Add the prepare_pool_properties to convert the properties vector into a
properties map suitable for either initializing the pg_pool_t member or
an erasure code plugin.
It is based on CrushWrapper::add_simple_ruleset, using a "default" root
and "host" failure domain by default. They can be overridden with
erasure-code parameters ( erasure-code-ruleset-root and
erasure-code-ruleset-failure-domain respectively ).
Sage Weil [Tue, 4 Feb 2014 05:12:41 +0000 (21:12 -0800)]
client: fix warnings
client/Client.cc: In member function 'int Client::_read(Fh*, int64_t, uint64_t, ceph::bufferlist*)':
warning: client/Client.cc:5893:27: comparison between signed and unsigned integer expressions [-Wsign-compare]
client/Client.cc: In member function 'int Client::_write(Fh*, int64_t, uint64_t, const char*)':
warning: client/Client.cc:6235:30: comparison between signed and unsigned integer expressions [-Wsign-compare]
Sage Weil [Mon, 3 Feb 2014 21:19:14 +0000 (13:19 -0800)]
mon: fix 'mds set allow_new_snaps'
We had already added this as a flag (set/unset) when I generalized the
'mds set_max_mds' to be 'ceph mds set <var> <val>'. Switch the snaps
flag to be a key/value to with true/false (similar to the hashpspool
pool flag) since there are fewer users and the var/val approach is more
general.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Loic Dachary [Sun, 2 Feb 2014 00:09:52 +0000 (01:09 +0100)]
mon: do not force proposal when no osds
If there are no OSDs, there is no need to propose to paxos. It does not
hurt on a production cluster but it matters when running tests because
it effectively alway ignores --paxos-propose-interval.
Sage Weil [Mon, 3 Feb 2014 16:54:14 +0000 (08:54 -0800)]
client: use 64-bit value in sync read eof logic
The file size can jump to a value that is very much larger than our current
position (for example, it could be a disk image file that gets a sparse
write at a large offset). Use a 64-bit value so that 'some' doesn't
overflow.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: John Spray <john.spray@inktank.com>
Somnath Roy [Wed, 22 Jan 2014 18:30:32 +0000 (10:30 -0800)]
Pipe, cephx: Message signing under config option
The config option was present earlier but the option is checked
not in the beginning, so, it was doing some stuff. Now the config
option guard is at the very beginning of the function.
Signed-off-by: Somnath Roy <somnath.roy@sandisk.com> Signed-off-by: Greg Farnum <greg@inktank.com>
Greg Farnum [Tue, 7 Jan 2014 18:05:57 +0000 (10:05 -0800)]
TrackedOp: optionally disable the actual tracking operations
To avoid op contention on global locks, optionally disable the
op tracking. Create an "osd_op_tracker" config to control it, then
in the OpTracker constructor set a bool. If it's set, the OpTracker
doesn't actually maintain its lists and avoids taking any locks. We
maintain the within-op tracking for now since it shouldn't contend,
but we can turn that off later on if we like.
Greg Farnum [Tue, 7 Jan 2014 17:43:28 +0000 (09:43 -0800)]
Throttler: optionally disable use of perfcounters
These can be expensive enough that we want to disable them. We're already
mostly set up for it to be optional, so just plug in a config option and
move a timestamp read under the "if (logger)" guard to set it up!
- fix a couple of typo for repo configuration and service restart
- the ceph package must be installed on RPM distro since the init
script relies on ceph-conf
- Note on radosgw service name for RPM distro
David Zafman [Wed, 29 Jan 2014 03:18:32 +0000 (19:18 -0800)]
osd: Move the rest of scrubbing routines to the backend
Move enum scrub_error_type to osd_types.h
Move PG::_compare_scrub_objects to ReplicatedBackend::be_compare_scrub_objects
Move PG::_select_auth_object to ReplicatedBackend::be_select_auth_object
Move PG::_compare_scrubmaps to ReplicatedBackend::be_compare_scrubmaps
Signed-off-by: David Zafman <david.zafman@inktank.com>
Sage Weil [Fri, 31 Jan 2014 15:19:10 +0000 (07:19 -0800)]
os/KeyValueStore: fix warning
./os/KeyValueStore.h: In member function ‘std::string KeyValueStore::strip_object_key(uint64_t)’:
warning: ./os/KeyValueStore.h:173:31: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
Sage Weil [Thu, 30 Jan 2014 23:13:05 +0000 (15:13 -0800)]
mon/OSDMonitor: encode full OSDMap with same feature bits as the Incremental
Each monitor is independently encoding the full OSDMap and storing it in
its local store. Sometime this happens when we do not have a valid value
for quorum_features (for example, it can happen during sync).
Instead, use the feature bits the Incremental was encoded with for the full
OSDMap so that they always match.
Note that this conveniently the *only* place in the mon where we encode
the full OSDMap, so we're capturing all paths. Yay!
Sage Weil [Thu, 30 Jan 2014 23:09:58 +0000 (15:09 -0800)]
OSDMap: note encoding features in Incremental encoding
The monitor will need to know what features the incremental was encoded
with so that it can encode the OSDMap using the same bits. Introduce a
member that is set during decode. During encode, encoding the value passed
in by the caller.
Loic Dachary [Wed, 29 Jan 2014 10:00:08 +0000 (11:00 +0100)]
ceph-disk: support and test the absence of PATH
Although this is not exactly the context in which ceph-disk is run when
invoked by udev, it makes sure there is at least one sensible way of
using it when PATH is undefined.
Also make src/ceph.in not fail if PATH is not defined.
Haomai Wang [Thu, 30 Jan 2014 03:11:12 +0000 (19:11 -0800)]
FileStore: avoid leveldb check for xattr when possible
Maintain an internal xattr called "spill_out" that indicates whether we
(may) have xattrs stored in omap. If attribute is set, it will indicate
that we should or should not look in omap. If the attribute is not
present, then we do not know and will also need to check.
For new stores, this will avoid the overhead of consulting omap in the
general case until a particular objects gets enough (or big) xattrs and
spills over.
For old stores, we will effectively fall back to the previous behavior
of always checking.
Implements #7059
Signed-off-by: Haomai Wang <haomaiwang@gmail.com> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Ilya Dryomov [Wed, 29 Jan 2014 14:12:01 +0000 (16:12 +0200)]
rbd: don't forget to call close_image() if remove_child() fails
close_image() among other things unregisters a watcher that's been
registered by open_image(). Even though it'll timeout in 30 or so
seconds, it's not nice now that we check for watchers before starting
the removal process.
Ilya Dryomov [Wed, 29 Jan 2014 14:12:01 +0000 (16:12 +0200)]
rbd: check for watchers before trimming an image on 'rbd rm'
Check for watchers before trimming image data to try to avoid getting
into the following situation:
- user does 'rbd rm' on a mapped image with an fs mounted from it
- 'rbd rm' trims (removes) all image data, only header is left
- 'rbd rm' tries to remove a header and fails because krbd has a
watcher registered on the header
- at this point image cannot be unmapped because of the mounted fs
- fs cannot be unmounted because all its data and metadata is gone
Unfortunately, this fix doesn't make it impossible to happen (the
required atomicity isn't there), but it's a big improvement over the
status quo.
Ilya Dryomov [Thu, 30 Jan 2014 11:39:15 +0000 (13:39 +0200)]
pybind: work around find_library() not searching LD_LIBRARY_PATH
Commit b28b64a0b6db ("pybind: use find_library for libcephfs and
librbd") switched us to find_library(), but this function doesn't seem
to respect LD_LIBRARY_PATH. There are numerous python tickets, dating
back several years, but alas. Work around it by using the soname as
a fallback. (rados.py has been fixed by commit e46d2ca067b5 ("fix the
bug ctypes.util.find_library to search for librados failed on
Centos6.4.")