Sage Weil [Thu, 5 Oct 2017 20:26:16 +0000 (15:26 -0500)]
src/messages/MOSDMap: reencode OSDMap for older clients
We explicitly select which missing bits trigger a reencode. We
already had jewel and earlier covered, but kraken includes all of
the previously mentioned bits but not SERVER_LUMINOUS. This
prevents kraken clients from decoding luminous maps.
Kefu Chai [Thu, 31 Aug 2017 10:15:28 +0000 (18:15 +0800)]
cmake: disable VTA on options.cc
to silence following warning and to avoid compiling this file twice:
ceph/src/common/options.cc: In function ‘std::vector<Option> get_global_options()’:
ceph/src/common/options.cc:151:21: note: variable tracking
size limit exceeded with -fvar-tracking-assignments, retrying without
std::vector<Option> get_global_options() {
^~~~~~~~~~~~~~~~~~
osd: make the PG's SORTBITWISE assert a more generous shutdown
We want to stop working if we get activated while sortbitwise is not set
on the cluster, but we might have old maps where it wasn't if the flag
was changed recently. And doing it in the PG code was a bit silly anyway.
Instead check SORTBITWISE in the main OSDMap handling code prior to
prepublishing it. Let it go through if we aren't active at the time.
Add _interfaces option to constrain the choice of IPs in the network
list to those on interfaces matching the provided list of interface names.
The _interfaces options only work in concert with the _network options,
so you must also specify a list of networks if you want to use a specific
interface, e.g., by specifying a broad network like "::" or "0.0.0.0/0".
Greg Farnum [Tue, 3 Oct 2017 22:54:06 +0000 (15:54 -0700)]
msgr: add a mechanism for Solaris to avoid dying on SIGPIPE
This is fairly clean: we define an RAII object in the Messenger.h on
Solaris, and "declare" it with a macro in the implementations. There's
no code duplication and on Linux it's just entirely compiled out.
Sage Weil [Tue, 3 Oct 2017 21:48:37 +0000 (16:48 -0500)]
os/bluestore: use normal Context for async deferred_try_submit
I'm not quite sure why the FunctionContext did not ever execute on the
finisher thread (perhaps the [&] captured some state on the stack that it
shouldn't have?). In any case, using a traditional Context here appears
to resolve the problem (of the async deferred_try_submit() never executing,
leading to a bluestore stall/deadlock).
Sage Weil [Fri, 29 Sep 2017 18:47:19 +0000 (13:47 -0500)]
os/bluestore: wake kv thread when blocking on deferred_bytes
We need to wake the kv thread whenever setting deferred_aggressive to
ensure that txns with deferred io that have committed but haven't submitted
their deferred writes get submitted. This aligns us with the other
users of deferred_aggressive (e.g., _osr_drain_all).
Sage Weil [Wed, 4 Oct 2017 13:25:38 +0000 (08:25 -0500)]
mgr/localpool: fix rule selection
The 'osd pool create' arg parsing is broken; the rule name for
'ceph osd pool create $name $numpgs replicated $rulename' is passed
via the erasure_code_profile param. Too many req=false options
without a way to disambiguate them.
Work around it by passing both 'rule' and 'erasure_code_profile'
keys, so that if/when the hack in OSDMonitor.cc is removed it will
still work. Blech.
mon/MgrMonitor: read cmd descs if empty on update_from_paxos()
If the MgrMonitor's `command_descs` is empty, the monitor will not send
the mgr commands to clients on `get_descriptions`. This, in turn, has
the clients sending the commands to the monitors, which will have no
idea how to handle them.
Therefore, make sure to read the `command_descs` from disk if the vector
is empty.
Fixes: http://tracker.ceph.com/issues/21300 Signed-off-by: Joao Eduardo Luis <joao@suse.de>
(cherry picked from commit 3d06079bae0fbc096d6c3639807d9be3597e841a)
Ramana Raja [Wed, 13 Sep 2017 14:23:43 +0000 (19:53 +0530)]
pybind/ceph_volume_client: add get, put, and delete object interfaces
Wrap low-level rados APIs to allow ceph_volume_client to get, put, and
delete objects. The interfaces would allow OpenStack Manila's
cephfs driver to store config data in a shared storage to implement
highly available Manila deployments. Restrict write(put) and
read(get) object sizes to 'osd_max_size' config setting.
... class attribute of the 'CephFSVolumeClient' class. It was supposed
to record the earliest version of CephFSVolumeClient that the current
version is compatible with. It's not useful data to be stored as a
class attribute.
mon/MgrMonitor: populate on-disk cmd descs if empty on upgrade
During kraken, when we first introduced the mgrs, we wouldn't populate
the on-disk command descriptions on create_initial(). Therefore, if we
are upgrading from a cluster that never had a mgr, we may end up
crashing because we have no cmd descs to load from disk.
Fixes: http://tracker.ceph.com/issues/21300 Signed-off-by: Joao Eduardo Luis <joao@suse.de>
Sage Weil [Thu, 10 Aug 2017 20:44:59 +0000 (16:44 -0400)]
os/bluestore: allocate entire write in one go
On the first pass through the writes, compress data and calculate a final
amount of space we need to allocate. On the second pass, assign the
extents to blobs and queue the writes.
This allows us to do a single allocation for all blobs, which will lead
to less fragmentation and a much better write pattern.
Ilya Dryomov [Thu, 17 Aug 2017 13:35:42 +0000 (15:35 +0200)]
qa/tasks/rbd.xfstests: take exclude list from yaml
Different filesystems (and further, different configurations of the
same filesystem) need different exclude lists. Hard coding the list in
a wrapper script is inflexible.
Ilya Dryomov [Wed, 16 Aug 2017 09:47:19 +0000 (11:47 +0200)]
qa/run_xfstests.sh: quit building xfstests on test nodes
xfstests is a pain to build on trusty, xenial and centos7 with a single
script. It is also very sensitive to dependencies, which again need to
be managed on all those distros -- different sets of supported commands
and switches, some versions have known bugs, etc.
Download a pre-built, statically linked tarball and use it instead.
The tarball was generated using xfstests-bld by Ted Ts'o, with a number
of tweaks by myself (mostly concerning the build environment).
Ilya Dryomov [Wed, 16 Aug 2017 09:47:19 +0000 (11:47 +0200)]
qa/run_xfstests.sh: drop *_MKFS_OPTIONS variables
AFAICT ./check doesn't query EXT4_MKFS_OPTIONS or BTRFS_MKFS_OPTIONS,
We don't need anything special for xfs, so remove all of them to avoid
confusion.
since the roles are mapped inside ceph-deploy, store the roles that
are mapped and use the new mapped role for upgrades during later
stage.
eg: mon.a is mapped to mon.mira002 during install, store this mapping
and durig upgrade map it back to appropriate name to find the hostname
with that role
Sage Weil [Wed, 6 Sep 2017 19:34:50 +0000 (15:34 -0400)]
pybind/mgr/localpool: module to automagically create localized pools
By default, this will create a pool per rack, 3x replication, with a host
failure domain. Those parameters can be customized via mgr config-key
options.
Sage Weil [Wed, 20 Sep 2017 20:42:01 +0000 (16:42 -0400)]
mon/OSDMonitor: error out if setting ruleset-* ec profile property
We change ruleset -> crush back in dc7a2aaf7a34b1e6af0c7b79dc44a69974c1da23.
If someone tries to use the old property, error out early, instead of
silently not doing the thing they thought they told us to do.
John Spray [Sat, 23 Sep 2017 12:48:36 +0000 (13:48 +0100)]
mon: show legacy health warning in `status` output
Previously you only got the text of this if you were
either looking at "health detail" or if you had
already set the preluminous_compat setting (in which
case you presumably were already aware so the message
isn't doing much).