mon/MgrMonitor: read cmd descs if empty on update_from_paxos()
If the MgrMonitor's `command_descs` is empty, the monitor will not send
the mgr commands to clients on `get_descriptions`. This, in turn, has
the clients sending the commands to the monitors, which will have no
idea how to handle them.
Therefore, make sure to read the `command_descs` from disk if the vector
is empty.
Fixes: http://tracker.ceph.com/issues/21300 Signed-off-by: Joao Eduardo Luis <joao@suse.de>
(cherry picked from commit 3d06079bae0fbc096d6c3639807d9be3597e841a)
mon/MgrMonitor: populate on-disk cmd descs if empty on upgrade
During kraken, when we first introduced the mgrs, we wouldn't populate
the on-disk command descriptions on create_initial(). Therefore, if we
are upgrading from a cluster that never had a mgr, we may end up
crashing because we have no cmd descs to load from disk.
Fixes: http://tracker.ceph.com/issues/21300 Signed-off-by: Joao Eduardo Luis <joao@suse.de>
Sage Weil [Thu, 10 Aug 2017 20:44:59 +0000 (16:44 -0400)]
os/bluestore: allocate entire write in one go
On the first pass through the writes, compress data and calculate a final
amount of space we need to allocate. On the second pass, assign the
extents to blobs and queue the writes.
This allows us to do a single allocation for all blobs, which will lead
to less fragmentation and a much better write pattern.
Ilya Dryomov [Thu, 17 Aug 2017 13:35:42 +0000 (15:35 +0200)]
qa/tasks/rbd.xfstests: take exclude list from yaml
Different filesystems (and further, different configurations of the
same filesystem) need different exclude lists. Hard coding the list in
a wrapper script is inflexible.
Ilya Dryomov [Wed, 16 Aug 2017 09:47:19 +0000 (11:47 +0200)]
qa/run_xfstests.sh: quit building xfstests on test nodes
xfstests is a pain to build on trusty, xenial and centos7 with a single
script. It is also very sensitive to dependencies, which again need to
be managed on all those distros -- different sets of supported commands
and switches, some versions have known bugs, etc.
Download a pre-built, statically linked tarball and use it instead.
The tarball was generated using xfstests-bld by Ted Ts'o, with a number
of tweaks by myself (mostly concerning the build environment).
Ilya Dryomov [Wed, 16 Aug 2017 09:47:19 +0000 (11:47 +0200)]
qa/run_xfstests.sh: drop *_MKFS_OPTIONS variables
AFAICT ./check doesn't query EXT4_MKFS_OPTIONS or BTRFS_MKFS_OPTIONS,
We don't need anything special for xfs, so remove all of them to avoid
confusion.
since the roles are mapped inside ceph-deploy, store the roles that
are mapped and use the new mapped role for upgrades during later
stage.
eg: mon.a is mapped to mon.mira002 during install, store this mapping
and durig upgrade map it back to appropriate name to find the hostname
with that role
Sage Weil [Wed, 6 Sep 2017 19:34:50 +0000 (15:34 -0400)]
pybind/mgr/localpool: module to automagically create localized pools
By default, this will create a pool per rack, 3x replication, with a host
failure domain. Those parameters can be customized via mgr config-key
options.
Sage Weil [Wed, 20 Sep 2017 20:42:01 +0000 (16:42 -0400)]
mon/OSDMonitor: error out if setting ruleset-* ec profile property
We change ruleset -> crush back in dc7a2aaf7a34b1e6af0c7b79dc44a69974c1da23.
If someone tries to use the old property, error out early, instead of
silently not doing the thing they thought they told us to do.
John Spray [Sat, 23 Sep 2017 12:48:36 +0000 (13:48 +0100)]
mon: show legacy health warning in `status` output
Previously you only got the text of this if you were
either looking at "health detail" or if you had
already set the preluminous_compat setting (in which
case you presumably were already aware so the message
isn't doing much).
Patrick Donnelly [Fri, 22 Sep 2017 16:53:37 +0000 (09:53 -0700)]
Merge PR #17854 into luminous
* refs/remotes/upstream/pull/17854/head:
mds: void sending cap import message when inode is frozen
client: fix message order check in handle_cap_export()
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
David Zafman [Thu, 7 Sep 2017 03:41:50 +0000 (20:41 -0700)]
test: Fix ceph-objectstore-tool test for standalone and latest code
vstart.sh now defaults to bluestore, so specify filestore
Set environment for run-standalone.sh and cmake build
Create td/cot_dir as test directory
Crush output format change
Change dir into test directory
Give a little time after pool creation
Check for core files as ceph-helpers.sh does
ceph: do link/rename semantic checks after srcdn is readable
For hard link, source inode must not be directory. For rename,
types of source/destination inodes must match. If srcdn is replica
and we do these checks while it's not readble, it's possible that
wrong source inode is used in these checks.
client: set client_try_dentry_invalidate to false by default
By default, ceph-fuse uses side effect of 'dentry invalidation' to
trim kernel dcache if it runs on kernel < 3.18. The implemention of
kernel function d_invalidate() changed in 3.18 kernel, the method no
longer works for upstream kernel >= 3.18.
RHEL 3.10 kernel includes backport of patches that change implemention
of d_invalidate(). So checking kernel version to decide if 'dentry
invalidation' method works is unreliable.
Douglas Fuller [Tue, 12 Sep 2017 17:22:09 +0000 (13:22 -0400)]
qa/tasks/cephfs: Whitelist POOL_APP_NOT_ENABLED for test_misc
test_misc verifies that ceph fs new will not create a filesystem
on a pool that already contains objects. As part of the test, it
inserts a dummy object into a pool and then attempts to use it for
CephFS. This triggers POOL_APP_NOT_ENABLED. Setting the application
metadata for the pool (and having ceph fs new fail because of the
existing metadata) would then exercise a different failure case.
Jeff Layton [Fri, 25 Aug 2017 12:31:47 +0000 (08:31 -0400)]
client: add mountedness check inside client_lock
Currently we check for mountedness in the high level wrappers, but those
checks are lockless. It's possible to have a call that races with
ceph_unmount(). It could pass one of the is_mounted() checks in the
wrapper, and then block on the client_lock while the unmount is actually
running. Eventually it picks up and runs after the unmount returns, with
questionable results -- possibly even a crash in some cases.
For now, we can explain this away with a simple admonition that
applications should ensure that no calls are running when ceph_unmount
is called. In the future though, we may need to forcibly shut down the
mount when certain events occur (not returning a lease or delegation in
time, for instance).
Sprinkle in a bunch of "unmounting" checks after taking the client_lock,
and simply have the functions return errors (or sensible values in some
cases) when the Client is being downed. With that, we ensure that this
sort of race can't occur, even when the unmount is not being driven by
userland. Note too that in some places I've replaced assertions in the
code with error returns, as that's nicer behavior for libraries.
Note that this can't replace the ->is_mounted() checks in the lockless
wrappers as those are needed to determine whether the client pointer in
the ceph_mount_info is still valid. The admonition not to allow
ceph_unmount to race with other calls is therefore still necessary.
jermudgeon [Mon, 28 Aug 2017 05:26:28 +0000 (21:26 -0800)]
mgr/prometheus: Fix for MDS metrics
MDS metrics come in these forms:
mds_mem_dir #Directories
mds_mem_dir+ #Directories opened
mds_mem_dir- #Directories closed
In this case, continuing the trend of replacing all illegal characters with '_' results in…
mds_mem_dir #Directories
mds_mem_dir_ #Directories opened
mds_mem_dir_ #Directories closed
which is palpably a bad idea.
Suggested replacement for '+' = '_plus' seems fine, and a perusal of all metrics indicate that only MDS metrics end in '-' or '+' at this time.
Replacing '-' with '_minus' is probably less good for the general case, if anyone has a better idea…
I suppose another alternative would be to change MDS metrics so they don't use 'illegal' characters, but this also seems cumbersome and would break more third parties.
Fixes: http://tracker.ceph.com/issues/20899 Signed-off-by: Jeremy H Austin <jhaustin@gmail.com>
(cherry picked from commit d719cd04b294e90ab9d440ba7d033826c069a2de)
The UUID thing (a) relies on partition labels to work, which isn't
always true (and won't be true for ceph-volume going forward), and
(b) reportedly doesn't work anyway. The fd-based helper works
just fine (even for vstart).