Ilya Dryomov [Thu, 17 Aug 2017 13:35:42 +0000 (15:35 +0200)]
qa/tasks/rbd.xfstests: take exclude list from yaml
Different filesystems (and further, different configurations of the
same filesystem) need different exclude lists. Hard coding the list in
a wrapper script is inflexible.
Ilya Dryomov [Wed, 16 Aug 2017 09:47:19 +0000 (11:47 +0200)]
qa/run_xfstests.sh: quit building xfstests on test nodes
xfstests is a pain to build on trusty, xenial and centos7 with a single
script. It is also very sensitive to dependencies, which again need to
be managed on all those distros -- different sets of supported commands
and switches, some versions have known bugs, etc.
Download a pre-built, statically linked tarball and use it instead.
The tarball was generated using xfstests-bld by Ted Ts'o, with a number
of tweaks by myself (mostly concerning the build environment).
Ilya Dryomov [Wed, 16 Aug 2017 09:47:19 +0000 (11:47 +0200)]
qa/run_xfstests.sh: drop *_MKFS_OPTIONS variables
AFAICT ./check doesn't query EXT4_MKFS_OPTIONS or BTRFS_MKFS_OPTIONS,
We don't need anything special for xfs, so remove all of them to avoid
confusion.
since the roles are mapped inside ceph-deploy, store the roles that
are mapped and use the new mapped role for upgrades during later
stage.
eg: mon.a is mapped to mon.mira002 during install, store this mapping
and durig upgrade map it back to appropriate name to find the hostname
with that role
Sage Weil [Wed, 6 Sep 2017 19:34:50 +0000 (15:34 -0400)]
pybind/mgr/localpool: module to automagically create localized pools
By default, this will create a pool per rack, 3x replication, with a host
failure domain. Those parameters can be customized via mgr config-key
options.
Sage Weil [Wed, 20 Sep 2017 20:42:01 +0000 (16:42 -0400)]
mon/OSDMonitor: error out if setting ruleset-* ec profile property
We change ruleset -> crush back in dc7a2aaf7a34b1e6af0c7b79dc44a69974c1da23.
If someone tries to use the old property, error out early, instead of
silently not doing the thing they thought they told us to do.
Patrick Donnelly [Fri, 22 Sep 2017 16:53:37 +0000 (09:53 -0700)]
Merge PR #17854 into luminous
* refs/remotes/upstream/pull/17854/head:
mds: void sending cap import message when inode is frozen
client: fix message order check in handle_cap_export()
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
ceph: do link/rename semantic checks after srcdn is readable
For hard link, source inode must not be directory. For rename,
types of source/destination inodes must match. If srcdn is replica
and we do these checks while it's not readble, it's possible that
wrong source inode is used in these checks.
client: set client_try_dentry_invalidate to false by default
By default, ceph-fuse uses side effect of 'dentry invalidation' to
trim kernel dcache if it runs on kernel < 3.18. The implemention of
kernel function d_invalidate() changed in 3.18 kernel, the method no
longer works for upstream kernel >= 3.18.
RHEL 3.10 kernel includes backport of patches that change implemention
of d_invalidate(). So checking kernel version to decide if 'dentry
invalidation' method works is unreliable.
Douglas Fuller [Tue, 12 Sep 2017 17:22:09 +0000 (13:22 -0400)]
qa/tasks/cephfs: Whitelist POOL_APP_NOT_ENABLED for test_misc
test_misc verifies that ceph fs new will not create a filesystem
on a pool that already contains objects. As part of the test, it
inserts a dummy object into a pool and then attempts to use it for
CephFS. This triggers POOL_APP_NOT_ENABLED. Setting the application
metadata for the pool (and having ceph fs new fail because of the
existing metadata) would then exercise a different failure case.
Jeff Layton [Fri, 25 Aug 2017 12:31:47 +0000 (08:31 -0400)]
client: add mountedness check inside client_lock
Currently we check for mountedness in the high level wrappers, but those
checks are lockless. It's possible to have a call that races with
ceph_unmount(). It could pass one of the is_mounted() checks in the
wrapper, and then block on the client_lock while the unmount is actually
running. Eventually it picks up and runs after the unmount returns, with
questionable results -- possibly even a crash in some cases.
For now, we can explain this away with a simple admonition that
applications should ensure that no calls are running when ceph_unmount
is called. In the future though, we may need to forcibly shut down the
mount when certain events occur (not returning a lease or delegation in
time, for instance).
Sprinkle in a bunch of "unmounting" checks after taking the client_lock,
and simply have the functions return errors (or sensible values in some
cases) when the Client is being downed. With that, we ensure that this
sort of race can't occur, even when the unmount is not being driven by
userland. Note too that in some places I've replaced assertions in the
code with error returns, as that's nicer behavior for libraries.
Note that this can't replace the ->is_mounted() checks in the lockless
wrappers as those are needed to determine whether the client pointer in
the ceph_mount_info is still valid. The admonition not to allow
ceph_unmount to race with other calls is therefore still necessary.
The UUID thing (a) relies on partition labels to work, which isn't
always true (and won't be true for ceph-volume going forward), and
(b) reportedly doesn't work anyway. The fd-based helper works
just fine (even for vstart).
Douglas Fuller [Wed, 12 Jul 2017 15:43:39 +0000 (10:43 -0500)]
qa/cephfs: support CephFS recovery pools
Add support for testing recovery of CephFS metadata into an alternate
RADOS pool, useful as a disaster recovery mechanism that avoids
modifying the metadata in-place.
Douglas Fuller [Wed, 12 Jul 2017 15:41:11 +0000 (10:41 -0500)]
qa/ceph_test_case: support CephFS recovery pools
Add support for testing recovery of CephFS metadata into an alternate
RADOS pool, useful as a disaster recovery mechanism that avoids
modifying the metadata in-place.
Yan, Zheng [Tue, 29 Aug 2017 03:35:56 +0000 (11:35 +0800)]
mds: void sending cap import message when inode is frozen
To export an inode to other mds, mds need to:
- Freeze the inode (stop issuing caps to clients)
- Flush client sessions (ensure client have received all cap messages)
- Send cap export message
These steps guarantee that clients receive cap import/export messages
in proper order (In the case that inode gets exported servel times
within a short time)
When inode is frozen, mds may have already flushed client sessions.
So mds shouldn't send cap import messages.
Yan, Zheng [Mon, 28 Aug 2017 09:13:31 +0000 (17:13 +0800)]
client: fix message order check in handle_cap_export()
If importer mds' cap already exists, but cap ID mismatches, client
should have received corresponding import message (the imported caps
got released later). Because cap ID does not change as long as client
holds the caps.
mds: check ongoing catter-gather process before capping log
When deactivating mds, MDLog::trim() may start scatter-gather
process on mdsdir inode. Locker::scatter_writebehind() submits
log entry. So mds should make sure there is no scatter-gather
before capping log.
The fast dispatch refactor in 3cc48278bf0ee5c9535d04b60a661f988c50063b
eliminated the osdmap subscription in the ms_fast_dispatch path, which
meant ops could reach a PG without having the latest map. In a cluster
with few osdmap updates, where the monitor fails to send a new map to
an osd (it tries one random osd), this can result in indefinitely
blocked requests.
Fix this by adding an OSDService mechanism for scheduling a new osdmap
subscription request.
We need to prevent duplicates in the final result. For example, we
can currently take
[1,2,3] and apply [(1,2)] and get [2,2,3]
or
[1,2,3] and apply [(3,2)] and get [1,2,2]
The rest of the system is not prepared to handle duplicates in the
result set like this.
The reverted commit was intended to allow
[1,2,3] and [(1,2),(2,1)] to get [2,1,3]
to reorder primaries. First, this bidirectional swap is hard to implement
in a way that also prevents dups. For example,
[1,2,3] and [(1,4),(2,3),(3,4)] would give [4,3,4]
but would we just drop the last step we'd have [4,3,3] which
is also invalid, etc. Simpler to just not handle bidirectional
swaps. In practice, they are not needed: if you just want to choose
a different primary then use primary_affinity, or pg_upmap
(not pg_upmap_items).