Sage Weil [Tue, 14 Aug 2018 13:58:31 +0000 (08:58 -0500)]
ceph_test_objectstore: queue split on parent pg sequencer
This matches the OSD.
It also ensure that we drain transactions that precede the split, which
will include any finish_write calls that might otherwise attach the
Buffer to the wrong cache shard: _txc_finish() calls finish_write and
passes the cache shard without the cache shard lock held, but may block
waiting for split_cache() which then changes the destination collection's
shard. Once it gets the lock and proceeds it would operate on the wrong
cache shard, leading to a failed assert later when the sharedblob is
trimmed.
Fixes: http://tracker.ceph.com/issues/24439 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 13 Aug 2018 17:10:31 +0000 (12:10 -0500)]
os/bluestore: fix split vs finish_write race
In _tcx_finish(), we were looking at the right Cache for the collection,
and then calling finish_write with that Cache and taking the lock. This
could race with a split_cache() such that after we got the lock the
collection was not on a different cache. This would in turn lead to a
failed assertion later on in _rm_buffer when the sharedblob was trimmed.
Fixes: http://tracker.ceph.com/issues/24439 Signed-off-by: Sage Weil <sage@redhat.com>
Patrick Donnelly [Mon, 13 Aug 2018 01:29:34 +0000 (18:29 -0700)]
Merge PR #23444 into master
* refs/pull/23444/head:
cephfs-shell: avoid sys.argv modification
tools/cephfs-shell: added support for batch file processing and to execute commands from arguments.
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Generally the slow warnings we get are just over the threshold. These warnings
are related to deploying multiple Ceph daemons side-by-side. Let's see how we
do with two minutes.
Ignoring the warnings entirely is unsatisfactory as they serve as a useful
canary in the coal mine when you see warnings for ops > some unreasonably large
amount of time.
Fixes: http://tracker.ceph.com/issues/26900 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Jos Collin [Wed, 1 Aug 2018 14:58:21 +0000 (20:28 +0530)]
cephfs: LeaseStat versioning
* Use the new feature bit CEPHFS_FEATURE_REPLY_ENCODING for encoding.
* encode/decode in the new format.
* Dropped LeaseStat::encode().
* The old format is maintained for backward compatibility.
* Created Locker::encode_lease() for encoding and dropped the duplicates.
Fixes: http://tracker.ceph.com/issues/24444 Signed-off-by: Jos Collin <jcollin@redhat.com>
Jos Collin [Wed, 1 Aug 2018 04:18:48 +0000 (09:48 +0530)]
cephfs: DirStat versioning
* Use the new feature bit CEPHFS_FEATURE_REPLY_ENCODING for encoding.
* encode/decode in the new format.
* Dropped DirStat::encode().
* The old format is maintained for backward compatibility.
* Created CDir::encode_dirstat() for encoding and dropped the duplicates.
Fixes: http://tracker.ceph.com/issues/24444 Signed-off-by: Jos Collin <jcollin@redhat.com>
Patrick Donnelly [Sat, 11 Aug 2018 23:33:47 +0000 (16:33 -0700)]
Merge PR #23500 into master
* refs/pull/23500/head:
debian: mark python-ceph-argparse Arch = all
rpm: package cephfs-shell for fedora
tools/cephfs,deb: package cephfs-shell
cmake: install script and egg-info files of cephfs-shell
tools/cephfs: add setup.py for cephfs-shell
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Sat, 11 Aug 2018 23:10:28 +0000 (16:10 -0700)]
mds: mark beacons as high priority
The mons already mark beacon replies as high priority (via default mon message
priority). We should expect that the mons handle our beacons at the same
priority so the MDS doesn't wrongly get marked laggy.
Fixes: http://tracker.ceph.com/issues/26899 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Yan, Zheng [Fri, 10 Aug 2018 09:34:01 +0000 (17:34 +0800)]
mds: use fast dispatch to handle MDSBeacon
Current mds handles MDSBeacon messages through MDSRank::ms_dispatch().
MDSRank::ms_dispatch() locks mds_lock at the very beginning. This means
that long running task (such as processing finished contexts) can delay
MMDSBeacon. Using fast dispatch for MDSBeacon can avoid this issue.
Sage Weil [Fri, 10 Aug 2018 21:32:34 +0000 (16:32 -0500)]
Merge PR #22838 into master
* refs/pull/22838/head:
kv/RocksDBStore: Handle nullptr if clock cache is chosen.
kv/rocksdb_cache/BinnedLRUCache: Don't promote data to the high pri pool.
src/kv: Initial import of a custom RocksDB cache.
src/rocksdb: switch back to master branch.
Jos Collin [Fri, 13 Jul 2018 13:38:37 +0000 (19:08 +0530)]
cephfs: InodeStat versioning
* Use the new feature bit CEPHFS_FEATURE_REPLY_ENCODING for encoding.
* encode/decode in the new format.
* The old format is maintained for backward compatibility.
Fixes: http://tracker.ceph.com/issues/24444 Signed-off-by: Jos Collin <jcollin@redhat.com>
Bryan Stillwell [Wed, 8 Aug 2018 21:24:53 +0000 (15:24 -0600)]
doc: Fix a couple typos and improve diagram formatting
I found a couple misspelled words in the crush-map documentation, and also
tweaked the formatting of the CRUSH hierarchy diagram to center some of the
entries.
Kefu Chai [Wed, 8 Aug 2018 16:50:00 +0000 (00:50 +0800)]
tools/cephfs,deb: package cephfs-shell
change `#!/usr/bin/env python3` to `#!/usr/bin/python3` as per
https://www.debian.org/doc/packaging-manuals/python-policy/programs.html#interpreter-directive
Kefu Chai [Thu, 9 Aug 2018 06:07:10 +0000 (14:07 +0800)]
cmake: fix build WITH_SYSTEM_BOOST=ON
FindBoost.cmake from upstream cmake now finds python libraries like
find_package(Boost 1.67 python36)
and it export targets like Boost::python36
but we are still linking against Boost::python, so to be compatible
with FindBoost.cmake, we need to update BuildBoost.cmake and
mgr/CMakeLists.txt accordingly. in other words, to export
Boost::python36 and to link Boost::python36.
Kefu Chai [Thu, 9 Aug 2018 02:17:32 +0000 (10:17 +0800)]
cmake: install script and egg-info files of cephfs-shell
egg-info offers requires.txt, which is parsed by dh_python3 to prepare
the dependencies for the cephfs-shell packaging. also, the meta-info
in the .egg allows user to use eggs if they wish. see
https://wiki.debian.org/Python/FAQ#How_should_we_package_Python_eggs.3F
.
osd/OSD.cc: force updating heartbeat peers periodically
Because the cluster topology may change (e.g., because we add some
new racks, hosts and disks) and we want the existing osds are then
aware of new incoming osds, guaranteeing osds are always trying to
do heartbeat as wide as possible(e.g., across all racks, hosts etc.).
We currently remove a rule without adjusting the **rules** array
because we don't recycle the rule_no and hence there can be
holes in the "rules" array.
xie xingguo [Wed, 8 Aug 2018 09:52:29 +0000 (17:52 +0800)]
osd/OSD.cc: choose heartbeat peers by failure domain
By default, monitor requires at least two valid failure votes/reports from
different hosts to mark an OSD down, which turns out to be impossible sometimes
for a replicated-pool of size of 2 in those clusters made up of hosts
with contiguous labeled OSDs.
This patch instead does a breadth-first search based on the highest level of
failure domain at cluster-wide, to try to make heartbeat peers can cover all failure domains
whenever possible, which can hopefully help accelerating osd failure detection
in the above case..