Sage Weil [Tue, 14 Aug 2018 13:58:31 +0000 (08:58 -0500)]
ceph_test_objectstore: queue split on parent pg sequencer
This matches the OSD.
It also ensure that we drain transactions that precede the split, which
will include any finish_write calls that might otherwise attach the
Buffer to the wrong cache shard: _txc_finish() calls finish_write and
passes the cache shard without the cache shard lock held, but may block
waiting for split_cache() which then changes the destination collection's
shard. Once it gets the lock and proceeds it would operate on the wrong
cache shard, leading to a failed assert later when the sharedblob is
trimmed.
Fixes: http://tracker.ceph.com/issues/24439 Signed-off-by: Sage Weil <sage@redhat.com>
fix several bugs that may fail unittest_seastar_messenger:
1. keep holding the connection reference until closed.
2. zero h.reply before start to use it.
3. set server-side features correctly.
4. add missing return to link write with flush.
5. remove an empty lambda operation.
Sage Weil [Mon, 13 Aug 2018 18:00:32 +0000 (13:00 -0500)]
Merge PR #23517 into master
* refs/pull/23517/head:
os/bluestore: uniform loging format for bluefs_extent_t.
os/bluestore_tool: handle fsck's returned status properly.
os/bluestore_tool: fix multiple extents handling in BlueFS::log_dump
Sage Weil [Mon, 13 Aug 2018 17:10:31 +0000 (12:10 -0500)]
os/bluestore: fix split vs finish_write race
In _tcx_finish(), we were looking at the right Cache for the collection,
and then calling finish_write with that Cache and taking the lock. This
could race with a split_cache() such that after we got the lock the
collection was not on a different cache. This would in turn lead to a
failed assertion later on in _rm_buffer when the sharedblob was trimmed.
Fixes: http://tracker.ceph.com/issues/24439 Signed-off-by: Sage Weil <sage@redhat.com>
Patrick Donnelly [Mon, 13 Aug 2018 01:29:34 +0000 (18:29 -0700)]
Merge PR #23444 into master
* refs/pull/23444/head:
cephfs-shell: avoid sys.argv modification
tools/cephfs-shell: added support for batch file processing and to execute commands from arguments.
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Sat, 11 Aug 2018 17:40:03 +0000 (10:40 -0700)]
MDSMonitor: note beacons and cluster changes at low dbg level
These messages are essential for diagnosing the reason why the MDSMonitor is
kicking MDSs out of the MDSMap. They should also be rare enough that the extra
verbosity is not noticable.
Fixes: http://tracker.ceph.com/issues/26898 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Generally the slow warnings we get are just over the threshold. These warnings
are related to deploying multiple Ceph daemons side-by-side. Let's see how we
do with two minutes.
Ignoring the warnings entirely is unsatisfactory as they serve as a useful
canary in the coal mine when you see warnings for ops > some unreasonably large
amount of time.
Fixes: http://tracker.ceph.com/issues/26900 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Sage Weil [Thu, 9 Aug 2018 13:33:42 +0000 (08:33 -0500)]
osd: vary tick interval +/- 5% to avoid scrub livelocks
If you have two pgs that need to scrub on two OSDs, each the primary
for one pg and the replica for the other, you can end up in a livelock:
- both osds locally reserve a scrub slot
- both osds send a scrub schedule request
- both scrub requests are rejected
- both osds wait exactly 1 second
- repeat
Seems a bit unlikely, but I've seen test cases where it goes on more an
hour.
Fixes: http://tracker.ceph.com/issues/26890 Signed-off-by: Sage Weil <sage@redhat.com>
Jos Collin [Wed, 1 Aug 2018 14:58:21 +0000 (20:28 +0530)]
cephfs: LeaseStat versioning
* Use the new feature bit CEPHFS_FEATURE_REPLY_ENCODING for encoding.
* encode/decode in the new format.
* Dropped LeaseStat::encode().
* The old format is maintained for backward compatibility.
* Created Locker::encode_lease() for encoding and dropped the duplicates.
Fixes: http://tracker.ceph.com/issues/24444 Signed-off-by: Jos Collin <jcollin@redhat.com>
Jos Collin [Wed, 1 Aug 2018 04:18:48 +0000 (09:48 +0530)]
cephfs: DirStat versioning
* Use the new feature bit CEPHFS_FEATURE_REPLY_ENCODING for encoding.
* encode/decode in the new format.
* Dropped DirStat::encode().
* The old format is maintained for backward compatibility.
* Created CDir::encode_dirstat() for encoding and dropped the duplicates.
Fixes: http://tracker.ceph.com/issues/24444 Signed-off-by: Jos Collin <jcollin@redhat.com>
Patrick Donnelly [Sat, 11 Aug 2018 23:33:47 +0000 (16:33 -0700)]
Merge PR #23500 into master
* refs/pull/23500/head:
debian: mark python-ceph-argparse Arch = all
rpm: package cephfs-shell for fedora
tools/cephfs,deb: package cephfs-shell
cmake: install script and egg-info files of cephfs-shell
tools/cephfs: add setup.py for cephfs-shell
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Sat, 11 Aug 2018 23:10:28 +0000 (16:10 -0700)]
mds: mark beacons as high priority
The mons already mark beacon replies as high priority (via default mon message
priority). We should expect that the mons handle our beacons at the same
priority so the MDS doesn't wrongly get marked laggy.
Fixes: http://tracker.ceph.com/issues/26899 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Yan, Zheng [Fri, 10 Aug 2018 09:34:01 +0000 (17:34 +0800)]
mds: use fast dispatch to handle MDSBeacon
Current mds handles MDSBeacon messages through MDSRank::ms_dispatch().
MDSRank::ms_dispatch() locks mds_lock at the very beginning. This means
that long running task (such as processing finished contexts) can delay
MMDSBeacon. Using fast dispatch for MDSBeacon can avoid this issue.
Sage Weil [Fri, 10 Aug 2018 21:32:34 +0000 (16:32 -0500)]
Merge PR #22838 into master
* refs/pull/22838/head:
kv/RocksDBStore: Handle nullptr if clock cache is chosen.
kv/rocksdb_cache/BinnedLRUCache: Don't promote data to the high pri pool.
src/kv: Initial import of a custom RocksDB cache.
src/rocksdb: switch back to master branch.
Jos Collin [Fri, 13 Jul 2018 13:38:37 +0000 (19:08 +0530)]
cephfs: InodeStat versioning
* Use the new feature bit CEPHFS_FEATURE_REPLY_ENCODING for encoding.
* encode/decode in the new format.
* The old format is maintained for backward compatibility.
Fixes: http://tracker.ceph.com/issues/24444 Signed-off-by: Jos Collin <jcollin@redhat.com>
Igor Fedotov [Thu, 28 Jun 2018 14:05:59 +0000 (17:05 +0300)]
os/bluestore_tool: fix multiple extents handling in BlueFS::log_dump
Without the fix the op crashed when trying to read from the second
extent of the log file which was absent in superblock but actually existed.
Regular replay properly retrieves it - which wasn't the case
for log_dump due to 'noop' mode of operation.