osd/ECBackend: preserve requests for other objects when sending extra reads
When multiple objects are in flight for the same ReadOp, swap() on the
map<hobject_t, read_request_t> would remove requests for all objects.
We just want to replace the requests for the single object we're
dealing with in send_all_remaining_reads().
This prevents crashing trying to look up rop.to_read[hoid] when another
object in the same ReadOp gets an EIO and tries to send more requests.
Test this by using osd-recovery-max-single-start to bundle multiple
reads into one ReadOp. Save and restore CEPH_ARGS so custom settings
are reset for each test.
Fixes: http://tracker.ceph.com/issues/23195 (the 2nd crash there) Signed-off-by: Josh Durgin <jdurgin@redhat.com>
osd/ECBackend: recover from EIO based on the minimum data necessary
Discount shards that already returned EIO, and use minimum_to_decode()
to request just what is necessary to recover or read the originally
requested extents of the object.
osd/ECBackend: only check required shards when finishing recovery reads
1235810c2ad08ccb7ef5946686eb2b85798f5bca allowed recovery to use
multiple passes of reads to handle EIO, but the end condition for
checking whether we finished reading requires the full data to be
decodable (this is what get_want_to_read_shards returns).
This is just a loss of efficiency normally, since when there is only
one object the subsequent read works, and grabs all the data
necessary. The crash comes from having multiple objects in the same
ReadOp - in this case the sequence of events is:
- start recovery of two objects (osd_recovery_max_single_start > 1)
- read object a shard 3
- read object b shard 3
- fail minimum_to_decode because shard 3 can't reconstruct all of object a
- re-read all of object a, marking more reads in progress
- fail minimum_to_decode because shard 3 can't reconstruct all of object b
- skip re-reading object because there are now reads in progress
- finish reading k shards of object a
- still fail minimum_to_decode for object b, so no extra data was read
- send_all_remaining_reads tries to lookup object b in ReadOp object
- crash dereferencing to_read[object b], since this was cleared after handling the original object b read reply
This patch fixes the immediate inefficiency and crash by only checking
for the missing shards that were requested, rather than the entire
object, for recovery reads.
* refs/pull/16779/head:
mds: cleanup MDCache::open_snaprealms()
mds: make sure snaptable version > 0
mds: don't consider CEPH_INO_LOST_AND_FOUND as base inode
mds: replace MAX() with std::max()
tools/cephfs: make cephfs-data-scan create snaprealm for base inodes
qa/cephfs: don't run TestSnapshots.test_kill_mdstable on kclient
qa/cephfs: adjust check of 'cephfs-table-tool all show snap' output
mds: don't warn unconnected snaplrealms in cluster log
mds: update CInode/CDentry's first according to global snapshot seq
qa/cephfs: add tests for snapclient cache
qa/cephfs: add tests for snaptable transaction
mds: add asok command that dumps cached snap infos
qa/cephfs: add tests for multimds snapshot
client: don't mark snap directory complete when its dirstat is empty
qa/workunits/snaps: add snaprealm split test
mds: make sure mds has uptodate mdsmap before checking 'allows_snaps'
client: fix incorrect snaprealm when adding caps
qa/workunits/snaps: add hardlink snapshot test
mds: add incompat feature and bump protocol for snapshot changes
mds: detach inode with single hardlink from global snaprealm
mds: record hardlink snaps in inode's snaprealm
mds: attach inode with multiple hardlinks to dummy global snaprealm
mds: cleanup rename code
mds: ensure xlocker has uptodate lock state
mds: simplify SnapRealm::build_snap_{set,trace}
mds: record global last_created/last_destroyed in snaptable
mds: pop projected snaprealm before inode's parent changes
mds: keep isnap lock in sync state
mds: handle mksnap vs resolve_snapname race
mds: cleanup snaprealm past parents open check
mds: rollback snaprealms when rolling back slave request
mds: send updated snaprealms along with slave requests
mds: explict notification for snap update
mds: send snap related messages centrally during mds recovery
mds: synchronize snaptable caches when mds recovers
mds: introduce MDCache::maybe_finish_slave_resolve()
mds: notify all mds about prepared snaptable update
mds: record snaps in old snaprealm when moving inode into new snaprealm
mds: cache snaptable in snapclient
mds: recover snaptable client when mds enters resolve state
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
--use-wheel was deprecated in favor of --only-binary in pip v7.0.0. and
--use-wheel was removed in a recent release of pip. but some packages
are source packages, so we cannot simply replace use-wheel with
only-binary. so a simpler approach is to drop --use-wheel option, as pip
respects --find-links, and will find the required package from the
wheelhouse.
Patrick Donnelly [Sat, 31 Mar 2018 05:25:10 +0000 (22:25 -0700)]
Merge PR #20132 into master
* refs/pull/20132/head:
qa/cephfs: update TestDamage for open file table
mds: allow storing open file table in multiple omaps
mds: differentiate Anchor types to clarify purpose
mds: add perf counter for 'open ino' operation
mds: protect open file table against partial omap update
mds: add dirfrags whose child inodes have caps to open file table
mds: don't try prefetching destroyed inodes
mds: don't try opening inodes that haven't been created
mds: don't re-requeue open files to head of log
mds: use open file table to speed up mds recovery
mds: introduce open file table
mds: track how many clients/mds want caps for each inode
mds: cleanup MDCache::opening_inodes access
mds: cleanup CInode/CDir states definition
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Li Wang [Thu, 7 Dec 2017 14:03:45 +0000 (22:03 +0800)]
rbd-nbd: fix ebusy when do map
When doing rbd-nbd map, if the Ceph service is not available,
the codes will wait on rados.connect(), unless killing the process.
In that case, the close_nbd logic is skipped with NBD_CLEAR_SOCK ioctl
not called. On the CentOS 7 kernel, it leaves nbd->file not cleared, which
causes the subsequent map requests return EBUSY, this patch fixes it
by connecting Ceph first prior to calling NBD_SET_SOCK ioctl
Fixes: http://tracker.ceph.com/issues/23528 Signed-off-by: Li Wang <laurence.liwang@gmail.com>
Mykola Golub [Thu, 29 Mar 2018 11:10:58 +0000 (14:10 +0300)]
qa/suites/rbd: set qemu task time_wait param
so workloads qemu_dynamic_features.sh and qemu_rebuild_object_map.sh,
which check if qemu is finished with periodicity 60 sec, have enough
time to detect this before the rbd image is removed.
Mykola Golub [Thu, 29 Mar 2018 11:06:13 +0000 (14:06 +0300)]
qa/tasks/qemu: add a parameter to wait for workloads detect qemu finished
In the case when a workload needs to detect qemu finished by running a
check with a periodicity of N sec it needs to set time_wait to 2 * N
in order to avoid races on finish.
Kefu Chai [Thu, 29 Mar 2018 11:33:44 +0000 (19:33 +0800)]
msg/async/dpdk: remove xsky copyright and LGPL copying
it's not legitimate to relicense source code without original author's
permission. some of the source files are copied from seastar project
with minor changes. so xsky cannot claim the copyright of them.