Samuel Just [Thu, 27 Aug 2020 07:14:30 +0000 (00:14 -0700)]
crimson/os/seastore: add space accounting to segment_cleaner and wire in
Adds support for space accounting to SegmentCleaner and wires into
Journal, Cache, and tests.
SegmentCleaner has two tracking implementations, SpaceTrackerSimple
and SpaceTrackerDetailed. SpaceTrackerSimple simply keeps a count
of live bytes and is intended to be the normal implementation.
SpaceTrackerDetailed maintains a bitmap and is simply useful
for debugging unit tests. It may be removed in the future.
This way, we can do a bulk scan of the store without building up
an unbounded amount of state in Transaction::read_set. Note that
such transactions will not be snapshot isolated.
Samuel Just [Wed, 26 Aug 2020 21:49:34 +0000 (14:49 -0700)]
crimson/.../lba_manager/btree: hold a reference to parent until added to cache
Currently, we need to rely on the Transaction::read_set to ensure cache
residence of a leaf node until TransactionManager adds the logical
extent to the cache. The next patch, however, will introduce a lazy
flag for Transaction's to enable doing snapshot inconsistent scans
without populating the read_set, so we'll want this to work without
it.
Jason Dillaman [Fri, 16 Oct 2020 15:25:39 +0000 (11:25 -0400)]
journal: possible race condition between flush and append callback
When notifying the journal recorder of an overflow or if the object
close request has completed due to no more in-flight IO, it was
possible for a race between a flush request and the processing of
an append completion to attempt to kick off duplicate notifications.
Since the overflowed and closed callbacks are properly protected from
duplicates, use a counter instead of a boolean to track possible
in-flight handler callbacks.
Fixes: https://tracker.ceph.com/issues/47880 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Kefu Chai [Fri, 16 Oct 2020 14:07:50 +0000 (22:07 +0800)]
crimson/common: schedule action only if the future is not available
otherwise we could call do_until() recursively if we have other tasks
which need to prempt the reactor and current future's state is actually
always available.
Kefu Chai [Fri, 16 Oct 2020 06:11:52 +0000 (14:11 +0800)]
crimson/common: do not take from a future twice
before this change, in our specialization of seastar::do_until(),
we access `f` after calling `f.get()`, this is not correct. as `f.get()`
actually moves `f._state` away and detaches the associated promise if any.
so we cannot call `f._then()` anymore after calling `f.get()`. as
`f._then()` schedules `f` by detaching the future from promise and
attaching the scheduled task to the promise. but `future_base::detach_promise()`
does not check `_promise` before accessing it, hence the segfault.
after this change, the order of the checks is rearranged so that
`f.get()` is called at the end. and also use `f.get0()` to be more
explicit, as we are accessing the only element of the returned
value.
Adam C. Emerson [Thu, 15 Oct 2020 16:03:13 +0000 (12:03 -0400)]
Merge pull request #37660 from adamemerson/wip-datalog-fix
cls/fifo: Switch use CLS_ERR for errors
rgw/fifo: Fix a few missed return value assignments
rgw/fifo: Add some error logging
rgw/fifo: Catch two instances journaling a new part
rgw/fifo: Use unique_ptr and explicit release for callbacks
Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
Matthew Oliver [Mon, 10 Aug 2020 04:46:21 +0000 (04:46 +0000)]
pick_address: Warn and continue when you find at least 1 IPv4 or IPv6 address
Currently if specify a single public or cluster network, yet have both
`ms bind ipv4` and `ms bind ipv6` set daemons crash when they can't find
both IPs from the same network:
unable to find any IPv4 address in networks '2001:db8:11d::/120' interfaces ''
And rightly so, of course it can't find an IPv4 network in an IPv6
network.
This patch, adds a new helper method, networks_address_family_coverage,
that takes the list of networks and returns a bitmap of address families
supported.
We then check to see if we have enough networks defined and if you don't
it'll warn and then continue.
Also update the network-config-ref to mention having to define both
address family addresses for cluster and or public networks.
As well as a warning about `ms bind ipv4` being enabled by default which
is easy to miss, there by enabling dual stack when you may only be
expect single stack IPv6.
Thee is also a drive by to fix a `note` that wan't being displayed due
to missing RST syntax.
Signed-off-by: Matthew Oliver <moliver@suse.com> Fixes: https://tracker.ceph.com/issues/46845 Fixes: https://tracker.ceph.com/issues/39711
Neha Ojha [Tue, 13 Oct 2020 15:52:20 +0000 (15:52 +0000)]
qa/suites/crimson-rados: add .qa helper
Fixes:
OSError: /home/nojha/src/github.com_ceph_ceph_master/qa/suites/crimson-rados/basic/centos_latest.yaml
does not exist (abs /home/nojha/src/github.com_ceph_ceph_master/qa/suites/crimson-rados/basic/centos_latest.yaml)
Yan, Zheng [Fri, 7 Aug 2020 15:58:19 +0000 (23:58 +0800)]
mds: distribute dirfrags for ephemeral distributed directory
Instead of distribute individual dir inodes inside the ephemeral
distributed dir. Distributing dirfrags can limit number of subtrees
created by the ephemeral dist pin.
This patch also unifies codes that handle export pin and ephemeral pin.
Jason Dillaman [Mon, 5 Oct 2020 18:04:14 +0000 (14:04 -0400)]
librbd: support preprocessing source object data prior to deep-copy
Let object dispatch layers potentially mutate the data read from the
source image prior to issuing the actual deep-copy operations against
the destination image.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The write-ops now only stores write vs zero ops and the type of
zero operation is delayed until the actual op is sent. This will
make the state machine compatible with the copyup process hook.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Thu, 24 Sep 2020 19:15:23 +0000 (15:15 -0400)]
librbd: support preprocessing parent data prior to copyup
Let object dispatch layers potentially mutate the copyup data read
from the parent prior to issuing the actual copyup operation. This
can allow for a layer like the crypto layer to re-encrypt the parent
image data using the child image's encryption keys, for example.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Wed, 23 Sep 2020 19:57:20 +0000 (15:57 -0400)]
librbd: new hook for pre-processing copyup data
This will permit the crypto layer to properly encrypt and potentially
align the sparse copyup data prior to it being written. It passes
potentially multiple sets of data in one pass to permit the deep-copy
state machine to utilize the same API and allow the crypto layer to
potentially handle layered alignment issues.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Wed, 30 Sep 2020 01:24:23 +0000 (21:24 -0400)]
librbd: rename SnapshotExtent to SparseExtent
The processing of copyup needs to be able to denote data extents that
are potentially zeroed or included in the associated bufferlist. By
renaming the type, it can be re-used for this second purpose.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 29 Sep 2020 00:35:38 +0000 (20:35 -0400)]
librbd: copyup state machine should always issue a sparse-read
When reading from the parent, always keep the data in a sparse
extent-map format. The forthcoming copyup preprocessing hook will
want to pass a set of sparse image-extent data.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 29 Sep 2020 00:04:48 +0000 (20:04 -0400)]
librbd: switch remaining uses of ExtentMap to Extents
The neorados API already requires the vector-based approach vs
the map-based approach. Now the remaining sparse-read functionality
has been switched to use the consistent approach.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Fri, 25 Sep 2020 14:40:32 +0000 (10:40 -0400)]
librbd: deep-copy should update object-map before writing to object
For the original use-case of RBD mirroring it was (maybe) more
acceptable to write to the object before updating the object map
because an interrupted sync will be retried. However, when using
the deep-copy object copy state machine as part of copyup, it's
more likely that the object-map has the potential to become
out-of-sync with reality if it's updated after the object is
written.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>