Jason Dillaman [Fri, 16 Oct 2020 15:25:39 +0000 (11:25 -0400)]
journal: possible race condition between flush and append callback
When notifying the journal recorder of an overflow or if the object
close request has completed due to no more in-flight IO, it was
possible for a race between a flush request and the processing of
an append completion to attempt to kick off duplicate notifications.
Since the overflowed and closed callbacks are properly protected from
duplicates, use a counter instead of a boolean to track possible
in-flight handler callbacks.
Fixes: https://tracker.ceph.com/issues/47880 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Kefu Chai [Fri, 16 Oct 2020 14:07:50 +0000 (22:07 +0800)]
crimson/common: schedule action only if the future is not available
otherwise we could call do_until() recursively if we have other tasks
which need to prempt the reactor and current future's state is actually
always available.
Kefu Chai [Fri, 16 Oct 2020 06:11:52 +0000 (14:11 +0800)]
crimson/common: do not take from a future twice
before this change, in our specialization of seastar::do_until(),
we access `f` after calling `f.get()`, this is not correct. as `f.get()`
actually moves `f._state` away and detaches the associated promise if any.
so we cannot call `f._then()` anymore after calling `f.get()`. as
`f._then()` schedules `f` by detaching the future from promise and
attaching the scheduled task to the promise. but `future_base::detach_promise()`
does not check `_promise` before accessing it, hence the segfault.
after this change, the order of the checks is rearranged so that
`f.get()` is called at the end. and also use `f.get0()` to be more
explicit, as we are accessing the only element of the returned
value.
Adam C. Emerson [Thu, 15 Oct 2020 16:03:13 +0000 (12:03 -0400)]
Merge pull request #37660 from adamemerson/wip-datalog-fix
cls/fifo: Switch use CLS_ERR for errors
rgw/fifo: Fix a few missed return value assignments
rgw/fifo: Add some error logging
rgw/fifo: Catch two instances journaling a new part
rgw/fifo: Use unique_ptr and explicit release for callbacks
Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
Yan, Zheng [Fri, 7 Aug 2020 15:58:19 +0000 (23:58 +0800)]
mds: distribute dirfrags for ephemeral distributed directory
Instead of distribute individual dir inodes inside the ephemeral
distributed dir. Distributing dirfrags can limit number of subtrees
created by the ephemeral dist pin.
This patch also unifies codes that handle export pin and ephemeral pin.
Jason Dillaman [Mon, 5 Oct 2020 18:04:14 +0000 (14:04 -0400)]
librbd: support preprocessing source object data prior to deep-copy
Let object dispatch layers potentially mutate the data read from the
source image prior to issuing the actual deep-copy operations against
the destination image.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The write-ops now only stores write vs zero ops and the type of
zero operation is delayed until the actual op is sent. This will
make the state machine compatible with the copyup process hook.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Thu, 24 Sep 2020 19:15:23 +0000 (15:15 -0400)]
librbd: support preprocessing parent data prior to copyup
Let object dispatch layers potentially mutate the copyup data read
from the parent prior to issuing the actual copyup operation. This
can allow for a layer like the crypto layer to re-encrypt the parent
image data using the child image's encryption keys, for example.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Wed, 23 Sep 2020 19:57:20 +0000 (15:57 -0400)]
librbd: new hook for pre-processing copyup data
This will permit the crypto layer to properly encrypt and potentially
align the sparse copyup data prior to it being written. It passes
potentially multiple sets of data in one pass to permit the deep-copy
state machine to utilize the same API and allow the crypto layer to
potentially handle layered alignment issues.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Wed, 30 Sep 2020 01:24:23 +0000 (21:24 -0400)]
librbd: rename SnapshotExtent to SparseExtent
The processing of copyup needs to be able to denote data extents that
are potentially zeroed or included in the associated bufferlist. By
renaming the type, it can be re-used for this second purpose.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 29 Sep 2020 00:35:38 +0000 (20:35 -0400)]
librbd: copyup state machine should always issue a sparse-read
When reading from the parent, always keep the data in a sparse
extent-map format. The forthcoming copyup preprocessing hook will
want to pass a set of sparse image-extent data.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 29 Sep 2020 00:04:48 +0000 (20:04 -0400)]
librbd: switch remaining uses of ExtentMap to Extents
The neorados API already requires the vector-based approach vs
the map-based approach. Now the remaining sparse-read functionality
has been switched to use the consistent approach.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Fri, 25 Sep 2020 14:40:32 +0000 (10:40 -0400)]
librbd: deep-copy should update object-map before writing to object
For the original use-case of RBD mirroring it was (maybe) more
acceptable to write to the object before updating the object map
because an interrupted sync will be retried. However, when using
the deep-copy object copy state machine as part of copyup, it's
more likely that the object-map has the potential to become
out-of-sync with reality if it's updated after the object is
written.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 13 Oct 2020 01:34:25 +0000 (21:34 -0400)]
librbd: update AioCompletion return value before evaluating pending count
If the pending count is decremented before the return value is updated,
there is a possibility of two ASIO threads concurrently decrementing the
pending count down from 2 -> 1 -> 0. In the second thread (the one that
performs the final decrement from 1 -> 0), it can finalize the completion
before the first thread has had a chance to update the return value.
Fixes: https://tracker.ceph.com/issues/47847 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
mds: support sending empty perf metrics to ceph-manager
Right now, there are no per-mds metrics that are tracked and
sent by mds. However, such metrics will get added soon. So,
send empty performance metrics to ceph-manager for now.
Changcheng Liu [Wed, 23 Sep 2020 07:39:47 +0000 (15:39 +0800)]
mailmap: update Intel employee mail/org
1. "changcheng.liu@aliyun.com" need be classified into intel until now.
This reverts part of commit: df07e9f3
2. add "Yuan Lu <yuan.y.lu@intel.com>" in mailmap
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>