Jason Dillaman [Tue, 14 Jul 2020 22:38:17 +0000 (18:38 -0400)]
librbd: utilize neorados to issue async blacklist request
The librados API does not currently offer an async 'mon_command'
API method. Instead of adding one just to support this effort,
re-use the neorados API to issue an asynchronous 'mon_command'
for blacklisting a client.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 14 Jul 2020 21:38:56 +0000 (17:38 -0400)]
librbd: managed_lock::BreakRequest needs a reference to AsioEngine
The current usage of the asio::ContextWQ to similate an asynchronous blocklist
API call is resulting in deadlock in the rbd-mirror HA tests when multiple
blocklists are occurring concurrently. The next commit will switch to use the
neorados async MON command API (since librados doesn't offer one).
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Mon, 13 Jul 2020 17:45:44 +0000 (13:45 -0400)]
librbd: fix race condition with AIO completion callbacks
Now that librbd utilizes multiple threads for the IO path, it's
possible for a race condition to occur if a client app is waiting
on a completion to fire and the actual invokation of the
completion.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Fri, 10 Jul 2020 16:46:27 +0000 (12:46 -0400)]
librbd: allocate the asio strands directly on the heap
This will assist with potential race condition debugging since the
stand pointer will be invalidated by the time the strand has been
destructed and shut down.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Fri, 10 Jul 2020 15:24:08 +0000 (11:24 -0400)]
librbd: ensure all asio completions are complete at ImageCtx destruction
With multiple threads of execution possible, we need to ensure that
all completions have fired prior to the destruction of the AsioEngine.
We also need to ensure that the AsioEngine is destoyed outside the
context of its owned stands.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Thu, 9 Jul 2020 21:04:50 +0000 (17:04 -0400)]
librbd: switch the IO path to utilize the neorados API
IO operations to the cluster are now dispatched via the neorados
API which allows multiple threads to be utilized for processing
incoming and outgoing IO.
This also involves switching from a map for tracking sparse extents
to a vector of pairs since that matches the new API for sparse
read operations.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 16 Jun 2020 16:59:11 +0000 (12:59 -0400)]
librbd: switch external API callbacks to use dedicated asio strand
This ensures that the API callers will not receive concurrent
callbacks and allows internal AioCompletion users to be able to
use all available asio dispatch threads.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 7 Jul 2020 18:37:54 +0000 (14:37 -0400)]
librbd: integrate neorados into ImageCtx
Also create an up-to-date data_io_context that mimics the function
of ImageCtx::data_ctx. The data_io_context will eventually be passed
via the IO dispatch specs to replace the passing of the snapshot
id vectors.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Thu, 9 Jul 2020 15:58:31 +0000 (11:58 -0400)]
test/librados_test_stub: pass read snap id to read operation hooks
The neorados API does not require the creation of heavy IoCtx-like
objects with static read snap_ids pre-assigned. Therefore, we will
need to pass the read snap_id to all affected functions and adjust
all dependent unittests to expect a new parameter.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Wed, 8 Jul 2020 18:28:55 +0000 (14:28 -0400)]
neorados: support blkin trace passing on execute calls
librbd passes blkin traces from the user API down through to
Objecter and back. Add these missing hooks to the neorados API
since they weren't included in the intial revision.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Fri, 10 Jul 2020 14:16:42 +0000 (10:16 -0400)]
common/Timer: fixed invalid read from deleted object
The std::conditional_variable will keep the provided reference and
repeatedly dereference it even after the lock was dropped and
re-acquired. This can lead to an invalid read if the associated
schedule entry has been removed while waiting.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
when doing full-system shutdown, monitors may go down before OSDs in which case
the osd shutdown hangs waiting for monc to successfully send the markmedown msg
to monitors
This commit changes the (not quite) sentence "Once
you have a deployed a Ceph Storage Cluster, you may
begin operating your cluster." to "Once you have
deployed a Ceph Storage Cluster, you may begin
operating your cluster."
Jason Dillaman [Mon, 13 Jul 2020 20:11:06 +0000 (16:11 -0400)]
librbd: fix parent cache races and error handling
If the plugin fails to connect to the daemon at start-up it will
crash the process due to a resource deadlock exception being
thrown as the client is destroyed. Additionally, librbd will support
concurrent IO thread processing in the future so the client needs
to be protected by a lock.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sebastian Wagner [Wed, 15 Jul 2020 12:42:54 +0000 (14:42 +0200)]
Merge pull request #35862 from adk3798/cephadm_45724
mgr/cephadm: check-host should not fail as hard using fqdn
Reviewed-by: Michael Fritch <mfritch@suse.com> Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com> Reviewed-by: Stephan Müller <smueller@suse.com>
Jason Dillaman [Mon, 13 Jul 2020 20:08:51 +0000 (16:08 -0400)]
librbd: move ContextWQ::queue definition to header
The parent cache plugin uses the ContextWQ::queue method and therefore
requires its definition to properly dynamically link into the librbd
process. If future plugins require additional functions this can be
reconsidered by using interfaces, static libraries, or moving generic
functions to libcommon.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 14 Jul 2020 22:49:30 +0000 (18:49 -0400)]
crush/CrushWrapper: rebuild reverse maps after rebuilding crush map
The Objecter will crash when localized reads are enabled and two threads
attempt to rebuild the (invalidated) reverse maps concurrently. This
should address the issue for the Objecter use-case without the need to
add additional locking.
Fixes: https://tracker.ceph.com/issues/44311 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This commit breaks one of the long sentences into
three shorter sentences, and adds a parenthetical
comment walking the reader through a series of commands,
explaining what each command does and why only the last
of the commands ends up having a lasting effect on the Ceph
environment.
Or Ozeri [Tue, 14 Jul 2020 11:28:12 +0000 (14:28 +0300)]
osdc/Striper: add get_file_offset function
This commit adds a get_file_offset translating (object_no, object_off) -> file_offset.
This is useful for encryption object dispatch layer in librbd
to comply with disk-encryption standards that require the file offset as input.
Patrick Donnelly [Tue, 14 Jul 2020 02:53:29 +0000 (19:53 -0700)]
Merge PR #35755 into master
* refs/pull/35755/head:
mgr/volumes: Deprecate protect/unprotect CLI calls for subvolume snapshots
Reviewed-by: Ramana Raja <rraja@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com> Reviewed-by: Victoria Martinez de la Cruz <vkmc@redhat.com> Reviewed-by: Goutham Pacha Ravi <gouthamr@redhat.com>
Patrick Donnelly [Mon, 13 Jul 2020 18:17:44 +0000 (11:17 -0700)]
Merge PR #34246 into master
* refs/pull/34246/head:
mds: add request to batch_op before taking auth pins and locks
mds: move MDRequestImpl::batch_reqs into Batch_Getattr_Lookup
mds: track which map batch_op is in-use in MDRequest
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Mon, 13 Jul 2020 18:16:28 +0000 (11:16 -0700)]
Merge PR #34785 into master
* refs/pull/34785/head:
ceph-fuse: show fuse helper options for libfuse >= 3.0
ceph-fuse: add splice read/write support to reduce the memory copy
ceph-fuse: add connection args parsing support for libfuse > 3.0
ceph-fuse: switch to fuse_reply_iov to reduce the memory copy
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>
split mempool allocation for bluestore_cache_other
While doing root cause analysis it bluestore_cache_other gives a bit of
a crude estimate, something more helpful would be to have it split into
the following fields: