Andrew Schoen [Mon, 8 Oct 2018 13:57:07 +0000 (09:57 -0400)]
ceph-volume: filter devices used by journals/block.db
If after filterering of data/block devices there are only
one device left it can not be used if it is an SSD and
has been used previously as a journal or block.db
Andrew Schoen [Thu, 27 Sep 2018 20:22:17 +0000 (15:22 -0500)]
ceph-volume: pick strategy for batch with only the unused devices
This will pick a strategy, filter out any devices already been used by
ceph and then pick a strategy again. If the strategy has changed the
call should error, if the strategy is the same proceed. If there are no
unused devices then the command is a noop.
rgw: copy actual stats from the source shards during reshard
Currently we don't copy the actual_stats field during reshard, which makes
resharded buckets show a size_utilized as 0, which further has the problem that
a subsequent object removal would subtract the object size from the 0 size
utilized showing up large uint64_t values. Copy the size_actual from the source
object in both cls and in reshard_process. This will fix the new buckets,
existing buckets will still have to go through a bucket check --fix for their
stats to be corrected.
* refs/pull/24403/head:
qa: add timeout to cleaning up workunit sandbox
qa: cleanup workunit dir for each unit
qa: add timeout to kclient umount
qa: do not cleanup sandbox on error
qa: use default timeout in fs workunits
qa: use sudo to cleanup workspace
qa: cleanup parallel execution of fsstress
qa/workunit: implement cleanup option
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Sage Weil [Thu, 9 Aug 2018 13:33:42 +0000 (08:33 -0500)]
osd: vary tick interval +/- 5% to avoid scrub livelocks
If you have two pgs that need to scrub on two OSDs, each the primary
for one pg and the replica for the other, you can end up in a livelock:
- both osds locally reserve a scrub slot
- both osds send a scrub schedule request
- both scrub requests are rejected
- both osds wait exactly 1 second
- repeat
Seems a bit unlikely, but I've seen test cases where it goes on more an
hour.
Conflicts:
src/osd/OSD.cc
- luminous does not have src/include/random.h; use #include <random>
instead, seeding with whoami so each OSD gets a different series
of pseudo-random numbers
Jason Dillaman [Mon, 24 Sep 2018 18:45:09 +0000 (14:45 -0400)]
librbd: do not invalidate object map if update races with copyup
The copyup state machine needs to iterate over all object maps to update
the existence for the object. If an snapshot is being removed concurrently,
it's possible to invalidate the object map for the image.
Fixes: http://tracker.ceph.com/issues/24516 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 5a1cb469879157297ab456261f9335d8b855684f)
Conflicts:
src/librbd/ObjectMap.cc: trivial resolution
src/librbd/ObjectMap.h: trivial resolution
src/librbd/deep_copy/ObjectCopyRequest.cc: moved to rbd-mirror image sync
src/librbd/io/CopyupRequest.cc: trivial resolution
src/test/librbd/deep_copy/test_mock_ObjectCopyRequest.cc: moved to rbd-mirror image sync
src/test/librbd/test_mock_ObjectMap.cc: trivial resolution
Jason Dillaman [Fri, 14 Sep 2018 15:46:13 +0000 (11:46 -0400)]
librbd: do not invalidate object map when attempting to delete non-existent snapshot
If duplicate snapshot remove requests are received by the lock owner from a peer
client, the first request will remove the object map. If the second request
arrives while the first is in-progress, it will again attempt to remove the
object map but fail to load it since it's already been deleted. This incorrectly
results in the next object map being flagged as invalid.
Fixes: http://tracker.ceph.com/issues/24516 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 0a31c55ea83d85da88c7586c9a8fa8d6ec6618a7)
Chang Liu [Mon, 5 Mar 2018 07:46:43 +0000 (15:46 +0800)]
rgw: url_encode key name and instance in es sync module
Some objects whose name contains space or other special chars
can't be synced to ES correctly. we need to do url_encode when
we send a HTTP request to ES.
msg: ceph_abort() when there are enough accepter errors in msg server
In some extrem cases(we have met one in our production cluster), when Accepter thread break out , new client can not connect to the osd. Because the former heartbeat connections are already connected, other osd can not detect failure then notify monitor to mark the failed osd down.
In the patch, we there are abnormal communication errors ,we just ceph_abort so that osd can go down fastly and other osds can notify monitor to mark the failed osd down. Signed-off-by: penglaiyxy@gmail.com <penglaiyxy@gmail.com>
(cherry picked from commit 00e0ab407b2e9659d9121be1217e95c8117c411e)
Conflicts:
src/common/legacy_config_opts.h : Resolved for ms_max_accept_failures
src/common/options.cc : Resolved for ms_max_accept_failures
src/msg/async/AsyncMessenger.cc : Resolved in accept
src/msg/simple/Accepter.cc : Resolved in entry
Patrick Donnelly [Fri, 17 Aug 2018 22:03:56 +0000 (15:03 -0700)]
mds: use monotonic waits in Beacon
This guarantees that the sender thread cannot be disrupted by system clock
changes. This commit also simplifies the sender thread by manually managing the
thread and avoiding unnecessary context creation.
Fixes: http://tracker.ceph.com/issues/26962 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit a5fc29b95281c6ca58c9177c665c379846beb4b3)
Conflicts:
src/mds/Beacon.cc
- g_conf->foo instead of g_conf()->foo
- boost::string_view instead of std::string_view
- always specify template type std::unique_lock<std::mutex>
src/mds/Beacon.h
- time::min() instead of clock::zero()
- always specify template type std::unique_lock<std::mutex>
- std::chrono::seconds instead of "1s" in std::chrono_literals namespace
(which is a C++14ism)
Conflicts:
src/librbd/ExclusiveLock.cc: trivial resolution
src/librbd/image/RemoveRequest.cc: trivial resolution
src/test/rbd_mirror/image_deleter/test_mock_SnapshotPurgeRequest.cc: DNE
src/tools/rbd_mirror/image_deleter/SnapshotPurgeRequest.cc: DNE
src/tools/rbd_mirror/image_deleter/SnapshotPurgeRequest.h: DNE
src/librbd/DeepCopyRequest.cc: (see below)
src/librbd/deep_copy/ObjectCopyRequest.cc: (see below)
src/librbd/deep_copy/ObjectCopyRequest.h: (see below)
src/librbd/deep_copy/SetHeadRequest.cc: (see below)
src/librbd/deep_copy/SetHeadRequest.h: (see below)
src/librbd/deep_copy/SnapshotCopyRequest.cc: (see below)
src/librbd/deep_copy/SnapshotCopyRequest.h: (see below)
src/librbd/deep_copy/SnapshotCreateRequest.cc: (see below)
src/librbd/deep_copy/SnapshotCreateRequest.h: (see below)
src/test/librbd/deep_copy/test_mock_ObjectCopyRequest.cc: (see below)
src/test/librbd/deep_copy/test_mock_SetHeadRequest.cc: (see below)
src/test/librbd/deep_copy/test_mock_SnapshotCopyRequest.cc: (see below)
src/test/librbd/deep_copy/test_mock_SnapshotCreateRequest.cc: (see below)
src/test/librbd/test_mock_DeepCopyRequest.cc
- deep-copy related files were originally derived from rbd-mirror
equivalents. Similar modifications where made to the associated
rbd-mirror files.
Jason Dillaman [Thu, 6 Sep 2018 13:44:59 +0000 (09:44 -0400)]
librbd: watcher should internally track blacklisted state
Since it will periodically attempt to re-acquire the watch,
it will know when the RADOS client has been blacklisted and
when the blacklist has been removed.
Douglas Fuller [Fri, 29 Jun 2018 17:55:31 +0000 (13:55 -0400)]
mon/OSDMonitor: Warn if missing expected_num_objects
When creating a pool on filestore, warn if the user appears to be
creating a pool to store a large number of objects but omitted the
expected_num_objects parameter. Create the pool anyway.
Fixes: http://tracker.ceph.com/issues/24687 Signed-off-by: Douglas Fuller <dfuller@redhat.com>
(cherry picked from commit 69fb2293c4d38012e7c4781aaa39a47596125bbb)
Douglas Fuller [Thu, 28 Jun 2018 15:21:38 +0000 (11:21 -0400)]
mon/OSDMonitor: Warn when expected_num_objects will have no effect
The expected_num_objects argument to ceph osd pool create is
only effective on filestore pools when merging is disabled
(filestore_merge_threshold < 0). Warn and disallow pool creation
in this situation.
Conflicts:
src/librbd/image/CreateRequest.cc
src/librbd/image/CreateRequest.h
- luminous uses m_ioctx where master has m_io_ctx
- luminous IoCtx does not have get_namespace/set_namespace
- in luminuos CreateRequest state machine is a little different than in
the master (see the diagram in CreateRequest.h). In luminous the next
state after VALIDATE_DATA_POOL is CREATE_ID_OBJECT, so call
create_id_object() instead of add_image_to_directory().
Conflicts:
src/mon/MDSMonitor.cc
- luminous has "ignore:" instead of "done:" in
MDSMonitor::preprocess_offload_targets() and this part of the backport
was already done unintentionally in 7bbc0a7b1670d99e42149fd3a25c24600314ca94
The AsyncConnection keeps local (member variable) bufferlists of incoming
messages before they're placed into the Message's front/data/middle buffers.
Previously these were reset only when a new Message is being received, which
means in steady state we store a full Message for every Connection even if
it's inactive!
Instead we obviously want to drop our local references to Message state
once it's been dispatched, so that it can go away.