Sage Weil [Tue, 7 May 2019 17:48:16 +0000 (12:48 -0500)]
Merge PR #27929 into master
* refs/pull/27929/head:
os/bluestore: be verbose about objects that existing on rmcoll
osd/PrimaryLogPG: disallow ops on objects with an empty name
osd/PG: fix cleanup of pgmeta-like objects on PG deletion
Yuri Weinstein [Mon, 6 May 2019 15:55:27 +0000 (08:55 -0700)]
qa/test: reduce over all number of runs
We kill thousands of queued jobs every week, so why do we even schedule them ?
Another point was that we run numerous of tests as part of PRs testing on released versions anyway, so it's duplicating effort
Kefu Chai [Tue, 7 May 2019 12:57:17 +0000 (20:57 +0800)]
seastar: pick up changes for better performance
to be specific, a78fb44c96e2912c6f39b2151f94a0bb2b5796a6 helps to
improve the performance of future implementation -- with this change
future can always reference its local state without checking its `_promise`
and dereferencing it.
Kefu Chai [Tue, 7 May 2019 07:06:42 +0000 (15:06 +0800)]
crimson/osd: shutdown services in the right order
we should stop config service *after* osd is stopped, as osd depends on
a working and alive config subsystem when stopping itself. for instance,
the destructor of AuthRegistry unregisters itself from the ObserverMgr,
which is in turn a member variable of ConfigProxy, so if ConfigProxy is
destroyed before we destroy mon::Client, we will have a segfault with
following backtrace
ObserverMgr<ceph::md_config_obs_impl<ceph::common::ConfigProxy>
>::remove_observer(ceph::md_config_obs_impl<ceph::common::ConfigProxy>*)
at /var/ssd/ceph/build/../src/common/config_obs_mgr.h:78
AuthRegistry::~AuthRegistry() at
/var/ssd/ceph/build/../src/crimson/common/config_proxy.h:101
(inlined by) AuthRegistry::~AuthRegistry() at
/var/ssd/ceph/build/../src/auth/AuthRegistry.cc:28
ceph::mon::Client::~Client() at
/var/ssd/ceph/build/../src/crimson/mon/MonClient.h:44
ceph::mon::Client::~Client() at
/var/ssd/ceph/build/../src/crimson/mon/MonClient.h:44
OSD::~OSD() at /usr/include/c++/9/bits/unique_ptr.h:81
vstart.sh: enable creating multiple OSDs backed by spdk backend
Currently vstart.sh only support deploying one OSD based on NVMe SSD.
The following two cases will cause errors:
1.There are 2 more NVMe SSDs from the same vendor on the machine
2.Trying to deploy 2 more OSDs if we only get 1 pci_id available
Add the support for allowing deploying multiple OSDs on a machine with
multiple NVME SSDs.
Changcheng Liu [Mon, 6 May 2019 02:29:11 +0000 (10:29 +0800)]
vstart.sh: correct ceph-run path
ceph-run is in the same directory as vstart.sh. It's often that
vstart.sh is run under build directory. Without giving the right
directory, ceph-run file can't be found.
Signed-off-by: Changcheng Liu <changcheng.liu@intel.com>
Sage Weil [Thu, 2 May 2019 16:39:31 +0000 (11:39 -0500)]
os/bluestore: be verbose about objects that existing on rmcoll
This is always a bug (OSD doesn't try to remove a collection unless it
thinks it is empty), and not seeing it at default debug levels makes it
hard to track down.
Sage Weil [Thu, 2 May 2019 16:28:14 +0000 (11:28 -0500)]
osd/PG: fix cleanup of pgmeta-like objects on PG deletion
If an object has an empty 'name' field, it "looks" like a pgmeta object,
and the PG cleanup code was skipping it. However, we were letting these
objects get created.
Fix by only skipping *our* pgmeta object. If there are other pgmeta-like
objects in the PG collection, clean them up.
Fixes: https://tracker.ceph.com/issues/38724 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 2 May 2019 19:34:53 +0000 (14:34 -0500)]
osd: clean up osdmap sharing
- always use the Session::last_sent_epoch value, both for clients and osds
- get rid of the stl map<> of peer epochs
- consolidate all map sharing into a single maybe_share_map()
- optionally take a lower bound on the peer's epoch, for use when it is
available (e.g., when we are handling a message that specifies what
epoch the peer had when it sent the message)
- use const OSDMapRef& where possible
- drop osd->is_active() check, since we no longer have any dependency on
OSD[Service] state beyond our osdmap
The old callchain was convoluted, partly because it was needlessly
separated into several layers of helpers, and partly because the tracking
for clients and peer OSDs was totally different.
Kefu Chai [Thu, 2 May 2019 15:48:56 +0000 (23:48 +0800)]
test/common/test_util: skip it if /etc/os-release does not exist
some GNU/Linux distros do not ship this file, and we should not fail the
test on them.
inspired by
http://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/patches/ceph-skip-collect-sys-info-test.patch?id=48f19e60c4677e392ee2c23f28098cfcaf9d1710
Jason Dillaman [Tue, 30 Apr 2019 16:56:08 +0000 (12:56 -0400)]
librbd: allow AioCompletion objects to be blocked
This will be used when user-provided memory is wrapped into a
ceph::buffer::raw pointer to prevent its release prior to the
drop of its last reference internally.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 6 Mar 2018 19:23:47 +0000 (14:23 -0500)]
librbd: switch to lock-free queue for event poll IO interface
'perf' shows several percent of CPU being wasted on lock contention
in the event poll interface. The 'fio' RBD engine uses this poll
IO interface by default when available.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Fri, 26 Apr 2019 18:24:15 +0000 (14:24 -0400)]
librbd: avoid using lock within AIO completion where possible
'perf' shows several percent of CPU is being utilized handling the
heavyweight locking semantics of AIO completion. With these changes,
the lock contention disappears.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Mon, 29 Apr 2019 14:13:21 +0000 (10:13 -0400)]
librbd: simplify IO flush handling through AsyncOperation
Allow ImageFlushRequest to directly execute a flush call through
AsyncOperation. This will allow the flush to be directly linked
to its preceeding IOs.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
PeeringState: don't zero backfill target num_bytes on activation
834d3c19a774f1cc93903447d91d182776e12d18 preserves num_bytes
on backfill targets in order to estimate space required to complete
backill. However, from activation until backfill reservation,
info.stats.stats.sum.num_bytes is persisted to disk as 0 messing
up future intervals. Instead, preserve it in the info sent during
recovery and leave it alone in RequestBackfillPrio.
Additionally, it's possible for backfill to be preempted between
last_backfill=MAX being sent to the replica and Backfilled being
queued occuring. In that case, the stats get on reservation
and the replica ends up with invalid stats.