Sage Weil [Sat, 28 Oct 2017 20:37:03 +0000 (15:37 -0500)]
include/interval_set: tolerate maps that invalidate iterator on change
These changes picked out of the diff between the original
btree_interval_set.h and interval_set.h (sadly I had it rolled into the
initial commit so it was tedious to identify these).
inverval_set: optimize subset_of with sequential search
Optimize subset_of to use sequential search when it
performs better than the lower_bound method, for set
size ratios smaller than 10. This is analogous to
intersection_of behavior since commit 825470fcf919.
The subset_of method can be used in some cases as a
less-expensive alternative to the intersection_of
method, since subset_of can return early if any element
of the smaller set is not contained in the larger set,
and intersection_of has the added burden of storing
the intersecting elements.
Zac Medico [Thu, 31 Aug 2017 03:59:32 +0000 (20:59 -0700)]
interval_set: optimize intersection_of
Iterate over all elements of the smaller set, and use find_inc to
locate elements from the larger set in logarithmic time. This greatly
improves performance when one set is much larger than the other:
2 +-+--+----+----+----+----+----+----+----+----+--+-+
P +* +
E |* |
R 1.8 +* +
F | * |
O | * |
R 1.6 + * +
M | * |
A | * |
N 1.4 + * +
C | * |
E | * |
1.2 + * +
R | * |
A | * |
T 1 + *** +
I | ****** |
O + ***********************************
0.8 +-+--+----+----+----+----+----+----+----+----+--+-+
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
SET SIZE RATIO
The above plot compares performance of the new intersection_size_asym
function to the existing intersection_of function. The performance of
intersection_size_asym gets worse as the set size ratio approaches 1.
For set size ratios where the performance ratio is greater than 1, the
performance of intersection_size_asym is superior. Therefore, this
patch only uses intersection_size_asym when the set size ratio is less
than or equal to 0.1 (code uses the reciprocal which is 10).
The plot was generated using benchmark results produced by the
following program:
int main()
{
const int interval_count = 100000;
const int interval_distance = 4;
const int interval_size = 2;
const int sample_count = 8;
const int max_offset = interval_count * interval_distance;
interval_set<int> a, b, intersection;
for (int i = 0; i < max_offset; i+=interval_distance) {
a.insert(i, interval_size);
}
for (int m = 1; m < 100; m++) {
float ratio = 1 / float(m);
for (int i = 0; i < max_offset; i+=interval_distance*m) {
b.insert(i, interval_size);
}
struct timeb start, end;
int ms = 0;
for (int i = 0; i < sample_count; i++) {
ftime(&start);
intersection.intersection_of(a, b);
ftime(&end);
ms += (int) (1000.0 * (end.time - start.time)
+ (end.millitm - start.millitm));
intersection.clear();
}
b.clear();
std::cout << ratio << "\t" << ms << std::endl << std::flush;
}
}
* refs/pull/21267/head:
discard the mdsload clear after prep_rebalance in case we want to export it for debugging
make sure that MDBalancer uses heartbeat info from the same epoch
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Zheng Yan <zyan@redhat.com>
Jianyu Li [Wed, 15 Nov 2017 07:43:44 +0000 (15:43 +0800)]
make sure that MDBalancer uses heartbeat info from the same epoch
Currently mds saves the heartbeat info from others in mds_load, once the mds_load.size( ) equals mds number, it considers that have received all heartbeats info and start the rebalance work. However, after prep_rebalance returns, it doesn't clear the mds_load immediately, but wait until receives the next round hearbeat from mds0. If there are mutiple mds(e.g. greater than 2), there is a chance for one mds receiving the first next round heartbeat other than mds0 due to the network delay.
Sage Weil [Wed, 4 Apr 2018 02:24:07 +0000 (21:24 -0500)]
osd: do not release_reserved_pushes when requeuing
Back in 3cc48278bf0ee5c9535d04b60a661f988c50063b we refactoring the sharded
wq and incorrectly included code that would release_reserved_pushes for
items that were queued and deferred and then woken and put back in the
queue. The reserved_pushes are for recovery ops that are in flight in the
queue, which includes the priority queue *and* the waiting_for_pg; the code
we replaced would release these only when dequeueing an item (or items) for
processing (or discard).
In master, this code is fixed as part of the peering fast dispatch and
OSDShard refactor.
Li Wang [Thu, 7 Dec 2017 14:03:45 +0000 (22:03 +0800)]
rbd-nbd: fix ebusy when do map
When doing rbd-nbd map, if the Ceph service is not available,
the codes will wait on rados.connect(), unless killing the process.
In that case, the close_nbd logic is skipped with NBD_CLEAR_SOCK ioctl
not called. On the CentOS 7 kernel, it leaves nbd->file not cleared, which
causes the subsequent map requests return EBUSY, this patch fixes it
by connecting Ceph first prior to calling NBD_SET_SOCK ioctl
Fixes: http://tracker.ceph.com/issues/23528 Signed-off-by: Li Wang <laurence.liwang@gmail.com>
(cherry picked from commit ab77dcc0170c0d63795fe0d50427cda630bfd593)
Jan Fajerski [Sat, 17 Feb 2018 11:07:46 +0000 (12:07 +0100)]
pybind/mgr/prometheus: add Metrics class to manage Metric instances
The central change of this commit is that per-daemon metrics are now
managed by first appending the metric (using Metrics.append) to a
staging area. Then the metrics for specific paths (metric names) are
overwritten by the staged metrics (by calling Metrics.reset). This gets
rid of metrics from daemon that are no longer in the cluster. I.e. when
ceph no longer reports metrics for one OSD daemon (because it was
removed from the cluster) the prometheus module will no longer export
metrics for that daemon.
Christopher Blum [Fri, 23 Feb 2018 17:48:49 +0000 (18:48 +0100)]
pybind/mgr/prometheus: don't crash on OSDs without metadata
Fix issue where the ceph_exporter crashes after a Ceph upgrade with a broken OSD - that OSD was never online with Luminous and thus we have no metadata for it
Boris Ranto [Fri, 16 Feb 2018 17:45:58 +0000 (18:45 +0100)]
mgr/prometheus: Fix pg_* counts
Currently, the pg_* counts are not computed properly. We split the
current state by '+' sign but do not add the pg count to the already
found pg count. Instead, we overwrite any existing pg count with the new
count. This patch fixes it by adding all the pg counts together for all
the states.
It also introduces a new pg_total metric for pg_total that shows the
total count of PGs.
common: FreeBSD wants the correct struct selection for ipv6
Lets see if this also works for Linux
Fixes: http://tracker.ceph.com/issues/21813 Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
(cherry picked from commit 3806d8ec7c145d0239c94fc5b601d460b36757a5)
Mykola Golub [Thu, 29 Mar 2018 11:10:58 +0000 (14:10 +0300)]
qa/suites/rbd: set qemu task time_wait param
so workloads qemu_dynamic_features.sh and qemu_rebuild_object_map.sh,
which check if qemu is finished with periodicity 60 sec, have enough
time to detect this before the rbd image is removed.
Mykola Golub [Thu, 29 Mar 2018 11:06:13 +0000 (14:06 +0300)]
qa/tasks/qemu: add a parameter to wait for workloads detect qemu finished
In the case when a workload needs to detect qemu finished by running a
check with a periodicity of N sec it needs to set time_wait to 2 * N
in order to avoid races on finish.
--use-wheel was deprecated in favor of --only-binary in pip v7.0.0. and
--use-wheel was removed in a recent release of pip. but some packages
are source packages, so we cannot simply replace use-wheel with
only-binary. so a simpler approach is to drop --use-wheel option, as pip
respects --find-links, and will find the required package from the
wheelhouse.
Conflicts:
src/ceph-detect-init/CMakeLists.txt
src/ceph-disk/CMakeLists.txt: trivial resolution
src/pybind/mgr/dashboard/CMakeLists.txt: dashboard2 is not
in luminous, so drop this change.