Sage Weil [Tue, 19 Jun 2018 18:51:09 +0000 (13:51 -0500)]
Merge PR #22596 into master
* refs/pull/22596/head:
os/bluestore: use vector instead of set for zombies
os/bluestore: reuse zombie OpSequencers by collection id
qa/suites/rados/objecstore/backends/objectstore: capture coredumps
os/bluestore: more debug output
os/bluestore: print cnode from _open_collections
os/bluestore: print cnode on fsck
qa/suites/rados/objecstore: preserve data dir for ceph_test_objecstore
Sage Weil [Mon, 18 Jun 2018 12:32:08 +0000 (07:32 -0500)]
os/bluestore: reuse zombie OpSequencers by collection id
We can get a sequence that deletes and then recreates a collection where
the transaction removing the collection is delayed (due to pending IO on
its sequencer) but colleciton create is not (new sequencer).
Avoid any such reordering by recycling the old collection's sequencer if
the zombie_osr has not been reaped yet.
Fixes: http://tracker.ceph.com/issues/24550 Signed-off-by: Sage Weil <sage@redhat.com>
Broken by 434589a3206aafe94de5a3b95b67eddb2cfc3bdb. The add_ceph_unittest
helper does more than just add this to the list of tests--it also adjusts
linking and build options.
Sage Weil [Fri, 15 Jun 2018 19:05:20 +0000 (14:05 -0500)]
mon: destroy-new -> purge-new
What we actually want is a purge, not a destroy. Destroy leaves the OSD
ID in used and allows it to be recreated. What ceph-volume wants is to
purge all trace of the failed OSD setup.
Volker Theile [Tue, 5 Jun 2018 10:03:16 +0000 (12:03 +0200)]
mgr/dashboard: Get user ID via RGW Admin Ops API.
The RGW API user id (set via 'ceph dashboard set-rgw-api-user-id <xxx>') is optional but the user ID is required internally for some situations. Because of that the user ID is requested via a RGW Admin Ops API call if it is not configured via CLI.
Patrick Donnelly [Fri, 15 Jun 2018 14:05:40 +0000 (07:05 -0700)]
Merge PR #22464 into master
* refs/pull/22464/head:
mds: print dir decay counters on hit
DecayCounter: removed unused velocity
DecayCounter: remove unnecessary delta member
mds: use monotonic time for DecayCounter
This commit has a few side-effects:
- Decaying the DecayCounter is more accurate, we do not need to "skip" decaying
the counter if it's been less than a second since the last decay. The time
delta is now at the granularity of the monotonic clock.
- Any check of the DecayCounter results in updating the current value, even
const checks.
- DecayRate is now established when the DecayCounter is created. There's no longer
a need to pass the DecayRate whenever interacting with the counter.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Erwan Velu [Fri, 15 Jun 2018 13:57:05 +0000 (15:57 +0200)]
ctest: Removing unittest_alloc_bench
unittest_alloc_bench is very cpu consuming and can take up to 20mn to
run. As per a discussion with the original author of this code, this
test is only about testing the performance while there is no validation
code in it.
To save time from people doing make check often or even the CI itself,
removing this test will save a lot of time while not reducing the test
coverage.
This commit is only removing the test from the make check but keep the
binary being compiled for those who want to run it manually.
Kefu Chai [Fri, 15 Jun 2018 05:54:31 +0000 (13:54 +0800)]
cmake: add WITH_GTEST_PARALLEL option
and remove src/test/gtest-parallel submodule, because gtest-parallel is
only useful for running tests. and not all end-users are interested in
running test not to mention running them in parallel. so, to avoid
including gtest-parallel scripts in the dist tarball. it'd be better to
make it optional, and an external project.
Erwan Velu [Thu, 14 Jun 2018 13:24:07 +0000 (15:24 +0200)]
src/test: Using gtest-parallel to speedup unittests
Unittests are run sequentially and could take a long while to run.
This commit is about using gtest-parallel on some of them which are
known to be very slow due to this sequentiality.
To enable the parallel features, the 'parallel' argument just have to be
added to the add_ceph_unittest() call like in :
-add_ceph_unittest(unittest_throttle)
+add_ceph_unittest(unittest_throttle parallel)
This commit impact the following tests :
Test name Before After (in seconds)
unittest_erasure_code_shec_all: 212 43
unittest_throttle 15 5
unittest_crush 9 6
unittest_rbd_mirror 79 21
Total 315 75
This commit saves 240 seconds (4 minutes) per build.
Note it exist several other long tests but can't be parallelized since
there is explicit dependencies in the order to run the subtests.
Those stay sequential.
Patrick Nawracay [Tue, 15 May 2018 07:47:19 +0000 (09:47 +0200)]
mgr/dashboard/backend: Enable get/set of cluster-wide OSD settings
Add ability to list, set and unset cluster-wide OSD flags.
Flags can be listed and changed through the `/api/osd/flags` API
resource. By using a GET request, the list is retrieved. By using a PUT
request, the flags are updated (all at once). Flags not contained in the
data of the PUT are removed, additional once are added. Note that the
PUT requests require a JSON body with the data contained as value of the
'flags' key like so:
{"flags": ["flag1", "flag2", ...]}
Fixes: http://tracker.ceph.com/issues/24056 Signed-off-by: Patrick Nawracay <pnawracay@suse.com>
Lenz Grimmer [Thu, 14 Jun 2018 13:56:39 +0000 (15:56 +0200)]
Merge pull request #22303 from ricardoasmarques/wip-help-menu
mgr/dashboard: Add help menu entry
Reviewed-by: Laura Paduano <lpaduano@suse.com> Reviewed-by: Patrick Nawracay <pnawracay@suse.com> Reviewed-by: Ricardo Dias <rdias@suse.com> Reviewed-by: Stephan Müller <smueller@suse.com> Reviewed-by: Volker Theile <vtheile@suse.com>
Patrick Nawracay [Fri, 18 May 2018 07:38:20 +0000 (09:38 +0200)]
mgr/dashboard: Add token authentication to Grafana proxy
Enables token authentication for the Grafana proxy as additional option
to username/password authentication. The authentication method has to be
set, too.
Erwan Velu [Wed, 13 Jun 2018 12:48:35 +0000 (14:48 +0200)]
tests: Protecting rados bench against endless loop
If the cluster dies during the rados bench, the maximum running time is
no more considered and all emitted aios are pending.
rados bench never quits and the global testing timeout (3600 sec : 1
hour) have to be reach to get a failure.
This situation is dramatic for a background test or a CI run as it locks
the whole job for too long for an event that will never occurs.
This ideal solution would be having 'rados bench' considering a failure
once the timeout is reached when aios are pending.
A possible workaround here is to put use the system command 'timeout'
before calling rados bench and fail if rados didn't completed on time.
To avoid side effects, this patch is doubling rados timeout. If rados
didn't completed after twice the expected time, it have to fail to avoid
locking the whole testing job.
Please find below the way it worked on a real test case.
We can see no IO after t>2 but despite timeout=4 the bench continue.
Thanks to this patch, the bench is stopped at t=8 and return 1.
Erwan Velu [Wed, 13 Jun 2018 12:25:04 +0000 (14:25 +0200)]
qa/standalone/ceph-helpers.sh: Defining custom timeout for wait_for_clean()
The wait_for_clean() is using the default timeout aka 300sec = 5mn.
wait_for_clean() is trying to find a clean status within that timeout
_or_ reset its counter if any progress got made in between loops.
In a case where the cluster is sane, the recovery should be made in
shorter than 5mn but it the cluster died, waiting for 5mn for nothing is
unefficient.
This patch is about defining a custom timeout for a wait_for_clean() not
to wait much more that 1m30 (90sec). If no progress is made in that
period, there is very few chance this will read the a valid state
anyhow.
Kefu Chai [Thu, 14 Jun 2018 01:32:08 +0000 (09:32 +0800)]
cmake: update BuildSPDK for spdk-18.05
in spdk v18.05, libuuid is linked by libspdk_util.a, in which,
it is used by lib/util/uuid.c. and libspdk_vol.a uses the wrapper
function exposed by libspdk_util.a, so update the CMakefile script to
reflect the change.