Sage Weil [Wed, 19 Sep 2018 12:33:32 +0000 (07:33 -0500)]
Merge PR #24109 into master
* refs/pull/24109/head:
doc: update docs for device management
mgr: make devicehealth always-on
mgr/devicehealth: do not create metrics pool on get-device-metrics
mgr/devicehealth: converge OPTIONS and DEFAULTS
mgr/devicehealth: squelch health warnings for unused devices
mgr/devicehealth: show-health-metrics -> get-health-metrics
Sage Weil [Tue, 18 Sep 2018 12:10:21 +0000 (07:10 -0500)]
Merge PR #24057 into master
* refs/pull/24057/head:
src/common: add a unit test (bufferlist.sha1())
osd, src/common: return sha1 value if zero-length buffer.
src/common/buffer.cc: remove unnecessary copy in sha1()
Reviewed-by: Kefu Chai <kchai@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
Boris Ranto [Fri, 14 Sep 2018 10:03:23 +0000 (12:03 +0200)]
mgr/dashboard: Do not require cert for http
The ceph dashboard currently requires a SSL certificate even if it is
not running in the SSL mode since it is always querying for the
certificate file/key pair.
This patch fixes the behaviour by querying for the certificate file/key
only if it is running in the SSL mode.
Fixes: http://tracker.ceph.com/issues/36069 Signed-off-by: Boris Ranto <branto@redhat.com>
common: make the get_data() of buffer_raw interface final.
This is just to ensure the just dropped buffer::raw_pipe
was the solely user of this facility. After successful
validation, we can drop `virtual` on the method entirely.
This module is written by Rick Chen <rick.chen@prophetstor.com> and
provides both a built-in local predictor and a cloud mode that queries
a cloud service (provided by ProphetStor) to predict device failures.
Signed-off-by: Rick Chen <rick.chen@prophetstor.com> Signed-off-by: Sage Weil <sage@redhat.com>
mgr/dashboard: Refactoring of `DeletionModalComponent`
- Simpler variable names:
Examples:
- `actionDescription` and `itemDescription` instead of `metaType`
- `bodyTemplate` instead of `description`
- `validationPattern` instead of `pattern`
Some of these variable names have been generalized to ease the
unification/generalization of dialog components:
- `submitAction` instead of `deletionMethod`
- Removed unique `setUp` method.
Benefits:
- Creation of the component is done as intended by the developers of
the `ngx-boostrap` package and as expected by developers which use
the package. The `setUp` method does not have to be called anymore
on the `DeletionModalComponent` exclusively but instead the
component is instantiated as all other modals. Property assignment
on the instantiated object isn't handled by the `setUp` method
anymore but by the `modalService`.
- With the removal of the `setUp` method, some tests could be
removed as well.
- No need to pass the reference of the created modal to the modal
manually.
Preserved:
- The provided check within the `setUp` method, which checked if the
component had been correctly instantiated, has been moved to the
`ngOnInit` method of the component.
Signed-off-by: Patrick Nawracay <pnawracay@suse.com>
make sure we only build with the higher version of gperftools on
distros where both 2.4 and 2.6.1 are packaged. see
https://git.centos.org/summary/rpms!gperftools.git . at the time of
writing, gperftools 2.6.1 is packaged for CentOS/RHEL 7, if gperftools
(>= 2.4) is required by Ceph, and user already has this version
installed, when new Ceph packages are installed, the updated gperftools
2.6.1 version won't be installed as a dependency. when launching
Ceph compiled with tcmalloc enabled, we will have
symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm
so, by bumping up the required version of gperftools, the updated
gperftools will be installed.
see https://software.opensuse.org/package/gperftools, openSUSE/SLE offer
2.5. so they are safe at this moment.
Jason Dillaman [Fri, 14 Sep 2018 15:46:13 +0000 (11:46 -0400)]
librbd: do not invalidate object map when attempting to delete non-existent snapshot
If duplicate snapshot remove requests are received by the lock owner from a peer
client, the first request will remove the object map. If the second request
arrives while the first is in-progress, it will again attempt to remove the
object map but fail to load it since it's already been deleted. This incorrectly
results in the next object map being flagged as invalid.
Fixes: http://tracker.ceph.com/issues/24516 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This brings down the static size of the memory used by the logging infrastructure:
If we used 1024, we'd have 1088*10000 = 10880000 = 10MB in use by the ring
buffer and 2*1088*100 = 2*108800 = 2*106KB for the m_new and m_flush
vectors.
In my testing, 1024 covers most log entries.
Note, I've kept 4096 for the StackStringStream via MutableEntry as these are
already allocated on the heap and cached in a thread local vector. Generally
there should only be about a dozen of these allocated so it's worth keeping a
larger buffer.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Thu, 23 Aug 2018 17:40:20 +0000 (10:40 -0700)]
log: avoid heap allocations for most log entries
Each log Entry now exists on the stack and uses a large (4k) buffer for its log
stream. This Entry is std::move'd to the queues (std::vector and
boost::circular_buffer) in the Log, involving only memory copies in the general
case. There are two memory copies (std::move) for any given Entry, once in
Log::submit_entry and again in Log::_flush
In practice, this eliminates 100% of allocations outside of startup
allocations
I've run a simple experiment with the MDS that copies /usr/bin to CephFS. I got
measurements for the number of allocations from the heap profiler and the
profile of CPU usage in the MDS.
$ bin/unittest_log
[==========] Running 15 tests from 1 test case
[----------] Global test environment set-up
[----------] 15 tests from Log
[ RUN ] Log.Simple
[ OK ] Log.Simple (0 ms)
[ RUN ] Log.ReuseBad
[ OK ] Log.ReuseBad (1 ms)
[ RUN ] Log.ManyNoGather
[ OK ] Log.ManyNoGather (0 ms)
[ RUN ] Log.ManyGatherLog
[ OK ] Log.ManyGatherLog (12 ms)
[ RUN ] Log.ManyGatherLogStringAssign
[ OK ] Log.ManyGatherLogStringAssign (27 ms)
[ RUN ] Log.ManyGatherLogStringAssignWithReserve
[ OK ] Log.ManyGatherLogStringAssignWithReserve (27 ms)
[ RUN ] Log.ManyGatherLogPrebuf
[ OK ] Log.ManyGatherLogPrebuf (15 ms)
[ RUN ] Log.ManyGatherLogPrebufOverflow
[ OK ] Log.ManyGatherLogPrebufOverflow (15 ms)
[ RUN ] Log.ManyGather
[ OK ] Log.ManyGather (8 ms)
[ RUN ] Log.InternalSegv
[WARNING] /home/pdonnell/cephfs-shell/src/googletest/googletest/src/gtest-death-test.cc:836:: Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test detected 3 threads
[ OK ] Log.InternalSegv (8 ms)
[ RUN ] Log.LargeLog
[ OK ] Log.LargeLog (43 ms)
[ RUN ] Log.TimeSwitch
[ OK ] Log.TimeSwitch (1 ms)
[ RUN ] Log.TimeFormat
[ OK ] Log.TimeFormat (0 ms)
[ RUN ] Log.Speed_gather
[ OK ] Log.Speed_gather (1779 ms)
[ RUN ] Log.Speed_nogather
[ OK ] Log.Speed_nogather (64 ms)
[----------] 15 tests from Log (2000 ms total)
[----------] Global test environment tear-down
[==========] 15 tests from 1 test case ran. (2000 ms total)
[ PASSED ] 15 tests
** After Patch **
The StackStreamBuf uses 4k for its default buffer. This appears to be more than
reasonable for preventing allocations for logging
$ bin/unittest_log
[==========] Running 13 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 13 tests from Log
[ RUN ] Log.Simple
[ OK ] Log.Simple (1 ms)
[ RUN ] Log.ReuseBad
[ OK ] Log.ReuseBad (0 ms)
[ RUN ] Log.ManyNoGather
[ OK ] Log.ManyNoGather (0 ms)
[ RUN ] Log.ManyGatherLog
[ OK ] Log.ManyGatherLog (83 ms)
[ RUN ] Log.ManyGatherLogStringAssign
[ OK ] Log.ManyGatherLogStringAssign (79 ms)
[ RUN ] Log.ManyGatherLogStackSpillover
[ OK ] Log.ManyGatherLogStackSpillover (81 ms)
[ RUN ] Log.ManyGather
[ OK ] Log.ManyGather (80 ms)
[ RUN ] Log.InternalSegv
[WARNING] /home/pdonnell/ceph/src/googletest/googletest/src/gtest-death-test.cc:836:: Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test detected 3 threads.
[ OK ] Log.InternalSegv (7 ms)
[ RUN ] Log.LargeLog
[ OK ] Log.LargeLog (55 ms)
[ RUN ] Log.TimeSwitch
[ OK ] Log.TimeSwitch (4 ms)
[ RUN ] Log.TimeFormat
[ OK ] Log.TimeFormat (1 ms)
[ RUN ] Log.Speed_gather
[ OK ] Log.Speed_gather (1441 ms)
[ RUN ] Log.Speed_nogather
[ OK ] Log.Speed_nogather (63 ms)
[----------] 13 tests from Log (1895 ms total)
[----------] Global test environment tear-down
[==========] 13 tests from 1 test case ran. (1895 ms total)
[ PASSED ] 13 tests.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
crimson/net: don't return null from Connection::read_message()
SocketConnection::read_message() now loops until it has a message with
valid sequence number. this means SocketMessenger::dispatch() doesn't
have to handle the null message case
The AsyncConnection keeps local (member variable) bufferlists of incoming
messages before they're placed into the Message's front/data/middle buffers.
Previously these were reset only when a new Message is being received, which
means in steady state we store a full Message for every Connection even if
it's inactive!
Instead we obviously want to drop our local references to Message state
once it's been dispatched, so that it can go away.
Sage Weil [Fri, 14 Sep 2018 14:00:18 +0000 (09:00 -0500)]
mgr/devicehealth: squelch health warnings for unused devices
We may also want to do the same for OSDs that are out and fully offloaded,
but that can wait until those OSDs go into a special "drive device to
failure" mode.
read_tags_until_next_message() will forward the ready future and create
a new promise for on_message, which assumes there is already a
read_message() holding the previous promise, but it is not true.
__project_pg_history__ does an inverse traverse of the series
of osdmaps passed in to get a pg's pg_history_t filled, which
can become super inefficient if the osdmap list to check is
very long.
E.g., in one of our clusters, we've observed it took approximate
10s for a PG to finish it's projecting:
```
2018-08-27 13:51:58.694823 7f1e1335a700 15 osd.9 823276 project_pg_history 34.6e9 from 821893 to 823276, start ec=380829/380829 l
is/c 820412/820412 les/c/f 820413/820413/0 821785/821785/821785
2018-08-27 13:52:08.634230 7f1e1335a700 15 osd.9 823276 project_pg_history 34.6e9 acting|up changed in 822265 from [57]/[57] 57/5
7 -> [58,57]/[58,57] 58/58
2018-08-27 13:52:08.634244 7f1e1335a700 15 osd.9 823276 project_pg_history 34.6e9 up changed in 822265 from [57] 57 -> [58,57] 58
2018-08-27 13:52:08.634248 7f1e1335a700 15 osd.9 823276 project_pg_history 34.6e9 primary changed in 822265
2018-08-27 13:52:08.634250 7f1e1335a700 15 osd.9 823276 project_pg_history end ec=380829/380829 lis/c 820412/820412 les/c/f 82041
3/820413/0 822265/822265/822265
```
Quote from Sage:
> let's just kill this off entirely, and make the handle_pg_query_nopg
reply unconditionally. Or, maybe, do a single sloppy check to see if
the primary has changed since the original epoch... if the osdmap
happens to be in cache... or not.
The querying end will discard the reply if it is out of date from it's
perspective, so it doesn't matter, and I suspect the overhead of doing
the check is larger than the overhead of sending a query reply that gets ignored.