Sage Weil [Thu, 7 Mar 2019 12:30:29 +0000 (06:30 -0600)]
Merge PR #26725 into master
* refs/pull/26725/head:
doc/releases/nautilus: ask users to opt in to telemetry
doc/mgr/telemtry: update docs
mgr/telemetry: drop config-set and config-show; add just show
mgr/telemetry: make 'telemetry show' readable by a human
mgr/telemetry: add 'telemetry on' and 'telemetry off' commands
mgr/telemetry: off by default
Reviewed-by: Wido den Hollander <wido@42on.com> Reviewed-by: Dan Mick <dmick@redhat.com>
Sage Weil [Wed, 6 Mar 2019 23:40:48 +0000 (17:40 -0600)]
msg/async/ProtocolV1: fix locking around authorizer_buf
Fix two problems:
- we are accessing authorizer_buf without the connection lock, and
under the lock we are modifying it (in connect()).
- if we receive two connect_msg's with a different length, we won't
have a buffer that's large enough.
Fixes: http://tracker.ceph.com/issues/38524 Signed-off-by: Sage Weil <sage@redhat.com>
Jeff Layton [Mon, 25 Feb 2019 14:21:08 +0000 (09:21 -0500)]
mgr/orchestrator: allow scaling the NFS server count up and down
Add a new 'ceph orchestrator nfs update' command that will take the
NFS clustername and a new count as arguments. That will get translated
to a StatelessServiceSpec and passed to update_stateless_service.
Also, add the necessary stubs to the test_orchestrator and the CLI
QA test.
Jeff Layton [Mon, 25 Feb 2019 14:27:02 +0000 (09:27 -0500)]
mgr/rook: allow scaling nfs count
Allow rook to handle scaling the NFS server count up and down in an NFS
cluster. We just manifest these changes as change to the
spec.server.active field in the CRD.
Jeff Layton [Tue, 26 Feb 2019 19:53:45 +0000 (14:53 -0500)]
mgr/orchestrator: just keep a single count value in StatelessServiceSpec
We currently have a min_size/max_size values in here, but we don't have
any orchestrators that can take advantage of two values. Let's just keep
a simple count for now, until we do.
xie xingguo [Wed, 6 Mar 2019 06:11:16 +0000 (14:11 +0800)]
osd/PrimaryLogPG: fix last_peering_reset checking on manifest flushing
```handle_manifest_flush``` is obviously using the wrong
**last_peering_reset** to check whether a new peering procedure
has been re-initialized by then.
Fix by using a different alias of the local copy of the
pg-wide **last_peering_reset** variable, which is less confusing and
error-prone.
xie xingguo [Tue, 5 Mar 2019 06:28:59 +0000 (14:28 +0800)]
mgr: 'osd df' by specified class or (crush) name
For large clusters, we use device classes to isolate storage pools.
The existing 'osd df' output turns out to be too nosiy, say, if
you care about only single storage pool with osds possibly spanning over
all hosts.
With this change you are now being able to do 'osd df' by class (or by pool,
if you simply use classes to separate different pools), or by a specified
crush bucket name you are currently interested in, which is much more
convenient.
Some examples:
```
$ bin/ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 0.05878 - 60 GiB 6.4 GiB 23 MiB 0 B 6 GiB 54 GiB 10.60 1.00 - root default
-3 0.02939 - 30 GiB 3.2 GiB 12 MiB 0 B 3 GiB 27 GiB 10.60 1.00 - host ceph11
3 aaa 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 56 up osd.3
4 bbb 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 58 up osd.4
5 ccc 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 60 up osd.5
-5 0.02939 - 30 GiB 3.2 GiB 12 MiB 0 B 3 GiB 27 GiB 10.60 1.00 - host ceph12
0 aaa 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 50 up osd.0
1 bbb 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 61 up osd.1
2 ccc 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 51 up osd.2
TOTAL 60 GiB 6.4 GiB 23 MiB 0 B 6 GiB 54 GiB 10.60
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
$ bin/ceph osd df tree class aaa
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 0.05878 - 20 GiB 2.1 GiB 7.8 MiB 0 B 2 GiB 18 GiB 10.60 1.00 - root default
-3 0.02939 - 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 - host ceph11
3 aaa 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 56 up osd.3
-5 0.02939 - 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 - host ceph12
0 aaa 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 50 up osd.0
TOTAL 20 GiB 2.1 GiB 7.8 MiB 0 B 2 GiB 18 GiB 10.60
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
$ bin/ceph osd df tree name ceph11
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-3 0.02939 - 30 GiB 3.2 GiB 12 MiB 0 B 3 GiB 27 GiB 10.60 1.00 - host ceph11
3 aaa 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 56 up osd.3
4 bbb 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 58 up osd.4
5 ccc 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 60 up osd.5
TOTAL 30 GiB 3.2 GiB 12 MiB 0 B 3 GiB 27 GiB 10.60
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
Ilya Dryomov [Tue, 5 Mar 2019 22:07:27 +0000 (23:07 +0100)]
qa/suites/krbd/wac: bluestore snippet is placed incorrectly
Instead of generating three tests, each with bluestore-bitmap.yaml, it
generates four tests: one consisting of just bluestore-bitmap.yaml and
the other three without any trace of bluestore. This was introduced in
commit 711df71790fa ("qa: objectstore snippets for krbd").
If we are in the middle of replacing, we can not queue any further
write events into the old center because we may end up replacing
existing connection's center with a new one, and hence executing
the newly queued write events in the old thread.
See **transfer_existing** for a detailed description.
Also the patch does not make a lot of sense for the original issue
it tried to resolve, because **send_keepalive** is a pure noop if the
underlying connection is not ready, which is obviously true for the
case demonstrated in http://tracker.ceph.com/issues/38493..
Sage Weil [Mon, 4 Mar 2019 14:44:56 +0000 (08:44 -0600)]
Merge PR #26704 into master
* refs/pull/26704/head:
msg/async, v2: drop alloc_aligned_buffer().
msg/async, v2: introduce frame late abort facility.
Revert "msg/async, v2: move ceph_msg_header2 to last frame segment."
msg, msg/async, v2: introduce late message abort facility.
msg/async, v2: failure of msg decode doesn't block throtlles.
msg/async, v2: move ceph_msg_header2 to last frame segment.
fixup: use frame epilogue for crc32 integrity checking.
msg/async, v2: epilogue size is variable in secure mode.
msg/async, v2: drop support for the buggy rx_buffers mechanism.
Revert "msg/async, v2: add flags field to frame's epilogue."
msg/async, v2: add flags field to frame's epilogue.
msg/async, v2: drop onwire_segment_t as epilogue had derogated it.
msg, msg/async, v2: drop crc fields from ceph_msg_header2.
msg/async, v2: use frame epilogue for crc32 integrity checking.
msg/async, v2: clean the ProtocolV2::{front,middle,data} up.
msg/async, v2: clean the ProtocolV2::epilogue up.
msg/async, v2: move crypto processing to segment reader.
msg/async, v2: handle epilogue separately from payload/data.
msg/async, v2: dissect decryption from SignedEncryptedFrame.
msg/async, v2: unify WAIT frames with other payload frames.
msg/async, v2: implement epilogue handling in secure mode.
msg/async, v2: message frames are pre-dispatched now.
The libradosstriper::RadosStriperImpl::aio_read populates the target
outbl with a static buffer and relies on us reading into it. This was
actually not reliable in the past (it could fail if the rx_buffers
optimization failed due to a retransmit or something else) but nevertheless
libradosstriper requires it to work *at all*.
Resolve this by modifying Objecter to copy the result into any provided
buffer at the lowest layer. This should capture any other such user who
needed this behavior.
On the other hand, it will break any user who inadvertantly reads into a
non-empty bufferlist. Given that any such user would already previously
have seen bad behavior due to the rx_buffers optimization, we expect
there to be 0 such instances.
This test introduces a map gap. What *should* happen is that when there is
such a gap, we cannot import. Previously, the test didn't reliably produce
a map gap at all, and didn't check that import failed--it verified that it
passed.
Fix the test so that it reliably produces a gap *and* reports
min_last_epoch_clean to the mon so we can trim. Then verify we fail to
import, but can with --force. But remove the pg again, because if we
force an import with a map gap the osd will refuse to start.
Fixes: http://tracker.ceph.com/issues/38525 Signed-off-by: Sage Weil <sage@redhat.com>