qa/tasks/ceph_manager: use s/ByteIO/StringIO in stdout for ceph-objectstore-tool
wrt master, we have moved to using run_ceph_objectstore_tool which uses
StringIO for stdout and stderr, to make the changes compatible with
nautilus, replacing use of ByteIO with StringIO.
Rishabh Dave [Wed, 10 Jun 2020 09:55:53 +0000 (15:25 +0530)]
vstart_runner: set default values of stdout and stderr to None
Not doing so leads to tests run successfully with vstart_runner.py but
crash when triggered with teuthology since the default values of these
variables there is None.
Fixes: https://tracker.ceph.com/issues/45815 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Kefu Chai [Sun, 8 Mar 2020 06:00:53 +0000 (14:00 +0800)]
qa/tasks/ceph_manager: use StringIO for capturing COT output
there are couple factors we should consider when choosing between
BytesIO and StringIO:
- if the producer is producing binary
- if we are expecting binary
- if the layers in between them are doing the decoding/encoding
automatically.
in our case, the producer is either the ChannelFile instances returned
by paramiko.SSHClient or subprocess.CompletedProcess insances returned
by subprocess.run(). the former are file-like objects opened in "r" mode,
but their contents are decoded with utf-8 when reading if
ChannelFile.FLAG_BINARY is not specified. that's why we always try to
add this flag in orchestra/run.py when collecting the stdout and stderr
from paramiko.SSHClient after executing a command.
back in python2, this works just fine. as we don't differentiate bytes
from str by then.
but in python3, we have to make a decision. in the case of
ceph-objectstore-tool (COT for short), it does not produce binary and
we don't check its output with binary, so, if neither Remote.run() nor
LocalRemote.run() decodes/encodes for us, it's fine.
so it boils down to `copy_to_log()`:
i think we we should respect the consumer's expectation, and only decode
the output if a StringIO is passed in as stdout or stderr.
as we always log the output with logging we could either set
`ChannelFile.FLAG_BINARY` depending on the type of `capture` or not.
if it's not set, paramiko will return str (bytes) on python2, and str on
python3. if it's not set paramiko will return str (bytes) on python2,
and bytes on python3.
if there is non-ASCII in the output, logging will bail fail with
`UnicodeDecodeError` exception. and paramiko throws the same exception
when trying to decode for us if `ChannelFile.FLAG_BINARY` is not
specified.
so to ensure that we always have logging messages no matter if the
producer follows the rule of "use StringIO if you only emit text" or
not, we have to use `ChannelFile.FLAG_BINARY`, and force paramiko
to send us the bytes. but we still have the luxury to use StringIO
and do the decode when the caller asks for str explicitly. that'd save
the pain of using `str.decode()` or `six.ensure_str()` everywhere
even if we can assure that the program does not write binary.
Ilya Dryomov [Sat, 21 Nov 2020 10:01:40 +0000 (11:01 +0100)]
msg/async/ProtocolV2: allow rxbuf/txbuf get bigger in testing, again
With CEPHX_V2 authorizer challenges brought back in commit 4a82c72e3bdd, these need to be bumped again, as two authorizers
(without and then with the challenge) are transmitted and signed
instead of one (without the challenge). See commit 94953dd9398a
("msg/async/ProtocolV2: allow rxbuf/txbuf get bigger in testing")
for details.
Conflicts:
qa/workunits/cephtool/test.sh
- drop unrelated "# test elector" comment (elector test not backported)
- no "test_mon_priority_and_weight" function in nautilus
Conflicts:
PendingReleaseNotes - add release notes about this feature
qa/tasks/mgr/test_progress.py - replace helper functions that is neeeded
for dealing with more than 1 type of events
- remove `period` in wait_until_equal()
src/pybind/mgr/progress/module.py - remove code that deals with mypy in master
qa/suites/rados/singleton/all/pg-autoscaler-progress-off.yaml - remove log-ignorelist
mgr/dashboard: Display users current bucket quota usage Fixes: https://tracker.ceph.com/issues/45011 Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit 4fabba0bb772d480dcddc83272c83e7714726fc1)
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/cluster/osd/osd-list/osd-list.component.html
src/pybind/mgr/dashboard/frontend/src/app/ceph/pool/pool-list/pool-list.component.html
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-bucket-list/rgw-bucket-list.component.spec.ts
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-bucket-list/rgw-bucket-list.component.ts
src/pybind/mgr/dashboard/frontend/src/app/shared/components/usage-bar/usage-bar.component.ts
- Resolved conflicts due to variable name change and few other import conflicts.
The ceph-volume lvm batch --auto introduced by [1] breaks the backward
compatibility when using non rotational devices only (SSD and/or NVMe).
Those devices are reaffected as bluestore db or filestore journal
devices while we want them as data devices.
Kiefer Chang [Wed, 4 Nov 2020 03:00:21 +0000 (11:00 +0800)]
mgr/dashboard: disable cluster selection in NFS export editing form
We should not allow changing an export's cluster because an export ID
might live in one cluster but not in another one. Editing a non-existing
export in a cluster causes an error.
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/nfs/nfs-form/nfs-form.component.spec.ts
src/pybind/mgr/dashboard/frontend/src/app/ceph/nfs/nfs-list/nfs-list.component.spec.ts
- Some imports differ from master; remove SummaryService and CephReleaseNamePipe from providers
and no longer needed test case; also remove not working docsUrl (and related lines) from
nfs-form.component.ts
Kefu Chai [Thu, 25 Jun 2020 02:41:30 +0000 (10:41 +0800)]
mgr: avoid false alarm of MGR_MODULE_ERROR
mgr sends healthy report periodically, the report includes the
information whether the always-on modules are loaded or not. but the
modules are loaded with two steps:
1. load the options and command exposed by modules. the options and
commands are registered using static methods of the subclasss of
MgrModule.
2. create an instance of the subclass of MgrModule. this is performed
in background by a Finisher thread. upon finishing of the construction
of the instance, ActivePyModules::start_one() adds the module which
successfully creates the class to `modules`.
but there is chance that when mgr sends healthy report, the always-on
module is still creating its instance of MgrModule subclass, or that
task is still pending in the finisher thread. in that case, mgr would
add a false error message like
```
4 mgr modules have failed (MGR_MODULE_ERROR)
```
in the healthy report
in this change, the number of modules in pending state is tracked,
and mgr will not take the missing always-on modules into account unless
the number of pending modules is 0.
Dimitri Savineau [Mon, 26 Oct 2020 19:12:59 +0000 (15:12 -0400)]
ceph-volume: consume mount opt in simple activate
When running ceph-volume simple activate command on a Filestore OSD
then the data device is mounted without any specific options so the
one from the ceph configuration file are ignored.
When deploying Filestore with the lvm subcommand then everything is
fine because the filestore_activate method uses mount_osd which relies
on the mount options defined in the ceph configuration file (if any).
Ilya Dryomov [Fri, 16 Oct 2020 10:57:50 +0000 (12:57 +0200)]
mon/MonClient: bring back CEPHX_V2 authorizer challenges
Commit c58c5754dfd2 ("msg/async/ProtocolV1: use AuthServer and
AuthClient") introduced a backwards compatibility issue into msgr1.
To fix it, commit 321548010578 ("mon/MonClient: skip CEPHX_V2
challenge if client doesn't support it") set out to skip authorizer
challenges for peers that don't support CEPHX_V2. However, it
made it so that authorizer challenges are skipped for all peers in
both msgr1 and msgr2 cases, effectively disabling the protection
against replay attacks that was put in place in commit f80b848d3f83
("auth/cephx: add authorizer challenge", CVE-2018-1128).
This is because con->get_features() always returns 0 at that
point. In msgr1 case, the peer shares its features along with the
authorizer, but while they are available in connect_msg.features they
aren't assigned to con until ProtocolV1::open(). In msgr2 case, the
peer doesn't share its features until much later (in CLIENT_IDENT
frame, i.e. after the authentication phase). The result is that
!CEPHX_V2 branch is taken in all cases and replay attack protection
is lost.
Only clusters with cephx_service_require_version set to 2 on the
service daemons would not be silently downgraded. But, since the
default is 1 and there are no reports of looping on BADAUTHORIZER
faults, I'm pretty sure that no one has ever done that. Note that
cephx_require_version set to 2 would have no effect even though it
is supposed to be stronger than cephx_service_require_version
because MonClient::handle_auth_request() didn't check it.
To fix:
- for msgr1, check connect_msg.features (as was done before commit c58c5754dfd2) and challenge if CEPHX_V2 is supported. Together
with two preceding patches that resurrect proper cephx_* option
handling in msgr1, this covers both "I want old clients to work"
and "I wish to require better authentication" use cases.
- for msgr2, don't check anything and always challenge. CEPHX_V2
predates msgr2, anyone speaking msgr2 must support it.
Conflicts:
src/msg/async/ProtocolV1.cc [ commit c58c5754dfd2
("msg/async/ProtocolV1: use AuthServer and AuthClient") not
in nautilus. This means that only msgr2 is affected, so drop
ProtocolV1.cc hunk. As a result, skip_authorizer_challenge is
never set, but this is fine because msgr1 still uses old ms_*
auth methods and tests CEPHX_V2 appropriately. ]
This was added in commit 9bcbc2a3621f ("mon,msg: implement
cephx_*_require_version options") and inadvertently dropped in
commit e6f043f7d2dc ("msgr/async: huge refactoring of protocol V1").
As a result, service daemons don't enforce cephx_require_version
and cephx_cluster_require_version options and connections without
CEPH_FEATURE_CEPHX_V2 are allowed through.
(cephx_service_require_version enforcement was brought back a
year later in commit 321548010578 ("mon/MonClient: skip CEPHX_V2
challenge if client doesn't support it"), although the peer gets
TAG_BADAUTHORIZER instead of TAG_FEATURES.)
Resurrect the original behaviour: all cephx_*require_version
options are enforced and the peer gets TAG_FEATURES, signifying
that it is missing a required feature.
Ilya Dryomov [Fri, 16 Oct 2020 09:33:32 +0000 (11:33 +0200)]
msg/async/ProtocolV1: resurrect "include MGR as service when applying cephx settings"
This was added in commit 0ec7d6bbc4af ("msg/async,simple: include MGR
as service when applying cephx settings") and inadvertently dropped in
commit e6f043f7d2dc ("msgr/async: huge refactoring of protocol V1").
As a result, mgr daemons are miscategorized as clients when enforcing
cephx_*require_signatures options.