Sage Weil [Thu, 12 Mar 2020 17:29:14 +0000 (12:29 -0500)]
Merge PR #33064 into octopus
* refs/pull/33064/head:
cephadm: add version to `command_ls` output
cephadm: add type checking to `update_filewalld`
cephadm: allow prepare-host to start an enabled service
cephadm: add type checking for `check_host` and `prepare_host`
cephadm: generalize logic for checking and enabling units
cephadm: add 'CEPH_CONF' to the NFS ganesha container envs
cephadm: trim nfs.json sample
qa/workunits/cephadm/test_cephadm.sh: systemctl stop nfs-server
qa/workunits/cephadm/test_cephadm.sh: make pgs available
cephadm: add some log lines
cephadm: check port in use
cephadm: add/remove nfs ganesha grace
cephadm: update firewalld with nfs service
qa/workunits/cephadm/test_cephadm.sh: add nfs-ganesha test
cephadm: add ganasha.conf
cephadm: add NFSGanesha deployment type
cephadm: consolidate list of supported daemons
cephadm: use keyword instead of positional args
Michael Fritch [Wed, 4 Mar 2020 22:30:03 +0000 (15:30 -0700)]
cephadm: add type checking to `update_filewalld`
fixes mypy errors:
cephadm:1682: error: Incompatible types in assignment (expression has type "str", variable has type "int")
cephadm:1683: error: List item 3 has incompatible type "int"; expected "str"
cephadm:1686: error: List item 3 has incompatible type "int"; expected "str"
Found 3 errors in 1 file (checked 1 source file)
Lenz Grimmer [Thu, 12 Mar 2020 13:37:30 +0000 (14:37 +0100)]
mgr/dashboard: Updated octopus image on 404 page
Replaced the image of the Nautilus octopus with another octopus
in preparation for the "Octopus" release.
The image was taken from Museums Victoria
(https://collections.museumvictoria.com.au/species/8696) and is
licensed under the Creative Commons "Attribution 4.0 International"
(CC BY 4.0) license.
Deleted older, now obsolete images from the assets directory.
Sage Weil [Thu, 12 Mar 2020 03:39:32 +0000 (22:39 -0500)]
Merge PR #33817 into octopus
* refs/pull/33817/head:
mgr/dashboard: Adapt tests to new DriveGroupSpec
fixup mgr/test_orchestrator: validate drive group matches anything.
mgr/orch: CLI: No Tracebacks for ServiceSpecValidationError
mgr/test_orchestrator: validate drive group matches anything.
python-common: don't run flake8 on tests.
python-common: Add support for legacy serialization format for Drive Groups
doc: Move Move ServiceSpec to python-common
python-common: Add `host_pattern` to `PlacementSpec.from_string()`
cephadm: add host_pattern to supported scheduling
python-common: Joined ServiceSpec and DriveGroupSpec from_json()
python-common: Make DriveGroupSpec a sub type of ServiceSpec
pybind/mgr: Move ServiceSpec to python-common: Fix imports
python-common, orch: Move ServiceSpec to python-common: Fix imports
python-common, orch: Move ServiceSpec tests to python-common
python-common: Move ServiceSpec to python-common: fix linting
python-common, orch: Move ServiceSpec (+deps) to python-common
Sage Weil [Fri, 6 Mar 2020 23:43:33 +0000 (17:43 -0600)]
cephadm: update unit.* atomically
Some of these are run as bash scripts, which means that updating them
can lead to the running bash picking up at a weird position mid-script
when it goes to the next command. This produces weird errors like
bash[9321]: /var/lib/ceph/f1758250-639e-11ea-9a42-001a4aab830c/mon.c/unit.run: line 2: -to-stderr=true: command not found
Jason Dillaman [Wed, 11 Mar 2020 19:11:10 +0000 (15:11 -0400)]
qa/workunits/rbd: wait for nbd map to close after unmap
The unmap action only sends a signal to the kernel to notify the
rbd-nbd daemon to disconnect. Therefore, it's possible that an
unmap followed by an immediate re-map to the same device might
fail since the unmap is still in-progress.
Fixes: https://tracker.ceph.com/issues/44567 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Ernesto Puerta [Wed, 4 Mar 2020 20:31:50 +0000 (21:31 +0100)]
mgr/dashboard: add feature toggle for NFS
- New NFS feature toggle added.
- Fixed regression which broke FeatureToggles in the main menu.
- Added extensive unit testing to NavigationComponent
- Added ng-mocks to improve test isolation and performance (~7x, 139s to
20s)
- Added ng-bullet package to improve unit testing performance (2x, 20s
to 9s)
- Added Rxjs Schedulers to implement NgZone.runOutsideAngular in a Rxjs
friendly way (based on
https://stackoverflow.com/questions/43121400/run-ngrx-effect-outside-of-angulars-zone-to-prevent-timeout-in-protractor)
Minor issues found from exhaustive unit testing:
- Missing permissions in Cluster menu:
- `permissions.log.read` and `permissions.prometheus.read`
- Missing classes:
- Block -> Images: `tc_submenuitem_block_images`
- Block -> iSCSI: `tc_submenuitem_block_iscsi`
- Typos:
- class `tc_submenuitem_hosts` assigned to OSD menu (instead of
`tc_submenuitem_osd`)
- `tc_menuitem_cephs` -> `tc_menuitem_cephfs`
Minor changes:
- Previously, Cluster Map -> CRUSH Map required both OSD and Host
permissions. Now it only requires OSD permissions (there are no
references to hosts in CRUSH Map page). Nevertheless, all system roles
setting OSD permission also set Host's, so no impact expected.
- Previously, Cluster -> Monitoring menu was hidden when both Prometheus
or Alertmanager weren't configured. Now it's displayed and when clicked
on it shows the helper banner indicating that further configuration is
required. This change removes the dependency on the PrometheusService.
Sage Weil [Wed, 11 Mar 2020 13:55:51 +0000 (08:55 -0500)]
Merge PR #33830 into octopus
* refs/pull/33830/head:
qa/tasks/cephadm: no default mon|mgr|crash service specs
qa/suites/rados/cephadm/upgrade: upgrade start point that supports the no-spec option
cephadm: create initial mon and mgr service specs too
cephadm: no need to pregenerate a crash key for the bootstrap host
mgr/cephadm: do not complain when we don't have enough hosts
mgr/cephadm: remove orphan daemons
mgr/cephadm: report size=0 for fabricated ServiceDescription
mgr/cephadm: safety check to prevent removing all mon|mgr daemons
mgr/cephadm: prevent scaling mon|mgr below count=1
mgr/cephadm: do not remove daemons from remove_service
Sage Weil [Wed, 11 Mar 2020 12:12:11 +0000 (07:12 -0500)]
Merge PR #33620 into octopus
* refs/pull/33620/head:
mgr/dashboard: Crush rule modal
mgr/dashboard: Preserve rule selection on pool type change
mgr/dashboard: Crush rule is only send during replicated pool creation
mgr/dashboard: Explicit returns in pool form
mgr/dashboard: Removes fork join in pool form
mgr/dashboard: Hide ECP actions during ec pool edit
mgr/dashboard: Pool form erasure/replicated boolean
mgr/dashboard: Change pool info API endpoint
mgr/dashboard: Moves ECP info endpoint to UI-API
The rados api tests are failing WatchNotify because the OSDs are so
heavily lagged.. in large part due to the high debug level of debug_ms=20
and debug_osd=25. Reduce that.
Also increase the heartbeat grace so slow valgrind-y osds don't get marked
down.
Kefu Chai [Wed, 11 Mar 2020 08:08:51 +0000 (16:08 +0800)]
cephadm: add "assert foo is not None" for mypy check
it's legit to pass file objects to fcntl(), but `Popen.stdout` and
`Popen.stderr` properies are not necessarily file objects -- they could be None.
this cannot be deduced at compile-time. even we can ensure this,
as we do pass `subprocess.PIPE` to the constructor. so mypy just
complains at seeing this:
```
cephadm:429: error: Argument 1 to "fcntl" has incompatible type "Optional[IO[Any]]"; expected "Union[int, HasFileno]"
cephadm:430: error: Argument 1 to "fcntl" has incompatible type "Optional[IO[Any]]"; expected "Union[int, HasFileno]"
cephadm:431: error: Argument 1 to "fcntl" has incompatible type "Optional[IO[Any]]"; expected "Union[int, HasFileno]"
cephadm:432: error: Argument 1 to "fcntl" has incompatible type "Optional[IO[Any]]"; expected "Union[int, HasFileno]"
cephadm:455: error: Item "None" of "Optional[IO[Any]]" has no attribute "fileno"
cephadm:465: error: Item "None" of "Optional[IO[Any]]" has no attribute "fileno"
cephadm:475: error: Item "None" of "Optional[IO[Any]]" has no attribute "fileno"
```
to silence this warning, insert `assert process.stdout is not None`
before accessing `process.stdout` to appease the strict optional
checking of mypy.
Kiefer Chang [Mon, 23 Dec 2019 08:03:10 +0000 (16:03 +0800)]
mgr/dashboard: Add Orchestrator doc components
Create two components for redirecting to Orchestrator Documents:
- cd-orchestrator-doc-panel: For displaying an information panel
- cd-orchestrator-doc-modal: For displaying an modal contains the
information panel
Kiefer Chang [Wed, 11 Dec 2019 04:08:01 +0000 (12:08 +0800)]
mgr/dashboard: support removing OSD in OSDs page
Add backend codes for deleting an OSD.
- `DELETE /api/osd/<svc_id>`: delete osd.svc_id. An error is returned if
there are any pre-checks fail.
- `DELETE /api/osd/<svc_id>?force=true`: with `force` flag on, the
deleting request is sent to orchestrator even pre-checks fail.
Jason Dillaman [Tue, 10 Mar 2020 17:31:34 +0000 (13:31 -0400)]
librbd: race condition in image watcher notification callback
If a refresh is in-progress when a header update notification is
received, the notification was previously incorrectly dropped.
This prevented rbd-mirror's snapshot-based mirroring replayer from
detecting updates in some cases.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Mon, 9 Mar 2020 22:32:03 +0000 (18:32 -0400)]
librbd: re-use mirror promote state machine when disabling
The promote state machine will handle remove the non-primary
feature bit and will ensure an interrupted disable operation
doesn't leave things in an inconsistent state.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Mon, 9 Mar 2020 21:04:27 +0000 (17:04 -0400)]
librbd: enable/disable implicit non-primary feature bit
When promoted to primary, disable the non-primary feature bit and
when demoted (or created non-primary), enable the non-primary feature
bit. This will prevent all non rbd-mirror RBD clients from modifying
the RBD image.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Mon, 9 Mar 2020 20:49:07 +0000 (16:49 -0400)]
rbd-mirror: permit R/W operations against non-primary image
With the non-primary feature bit is enabled, mask-out the read-only
feature bit that will be set in the refresh image state machine if
the image has that feature bit set. This will ensure that only the
rbd-mirror daemon will be able to modify a non-primary image.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 3 Mar 2020 20:17:52 +0000 (15:17 -0500)]
librbd: track reason why ImageCtx is read-only
This will be utilized by the RefreshRequest state machine to flag the image
as read-only if the new RBD_FEATURE_NON_PRIMARY feature is enabled. Also
allow that flag to be masked out by rbd-mirror daemon to permit IO and
operations against a non-primary image.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 3 Mar 2020 20:01:35 +0000 (15:01 -0500)]
librbd: new RBD_FEATURE_NON_PRIMARY to prevent R/W IO
When a snapshot-based image is non-primary, we will need to use
this implicit feature to ensure that writes and maintenance
operations cannot be performed against the image.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Tue, 10 Mar 2020 22:20:48 +0000 (17:20 -0500)]
Merge PR #33771 into octopus
* refs/pull/33771/head:
common/ceph_timer: Pass reference to waited time on stack
common/ceph_timer: Add test
common/ceph_timer: Use unique_function, allowing noncopyable events
common/ceph_timer: Couple cleanups
common/ceph_timer: Fix namespaces
common/ceph_timer: Add missing includes
common/ceph_timer.h: Don't indent contents of a namespace