Any unknown exception causes the module to be unloaded and unresponsive.
So, it'll be ideal to catch all exceptions during command-line interaction
and report them instead of crashing with a traceback.
Afreen [Tue, 6 Feb 2024 09:43:58 +0000 (15:13 +0530)]
mgr/dashboard: fix error while accessing roles tab when policy attached
Fixes https://tracker.ceph.com/issues/64270
Issue:
======
Accessing Object->Users-Roles tab causing 500 internal servor error.
This is due to the "PermissionPolicies" which are attached to role and
backend was not handling this field for rgw roles.
Fix:
====
Added "PermissionPolicies" as the valid field in backend and updated
frontend to render the attached policy in formatted JSON
Zac Dover [Wed, 7 Feb 2024 13:18:35 +0000 (23:18 +1000)]
doc/radosgw: add confval directives
Add confval directives to the documentation of "quota cache" options.
This addresses a request made by Antony D'Atri in https://github.com/ceph/ceph/pull/55075/files#r1444006246.
Zac Dover [Sun, 4 Feb 2024 15:36:10 +0000 (01:36 +1000)]
doc/rados: update PG guidance
Update the "Creating a Pool" section of doc/rados/operations/pools.rst
so that the documentation no longer insists that the user change the
values of "osd_pool_default_pg_num" and "osd_pool_default_pgp_num".
See also: https://github.com/ceph/ceph/pull/55419
Tracker: https://tracker.ceph.com/issues/64259
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 5ad241442d2c141ba508faba61f39d70f3f09679)
This commit introduces a major refactor of the main
entrypoint.
- subclass threading.Thread:
- Introduce a new class `BaseThread()` that is a
`threading.Thread()` abstraction class in order
to monitor the different threads.
- `BaseSystem()` inherits from `BaseThread()`.
- Handle `SIGTERM` signal in order to gracefully shutdown
node-proxy (make threads exit gracefully, log out from RedFish API, etc.)
Additionally, this:
- drops the class `Logger()` from util.py which
was not adding value. It is now replaced with a simple `get_logger()`
function.
- changes the node-proxy API port from 8080 to 9456
(8080 being widely used for frontend apps...)
- changes the container entrypoint in order to use the
`ceph-node-proxy` binary from the packaging
Zac Dover [Fri, 2 Feb 2024 01:53:45 +0000 (11:53 +1000)]
doc/rados: update config for autoscaler
Update doc/rados/configuration/pool-pg-config-ref.rst to account for the
behavior of autoscaler.
Previously, this file was last meaningfully altered in 2013, prior to
the invention of autoscaler. A recent confusion was brought to my
attention on the Ceph Slack whereby a user attempted to alter the
default values of a Quincy cluster, as suggested in this documentation.
That alteration caused Ceph to throw the error "Error ERANGE: 'pgp_num'
must be greater than 0 and lower or equal than 'pg_num', which in this
case is one" and a related "rgw_init_ioctx ERROR" reading in part
"Numerical result out of range". The user removed the
"osd_pool_default_pgp_num" configuration line from ceph.conf and the
cluster worked as expected. I presume that this is because the removal
of this configuration line allowed autoscaler to work as intended.
Fixes: https://tracker.ceph.com/issues/64259 Co-authored-by: David Orman <ormandj@corenode.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 4dc12092be584da44baca14e31ca33231164235f)
bacport mgr/rook: always recreate kvm default network + fix groups refresh Fixes: https://tracker.ceph.com/issues/64079
This change also includes:
- adding ~/.local/bin to path so behave binary can be found
- adding requirements.txt file for testing dependencies
- increasing timeout used to wait for tools deployment to 90s
- increasing timeout used to wait for kvm network to 20s
mgr/cephadm: add a new config option 'oob_default_addr'
So there's a default value (169.254.1.1) which is the default
address for the 'OS to iDrac pass-through' interface.
Given that node-proxy will reach the RedFish API through this interface,
we can make users avoid to pass that addr when providing the host spec
at bootstrap time.
Afreen [Mon, 29 Jan 2024 10:12:10 +0000 (15:42 +0530)]
mgr/dashboard: Create subvol of same name in different group
Fixes https://tracker.ceph.com/issues/64112
Issue:
Currently, we are unable to create subvolume of same name in different
subvolume group
Fix:
We are validating only the filesystem name of subvolume
which is stopping the creation a subvolume of same name.
Added more granularity , by adding the subvolumegroup name.
Laura Flores [Mon, 29 Jan 2024 00:58:25 +0000 (00:58 +0000)]
mgr: pin pytest to version 7.4.4
On 2024-01-27, pytest updated to 8.0.0,
which broke run-tox-mgr.
https://docs.pytest.org/en/stable/changelog.html
==================================== ERRORS ====================================
_____________________ ERROR collecting alerts/__init__.py ______________________
alerts/__init__.py:2: in <module>
from .module import Alerts
alerts/module.py:6: in <module>
from mgr_module import CLIReadCommand, HandleCommandResult, MgrModule, Option
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
______________________ ERROR collecting alerts/module.py _______________________
alerts/module.py:6: in <module>
from mgr_module import CLIReadCommand, HandleCommandResult, MgrModule, Option
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
____________________ ERROR collecting balancer/__init__.py _____________________
balancer/__init__.py:2: in <module>
from .module import Module
balancer/module.py:12: in <module>
from mgr_module import CLIReadCommand, CLICommand, CommandResult, MgrModule, Option, OSDMap, CephReleases
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
_____________________ ERROR collecting balancer/module.py ______________________
balancer/module.py:12: in <module>
from mgr_module import CLIReadCommand, CLICommand, CommandResult, MgrModule, Option, OSDMap, CephReleases
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
ceph-volume: fix partitions support in disk.get_devices()
The following:
```
is_part = get_file_contents(os.path.join(_sys_dev_block_path, item, 'partition')) == "1"
```
assumes any `/sys/dev/block/x:y/partition` contains '1' which is wrong.
This file actually contains the corresponding partition number.
ceph-volume: use 'no workqueue' options with dmcrypt
CloudFlare engineers made some testing and realized that using
workqueues with encryption on flash devices has a bad effect.
See [1] for details.
With this patch it will make ceph-volume call crypsetup with
`--perf-no_read_workqueue` and `--perf-no_write_workqueue` options
when the device is not a rotational.
the following messages get logged quite a lot while
this is not a very useful information in a normal situation:
```
2024-01-12 09:09:40,604 - reporter - INFO - data ready to be sent to the mgr.
2024-01-12 09:09:40,604 - reporter - INFO - no diff, not sending data to the mgr.
2024-01-12 09:10:15,022 - reporter - INFO - data ready to be sent to the mgr.
2024-01-12 09:10:15,022 - reporter - INFO - no diff, not sending data to the mgr.
...
```
This `sleep(5)` should be initiated *after* the lock is released.
Otherwise, it can cause troubles with the reporter loop which can
never acquire the lock.
The current implementation requires the inclusion of all the recent
modifications in the cephadm binary, which won't be backported.
Since we need the node-proxy code backported to reef, let's move the
code make it a separate daemon.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com> Co-authored-by: Adam King <adking@redhat.com>
(cherry picked from commit 7e6bc179ae7e0d633bd63086775002182c861d3f)
This renames the mgr's NodeProxyCache attribute from
`self.node_proxy` to `self.node_proxy_cache` and the
class `NodeProxy` in agent.py from `NodeProxy` to
`NodeProxyEndpoint` to make it clearer and avoid confusion.
node-proxy: enhance debug log messages for locking operations
This commit updates the debug log messages in the BaseRedfishSystem
and Reporter classes. The adjustments made enhance the clarity and
precision of the messages by specifically identifying acquired
and released locks, detailing their context, thereby improving the
understanding of the control flow during locking operations
in these components.