cmd = ['ceph-volume', '--fsid', fsid] + cv_cmd
with with_cephadm_ctx(cmd) as ctx:
cd.command_ceph_volume(ctx)
assert ctx.fsid == fsid
s = get_ceph_conf(fsid=fsid)
f = cephadm_fs.create_file('ceph.conf', contents=s)
cmd = ['ceph-volume', '--fsid', fsid, '--config', f.path] + cv_cmd
with with_cephadm_ctx(cmd) as ctx:
cd.command_ceph_volume(ctx)
assert ctx.fsid == fsid
cmd = ['ceph-volume', '--fsid', '10000000-0000-0000-0000-0000deadbeef', '--config', f.path] + cv_cmd
with with_cephadm_ctx(cmd) as ctx:
err = 'fsid does not match ceph.conf'
with pytest.raises(cd.Error, match=err):
cd.command_ceph_volume(ctx)
> assert ctx.fsid == None
E AssertionError: assert '10000000-0000-0000-0000-0000deadbeef' == None
E + where '10000000-0000-0000-0000-0000deadbeef' = <cephadm.CephadmContext object at 0x7f1c7121c1c0>.fsid
```
The clean_cgroup method assumes that the ctx.fsid is set while this is
true for the bootstrap command, it isn't set for adopt or deploy commands
(and maybe others).
This ends up to the adopt command to fails:
Traceback (most recent call last):
File "/sbin/cephadm", line 8301, in <module>
main()
File "/sbin/cephadm", line 8289, in main
r = ctx.func(ctx)
File "/sbin/cephadm", line 1764, in _default_image
return func(ctx)
File "/sbin/cephadm", line 5091, in command_adopt
command_adopt_ceph(ctx, daemon_type, daemon_id, fsid)
File "/sbin/cephadm", line 5299, in command_adopt_ceph
osd_fsid=osd_fsid)
File "/sbin/cephadm", line 2884, in deploy_daemon_units
clean_cgroup(ctx, unit_name)
File "/sbin/cephadm", line 2724, in clean_cgroup
if not ctx.fsid:
File "/sbin/cephadm", line 155, in __getattr__
return super().__getattribute__(name)
AttributeError: 'CephadmContext' object has no attribute 'fsid'
Since we already have the fsid value in deploy_daemon_units (which calls
clean_cgroup) then we can pass the fsid value directly.
mgr/cephadm: ingress: fix typo in spec.virtual_interface_networks reference
When using virtual_inteface_networks to identify the interface to have the
virtual ip on, it referenced spec.networks instead of
spec.virtual_interface_networks.
We don't need to run an extra command (mgr module ls) to obtain the mgr
modules list since we already have this information in the mgr_map.
This workflow is already done for the monitoring stack or for configuring
the iscsi integration within the dashboard (during creation) via the
config_dashboard method.
The mgr_map is mocked in the tests with the dashboard module enabled so we
don't need _mon_command_mock_mgr_module_ls anymore.
Sebastian Wagner [Tue, 20 Jul 2021 14:09:57 +0000 (16:09 +0200)]
cephadm: haproxy 2.4 defaults to a different container user.
Another alternative would be to investigage a different setup
leverageing `--sysctl net.ipv4.ip_unprivileged_port_start=0`,
but that would be a larger PR.
Fixes: https://tracker.ceph.com/issues/51355 Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 250064bdcbe778b3cc245df843d14dd19cbb8772)
Michael Fritch [Fri, 9 Jul 2021 01:35:42 +0000 (19:35 -0600)]
cephadm: use CephadmContext rather than MagicMock
MagicMock hides attribute errors:
```
self = <cephadm.CephadmContext object at 0x7f1121e62370>, name = 'config_json'
def __getattr__(self, name: str) -> Any:
if '_conf' in self.__dict__ and hasattr(self._conf, name):
return getattr(self._conf, name)
elif '_args' in self.__dict__ and hasattr(self._args, name):
return getattr(self._args, name)
else:
> return super().__getattribute__(name)
E AttributeError: 'CephadmContext' object has no attribute 'config_json'
```
Javier Cacheiro [Mon, 12 Jul 2021 14:03:27 +0000 (16:03 +0200)]
Fetch the actually running selinux status.
The HostFacts should return the **actual** selinux mode in which the
kernel is running.
The actual mode can be different from the one in the configuration
if the server has not been rebooted or if the mode was changed
after boot using setenforce.
Instead of reading _selinux_path_list we should look at the output of
sestatus or getenforce.
The _selinux_path_list attribute is no longer needed.
Fixes: https://tracker.ceph.com/issues/51632 Signed-off-by: Javier Cacheiro <javier.cacheiro.lopez@cesga.es>
(cherry picked from commit c3c79fc44c34825384c59cbe962b9153e6b522b0)
When the mgr dashboard module isn't enabled then the iSCSI service deletion
is stuck and the cluster state goes ERR.
The `ceph dashboard` commands aren't available when the mgr dashboard module
isnt' enabled.
This PR improves the readability and format
of the troubleshooting.rst file. This also
makes a change to the markdown of one of the
sub-subsections so that it is made of tildes
(~) instead of carets (^), because that's
the RST standard.
- Rewrites the "Data Location" section of the Operations
docs
- Rewrites the "Health Checks" section of the Operations
docs
- Adds prompts to commands
- Adds console-output formatting to the places where it
is appropriate
- Adds several section headers where appropriate, to
signpost to the reader what is currently under discussion
For some reason, the sysctl directory could not exist if no packages dropping
a custom sysctl file is installed on the host.
Instead we create the directory if it doesn't exist.
Zac Dover [Tue, 29 Jun 2021 11:45:22 +0000 (21:45 +1000)]
doc/cephadm: improving "Starting the Upgrade"
This PR (slightly) improves the text in the section "Starting
the Upgrade" in the "Upgrading Ceph" chapter of the cephadm
documentation.
This is a very minor update, and does little but bring the sentences
into agreement with many other sentences that I've already written.
This is done to give the reader an almost tabular sense of what to
expect when looking at our docs.
胡玮文 [Sun, 13 Jun 2021 06:23:56 +0000 (14:23 +0800)]
cephadm: workaround unit replace failure
This should be a bug in systemd. It failed to cleanup cgroups when stop the
unit. Then if we start a new unit with the same name, the 'ExecStartPre' command
will fail with status=219/CGROUP (Only when systemd unified cgroup hierarchy is
enabled), because cgroup v2 does not allow process in non-leaf group. This
should be fixed in systemd commit e08dabfec7304dfa0d59997dc4219ffaf22af717.
By now, we just remove these left over cgroups before start new unit.
Michael Fritch [Wed, 16 Jun 2021 20:20:38 +0000 (14:20 -0600)]
cephadm: fix regexp to strip `v1:` or `v2:` prefix from IPv6 addr
regexp was striping the first hextet of the IPv6 address:
```
FAILED tests/test_cephadm.py::TestBootstrap::test_mon_addrv[[0000:0000:0000:0000:0000:FFFF:C0A8:0101:1234]-list_networks5-None] - cephadm.Error: Cannot infer CIDR network for mon IP `0000:0000:0000:0000:FFFF:C0A8:0101`: pass --skip-mon-network to configure it later
```
(1) improves syntax and formatting of "Logging to stdout"
(2) improves syntax and formatting of "Logging to files"
(3) replaces all carets with tildes in 3rd-level section
headers in operations.rst (./build-doc was crying
about inconsistency when I fed it tildes, but tildes
and not carets are the RST standard according to
https://docutils.sourceforge.io/ \
docs/user/rst/quickstart.html#sections
so the carets had to go.
Sebastian Wagner [Tue, 15 Jun 2021 09:24:34 +0000 (11:24 +0200)]
pyhton-common: fix mypy errors
Fixes:
```
py3 run-test: commands[2] | mypy --config-file=../mypy.ini -p ceph
ceph/deployment/service_spec.py: note: In member "yaml_representer" of class "ServiceSpec":
ceph/deployment/service_spec.py:659: error: Argument 1 to "represent_dict" of "SafeRepresenter" has incompatible type "_OrderedDictItemsView[str, Any]"; expected "Mapping[Any, Any]"
```
Sebastian Wagner [Tue, 15 Jun 2021 08:19:40 +0000 (10:19 +0200)]
mgr/orch: fix mypy errors
Fixes:
```
orchestrator/__init__.py:6: note: In module imported here:
orchestrator/_interface.py: note: In member "yaml_representer" of class "DaemonDescription":
orchestrator/_interface.py:1039: error: Argument 1 to "represent_dict" of "SafeRepresenter" has incompatible type "ItemsView[Any, Any]"; expected "Mapping[Any, Any]"
orchestrator/_interface.py: note: In member "yaml_representer" of class "ServiceDescription":
orchestrator/_interface.py:1178: error: Argument 1 to "represent_dict" of "SafeRepresenter" has incompatible type "ItemsView[Any, Any]"; expected "Mapping[Any, Any]"
orchestrator/_interface.py: note: At top level:
orchestrator/_interface.py:1181: error: Argument 2 to "add_representer" has incompatible type "Callable[[SafeDumper, DaemonDescription], Any]"; expected "Callable[[SafeDumper, ServiceDescription], Node]"
Found 3 errors in 1 file (checked 29 source files)
```
limit the number of concurrent RGWMetaSyncSingleEntryCRs that each mdlog
shard is allowed to spawn. use META_SYNC_SPAWN_WINDOW=20 to match data-
and bucket sync
rgw: metadata sync treats all errors as 'transient'
collect_children() had a special case for EAGAIN that it treated as
a 'transient' error, which set the can_adjust_marker = false to bail out
of RGWMetaSyncShardCR and retry from the previous marker
but the http client doesn't return EAGAIN - rgw_http_error_to_errno()
defaults to EIO - so this retry logic based on can_adjust_marker never
runs. on any other error, RGWMetaSyncSingleEntryCR would not call
marker_tracker->finish() to advance the sync status marker, and
RGWMetaSyncShardCR would continue on with full- or incremental sync
without ever attempting to retry the failed entries
a detailed comment in collect_children() describes a different strategy
for handling 'permanent' errors, but that was never fully elaborated.
i also don't think there's a reasonable way to differentiate between
transient and permanent errors, so this treats all errors as transient
to be retried
if an error really is permanent for a given metadata key, metadata sync
will get stuck there and require manual intervention
luo rixin [Tue, 29 Dec 2020 06:39:21 +0000 (14:39 +0800)]
rgw/rgw_file: Fix the return value of read() and readlink()
Fixes: https://tracker.ceph.com/issues/49189 Signed-off-by: Dai zhiwei <daizhiwei3@huawei.com> Signed-off-by: luo rixin <luorixin@huawei.com>
(cherry picked from commit bfd83e8fa142873a0bdf09a4d1ad1b04127f5885)
test/rgw: fix use of poll() with timers in unittest_rgw_dmclock_scheduler
the AsyncScheduler uses an asio timer to dispatch work to its executor
with an optional delay. when no delay is requested, it waits on the
timer with an expiration time in the past (crimson::dmclock::TimeZero)
tests are failing here because poll() is returning without executing the
handlers of those expired timers
asio implements these timers with timerfd and epoll. debugging with
strace, i see that these timers armed with timerfd_settime() are not
always immediately ready according to epoll_wait():
rgw: read_obj_policy() consults iam_user_policies on ENOENT
when the head object doesn't exist, read_obj_policy() has to decide
whether to return ENOENT or EACCES
when there's a bucket policy, we check whether it has s3ListBucket
permissions. when there's an assumed role, we also need to check
against the role's policies in s->iam_user_policies
J. Eric Ivancich [Tue, 15 Jun 2021 19:20:33 +0000 (15:20 -0400)]
rgw: when deleted obj removed in versioned bucket, extra del-marker added
After initial checks are complete, this will read the OLH earlier than
previously to check the delete-marker flag and under the bug's
conditions will return -ENOENT rather than create a spurious delete
marker.
J. Eric Ivancich [Wed, 14 Apr 2021 17:55:22 +0000 (13:55 -0400)]
rgw: during reshard lock contention, adjust logging
When RGW fails to get a lock on a reshard log, we log it in such a way
that it looks like an error. Instead we'll make sure that the log
message is informational.
Adam C. Emerson [Fri, 16 Jul 2021 15:20:39 +0000 (11:20 -0400)]
rgw: radosgw-admin errors if marker not specified on data/mdlog trim
Check that a marker was specified and trim if we don't have one.
Also: In a world where we're parsing for generation, it doesn't really
make sense to have a 'no marker specified' as separate from a marker
that is just an empty string.
Also: Successful datalog trim returns zero, not
-ENODATA, and radosgw-admin should expect this.
Fixes: https://tracker.ceph.com/issues/51712 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 4cb6fcf7f4e9548a8a0dd017b49a6ea23bedffd6)
Conflicts:
src/rgw/rgw_admin.cc
Cherry-pick notes:
- static_cast for RadosStore not needed in Pacific
J. Eric Ivancich [Fri, 30 Apr 2021 20:07:54 +0000 (16:07 -0400)]
rgw: fix bucket object listing when marker matches prefix
When an iniitial marker that ends with a delimiter is provided, it
prevents listing of that "subdirectory" due to new logic at the cls
level to make listing more efficient. The fix catches that situation.