This commit caused a regression in the rados suite, as evidenced by:
- with the commit:
http://pulpito.front.sepia.ceph.com/lflores-2022-09-14_15:11:39-rados-quincy-release-distro-default-smithi/
- with the commit reverted:
http://pulpito.front.sepia.ceph.com/lflores-2022-09-14_17:02:02-rados-wip-lflores-testing-quincy-release-distro-default-smithi/
Fixes: https://tracker.ceph.com/issues/57546 Signed-off-by: Laura Flores <lflores@redhat.com>
This commit caused a regression in the rados suite, as evidenced by:
- with the commit:
http://pulpito.front.sepia.ceph.com/lflores-2022-09-14_15:11:39-rados-quincy-release-distro-default-smithi/
- with the commit reverted:
http://pulpito.front.sepia.ceph.com/lflores-2022-09-14_17:02:02-rados-wip-lflores-testing-quincy-release-distro-default-smithi/
Fixes: https://tracker.ceph.com/issues/57546 Signed-off-by: Laura Flores <lflores@redhat.com>
Adam King [Thu, 1 Sep 2022 12:37:39 +0000 (08:37 -0400)]
mgr/cephadm: don't use "sudo" in commands if user is root
We had a patch earlier to make us not use sudo unless the
user is not root for our other commands, but this specific
one that just runs "true" with a timeout to check if the host
is online was missed.
Adam King [Thu, 1 Sep 2022 17:08:13 +0000 (13:08 -0400)]
mgr/cephadm: fix tuned profiles getting removed if name has dashes
Previously, due to the way it was just splitting the file name on
dashes to try and get the profile name, any profile name with dashes
was not getting properly matched and would therefore get marked
stray and removed. This new strategy instead tries to match the actual
expected file name.
Adam King [Tue, 16 Aug 2022 14:43:39 +0000 (10:43 -0400)]
mgr/cephadm: make setting --cgroups=split configurable
Previously, we were just always setting this as long
as users were using podman with a high enough version,
but it seems a user has run into an issue where their
daemons are failing to deploy with
Error: could not find cgroup mount in "/proc/self/cgroup"
Zac Dover [Fri, 12 Aug 2022 21:53:21 +0000 (07:53 +1000)]
doc/rados: add prompts to pools.rst
This commit adds ".. prompt:: bash $"-style prompts to pools.rst.
This brings this file up to the standard established in 2020 when
Kefu added support for the ".. prompt::" directive.
This commit is a part of an initiative to modernize the presentation
of all BASH commands in the RADOS documentation.
The progress of this project can be tracked here:
https://tracker.ceph.com/issues/57108
Nizamudeen A [Wed, 24 Aug 2022 07:47:50 +0000 (13:17 +0530)]
mgr/dashboard: fix unable to create ingress unmanaged
the following snipped is the error from backend
```
File "/lib/python3.6/site-packages/ceph/deployment/service_spec.py", line 698, in _from_json_impl
_cls.validate()
File "/lib/python3.6/site-packages/ceph/deployment/service_spec.py", line 1058, in validate
'Cannot add ingress: No frontend_port specified')
ceph.deployment.hostspec.SpecValidationError: Cannot add ingress: No frontend_port specified
```
It looks like even if we set unmanaged flag, we need to input the
backend_service, frontend_port, monitor_port and virtual_ip, because there is a
validation going for that in the backend.
test/{librbd, rgw}: increase delay between and number of bind attempts
Commit aa7885f7cc41 ("test/{librbd, rgw}: retry when bind fail with
port 0") reduced the frequency of sporadic unit test failures caused
by EADDRINUSE a lot, but not entirely.
Currently, it yields a cumulative sleep of ~9 seconds. Let's increase
that to 1 minute.
Lucian Petrut [Fri, 26 Aug 2022 12:54:10 +0000 (12:54 +0000)]
include: fix IS_ERR on Windows
The "long" type uses 32b on x64 Windows platforms, which means
it's not large enough to store a pointer. intptr_t or uintptr_t
should be used instead.
This change fixes include/err.h, using the right types. There was
a previous patch on this topic but unfortunately it didn't address
all the type casts.
This issue was brought up by the unittest_crush test, which recently
started to fail as the CrushWrapper methods use IS_ERR.
Kefu Chai [Fri, 5 Aug 2022 00:17:45 +0000 (08:17 +0800)]
dokan: cast variable to the expected type before comparison
to fix the FTBFS due to following warning:
```
/home/jenkins-build/build/workspace/ceph-windows-pull-requests/ceph/build.deps/src/dokany/dokan/dokan.h:723:22: error: narrowing conversion of '-1' from 'int' to 'long unsigned int' [-Wnarrowing]
723 | #define DOKAN_ERROR -1
| ^
```
also, clean up the following warning:
```
/home/jenkins-build/build/workspace/ceph-windows-pull-requests/ceph/src/dokan/dbg.cc:142:62: warning: NULL used in arithmetic [-Wpointer-arith]
142 | o << "\n\tIsDirectory: " << (DokanFileInfo->IsDirectory != NULL);
|
```
test/encoding: verify that e.what() starts with expected str
boost changes the way how it prints boost::system::system_error in
boost 1.79 -- it appends the stringified error_category at end of
exception::what(), and our buffer::malformed_input is a subclass
of boost::system::system_error.
so we cannot just compare the return value of what() with the
expected string, to be more future proof, let's check if i
starts with the expected string instead.
to avoid the conflicting declaration of NTSTATUS from bcrypt.h and our
own typedef. as after switching to boost 1.79, we would have following compiling
failure:
In file included from ../src/dokan/options.cc:14:
../src/dokan/ceph_dokan.h:16:15: error: conflicting declaration 'typedef DWORD NTSTATUS'
16 | typedef DWORD NTSTATUS;
| ^~~~~~~~
In file included from ../build.deps/mingw/boost/include/boost/asio/impl/connect_pipe.ipp:29,
from ../build.deps/mingw/boost/include/boost/asio/connect_pipe.hpp:79,
from ../build.deps/mingw/boost/include/boost/asio.hpp:64,
from ../src/include/win32/winsock_wrapper.h:20,
from <command-line>:
/usr/share/mingw-w64/include/bcrypt.h:27:16: note: previous declaration as 'typedef LONG NTSTATUS'
27 | typedef LONG NTSTATUS,*PNTSTATUS;
| ^~~~~~~~
With auto-deletion of trashed snapshots, it is relatively easy to lose
a race to "rbd flatten" as follows:
- when V2_GET_PARENT runs, the image is technically still a clone
- when V2_REFRESH_PARENT runs, the image is fully flattened and the
snapshot in the parent image is deleted
This results in a spurious ENOENT error, mainly when trying to open the
image (e.g. for "rbd info"). This race condition has always been there
but auto-deletion of trashed snapshots makes it much worse.
Retry ENOENT in V2_REFRESH_PARENT the same way as in V2_GET_SNAPSHOTS.
librbd: fix a bunch of issues with restarting RefreshRequest
Make RefreshRequest properly restartable, at least up until and including
V2_REFRESH_PARENT step:
- clear m_migration_spec when skipping GET_MIGRATION_HEADER
- don't rely on potentially stale m_incomplete_update on retry
- reset m_legacy_parent when retrying more than just V2_GET_PARENT
- don't rely on potentially stale m_parent_md.overlap and
m_head_parent_overlap on retry
- clear m_metadata before fetching image metadata (but not before
fetching pool metadata)
- clear m_op_features when skipping V2_GET_OP_FEATURES
- clear m_group_spec on EOPNOTSUPP error in V2_GET_GROUP
- reset m_legacy_snapshot when retrying more than just V2_GET_SNAPSHOTS
- don't rely on potentially stale m_snap_parents on retry
Soumya Koduri [Thu, 26 May 2022 16:55:06 +0000 (22:25 +0530)]
rgw: Avoid dereferencing nullptr while configuring bucket sync policy
While configuring bucket sync policy, in "rgw_sync_bucket_entities::set_bucket()",
there could be a case where in bucket doesnt contain any value but is still being
dereferenced. This commit fixes the same.
ceph-volume: add a retry in util.disk.remove_partition
This fixes a possible race condition when zapping a device.
Due to some udev events, that race condition makes the key
`ID_PART_ENTRY_NUMBER` show up too late.
The idea here is to retry multiple times before actually failing.
test/{librbd, rgw}: retry when bind fail with port 0
there is chance that the bind() call may fail if we have another test
happen to pick the free port picked by operating system. in this case,
we just retry up to 42 times.
in theory, this change does not fully address the racing, but it should
help to alleviate this issue.
John Mulligan [Fri, 19 Aug 2022 15:39:44 +0000 (11:39 -0400)]
pybind/mgr: improve behavior of ErrorResponse.wrap method
The wrap classmethod is intended to turn any exception into something
that can produce a manager response. All exceptions inheriting from
ErrorResponseBase can do that, so instead of losing all that state,
have wrap return those exceptions inheriting from ErrorResponseBase
as-is.
Additionally, only change the sign of errno values that aren't already
negative.
John Mulligan [Mon, 15 Aug 2022 18:47:47 +0000 (14:47 -0400)]
pybind/mgr: fix incorrect construction of mgr CLI arguments
Instead of adding the extra_args correctly the code was "adding" the
"--format" option with the wrong name - using whatever the last argument
name in the actual arguments of the decorated function.
Updated tests to verify that the format option is added to the "ceph
args" string properly.