Adam King [Thu, 16 Feb 2023 17:34:06 +0000 (12:34 -0500)]
qa/distros: pass --allowerasing --nobest when installing container-tools
One of the tests in the orch suite is running distro install
commands from multiple distros, causing it to first install
container-tools 3.0 and then later install container-tools,
which fails, causing the test to fail. This is sort of a bandaid
fix to getthe test to work. It will cause whatever the last
version of the package to be installed to end up being installed
(and will do so without error) which is what we want in the tests.
Adam King [Sun, 12 Feb 2023 20:28:10 +0000 (15:28 -0500)]
cephadm: set pids-limit unlimited for all ceph daemons
We actually had this setup before, but ran into issues.
Some teuthology test had failed in the fs suite, so it was
modified to only affect iscsi and rgw daemons (https://github.com/ceph/ceph/pull/45798)
and then the changes were reverted entirely (so no pids-limit
modifying code at all) in quincy and pacific because
the LRC ran into issues with the change related to the podman
version (https://github.com/ceph/ceph/pull/45932). This new patch
now addresses the podman versions, specifically that the patch
that makes -1 work for a pids-limit seems to have landed in
podman 3.4.1 based on https://github.com/containers/podman/pull/12040.
We'll need to make sure that this doesn't break anything in the
fs suites again as I don't remember the details of the first
issue, or why having it only set the pids-limit for iscsi and rgw fixes it.
Assuming that isn't a problem we should hopefully be able to unify
at least how reef and quincy handle this now that the podman version
issue is being addressed in this patch.
See the linked tracker issue for a discussion on why we're going at
this again and why I'm trying to do this for all ceph daemon types.
Teoman ONAY [Thu, 11 Nov 2021 15:05:49 +0000 (15:05 +0000)]
cephadm: remove containers pids-limit
The default pids-limit (docker 4096/podman 2048) prevent some
customization from working (http threads on RGW) or limits the number
of luns per iscsi target.
Zac Dover [Sat, 21 Jan 2023 16:32:59 +0000 (02:32 +1000)]
doc/install: refine index.rst
Refine English sentences in doc/install/index.rst. Remove adverbial
phrases of time that refer to Nautilus-era features as "new", since that
was four years ago.
Zac Dover [Wed, 8 Mar 2023 01:52:12 +0000 (11:52 +1000)]
doc/install: update index.rst
Update index.rst by making minor grammar improvements. This file was
long overdue for a backport to Reef, Quincy, and Pacific, so this commit
was a good way to pass a human eyeball over the text before making those
backports.
Yixin Jin [Wed, 15 Feb 2023 17:08:19 +0000 (17:08 +0000)]
rgw: Fix segfault due to concurrent socket use at timeout
This commit fixes a potential segfault risk when
rgw timeout handler works on the socket in one
thread while it is concurrently used by another.
The details of the fix are:
1. Instead of calling socket close(), which resets
descriptor_data in boost::asio socket and risks
segfault due to concurrent use of the socket,
the timeout handler now calls cancel() to abort
all pending ops followed by shutdown() to disable
the underlying transport. The eventual closure of
the socket will be done in the socket destructor.
2. Expose the actual boost::asio socket via get_socket()
from Connection so that the timeout handler can call
cancel() and shutdown() on it, although the socket data
member is already accessible. It allows future expansion
that wants to hide the socket even though it renders the
existing close() less useful.
Casey Bodley [Thu, 5 Jan 2023 16:30:03 +0000 (11:30 -0500)]
rgw/beast: fix interaction between keepalive and 100-continue
if we reject a request with a "Expect: 100-continue" header before
sending a "100 Continue" response, the keepalive logic should not try to
read/discard request body before parsing the next request headers
Zac Dover [Thu, 2 Mar 2023 18:04:30 +0000 (04:04 +1000)]
doc/radosgw: format admonitions
Break up the text of two similar admonitions into three paragraphs (in
each of the two instances). This makes the content of the admonition
much easier to read at a glance.
Adam King [Wed, 1 Mar 2023 21:10:41 +0000 (16:10 -0500)]
doc/cephadm: update cephadm compatability and stability page
This page is very out of date. This commit probably doesn't
cover everything there is to say about stability and compatability
in cephadm, but it at least gets it noticeably closer to reality
Zac Dover [Tue, 28 Feb 2023 02:55:08 +0000 (12:55 +1000)]
doc/radosgw: s/zone group/zonegroup/g et alia
s/zone group/zonegroup/ where simple greps failed to find instances of
"zone group" that were spread across two lines; break a paragraph into
two paragraphs so that each paragraph has a thematic idea of its own.
Zac Dover [Mon, 27 Feb 2023 08:40:14 +0000 (18:40 +1000)]
doc/rgw: remove "tertiary", link to procedure
Remove the term "tertiary zone" and replace it with "second secondary
zone" (because there is no such thing as a tertiary zone). Link to the
procedure for creating a secondary zone in a place where such a link is
helpful to the reader.
Rishabh Dave [Mon, 20 Feb 2023 07:51:25 +0000 (13:21 +0530)]
doc/cephfs: describe conf opt "client quota df" in quota doc
The ceph config file option (from the client section) "client quota df"
is mentioned in "CephFS Client Capabilities" document but not in the
"CephFS Quota" document. Adding information about this option to this
document too would make it easier for CephFS users to discover,
understand and use this option.
thomas [Fri, 24 Feb 2023 06:00:00 +0000 (01:00 -0500)]
doc/cephadm/host-management: add service spec link
The old "(below)" text is not accurate, the service spec definition is
not in the same file at this point in time. This commit adds a ref link
to the actual service specification section.
Zac Dover [Fri, 24 Feb 2023 01:07:12 +0000 (11:07 +1000)]
doc/glossary: add AWS/OpenStack bucket info
Add links to AWS's documentation of buckets, in accordance with Casey
Bodley's suggestions here:
https://github.com/ceph/ceph/pull/50221#discussion_r1115900879
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit d16ed970e9ac76d6320f0ae26fa3ddd745dbd429)
Casey Bodley [Tue, 9 Nov 2021 02:24:52 +0000 (21:24 -0500)]
cls/rgw: index cancelation still cleans up remove_objs
when multipart uploads complete their final bucket index transaction,
they pass the list of part objects in 'remove_objs' for bulk removal -
the part objects, along with their bucket stats, get replaced by the
head object
but if CompleteMultipart races with another upload, the head object
write will fail with ECANCELED and the bucket index transaction gets
canceled with CLS_RGW_OP_CANCEL. these canceled uploads still need to
clean up their 'remove_objs', but cancelation was returning too early.
as a result, these bucket index entries get orphaned and leave the
bucket stats inconsistent
this commit reworks rgw_bucket_complete_op() so that CLS_RGW_OP_CANCEL
is handled the same way as OP_ADD and OP_DEL, so always runs the loop to
clean up 'remove_objs'
Zac Dover [Wed, 22 Feb 2023 03:36:40 +0000 (13:36 +1000)]
doc/rgw: clarify multisite.rst top matter
Improve the pragmatics of the top matter of multisite.rst. Organize the
text into sections, where doing so makes the nature of multi-site
configurations clearer.
N Balachandran [Thu, 16 Feb 2023 04:57:02 +0000 (10:27 +0530)]
rbd-mirror: fix syncing_percent calculation logic in get_replay_status()
When a snapshot sync is resumed and the get_replay_status function
is called before handle_copy_image_progress, the syncing_percent
value may be greater than 100 as the m_local_object_count is still
set to zero. This commit sets the syncing_percent to 0 in such cases.
Fixes: https://tracker.ceph.com/issues/58706 Signed-off-by: N Balachandran <nibalach@redhat.com>
(cherry picked from commit c7ae0f6eb6a8fb08859454d1b2e81d2dc7b0226f)
Ilya Dryomov [Thu, 16 Feb 2023 11:53:02 +0000 (12:53 +0100)]
qa/workunits/rbd-nbd: work around "rbd feature disable" hang
"rbd feature disable" appears to reliably hang if the corresponding
remote request is proxied to rbd-nbd (because rbd-nbd happens to own
the exclusive lock after a series of blkdiscard calls) [1]. Work
around it here by enabling journaling before the image is mapped
and disabling it after the image is unmapped.
Also, don't assert on the output of "rbd journal inspect --verbose"
having a certain number of entries. This is racy: if the script gets
delayed after the last blkdiscard call for some reason, there may be
fewer entries present in the journal or none at all.