Zac Dover [Sun, 26 Mar 2023 15:03:58 +0000 (01:03 +1000)]
doc/rados: clean up ops/bluestore-migration.rst
Clean up internal links, fix the numbering of a procedure, and implement
Anthony D'Atri's suggestions in
https://github.com/ceph/ceph/pull/50487 and
https://github.com/ceph/ceph/pull/50488.
Zac Dover [Sun, 12 Mar 2023 01:17:03 +0000 (11:17 +1000)]
doc/rados: edit operations/bs-migration (2 of x)
Disambiguate and improve the English language in
doc/rados/operations/bluestore-migration.rst up to but not including the
section called "Whole Host Replacement".
Zac Dover [Sun, 12 Mar 2023 01:17:03 +0000 (11:17 +1000)]
doc/rados: edit operations/bs-migration (1 of x)
Disambiguate and improve the English language in
doc/rados/operations/bluestore-migration.rst up to but not including the
section called "Whole Host Replacement".
Adam King [Wed, 15 Feb 2023 22:07:09 +0000 (17:07 -0500)]
mgr/cephadm: be aware of host's shortname and FQDN
The idea is to gether the shortname and FQDN as part
of gather-facts, and then if we ever try to check if a certain
host is in our internal inventory by hostname, we can check
these other known names. This should avoid issues where
we think a hostname specified by FQDN is not in our
inventory because we know the host by the shortname
or vice versa.
Mykola Golub [Tue, 28 Feb 2023 17:27:39 +0000 (19:27 +0200)]
mgr/cephadm: try to avoid pull when getting container image info
only if use_repo_digest is not set.
The commit ac88200 introduced this possibility to skip pulling, but
doing this unconditionally broke a use case when one was able to have
a ceph image on a floating tag, and was able to upgrade to a new image
pushed to that tag. As using a floating tag is possible only when
use_repo_digest is enabled (the default), now skipping the pull
if use_repo_digest is disabled will not break it anymore.
Adam King [Thu, 16 Feb 2023 17:34:06 +0000 (12:34 -0500)]
qa/distros: pass --allowerasing --nobest when installing container-tools
One of the tests in the orch suite is running distro install
commands from multiple distros, causing it to first install
container-tools 3.0 and then later install container-tools,
which fails, causing the test to fail. This is sort of a bandaid
fix to getthe test to work. It will cause whatever the last
version of the package to be installed to end up being installed
(and will do so without error) which is what we want in the tests.
Adam King [Sun, 12 Feb 2023 20:28:10 +0000 (15:28 -0500)]
cephadm: set pids-limit unlimited for all ceph daemons
We actually had this setup before, but ran into issues.
Some teuthology test had failed in the fs suite, so it was
modified to only affect iscsi and rgw daemons (https://github.com/ceph/ceph/pull/45798)
and then the changes were reverted entirely (so no pids-limit
modifying code at all) in quincy and pacific because
the LRC ran into issues with the change related to the podman
version (https://github.com/ceph/ceph/pull/45932). This new patch
now addresses the podman versions, specifically that the patch
that makes -1 work for a pids-limit seems to have landed in
podman 3.4.1 based on https://github.com/containers/podman/pull/12040.
We'll need to make sure that this doesn't break anything in the
fs suites again as I don't remember the details of the first
issue, or why having it only set the pids-limit for iscsi and rgw fixes it.
Assuming that isn't a problem we should hopefully be able to unify
at least how reef and quincy handle this now that the podman version
issue is being addressed in this patch.
See the linked tracker issue for a discussion on why we're going at
this again and why I'm trying to do this for all ceph daemon types.
Teoman ONAY [Thu, 11 Nov 2021 15:05:49 +0000 (15:05 +0000)]
cephadm: remove containers pids-limit
The default pids-limit (docker 4096/podman 2048) prevent some
customization from working (http threads on RGW) or limits the number
of luns per iscsi target.
Adam King [Sun, 15 Jan 2023 22:18:47 +0000 (17:18 -0500)]
mgr/cephadm: fix haproxy nfs backend server ip gathering
Fixes: https://tracker.ceph.com/issues/58465
Previously, if there were 2 nfs daemons of the same
rank, we could not check the rank generation, which
is intended to mark which one is the "real" on of that
rank in cases where we cannot remove the other one due
to its host being offline. The nfs of a given rank with
the highest rank_generation is the one we want haproxy
to use for its backend IP. Since we didn't actually
check this, it was random, depending on what order we
happened to iterate over the nfs daemons of the same
rank, which IP we actually got. If the nfs with the
lower rank_generation on an offline host happened
to come later in the iterations, we'd use that one
for the IP, which is incorrect.
Adam King [Sun, 15 Jan 2023 21:30:53 +0000 (16:30 -0500)]
mgr/cephadm: don't attempt daemon actions for daemons on offline hosts
They'll just fail anyway, and it will waste time waiting
for the connection to timeout. We have other places in
the serve loop that will check if the host is back
online.
Xiubo Li [Wed, 19 Oct 2022 08:44:04 +0000 (16:44 +0800)]
ceph_fuse: make it to force invalidating dentries when kernel >=3.18
The remount will fail randomly for unknown reasons. And in certain
circumstance we can reprodce this very easy, which will block our
testing. Make it posible to force to old method to invalidate the
dcache when the "client_try_dentry_invalidate" option is enabled
even kernel version >= 3.18.0
Xiubo Li [Fri, 10 Mar 2023 05:46:27 +0000 (13:46 +0800)]
client: rename mds_max_retries_on_remount_failure to client_
mds_max_retries_on_remount_failure option is used by Client.cc only.
Fixed: https://tracker.ceph.com/issues/56532 Signed-off-by: Xiubo Li <xiubli@redhat.com>
(cherry picked from commit b9edab80f048fee09b82cdd4ec58fa37bd937ded)
Conflicts:
- The options are still old style in pacific
Zac Dover [Sat, 21 Jan 2023 16:32:59 +0000 (02:32 +1000)]
doc/install: refine index.rst
Refine English sentences in doc/install/index.rst. Remove adverbial
phrases of time that refer to Nautilus-era features as "new", since that
was four years ago.
Zac Dover [Wed, 8 Mar 2023 01:52:12 +0000 (11:52 +1000)]
doc/install: update index.rst
Update index.rst by making minor grammar improvements. This file was
long overdue for a backport to Reef, Quincy, and Pacific, so this commit
was a good way to pass a human eyeball over the text before making those
backports.
Yixin Jin [Wed, 15 Feb 2023 17:08:19 +0000 (17:08 +0000)]
rgw: Fix segfault due to concurrent socket use at timeout
This commit fixes a potential segfault risk when
rgw timeout handler works on the socket in one
thread while it is concurrently used by another.
The details of the fix are:
1. Instead of calling socket close(), which resets
descriptor_data in boost::asio socket and risks
segfault due to concurrent use of the socket,
the timeout handler now calls cancel() to abort
all pending ops followed by shutdown() to disable
the underlying transport. The eventual closure of
the socket will be done in the socket destructor.
2. Expose the actual boost::asio socket via get_socket()
from Connection so that the timeout handler can call
cancel() and shutdown() on it, although the socket data
member is already accessible. It allows future expansion
that wants to hide the socket even though it renders the
existing close() less useful.
Casey Bodley [Thu, 5 Jan 2023 16:30:03 +0000 (11:30 -0500)]
rgw/beast: fix interaction between keepalive and 100-continue
if we reject a request with a "Expect: 100-continue" header before
sending a "100 Continue" response, the keepalive logic should not try to
read/discard request body before parsing the next request headers
Zac Dover [Thu, 2 Mar 2023 18:04:30 +0000 (04:04 +1000)]
doc/radosgw: format admonitions
Break up the text of two similar admonitions into three paragraphs (in
each of the two instances). This makes the content of the admonition
much easier to read at a glance.