Kefu Chai [Wed, 17 Jan 2024 15:36:12 +0000 (23:36 +0800)]
debian/ceph-common.postinst: set user directory using adduser
now that adduser allows us to set its home directory, we can do
this using adduser instead of using usermod. this change also
silences the warning from lintian
"maintainer-script-lacks-home-in-adduser". lintian complains if
`adduser --system` is called without passing `--home` option.
also, take this opportunity to s/-c/--comment/ in the command line
of `usermod`, for better readability.
Kefu Chai [Wed, 17 Jan 2024 15:09:02 +0000 (23:09 +0800)]
debian/control: add adduser to Depends of cephadm and ceph-common
in `debian/ceph-common.postinst` and `debian/cephadm.postinst`, we
use `adduser --system` to create the system user when configuring
the corresponding package.
before this change, the dependency is not listed in the runtime
`Depends` section of ceph-common and cephadm.
in this change, the dependency is added. this is also suggested
by Securing Debian Manual, see
https://www.debian.org/doc/manuals/securing-debian-manual/bpp-lower-privs.en.html
fs suite relies on these debugfs entries to gather mount information
(client-id, addr/inst) which are required by some tests. In fs suite,
the disto kernel gets overridden by the testing kernel and therefore
even if Ubuntu 20.04 is chosen as the distro, the testing kernel is
installed. However, with smoke suite, the distro kernel is used and
the missing patches causes certain essential information gathering to
fail early on (client-id, etc..) causing the test to not even start
execution. PR #54515 fixes a bug in the client-id fetching path but
isn't complete due to the missing patches - details here:
https://tracker.ceph.com/issues/63488#note-8
But its essential to have the smoke tests running since those tests
have lately uncovered bugs in the MDS (w/ distro kernels). In order
to benefit from those tests, this change ignores failures when
gathering mount information (which aren't used by the fs relevant
smoke tests). The test (in fs suite) that rely on this piece of
information would fail when run with 20.04 distro kernel (but the
fs suite overrides it with the testing kernel).
Venky Shankar [Mon, 27 Nov 2023 05:12:02 +0000 (10:42 +0530)]
qa: add centos_latest (9.stream) and ubuntu_20.04 yamls to supported-all-distro
A bug in Ceph MDS (MDS crash!) is seen with distos using a not-so-recent kernel
(5.4ish). This crash was first seen in quincy smoke run and the problematic
backport change was reverted. The smoke suite chooses a random distro for each
job, so to hit this bug, the appropriate distro needs to be (randomly) get chosen.
This change point the smoke suite to run against all supported distros.
This effects suites that point to supported-all-distro (powercycle) since it
bloats up the number of jobs. E.g., currently, without --subset, powercycle:osd
INFO:teuthology.suite.run:0/336 jobs were filtered out.
vs
(with this change)
Unable to schedule 560 jobs, too many jobs, when maximum 500 jobs allowed.
For smoke suite
INFO:teuthology.suite.run:Scheduled 24 jobs in total.
vs
(with this change)
INFO:teuthology.suite.run:Scheduled 120 jobs in total.
Eventually, with PR #46882, then testing kernel will no longer override the
distro kernel in fs suite, so we should get good coverage then.
MClientRequest: handle owner_uid and owner_gid from ceph_mds_request_head_legacy
When a client is too old and uses struct ceph_mds_request_head_legacy we must
fill new owner_uid and owner_gid fields from an old client_uid and client_gid.
Fixes: https://github.com/ceph/ceph/pull/52575 Fixes: https://tracker.ceph.com/issues/63288 Fixes: commit 46cb244b9c839 ("ceph_fs.h: add separate owner_{u,g}id fields") Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
(cherry picked from commit a70a70f589214d6e2a5b477a61005b13ba2fec46)
(cherry picked from commit 65257baa62eddac0cc3df9d2ca3a57e7fd2b25e2)
MClientRequest: handle ext_num_retry and ext_num_fwd from ceph_mds_request_head_legacy
When a client is too old and uses struct ceph_mds_request_head_legacy we must
fill new ext_num_retry and ext_num_fwd fields from an old num_retry and num_fwd.
Fixes: https://github.com/ceph/ceph/pull/45669 Fixes: https://tracker.ceph.com/issues/63288 Fixes: commit cbd7e3040208 ("ceph_fs.h: add 32 bits extended num_retry and num_fwd support") Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
(cherry picked from commit 43f32a46aa9095b19525357ba7ca215e842b4f77)
(cherry picked from commit 312bb5b9f1ada9646205a78f0a0fcc73d2530d5c)
'ceph-volume raw list' is broken for a specific use case (rook).
rook copies devices from /dev/ to /mnt for specific/internal needs.
when ceph-volume raw list is passed a device from /mnt then
ceph-volume ignores it and return an empty dict.
That prevent rook from creating OSDs properly.
Zac Dover [Tue, 14 Nov 2023 13:40:42 +0000 (23:40 +1000)]
doc/glossary: add "Quorum" to glossary
Add the term "Quorum" to the glossary and link to the part of
architecture.rst concerning Monitors. The sticky header at the top of
the docs.ceph.com website gets in the way of the location linked to in
this commit, but fatigue and disgust prevent me from spending time today
trial-and-erroring my way through the hostile and ill-documented
wilderness of scroll-margin so that the link goes where it should.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit c2f6a770bf0e12296c334d99ac86ff4732ec29b7)
Zac Dover [Mon, 13 Nov 2023 10:57:07 +0000 (20:57 +1000)]
doc/rados: format "initial troubleshooting"
Format the steps in the "Initial Troubleshooting" section of
doc/rados/troubleshooting/troubleshooting-mon.rst. A near-future PR (not
this one) will add context to this section and explain that the steps
described here are the first steps that you should undertake when you
determine that you have an unresponsive or down Monitor. This PR is
merely for formatting.
Zac Dover [Sun, 12 Nov 2023 10:21:41 +0000 (20:21 +1000)]
doc/config: edit "ceph-conf.rst"
Edit the first section of doc/rados/configuration/ceph-conf.rst.
Initially I just wanted to change "series" to "set", but once I got my
hands dirty I ended up simplifying some sentences.
Zac Dover [Sun, 12 Nov 2023 10:52:09 +0000 (20:52 +1000)]
doc/rados: parallelize t-mon headings
Give parallel structure to the questions in the Q&A section of the "The
Cluster Has Quorum But At Least One Monitor Is Down" subsection of the
"Most Common Monitor Issues" section of
doc/rados/troubleshooting/troubleshooting-mon.rst.
Aashish Sharma [Tue, 7 Nov 2023 13:27:24 +0000 (18:57 +0530)]
mgr/dashboard: fix rgw multi-site import form helper
Before : To obtain the token, generate it from your primary Ceph cluster. This token includes encoded information about the secondary cluster's endpoint, access key, and secret key.
Fix: To obtain the token, generate it from your primary Ceph cluster. This token includes encoded information about the primary cluster's endpoint, access key, and secret key.
Prashant D [Wed, 18 Oct 2023 20:07:47 +0000 (16:07 -0400)]
qa/smoke,orch,perf-basic: add POOL_APP_NOT_ENABLED to ignorelist
Some of the smoke, orch and perf-basic tests are failing due
to POOL_APP_NOT_ENABLED health check failure. Add
POOL_APP_NOT_ENABLED to ignorelist for these tests.
Casey Bodley [Tue, 24 Oct 2023 20:48:06 +0000 (16:48 -0400)]
rgw: fetch_remote_obj() uses uncompressed size for encrypted objects
use the original size from RGW_ATTR_COMPRESSION as the accounted size in
the bucket index for objects that were transferred in their
encrypted/compressed form
Ville Ojamo [Fri, 3 Nov 2023 05:44:00 +0000 (12:44 +0700)]
doc/cephadm/services: remove excess rendered indentation in osd.rst
Start bash command blocks at the left margin, removing
excessive padding/indentation that would render the
block too much towards the right.
At the same time ident the source consistently:
- Two spaces for command blocks and output blocks.
- Four spaces for notes, code blocks.
There seems to be no uniform style for this, sometimes
commands are indented with three spaces but it would
seem two spaces is common. In the end it all renders
the same I guess.
Ramana Raja [Mon, 18 Sep 2023 02:52:56 +0000 (22:52 -0400)]
qa/suites/rbd: add test to check rbd_support module recovery
... on repeated blocklisting of its client.
There were issues with rbd_support module not being able to recover
from its RADOS client being repeatedly blocklisted. This occured for
example in clusters with OSDs slow to process RBD requests while the
module's mirror_snapshot_scheduler was taking mirror snapshots by
requesting exclusive locks on the RBD images and workloads were running
on the snapshotted images via kernel clients.
There is no need for CreateSnapshotRequests.__del__() that calls
CreateSnapshotRequests.wait_for_pending().
MirrorSnapshotScheduleHandler.shutdown() already calls
CreateSnapshotRequests.wait_for_pending().
Ramana Raja [Thu, 26 Oct 2023 17:18:52 +0000 (13:18 -0400)]
mgr/rbd_support: fix recursive locking on CreateSnapshotRequests lock
The MirrorSnapshotScheduleHandler's run thread issues asynchronous
create snapshot requests using a CreateSnapshotRequests instance. When
the thread invokes a CreateSnapshotRequests instance's get_ioctx(),
the instance's class variable lock is acquired. With the class
variable lock held, the garbage collection of a CreateSnapshotRequests
instance may race in the thread. The thread would then call
CreateSnapshotRequests __del__() that tries to acquire the class
variable lock that the thread already holds. Fix this
recursive deadlock by converting the CreateSnapshotRequests lock from
a class variable to an instance variable. There is no need to share
the lock across CreateSnapshotRequests instances.
Also convert MirrorSnapshotScheduleHandler, PerfHandler and
TrashPurgeScheduleHandler class variables to instance variables
that don't need to be shared across the instances.
Fixes: https://tracker.ceph.com/issues/62994 Signed-off-by: Ramana Raja <rraja@redhat.com> Co-Authored-By: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 4452bc22d1c6c8499cf55d6e39090adf7ae1dcbf)
Zac Dover [Wed, 1 Nov 2023 01:53:59 +0000 (11:53 +1000)]
doc/cephadm: edit troubleshooting.rst (1 of x)
Edit doc/cephadm/troubleshooting.rst. This commit and the PR of which it
is a part was raised in response to
https://github.com/ceph/ceph/pull/53976. The limits of reStructuredText
are particularly visible here in every instance of a BASH for-loop and
in every instance of a command stretched over multiple lines.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 69472c26af5419faa9ed93c071ed5933d03fa67f)
Laura Flores [Thu, 28 Sep 2023 17:52:11 +0000 (17:52 +0000)]
osd: fix logic in check_pg_upmaps
The logic was changed in check_pg_upmaps
in a Reef refactor, which results in recommendations
made by the upmap balancer even when it says there are
no optimizations.