Zac Dover [Sun, 12 Mar 2023 01:17:03 +0000 (11:17 +1000)]
doc/rados: edit operations/bs-migration (2 of x)
Disambiguate and improve the English language in
doc/rados/operations/bluestore-migration.rst up to but not including the
section called "Whole Host Replacement".
Zac Dover [Sun, 12 Mar 2023 01:17:03 +0000 (11:17 +1000)]
doc/rados: edit operations/bs-migration (1 of x)
Disambiguate and improve the English language in
doc/rados/operations/bluestore-migration.rst up to but not including the
section called "Whole Host Replacement".
Adam Kupczyk [Fri, 10 Mar 2023 07:53:27 +0000 (08:53 +0100)]
os/bluestore: BlueFS: harmonize log read and writes modes
BlueFS log has always been written in non-buffered mode.
Reading of it depends on bluefs_buffered_io option.
It is strongly suspected that this causes some wierd problems.
Adam King [Sun, 15 Jan 2023 22:18:47 +0000 (17:18 -0500)]
mgr/cephadm: fix haproxy nfs backend server ip gathering
Fixes: https://tracker.ceph.com/issues/58465
Previously, if there were 2 nfs daemons of the same
rank, we could not check the rank generation, which
is intended to mark which one is the "real" on of that
rank in cases where we cannot remove the other one due
to its host being offline. The nfs of a given rank with
the highest rank_generation is the one we want haproxy
to use for its backend IP. Since we didn't actually
check this, it was random, depending on what order we
happened to iterate over the nfs daemons of the same
rank, which IP we actually got. If the nfs with the
lower rank_generation on an offline host happened
to come later in the iterations, we'd use that one
for the IP, which is incorrect.
Adam King [Sun, 15 Jan 2023 21:30:53 +0000 (16:30 -0500)]
mgr/cephadm: don't attempt daemon actions for daemons on offline hosts
They'll just fail anyway, and it will waste time waiting
for the connection to timeout. We have other places in
the serve loop that will check if the host is back
online.
Nizamudeen A [Thu, 9 Mar 2023 11:51:44 +0000 (17:21 +0530)]
mgr/dashboard: custom image for kcli bootstrap script
the stable branches like quincy pulls from the quay.io/ceph/ceph:v17 to
bootstrap the ceph cluster in test environments. This will cause issues
because the branches are changing constantly but the image is not. So
using the quay.ceph.io repo to bring the cluster in test environment.
Matan Breizman [Thu, 15 Dec 2022 17:05:15 +0000 (17:05 +0000)]
mon/OSDMonitor: Skip check_pg_num on pool size decrease
When changing the pool size we use check_pg_num to not exceed
`mon_max_pg_per_osd` value. This check should only be applied
when increasing the size to avoid underflows.
(Same already applied when changing pg_num)
Matan Breizman [Mon, 19 Dec 2022 09:58:06 +0000 (09:58 +0000)]
mon/OSDMointor: Simplify check_pg_num()
* See: https://tracker.ceph.com/issues/47062.
Originally check_pg_num did not take into account the root
osds by the crash rule.
This behavior resulted in an inaccurate pg num per osd count.
* Avoid summing all of the projecetd pg num and only later
on subtracting the pg num if the pool did exist.
* With this change, we only count the projected pg num which
are part the pools affected by the crush rule.
Same for osd number, instead of dividing the projected
pg number by all of the osdmap osds, divide only by
the osds used by the crush rule.
* Avoid differentiating between whether the mapping epoch
is later than the osdmap epoch or not. Always check the pg
num according to crush rule.
Anthony D'Atri [Thu, 1 Dec 2022 19:04:30 +0000 (14:04 -0500)]
src/mon: clarify pool creation failure due to max_pgs_per_osd error message
Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>
Note: This commit is cherry-picked as a dependency
for later commits in this backport.
(cherry picked from commit 88e8eeca7571fc314bc30a52cd17218fa9fac500)
Tongliang Deng [Fri, 31 Dec 2021 06:02:25 +0000 (14:02 +0800)]
mon/OSDMonitor: fix integer underflow of check_pg_num
Underflow of the `uint64_t projected` variable occurs when
the sum of current acting pg num and new pg num we specified
is less than the pg num calculated from pg info.
Signed-off-by: Tongliang Deng <dengtongliang@gmail.com>
Note: This commit is cherry-picked as a dependency
for later commits in this backport.
(cherry picked from commit bd9813f5e1a3addca1a57360d58b50b120e0e5f3)
jerryluo [Mon, 25 Jan 2021 16:10:57 +0000 (00:10 +0800)]
mon/OSDMonitor: Make the pg_num check more accurate
In check_pg_num function, finding the corresponding osd according to the
current pool's crush rule, and calculating whether the average value of
pg_num on these osd will exceed the value of 'mon_max_pg_per_osd'. Make
the pg_num check more accurate by counting all the pgs on the osd used
by the new pool.
Fixes: https://tracker.ceph.com/issues/47062 Signed-off-by: Jerry Luo <luojierui@chinatelecom.cn>
Note: This commit has been reverted and is cherry-picked as
dependency for other commits in this backport.
(cherry picked from commit c726ce9e5088b30d29e0db5c0ecc8c03fe41da1d)
Adam King [Thu, 16 Feb 2023 17:34:06 +0000 (12:34 -0500)]
qa/distros: pass --allowerasing --nobest when installing container-tools
One of the tests in the orch suite is running distro install
commands from multiple distros, causing it to first install
container-tools 3.0 and then later install container-tools,
which fails, causing the test to fail. This is sort of a bandaid
fix to getthe test to work. It will cause whatever the last
version of the package to be installed to end up being installed
(and will do so without error) which is what we want in the tests.
Adam King [Sun, 12 Feb 2023 20:28:10 +0000 (15:28 -0500)]
cephadm: set pids-limit unlimited for all ceph daemons
We actually had this setup before, but ran into issues.
Some teuthology test had failed in the fs suite, so it was
modified to only affect iscsi and rgw daemons (https://github.com/ceph/ceph/pull/45798)
and then the changes were reverted entirely (so no pids-limit
modifying code at all) in quincy and pacific because
the LRC ran into issues with the change related to the podman
version (https://github.com/ceph/ceph/pull/45932). This new patch
now addresses the podman versions, specifically that the patch
that makes -1 work for a pids-limit seems to have landed in
podman 3.4.1 based on https://github.com/containers/podman/pull/12040.
We'll need to make sure that this doesn't break anything in the
fs suites again as I don't remember the details of the first
issue, or why having it only set the pids-limit for iscsi and rgw fixes it.
Assuming that isn't a problem we should hopefully be able to unify
at least how reef and quincy handle this now that the podman version
issue is being addressed in this patch.
See the linked tracker issue for a discussion on why we're going at
this again and why I'm trying to do this for all ceph daemon types.
Teoman ONAY [Thu, 11 Nov 2021 15:05:49 +0000 (15:05 +0000)]
cephadm: remove containers pids-limit
The default pids-limit (docker 4096/podman 2048) prevent some
customization from working (http threads on RGW) or limits the number
of luns per iscsi target.
Redouane Kachach [Wed, 25 Jan 2023 09:14:59 +0000 (10:14 +0100)]
cephadm: using short hostname to create the initial mon and mgr Fixes: https://tracker.ceph.com/issues/58466 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 0b807eefb8dbccf1e25c846f8177ddb74c6f333d)
This patch introduces a per osd crush_device_class definition in the
DriveGroup spec. The Device object is extended to support a
crush_device_class parameter which is processed by ceph-volume when
drives are prepared in batch mode. According to the per osd defined
crush device classes, drives are collected and grouped in a dict that is
used to produce a set of ceph-volume commands that eventually apply (if
defined) the right device class. The test_drive_group unit tests are
also extended to make sure we're not breaking compatibility with the
default definition and the new syntax is validated, raising an exception
if it's violated.
Fixes: https://tracker.ceph.com/issues/58184 Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit 6c6cb2f5130dbcf8e42cf03666173948411fc92b)
Zac Dover [Sat, 21 Jan 2023 16:32:59 +0000 (02:32 +1000)]
doc/install: refine index.rst
Refine English sentences in doc/install/index.rst. Remove adverbial
phrases of time that refer to Nautilus-era features as "new", since that
was four years ago.
Zac Dover [Wed, 8 Mar 2023 01:52:12 +0000 (11:52 +1000)]
doc/install: update index.rst
Update index.rst by making minor grammar improvements. This file was
long overdue for a backport to Reef, Quincy, and Pacific, so this commit
was a good way to pass a human eyeball over the text before making those
backports.
Mykola Golub [Tue, 28 Feb 2023 17:27:39 +0000 (19:27 +0200)]
mgr/cephadm: try to avoid pull when getting container image info
only if use_repo_digest is not set.
The commit ac88200 introduced this possibility to skip pulling, but
doing this unconditionally broke a use case when one was able to have
a ceph image on a floating tag, and was able to upgrade to a new image
pushed to that tag. As using a floating tag is possible only when
use_repo_digest is enabled (the default), now skipping the pull
if use_repo_digest is disabled will not break it anymore.
Prashant D [Tue, 30 Aug 2022 07:29:24 +0000 (03:29 -0400)]
mon/LogMonitor: Fix log last
The ceph log last command outputs all the cluster
logs generated from logm entries at DBG level,
irrespective of their log level. We must output
cluster logs generated from logm according
to the log level specified in the log last command.
Fixes: https://tracker.ceph.com/issues/57340 Signed-off-by: Prashant D <pdhange@redhat.com>
(cherry picked from commit 32e40328fbdece9f6c573c11305ee525823e53c6)
Conflicts:
src/pybind/mgr/dashboard/frontend/package-lock.json
- Generate a new one
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-config-modal/rgw-config-modal.component.html
- Accept the current changes
Ilya Dryomov [Thu, 16 Feb 2023 11:53:02 +0000 (12:53 +0100)]
qa/workunits/rbd-nbd: work around "rbd feature disable" hang
"rbd feature disable" appears to reliably hang if the corresponding
remote request is proxied to rbd-nbd (because rbd-nbd happens to own
the exclusive lock after a series of blkdiscard calls) [1]. Work
around it here by enabling journaling before the image is mapped
and disabling it after the image is unmapped.
Also, don't assert on the output of "rbd journal inspect --verbose"
having a certain number of entries. This is racy: if the script gets
delayed after the last blkdiscard call for some reason, there may be
fewer entries present in the journal or none at all.
Ilya Dryomov [Thu, 16 Feb 2023 11:51:04 +0000 (12:51 +0100)]
test/librbd: add LengthModifiedDiscardJournalAppendEnabled test
Currently nothing triggers the length_modified case in
ImageDiscardRequest::prune_object_extents() in isolation. It's only
triggered in DiscardGranularityJournalAppendEnabled test together with
the prune_required case and a bad refactoring could easily break the
length_modified logic again.
Josef Johansson [Mon, 2 Jan 2023 13:12:53 +0000 (14:12 +0100)]
librbd: Fix local rbd mirror journals growing forever
This commit fixes commit 7ca1bab90f3 by pushing properly aligned
discards back to m_image_extents, if corrected.
If discards are misaligned (off 0, len 4608, gran=4096), they are
corrected properly, but only in object_extents and not in
m_image_extents.
When journal_append_event is triggered it will only append from
m_image_extents and does not now about the alignment fixes. In
commit_io_events_extent it will log a message and return without
completing the io since the larger misaligned area was sent to the journal.
This will in turn break rbd journal mirroring since the local client will wait
indefinately on the commit to be completed, which it never does.
This does not effect rbd-mirror in any way, which may be confusing and
dangerous since it's only rbd-mirror that updates ceph health, and not
the local client.
Setting `rbd_skip_partial_discard = false` under client will restore the
pre 7ca1bab behaviour and thus not trigger the bug with journals growing.
This will set `rbd_discard_granularity_bytes = 0` internally. This
setting is only changed during startup of a client.
Fixes: 7ca1bab90f3db3aaaa4cdbfc1f18e9f5cfbf5568 Fixes: https://tracker.ceph.com/issues/57396 Signed-off-by: Josef Johansson <josef@oderland.se>
(cherry picked from commit 21a26a752843295ff946d1543c2f5f9fac764593)
Conflicts:
src/librbd/io/ImageRequest.cc [ commit b2c88820923e ("librbd:
return area from extents_to_file()") not in quincy ]
src/test/librbd/io/test_mock_ImageRequest.cc [ commit b9a2384cdc43 ("librbd: propagate area down to
file_to_extents()") not in quincy ]
drive_group: fix limit filter in drive_selection.selector
When multiple osd service specs with 'limit' filter are applied,
the current logic makes the second service speec
try to pick devices that are already used by the first service spec.
Adam King [Mon, 9 Jan 2023 19:50:12 +0000 (14:50 -0500)]
mgr/cephadm: fix extra container/entrypoint args with spaces
Fixes: https://tracker.ceph.com/issues/57338
Prior, doing extra container args like
- "--cpus"
- "2"
would work fine as the two args would be passed separately and
eventually placed in the final podman/docker run command
with a space between them. However, trying to do something like
- "--cpus 2"
instead would fail, as it would be translated to
--extra-container-args=--cpus 2
causing "2" to be considered its own arg, which cephadm
wouldn't know how to handle. Another way this can cause problems
is listed in the linked tracker. Either way, leaving the spaces
in the args was causing problems, and the simplest way to handle
it seems to be to just split on the original arg on the spaces
into multiple args