Zac Dover [Sun, 26 Mar 2023 15:03:58 +0000 (01:03 +1000)]
doc/rados: clean up ops/bluestore-migration.rst
Clean up internal links, fix the numbering of a procedure, and implement
Anthony D'Atri's suggestions in
https://github.com/ceph/ceph/pull/50487 and
https://github.com/ceph/ceph/pull/50488.
msg/async: don't abort when public addrs mismatch bind addrs
Before the 69b47c805fdd2ecd4f58547d58c9f019fc62d447 (PR #50153)
a mismatch (in number or types of stored `entity_addr_t`) between
public addrs and bind addrs vectors was ignored and the former
was taking over anything else -- it was possible to e.g. bind to
both v1 and v2 addresses but expose v2 only. Unfortunately, that's
exactly how Rook configures ceph-mon:
```
debug 2023-03-16T21:01:48.389+0000 7f99822bf8c0 0 starting mon.a rank 0 at public addrs v2:172.30.122.144:3300/0 at bind addrs [v2:10.129.2.21:3300/0,v1:10.129.2.21:6789/0] mon_data /var/lib/ceph/mon/ceph-a fsid acc14d1b-fb2b-4f01-8b61-6e7cb26e9200
```
Nizamudeen A [Wed, 22 Feb 2023 07:33:34 +0000 (13:03 +0530)]
mgr/dashboard: fix prometheus api error on landing page v3
When no prometheus is configured in the cluster, it gives out error
while polling to the prometheus endpoint. so a proper check needs to be
added there.
bryanmontalvan [Wed, 3 Aug 2022 01:39:05 +0000 (21:39 -0400)]
mgr/dashboard: dashboard-v3: status card
This commit is the bare-bones work of the status card. The only logic
written in this commit is the Cluster health status icon.
tracker: https://tracker.ceph.com/issues/58728 Signed-off-by: bryanmontalvan <bmontalv@redhat.com>
mgr/dashboard: introduce active alerts to status cards
This commit add the following
- A new dashboard component, which will exist in parallel with in the
current landing-page
- Created a route for this dashboard `/dashboard_3`
- Created a bare-bones bootstrap grid with mock-up card components
Signed-off-by: bryanmontalvan <bmontalv@redhat.com> Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>
mgr/dashboard: changes to first layout
CHANGES:
- Renamed dashboardcomponents
- Removed unnecesary styling
- Added unit tests
Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>
Moved router.url logic inside html template
This commit removes the `this.router.url` logic which was located in the
`workbench-layout.component.ts` file and moved it into the HTML template
section.
Signed-off-by: bryanmontalvan <bmontalv@redhat.com>
mgr/dashboard: syntax changes from bootstrap 4 to 5
Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>
mgr/dashboard: small fixes and improvements over all cards and layout
- all cards placed evenly with the same height
- increased font size on details card and adjusted margin
- changed capacity card legend to: "Used"
- adjusted cluster utilization card margins and increased graphs height
- improved status card toggle
- changed orchestror to orchestrator
- switched IPS/OPS graph colors
Zac Dover [Sun, 12 Mar 2023 01:17:03 +0000 (11:17 +1000)]
doc/rados: edit operations/bs-migration (2 of x)
Disambiguate and improve the English language in
doc/rados/operations/bluestore-migration.rst up to but not including the
section called "Whole Host Replacement".
Zac Dover [Sun, 12 Mar 2023 01:17:03 +0000 (11:17 +1000)]
doc/rados: edit operations/bs-migration (1 of x)
Disambiguate and improve the English language in
doc/rados/operations/bluestore-migration.rst up to but not including the
section called "Whole Host Replacement".
Adam Kupczyk [Fri, 10 Mar 2023 07:53:27 +0000 (08:53 +0100)]
os/bluestore: BlueFS: harmonize log read and writes modes
BlueFS log has always been written in non-buffered mode.
Reading of it depends on bluefs_buffered_io option.
It is strongly suspected that this causes some wierd problems.
Adam King [Sun, 15 Jan 2023 22:18:47 +0000 (17:18 -0500)]
mgr/cephadm: fix haproxy nfs backend server ip gathering
Fixes: https://tracker.ceph.com/issues/58465
Previously, if there were 2 nfs daemons of the same
rank, we could not check the rank generation, which
is intended to mark which one is the "real" on of that
rank in cases where we cannot remove the other one due
to its host being offline. The nfs of a given rank with
the highest rank_generation is the one we want haproxy
to use for its backend IP. Since we didn't actually
check this, it was random, depending on what order we
happened to iterate over the nfs daemons of the same
rank, which IP we actually got. If the nfs with the
lower rank_generation on an offline host happened
to come later in the iterations, we'd use that one
for the IP, which is incorrect.
Adam King [Sun, 15 Jan 2023 21:30:53 +0000 (16:30 -0500)]
mgr/cephadm: don't attempt daemon actions for daemons on offline hosts
They'll just fail anyway, and it will waste time waiting
for the connection to timeout. We have other places in
the serve loop that will check if the host is back
online.
Nizamudeen A [Thu, 9 Mar 2023 11:51:44 +0000 (17:21 +0530)]
mgr/dashboard: custom image for kcli bootstrap script
the stable branches like quincy pulls from the quay.io/ceph/ceph:v17 to
bootstrap the ceph cluster in test environments. This will cause issues
because the branches are changing constantly but the image is not. So
using the quay.ceph.io repo to bring the cluster in test environment.
Matan Breizman [Thu, 15 Dec 2022 17:05:15 +0000 (17:05 +0000)]
mon/OSDMonitor: Skip check_pg_num on pool size decrease
When changing the pool size we use check_pg_num to not exceed
`mon_max_pg_per_osd` value. This check should only be applied
when increasing the size to avoid underflows.
(Same already applied when changing pg_num)
Matan Breizman [Mon, 19 Dec 2022 09:58:06 +0000 (09:58 +0000)]
mon/OSDMointor: Simplify check_pg_num()
* See: https://tracker.ceph.com/issues/47062.
Originally check_pg_num did not take into account the root
osds by the crash rule.
This behavior resulted in an inaccurate pg num per osd count.
* Avoid summing all of the projecetd pg num and only later
on subtracting the pg num if the pool did exist.
* With this change, we only count the projected pg num which
are part the pools affected by the crush rule.
Same for osd number, instead of dividing the projected
pg number by all of the osdmap osds, divide only by
the osds used by the crush rule.
* Avoid differentiating between whether the mapping epoch
is later than the osdmap epoch or not. Always check the pg
num according to crush rule.
Anthony D'Atri [Thu, 1 Dec 2022 19:04:30 +0000 (14:04 -0500)]
src/mon: clarify pool creation failure due to max_pgs_per_osd error message
Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>
Note: This commit is cherry-picked as a dependency
for later commits in this backport.
(cherry picked from commit 88e8eeca7571fc314bc30a52cd17218fa9fac500)
Tongliang Deng [Fri, 31 Dec 2021 06:02:25 +0000 (14:02 +0800)]
mon/OSDMonitor: fix integer underflow of check_pg_num
Underflow of the `uint64_t projected` variable occurs when
the sum of current acting pg num and new pg num we specified
is less than the pg num calculated from pg info.
Signed-off-by: Tongliang Deng <dengtongliang@gmail.com>
Note: This commit is cherry-picked as a dependency
for later commits in this backport.
(cherry picked from commit bd9813f5e1a3addca1a57360d58b50b120e0e5f3)
jerryluo [Mon, 25 Jan 2021 16:10:57 +0000 (00:10 +0800)]
mon/OSDMonitor: Make the pg_num check more accurate
In check_pg_num function, finding the corresponding osd according to the
current pool's crush rule, and calculating whether the average value of
pg_num on these osd will exceed the value of 'mon_max_pg_per_osd'. Make
the pg_num check more accurate by counting all the pgs on the osd used
by the new pool.
Fixes: https://tracker.ceph.com/issues/47062 Signed-off-by: Jerry Luo <luojierui@chinatelecom.cn>
Note: This commit has been reverted and is cherry-picked as
dependency for other commits in this backport.
(cherry picked from commit c726ce9e5088b30d29e0db5c0ecc8c03fe41da1d)
Adam King [Thu, 16 Feb 2023 17:34:06 +0000 (12:34 -0500)]
qa/distros: pass --allowerasing --nobest when installing container-tools
One of the tests in the orch suite is running distro install
commands from multiple distros, causing it to first install
container-tools 3.0 and then later install container-tools,
which fails, causing the test to fail. This is sort of a bandaid
fix to getthe test to work. It will cause whatever the last
version of the package to be installed to end up being installed
(and will do so without error) which is what we want in the tests.
Adam King [Sun, 12 Feb 2023 20:28:10 +0000 (15:28 -0500)]
cephadm: set pids-limit unlimited for all ceph daemons
We actually had this setup before, but ran into issues.
Some teuthology test had failed in the fs suite, so it was
modified to only affect iscsi and rgw daemons (https://github.com/ceph/ceph/pull/45798)
and then the changes were reverted entirely (so no pids-limit
modifying code at all) in quincy and pacific because
the LRC ran into issues with the change related to the podman
version (https://github.com/ceph/ceph/pull/45932). This new patch
now addresses the podman versions, specifically that the patch
that makes -1 work for a pids-limit seems to have landed in
podman 3.4.1 based on https://github.com/containers/podman/pull/12040.
We'll need to make sure that this doesn't break anything in the
fs suites again as I don't remember the details of the first
issue, or why having it only set the pids-limit for iscsi and rgw fixes it.
Assuming that isn't a problem we should hopefully be able to unify
at least how reef and quincy handle this now that the podman version
issue is being addressed in this patch.
See the linked tracker issue for a discussion on why we're going at
this again and why I'm trying to do this for all ceph daemon types.
Teoman ONAY [Thu, 11 Nov 2021 15:05:49 +0000 (15:05 +0000)]
cephadm: remove containers pids-limit
The default pids-limit (docker 4096/podman 2048) prevent some
customization from working (http threads on RGW) or limits the number
of luns per iscsi target.
Redouane Kachach [Wed, 25 Jan 2023 09:14:59 +0000 (10:14 +0100)]
cephadm: using short hostname to create the initial mon and mgr Fixes: https://tracker.ceph.com/issues/58466 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 0b807eefb8dbccf1e25c846f8177ddb74c6f333d)
This patch introduces a per osd crush_device_class definition in the
DriveGroup spec. The Device object is extended to support a
crush_device_class parameter which is processed by ceph-volume when
drives are prepared in batch mode. According to the per osd defined
crush device classes, drives are collected and grouped in a dict that is
used to produce a set of ceph-volume commands that eventually apply (if
defined) the right device class. The test_drive_group unit tests are
also extended to make sure we're not breaking compatibility with the
default definition and the new syntax is validated, raising an exception
if it's violated.
Fixes: https://tracker.ceph.com/issues/58184 Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit 6c6cb2f5130dbcf8e42cf03666173948411fc92b)