Ilya Dryomov [Thu, 9 Jun 2022 11:42:01 +0000 (13:42 +0200)]
cmake: pass -Wno-error when building PMDK
It's hitting pacific with a nuisance -Werror=array-parameter= const
char * vs const char[37] mismatch. Follow commit 91a616b26e83 ("cmake:
pass RTE_DEVEL_BUILD=n when building dpdk") and just disable -Werror.
This was attempted in commit 69a7ed4eab36 ("run-make-check: enable
WITH_RBD_RWL when WITH_PMEM is true") but never completed. We soon
bumped the requirement on libpmem, so WITH_SYSTEM_PMDK=ON wouldn't
have worked anyway.
Enable the RWL mode conditionally based on WITH_RBD_RWL variable.
Enable the SSD mode unconditionally as it has no special dependencies
and can be built on any architecture.
test/encoding/check-generated.sh: show diff if binary reencode check fails
Take bf0b161115aa ("test/encoding/check-generated.sh: show diff if cmp
fails") a bit further. Suggesting "cmp $tmp1 $tmp2" isn't very helpful
since cmp would report just the mismatch offset.
librbd/cache/pwl: WriteLogCacheEntry constructor must initialize flags
Initializing the individual bit field members leaves the remaining two
bits uninitialized and that garbage state gets persisted.
In general, using bit fields in a structure where the layout actually
matters is not desirable. Even with a few single bits, such as here,
their order, strictly speaking, is not guaranteed:
An implementation may allocate any addressable storage unit large
enough to hold a bit-field. If enough space remains, a bit-field
that immediately follows another bit-field in a structure shall be
packed into adjacent bits of the same unit. If insufficient space
remains, whether a bit-field that does not fit is put into the next
unit or overlaps adjacent units is implementation-defined. The
order of allocation of bit-fields within a unit (high-order to
low-order or low-order to high-order) is implementation-defined.
The alignment of the addressable storage unit is unspecified.
cephadm: fix osd adoption with custom cluster name
When adopting Ceph OSD containers from a Ceph cluster with a custom name, it fails
because the name isn't propagated in unit.run.
The idea here is to change the lvm metadata and enforce 'ceph.cluster_name=ceph'
given that cephadm doesn't support custom names anyway.
Fixes: https://tracker.ceph.com/issues/55654 Signed-off-by: Adam King <adking@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e720a658d6a1582c0497bdf709ef4bd26bb5bb73)
ceph-mixin: refactor the structure of _config and utils
Before this refactor we couln't override the config externally. Now the
_config is correctly propagated and not only taken from the
config.libsonnet file.
ceph-mixin: fix linting issue and add cluster template support
Fix most of the issues reported by dashboards-linter:
- Add matcher/template for job (and also cluster)
- use $__rate_interval everywhere
Also this change all the irate functions to rate as most of irate where
not actually used correctly. While using irate on graph for instance you
can easily miss some of the metrics values as irate only take the two
last values and the query steps can be quite large if you want a graph
for a few hours/a day or more.
Fixes: https://tracker.ceph.com/issues/55003 Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
ceph-mixin: add config with matchers and tags
Zac Dover [Wed, 1 Jun 2022 14:44:02 +0000 (00:44 +1000)]
doc: (squash) adding pacific.rst to toctree
Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc/start: update "memory" in hardware-recs.rst
This PR corrects some usage errors in the "Memory" section
of the hardware-recommendations.rst file. It also closes
some opened but never closed parentheses.
This adds the security/ directory from the main branch.
This is done so that all references in the pacific.rst
file find destinations. This means that Sphinx will re-
cognize the document as coherent and that Sphinx will
permit it to build.
Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: (squash) add security/index.rst to toctree
Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: remove :confvals: from bluestore-config-ref
Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: (squash) linking to s3-feature-table
Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: (squash) repair refs to cephfs-top
Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: (squash) fix link to snap-schedule
Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: (squash) fix link to ceph-dokan
Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: (squash) fixing active-releases link
Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: testing security/index toctree link
afd8be7eac5e996c3bd07656601a4534053e2516 broke it.
It has dropped`block_wal` and `block_db` from
`ceph_volume.devices.raw.activate.activate_bluestore` but
`activate.main.Activate.main` still passes those arguments when
calling `RAWActivate([]).activate()`
Note that we're making this change because (1) it allows db/wal and (2)
because there are no known users of 'raw activate'. The only known user
is via 'ceph-volume activate' and we've fixed that caller in this commit.
Zac Dover [Wed, 1 Jun 2022 14:54:40 +0000 (00:54 +1000)]
doc/rados: update bluestore-config-ref.rst
This PR updates bluestore-config-ref.rst so that
other PRs that refer to material in it can be
backported.
In order to ensure the coherence of this document,
all :confval: declarations have been removed. The
module that interprets those is called ceph_confval
and is available only in Quincy.
mgr/dashboard: introduce memory and cpu usage for daemons
Fixes: https://tracker.ceph.com/issues/55218 Signed-off-by: Avan Thakkar <athakkar@redhat.com> Co-authored-by: Aashish Sharma <aasharma@redhat.com>
Introducing 2 new columns in Cluster->Host->Daemons table for Memory and CPU usage.
Conflicts:
src/pybind/mgr/cephadm/module.py
- _process_ls_output() doesn't exist in pacific as agent isn't yet backported. So similar changes
needs to be done in serve.py instead.
In `master` the milestone step exits and causes remaining tasks not to be run. I previously tried with the `continue-on-error` flag, but it didn't work, so let's try putting that steps at the end.
Prashant D [Thu, 17 Mar 2022 14:29:40 +0000 (14:29 +0000)]
mgr, mgr/prometheus: Fix regression with prometheus metrics
The ceph dameons on host are inheriting ceph version from the host.
This introduces a wrong interpretation in prometheus metrics as well
as in dump_server. Each ceph daemon should represent it's own
ceph version based on the ceph binary is use for that daemon.
Consider a situation where partial upgrade is done on host, some daemons
which are restarted should have ceph version tag as upgraded version
and rest should have older ceph version but presently all inherites
host version. In containerized environment, all daemons are
using ceph version of last daemon registered as a service on the host.
Fixes: https://tracker.ceph.com/issues/54611 Signed-off-by: Prashant D <pdhange@redhat.com>
(cherry picked from commit aeca2e41ef560cf51c1ad935cfb6470e782aa8d5)
Prashant D [Mon, 28 Mar 2022 13:02:08 +0000 (14:02 +0100)]
mgr, mon: Keep upto date metadata with mgr for MONs
The mgr updates mon metadata through handle_mon_map which
gets triggered when MONs were removed/added from/to cluster or
the active mgr is restarted or mgr failsover.
We could have handled metadata update through MMgrOpen or
early MMgrReport messages but these are sent before monitor
electin completes and lead monitor updates pending metadata
in monstore. Instead of relying on fetching mon metadata using
'ceph mon metadata <id>' command, explicitly send metadata
update request with mon metadata to mgr.
Fixes: https://tracker.ceph.com/issues/55088 Signed-off-by: Prashant D <pdhange@redhat.com>
(cherry picked from commit 1a065043b964f8c014ebb5bc890a243c398ff07c)
Laura Flores [Mon, 16 May 2022 22:59:42 +0000 (17:59 -0500)]
qa/suites/rados/thrash-erasure-code-big/thrashers: add `osd max backfills` setting to mapgap and pggrow
All `rados/thrash-erasure-code-big` tests that die due to the “wait_for_recovery” timeout have one thing in common: They contain either `thrashers/pggrow` or `thrashers/mapgap`.
The difference between pggrow and mapgap vs. all other non-offending thrashers (default, careful, fastread, and morepggrow) is that they lack an override setting for `osd max backfills`. `osd max backfills` is the max number of backfill operations allowed to/from an OSD. The higher the number, the quicker the recovery. By default, this value is 1. On all of the non-offending thrashers (default, careful, fastread, and morepggrow), the default 1 value gets overridden in their .yaml files with a value > 1. This is not the case for pggrow and mapgap, however, as they lack an `osd max backfills` override setting.
The mclock op scheduler is known to override `osd max backfills` with a high value, but all of the thrash-erasure-code-big thrashers have their op queue set to “debug_random”, which chooses randomly between op queues (the debug_random op queue is set to override the default mclock_scheduler in qa/config/rados.yaml). So, coupled with the “debug_random” op queue, the low `osd max backfill` setting is causing some tests to time out in recovery.
WITHOUT `osd max backfills`, as they are now, “mapgap” and “pggrow” tests die due to timed-out recovery about 17/100 times, as seen here with a pggrow test: http://pulpito.front.sepia.ceph.com/lflores-2022-05-18_14:24:29-rados:thrash-erasure-code-big-master-distro-default-smithi/
WITH `osd max backfills` specified, as I have suggested in this PR, 99/100 tests passed, with one test failing for a different reason:
http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_22:40:27-rados:thrash-erasure-code-big-master-distro-default-smithi/
I also scheduled 145 tests WITH `osd max backfills` that are a mix of pggrow and mapgap thrashers. 144/145 tests passed, with one test failing for a different reason. http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_15:27:54-rados:thrash-erasure-code-big-master-distro-default-smithi/
Fixes: https://tracker.ceph.com/issues/51076 Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 40062676c2ceed49b9fa147127ffa83ba6118e2a)
Tim Serong [Wed, 30 Mar 2022 05:25:30 +0000 (16:25 +1100)]
ceph.spec.in: remove build directory in %clean, not %install
Removing the build directory at the end of %install is too soon,
and means we get rid of a bunch of stuff needed to correctly
create debuginfo/debugsource packages, which happens automatically
right after %install. So, let's put it where it really belongs, in
the %clean section.
Fixes: aa18cb12003e3526c8e8f23dc2335a483fbfa68e Fixes: https://tracker.ceph.com/issues/55079 Signed-off-by: Tim Serong <tserong@suse.com>
(cherry picked from commit 94ad178bdcbae56a8eafc65a3a276e25d7a51a5e)
Tim Serong [Mon, 28 Mar 2022 04:12:10 +0000 (15:12 +1100)]
ceph.spec.in: remove build directory at end of %install
By the time we get to the end of the %install section, all the
built binaries have been installed in the build root, so we can
delete the build directory from the source tree. This frees up
about 17GB of disk space on build hosts, which is helpful in
case other processes later in the RPM build need more disk space.
Fixes: https://tracker.ceph.com/issues/55079 Signed-off-by: Tim Serong <tserong@suse.com>
(cherry picked from commit aa18cb12003e3526c8e8f23dc2335a483fbfa68e)
Conflicts:
ceph.spec.in
- pacific uses "build", not "%{_vpath_builddir}"
Adam King [Fri, 1 Apr 2022 12:20:28 +0000 (08:20 -0400)]
mgr/cephadm: make UpgradeState from_json a bit safer
This way, for downgrades to whatever versions
this lands in onward, having added new parameters to
UpgradeState shouldn't break anything. Can't do much
about downgrades to older versions from this one
but this should help in the future.
Adam King [Mon, 28 Mar 2022 16:10:15 +0000 (12:10 -0400)]
mgr/cephadm: split _do_upgrade into sub functions
This function was around 500 lines and difficult to work
with. Splitting it into sub functions should hopefully make
it a bit easier to understand and make changes to.
Adam Kupczyk [Tue, 2 Nov 2021 15:57:32 +0000 (16:57 +0100)]
os/bluestore/bluefs: Fix data corruption in truncate()
It is possible to create condition in which a BlueFS contains file that is corrupted.
It can happen when BlueFS replay log is on device A and we just wrote to device B and truncated file.
Scenario:
1) write to file h1 on SLOW device
2) flush h1 (initiate transfer, but no fdatasync yet)
3) truncate h1
4) write to file h2 on DB
5) fsync h2 (forces replay log to be written, after fdatasync to DB)
6) poweroff
Fixes: https://tracker.ceph.com/issues/53129 Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
(cherry picked from commit 49b7b44b3b5c94ee401562e603999e2b3bd8f9a2)
Volker Theile [Tue, 10 May 2022 13:25:54 +0000 (15:25 +0200)]
cephadm: prometheus: The generatorURL in alerts is only using hostname
Prometheus is currently using only the hostname in the 'generatorURL' of an alert which causes issues when clicking on the URL in the Ceph Dashboard or somewhere else, because in most cases the hostname of the node that is running the Prometheus container is not resolvable.
To fix that the command line argument '--web.external-url' must be appended in the systemd unit file of the Prometheus container, e.g. '--web.external-url http://foo.bar:9095' whereas a FQDN hostname is used.
Volker Theile [Mon, 9 May 2022 13:31:15 +0000 (15:31 +0200)]
mgr/dashboard: Creating and editing Prometheus AlertManager silences is buggy
When creating a new monitoring silence the form is pre-filled with the wrong alert data. It is always used the alert data from the very first object in the list of the API response but not the specified alert identified by the 'fingerprint' property.
The same problem applies to editing silences. The selected silence is not edited, it's always the first one in the list returned API response but not that with the specified 'id' property.
The main problem of the origin implementation is that the Prometheus Alertmanager API endpoints /api/v1/[alerts/silences] do not support querying. To fix that, filtering is done in the frontend.
Zac Dover [Wed, 18 May 2022 10:36:53 +0000 (20:36 +1000)]
doc/start: s/3/three/ in intro.rst
I'm changing "3" to "three" for two reasons:
1. It's correct.
2. This allows me to test backports into Octopus, Pacific, and Quincy.
I am particularly interested to see what happens when I attempt
the backport into Octopus, because backports into Octopus have
failed. This will provide me with another unit of data.
Adam King [Thu, 18 Nov 2021 20:22:39 +0000 (15:22 -0500)]
mgr/cephadm: re-use old ip when re-adding hosts if necessary
When a host is re-added without an explicit ip we can default to the old
ip we had stored for the host rather than either keeping the loopback
address or throwing an exception. We only want to actually error when
the only options left are error or use a resolved loopback address
Redouane Kachach [Tue, 17 May 2022 15:26:39 +0000 (17:26 +0200)]
mgr/cephadm: stripping out / from the end of the url Fixes: https://tracker.ceph.com/issues/55638 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 17032f6be22e9efc3e199d7e35091025bfaae965)