git.apps.os.sepia.ceph.com Git

cmake: pass -Wno-error when building PMDK

It's hitting pacific with a nuisance -Werror=array-parameter= const
char * vs const char[37] mismatch. Follow commit 91a616b26e83 ("cmake:
pass RTE_DEVEL_BUILD=n when building dpdk") and just disable -Werror.

Fixes: https://tracker.ceph.com/issues/55977
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 03d27945a646124e8d8a915771ac94ed7b684366)

run-make-check.sh: enable RBD persistent caches

This was attempted in commit 69a7ed4eab36 ("run-make-check: enable
WITH_RBD_RWL when WITH_PMEM is true") but never completed. We soon
bumped the requirement on libpmem, so WITH_SYSTEM_PMDK=ON wouldn't
have worked anyway.

Enable the RWL mode conditionally based on WITH_RBD_RWL variable.
Enable the SSD mode unconditionally as it has no special dependencies
and can be built on any architecture.

Fixes: https://tracker.ceph.com/issues/55285
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 0f1634a21f5da2250915d8ac05a6f179d4e76d03)

Conflicts:
run-make-check.sh [ commit 69a7ed4eab36 ("run-make-check:
enable WITH_RBD_RWL when WITH_PMEM is true") not in pacific ]

test/encoding/check-generated.sh: show diff if binary reencode check fails

Take bf0b161115aa ("test/encoding/check-generated.sh: show diff if cmp
fails") a bit further. Suggesting "cmp $tmp1 $tmp2" isn't very helpful
since cmp would report just the mismatch offset.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 59d928a06c028bf381307001d2b68fa8545d8fc4)

librbd/cache/pwl: WriteLogCacheEntry constructor must initialize flags

Initializing the individual bit field members leaves the remaining two
bits uninitialized and that garbage state gets persisted.

In general, using bit fields in a structure where the layout actually
matters is not desirable.  Even with a few single bits, such as here,
their order, strictly speaking, is not guaranteed:

    An implementation may allocate any addressable storage unit large
    enough to hold a bit-field. If enough space remains, a bit-field
    that immediately follows another bit-field in a structure shall be
    packed into adjacent bits of the same unit. If insufficient space
    remains, whether a bit-field that does not fit is put into the next
    unit or overlaps adjacent units is implementation-defined. The
    order of allocation of bit-fields within a unit (high-order to
    low-order or low-order to high-order) is implementation-defined.
    The alignment of the addressable storage unit is unspecified.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 91d270b210a908ea2f3578dd7db3263383da95a8)

librbd/cache/pwl: initialize generate_test_instances() objects

... to prevent check-generated.sh failures such as:

**** librbd::cache::pwl::WriteLogPoolRoot test 1 dump_json check failed ****
   ceph-dencoder type librbd::cache::pwl::WriteLogPoolRoot select_test 1 dump_json > /tmp/typ-cAoWrqlHC
   ceph-dencoder type librbd::cache::pwl::WriteLogPoolRoot select_test 1 encode decode dump_json > /tmp/typ-ES5yHpfGL
5c5
<     "flushed_sync_gen": 0,
---
>     "flushed_sync_gen": 255,

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 2c131f57d63454de39210375ce75a282df6fe365)

Conflicts:
src/librbd/cache/pwl/Types.cc [ commit 6eb14774fec0 ("librbd:
  build without "using namespace std"") not in pacific ]
src/librbd/cache/pwl/ssd/Types.h [ ditto ]

librbd/cache/pwl: fix -Wunused-lambda-capture warnings

Reported by clang on "make check" and "make check arm64" builds.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 753aa038fdbc26a2ee0978f54d3f7dcfa052e833)

Merge pull request #46556 from adk3798/pacific-fqdn-autotune

pacific: mgr/cephadm: use host shortname for osd memory autotuning

Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Redouane Kachach <rkachach@redhat.com>

Merge pull request #45698 from tserong/wip-55097-pacific

pacific: ceph.spec.in: remove build directory at end of %install

Reviewed-by: Kefu Chai <tchaikov@gmail.com>

Merge pull request #46545 from adk3798/pacific-raw-osd-fixup

pacific: mgr/cephadm: don't redeploy osds seen in raw list if cephadm knows them

Reviewed-by: Guillaume Abrioux <gabrioux@redhat.com>

Merge pull request #46552 from guits/bkp-pacific-46481

pacific: backport of cephadm: fix osd adoption with custom cluster name

Reviewed-by: Adam King <adking@redhat.com>

Merge pull request #46555 from adk3798/pacific-allow-crush-class

pacific: python-common: allow crush device class to be set from osd service spec

Reviewed-by: Cory Snyder <csnyder@iland.com>

Merge pull request #46429 from pdvian/wip-55309-pacific

pacific: mgr, mgr/prometheus: Fix regression with prometheus metrics

Reviewed-by: Laura Flores <lflores@redhat.com>

Merge pull request #46427 from pdvian/wip-55308-pacific

pacific: mgr, mon: Keep upto date metadata with mgr for MONs

Reviewed-by: Laura Flores <lflores@redhat.com>

mgr/cephadm: use hostname from crush map for osd memory autotuning

Fixes: https://tracker.ceph.com/issues/55841
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 50f28aa56edd348c3816335bef3bbfaf5133ae54)

python-common: allow crush device class to be set from osd service spec

Adds crush_device_class parameter to DriveGroupSpec so that device class
can be set via service specs with cephadm.

Fixes: https://tracker.ceph.com/issues/55813
Signed-off-by: Cory Snyder <csnyder@iland.com>
(cherry picked from commit c2f314ab8c187b54f12c04ec26034d451bd82273)

Conflicts:
src/python-common/ceph/deployment/drive_group.py

Merge pull request #46215 from rzarzynski/wip-tests-bl-fix-rebuild-pacific

pacific: test/bufferlist: ensure rebuild_aligned_size_and_memory() always rebuilds.

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>

Merge pull request #46120 from myoungwon/backport-50806

pacific: osd: fix wrong input when calling recover_object()

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #45870 from idryomov/wip-stretch-last-force-resend-pacific

pacific: mon/OSDMonitor: properly set last_force_op_resend in stretch mode

Reviewed-by: Greg Farnum <gfarnum@redhat.com>

Merge pull request #45171 from pponnuvel/wip-53391-pacific

pacific: Fix data corruption in bluefs truncate()

Reviewed-by: Igor Fedotov <ifedotov@suse.com>

Merge pull request #46252 from rzarzynski/wip-45529-pacific

pacific: osd/PGLog.cc: Trim duplicates by number of entries

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Nitzan Mordechai <nmordech@redhat.com>

Merge pull request #46549 from MrFreezeex/wip-55652-pacific

pacific: ceph-mixin: backport of recent cleanups

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>

cephadm: fix osd adoption with custom cluster name

When adopting Ceph OSD containers from a Ceph cluster with a custom name, it fails
because the name isn't propagated in unit.run.
The idea here is to change the lvm metadata and enforce 'ceph.cluster_name=ceph'
given that cephadm doesn't support custom names anyway.

Fixes: https://tracker.ceph.com/issues/55654
Signed-off-by: Adam King <adking@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e720a658d6a1582c0497bdf709ef4bd26bb5bb73)

Merge pull request #46541 from idryomov/wip-rbd-codeowners-pacific

pacific: CODEOWNERS: add RBD team

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

ceph-mixin: remove timepicker override in every dashboards

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
(cherry picked from commit 37add644d1e65b4d6375a63dcbef742420a5a4c3)

ceph-mixin: rationalize local helper functions to utils

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
(cherry picked from commit 5db37300fde5c6cc2ec9f3ead34ea1b93126f5bf)

ceph-mixin: fix typos

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
(cherry picked from commit 0b7cc6bc998fec47a29abfa94224cacee2dd598d)

ceph-mixin: fix test with rate and label changes

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
(cherry picked from commit c8f086c182b87f1a813cb37fd58ad1e753a6b0bf)

ceph-mixin: don't add cluster matcher if showcluster is disabled

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
(cherry picked from commit 3b6356c8724ee2b299743d20ff5df0401181228b)

ceph-mixin: refactor the structure of _config and utils

Before this refactor we couln't override the config externally. Now the
_config is correctly propagated and not only taken from the
config.libsonnet file.

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
(cherry picked from commit fd4f484d220d98ba684878c87488cd74c502b4ff)

ceph-mixin: fix makefile dashboards dependency

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
(cherry picked from commit 4595e9af23205e82a5232ecc3da408054b30d520)

ceph-mixin: fix linting issue and add cluster template support

Fix most of the issues reported by dashboards-linter:
- Add matcher/template for job (and also cluster)
- use $__rate_interval everywhere

Also this change all the irate functions to rate as most of irate where
not actually used correctly. While using irate on graph for instance you
can easily miss some of the metrics values as irate only take the two
last values and the query steps can be quite large if you want a graph
for a few hours/a day or more.

Fixes: https://tracker.ceph.com/issues/55003
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
ceph-mixin: add config with matchers and tags

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
(cherry picked from commit faeea8d165342245929ea26441ee0cbb8957e3a7)

ceph-mixin: rewrite promql queries to multiline

Fixes: https://tracker.ceph.com/issues/55005
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
(cherry picked from commit 1452311a9bb01baa85786345e119d719b5838307)

mgr/cephadm: don't redeploy osds seen in raw list if cephadm knows them

As is already done when checking the lvm list
results and should also be done for checking raw
list but is missing do ot a backporting mistake

Tachnically a partial backport of #44228 that
was not included in #44627 because raw osd support
was not in pacific then

Signed-off-by: Adam King <adking@redhat.com>

CODEOWNERS: add RBD team

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 00a44f1c6b3c5270f1c9d75cf6dcac3f0d470fa9)

Merge pull request #46520 from neha-ojha/wip-46415-pacific

pacific: .github/CODEOWNERS: tag core devs on core PRs

Reviewed-by: Laura Flores lflores@redhat.com

Merge pull request #46484 from zdover23/wip-doc-2022-06-02-backport-46430-pacific-2nd-attempt

.github/CODEOWNERS: tag core devs on core PRs

Start with everything that is present under core in .github/labeler.yml.

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 8303c6b911154ee936adb46e7c3491b174d22df8)

doc: (squash) adding :orphan: to security/index

Signed-off-by: Zac Dover <zac.dover@gmail.com>

doc: (squash) fix s3-select-feature-table link

Signed-off-by: Zac Dover <zac.dover@gmail.com>

doc: (squash) removing extraneous toc entry

Signed-off-by: Zac Dover <zac.dover@gmail.com>

Merge branch 'pacific' into wip-doc-2022-06-02-backport-46430-pacific-2nd-attempt

doc: (squash) adding pacific.rst to toctree

Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc/start: update "memory" in hardware-recs.rst

This PR corrects some usage errors in the "Memory" section
of the hardware-recommendations.rst file. It also closes
some opened but never closed parentheses.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
(cherry picked from commit 429bbdea65188df6708832efee188e0a40e1cde2)
(cherry picked from commit e63b048a98a33e82e35d76c2d67ed8de184fed57)
Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: (squash) adding security/ dir

This adds the security/ directory from the main branch.
This is done so that all references in the pacific.rst
file find destinations. This means that Sphinx will re-
cognize the document as coherent and that Sphinx will
permit it to build.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: (squash) add security/index.rst to toctree

Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: remove :confvals: from bluestore-config-ref

Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: (squash) linking to s3-feature-table

Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: (squash) repair refs to cephfs-top

Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: (squash) fix link to snap-schedule

Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: (squash) fix link to ceph-dokan

Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: (squash) fixing active-releases link

Signed-off-by: Zac Dover <zac.dover@gmail.com>
doc: testing security/index toctree link

Signed-off-by: Zac Dover <zac.dover@gmail.com>

Merge pull request #46511 from adk3798/pacific-fix-activate

pacific: ceph-volume: fix activate

Merge pull request #46485 from zdover23/wip-doc-2022-06-02-backport-rocksdb-sharding-to-pacific

doc/rados: update bluestore-config-ref.rst

Reviewed-by: Josh Durgin <jdurgin@redhat.com>

ceph-volume: fix generic activate

afd8be7eac5e996c3bd07656601a4534053e2516 broke it.
It has dropped`block_wal` and `block_db` from
`ceph_volume.devices.raw.activate.activate_bluestore` but
`activate.main.Activate.main` still passes those arguments when
calling `RAWActivate([]).activate()`

Fixes: https://tracker.ceph.com/issues/54441
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3337b62e859673cba908bf8e12c7f3f23fddf2c2)

mgr/cephadm: add some debug output for serve loop

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit f10f94478f58db96653ffa4f74e99b40b529c663)

Conflicts:
src/pybind/mgr/cephadm/serve.py

ceph-volume: adjust arguments for 'ceph-volume raw activate'

Take a list of devices, so that we can selectively activate a raw osd
with db/wal.

Remove the argument type kludge introduced in 2c228a9a409176c0f1679f176443fd3ead219c7a
since it is no longer needed.

Note that we're making this change because (1) it allows db/wal and (2)
because there are no known users of 'raw activate'. The only known user
is via 'ceph-volume activate' and we've fixed that caller in this commit.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit afd8be7eac5e996c3bd07656601a4534053e2516)

Conflicts:
src/ceph-volume/ceph_volume/devices/raw/list.py

ceph-volume: add raw support for db/wal for list and activate

Currently 'prepare' doesn't support db/wal, but we want it in list and
activate because 'ceph-volume activate ...' tries raw before lvm.

Note that I'm not sure we really want to accept --block.db and --block.wal
here at all.

Fixes: 3d7ceec684b0ac5b83fae4c397b134236fac485e
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit f0a0c70d5c8c150abd0590ea23be83c7e53f9a10)

Merge pull request #46504 from rhcs-dashboard/wip-55832-pacific

pacific: qa: fix teuthology master branch ref

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Laura Flores <lflores@redhat.com>

qa: fix teuthology master branch ref

Fixes: https://tracker.ceph.com/issues/55826
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit e91773df68c286266a2855e69bf542b4c73379d9)

Conflicts:
qa/tox.ini
- accept only master-main rename

Merge pull request #46490 from ceph/pacific-nobranch

pacific: qa: remove .teuthology_branch file

qa: remove .teuthology_branch file

This was originally added to help support the py2 -> py3 conversion.
That's long since complete so we should be able to just remove this file
now.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
(cherry picked from commit 81430de9b70be16a439bf2445f3345b83035a861)

Merge pull request #46459 from rhcs-dashboard/wip-55601-pacific

pacific: mgr/dashboard: introduce memory and cpu usage for daemons

Reviewed-by: Sarthak Gupta <sarthak.dev.0702@gmail.com>
Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: nSedrickm <NOT@FOUND>
Reviewed-by: sunilangadi2 <NOT@FOUND>

doc/rados: update bluestore-config-ref.rst

This PR updates bluestore-config-ref.rst so that
other PRs that refer to material in it can be
backported.

In order to ensure the coherence of this document,
all :confval: declarations have been removed. The
module that interprets those is called ceph_confval
and is available only in Quincy.

Signed-off-by: Zac Dover <zac.dover@gmail.com>

Merge pull request #46461 from rhcs-dashboard/wip-55116-pacific

pacific: mgr/dashboard: don't log 3xx as errors

Reviewed-by: Laura Flores <lflores@redhat.com>
Reviewed-by: nSedrickm <NOT@FOUND>

mgr/dashboard: fix linting errors and add test

Fixes: https://tracker.ceph.com/issues/55218
Signed-off-by: Aashish Sharma <aasharma@redhat.com>

Merge pull request #46448 from ceph/fix-triage-pacific

pacific: .github: continue on error and reorder milestone step

Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

mgr/dashboard:  don't log 3xx as errors

Let's avoid printing these ugly/misleading/redundant messages:

```
0 [dashboard DEBUG controllers.home] frontend language from headers: ['en-us']
0 [dashboard DEBUG controllers.home] found directory for language 'en-us'
0 [dashboard DEBUG controllers.home] serving static content: /home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/build/src/pybind/mgr/dashboard/frontend/dist/en-US/styles.css
0 [dashboard ERROR exception] Internal Server Error
Traceback (most recent call last):
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/dashboard/services/exception.py", line 47, in dashboard_exception_handler
    return handler(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/cherrypy/_cpdispatch.py", line 60, in __call__
    return self.callable(*self.args, **self.kwargs)
  File "/home/jenkins-build/build/workspace/ceph-dashboard-pull-requests/src/pybind/mgr/dashboard/controllers/home.py", line 134, in __call__
    return serve_file(full_path)
  File "/usr/lib/python3/dist-packages/cherrypy/lib/static.py", line 70, in serve_file
    cptools.validate_since()
  File "/usr/lib/python3/dist-packages/cherrypy/lib/cptools.py", line 117, in validate_since
    raise cherrypy.HTTPRedirect([], 304)
cherrypy._cperror.HTTPRedirect: ([], 304)
```

Fixes: https://tracker.ceph.com/issues/54991
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit 406e54d8c41bbc94b7285077d3055766629a2313)

mgr/dashboard: introduce memory and cpu usage for daemons

Fixes: https://tracker.ceph.com/issues/55218
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Co-authored-by: Aashish Sharma <aasharma@redhat.com>
Introducing 2 new columns in Cluster->Host->Daemons table for Memory and CPU usage.

(cherry picked from commit 263940502bdd9858c97923f394cd3d918e86e921)

Conflicts:
src/pybind/mgr/cephadm/module.py
- _process_ls_output() doesn't exist in pacific as agent isn't yet backported. So similar changes
needs to be done in serve.py instead.

Merge pull request #46204 from rhcs-dashboard/wip-55570-pacific

pacific: mgr/dashboard: fix ssl cert validation for ingress service creation

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

.github/pr-triage: reorder milestone step

In `master` the milestone step exits and causes remaining tasks not to be run. I previously tried with the `continue-on-error` flag, but it didn't work, so let's try putting that steps at the end.

Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit d8c0229b90cc20e89f7037a72af8b5d41b6b0861)

.github: continue on error

Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit b6791ee09a49398cdef39faae5f2e72f43061d64)

mgr, mgr/prometheus: Fix regression with prometheus metrics

The ceph dameons on host are inheriting ceph version from the host.
This introduces a wrong interpretation in prometheus metrics as well
as in dump_server. Each ceph daemon should represent it's own
ceph version based on the ceph binary is use for that daemon.

Consider a situation where partial upgrade is done on host, some daemons
which are restarted should have ceph version tag as upgraded version
and rest should have older ceph version but presently all inherites
host version. In containerized environment, all daemons are
using ceph version of last daemon registered as a service on the host.

Fixes: https://tracker.ceph.com/issues/54611
Signed-off-by: Prashant D <pdhange@redhat.com>
(cherry picked from commit aeca2e41ef560cf51c1ad935cfb6470e782aa8d5)

mgr, mon: Keep upto date metadata with mgr for MONs

The mgr updates mon metadata through handle_mon_map which
gets triggered when MONs were removed/added from/to cluster or
the active mgr is restarted or mgr failsover.
We could have handled metadata update through MMgrOpen or
early MMgrReport messages but these are sent before monitor
electin completes and lead monitor updates pending metadata
in monstore. Instead of relying on fetching mon metadata using
'ceph mon metadata <id>' command, explicitly send metadata
update request with mon metadata to mgr.

Fixes: https://tracker.ceph.com/issues/55088
Signed-off-by: Prashant D <pdhange@redhat.com>
(cherry picked from commit 1a065043b964f8c014ebb5bc890a243c398ff07c)

Merge pull request #46391 from ljflores/wip-55745-pacific

Merge PR #46336 into pacific

* refs/pull/46336/head:
16.2.9
mgr/ActivePyModules.cc: fix cases where GIL is held while attempting to lock mutex

Merge pull request #46277 from votdev/wip-55642-pacific

pacific: mgr/dashboard: Creating and editing Prometheus AlertManager silences is buggy

Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>

Merge pull request #46379 from rhcs-dashboard/wip-55738-pacific

pacific: mgr/dashboard: form field validation icons overlap with other icons

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: nSedrickm <NOT@FOUND>

Merge pull request #46343 from rhcs-dashboard/wip-55718-pacific

pacific: mgr/dashboard: customizable log-in page text/banner

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

Merge pull request #46228 from rhcs-dashboard/wip-55415-pacific

pacific: mgr/dashboard: fix wrong pg status processing

Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #46322 from rhcs-dashboard/wip-55690-pacific

pacific: mgr/dashboard: unselect rows in datatables

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>

qa/suites/rados/thrash-erasure-code-big/thrashers: add `osd max backfills` setting to mapgap and pggrow

All `rados/thrash-erasure-code-big` tests that die due to the “wait_for_recovery” timeout have one thing in common: They contain either `thrashers/pggrow` or `thrashers/mapgap`.

The difference between pggrow and mapgap vs. all other non-offending thrashers (default, careful, fastread, and morepggrow) is that they lack an override setting for `osd max backfills`. `osd max backfills` is the max number of backfill operations allowed to/from an OSD. The higher the number, the quicker the recovery. By default, this value is 1. On all of the non-offending thrashers (default, careful, fastread, and morepggrow), the default 1 value gets overridden in their .yaml files with a value > 1. This is not the case for pggrow and mapgap, however, as they lack an `osd max backfills` override setting.

The mclock op scheduler is known to override `osd max backfills` with a high value, but all of the thrash-erasure-code-big thrashers have their op queue set to “debug_random”, which chooses randomly between op queues (the debug_random op queue is set to override the default mclock_scheduler in qa/config/rados.yaml). So, coupled with the “debug_random” op queue, the low `osd max backfill` setting is causing some tests to time out in recovery.

WITHOUT `osd max backfills`, as they are now, “mapgap” and “pggrow” tests die due to timed-out recovery about 17/100 times, as seen here with a pggrow test: http://pulpito.front.sepia.ceph.com/lflores-2022-05-18_14:24:29-rados:thrash-erasure-code-big-master-distro-default-smithi/

WITH `osd max backfills` specified, as I have suggested in this PR, 99/100 tests passed, with one test failing for a different reason:
http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_22:40:27-rados:thrash-erasure-code-big-master-distro-default-smithi/

I also scheduled 145 tests WITH `osd max backfills` that are a mix of pggrow and mapgap thrashers. 144/145 tests passed, with one test failing for a different reason. http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_15:27:54-rados:thrash-erasure-code-big-master-distro-default-smithi/

Fixes: https://tracker.ceph.com/issues/51076
Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 40062676c2ceed49b9fa147127ffa83ba6118e2a)

Merge pull request #46359 from adk3798/pacific-staggered-upgrade

pacific: mgr/cephadm: staggered upgrade

Reviewed-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
Reviewed-by: Michael Fritch <mfritch@suse.com>

Merge pull request #45964 from adk3798/pacific-raw-osd

pacific: mgr/cephadm: Raw OSD Support

Reviewed-by: Guillaume Abrioux <gabrioux@redhat.com>

ceph.spec.in: remove build directory in %clean, not %install

Removing the build directory at the end of %install is too soon,
and means we get rid of a bunch of stuff needed to correctly
create debuginfo/debugsource packages, which happens automatically
right after %install. So, let's put it where it really belongs, in
the %clean section.

Fixes: aa18cb12003e3526c8e8f23dc2335a483fbfa68e
Fixes: https://tracker.ceph.com/issues/55079
Signed-off-by: Tim Serong <tserong@suse.com>
(cherry picked from commit 94ad178bdcbae56a8eafc65a3a276e25d7a51a5e)

ceph.spec.in: remove build directory at end of %install

By the time we get to the end of the %install section, all the
built binaries have been installed in the build root, so we can
delete the build directory from the source tree. This frees up
about 17GB of disk space on build hosts, which is helpful in
case other processes later in the RPM build need more disk space.

Fixes: https://tracker.ceph.com/issues/55079
Signed-off-by: Tim Serong <tserong@suse.com>
(cherry picked from commit aa18cb12003e3526c8e8f23dc2335a483fbfa68e)
Conflicts:
ceph.spec.in
- pacific uses "build", not "%{_vpath_builddir}"

mgr/dashboard: form field validation icons overlap with other icons

Signed-off-by: Sarthak0702 <sarthak.dev.0702@gmail.com>
(cherry picked from commit 0bd2d023026af737b1894f74a545f039a6ec2428)

Merge pull request #46352 from mgfritch/backport-46218-pacific

pacific: cephadm: prometheus: The generatorURL in alerts is only using hostname

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Volker Theile <vtheile@suse.com>

doc/cephadm: staggered upgrade docs

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 6a68def64eb720ef0eeace7c0d19c48cb1f6e5bb)

mgr/cephadm: unit test for staggered upgrade param validation

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 0a46fcb423133e662254ec1aad3704bcaf5e101b)

Conflicts:
src/pybind/mgr/cephadm/tests/test_upgrade.py

qa/suites/orch/cephadm: staggered upgrade test

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 791e1d20b363c5960e11263312293383e2748a9d)

mgr/cephadm: make use of new upgrade control parameters

Fixes: https://tracker.ceph.com/issues/54135
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit c1f3497b43bff6f7640161807dce01dc089ce405)

Conflicts:
src/pybind/mgr/cephadm/upgrade.py

mgr/cephadm: make UpgradeState from_json a bit safer

This way, for downgrades to whatever versions
this lands in onward, having added new parameters to
UpgradeState shouldn't break anything. Can't do much
about downgrades to older versions from this one
but this should help in the future.

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit aeaa0b5fd87068a31bfa61dd088c49affce42419)

mgr/cephadm: add new args and validation for staggered upgrade

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit e6b0fe0e4859f83ca69d14d89f9e47f0ea74e770)

Conflicts:
src/pybind/mgr/orchestrator/module.py

mgr/cephadm: split _do_upgrade into sub functions

This function was around 500 lines and difficult to work
with. Splitting it into sub functions should hopefully make
it a bit easier to understand and make changes to.

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 7b83c51fe63ae006b15dcf509c08a722f104788e)

Conflicts:
src/pybind/mgr/cephadm/upgrade.py

os/bluestore/bluefs: Fix data corruption in truncate()

It is possible to create condition in which a BlueFS contains file that is corrupted.
It can happen when BlueFS replay log is on device A and we just wrote to device B and truncated file.

Scenario:
1) write to file h1 on SLOW device
2) flush h1 (initiate transfer, but no fdatasync yet)
3) truncate h1
4) write to file h2 on DB
5) fsync h2 (forces replay log to be written, after fdatasync to DB)
6) poweroff

Fixes: https://tracker.ceph.com/issues/53129
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
(cherry picked from commit 49b7b44b3b5c94ee401562e603999e2b3bd8f9a2)

os/objectstore/test: Add test for data corruption in file truncation

Test for https://tracker.ceph.com/issues/53129

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
(cherry picked from commit 1f7771d4a77ebb271b939b3893d7607d964796f0)

cephadm: prometheus: The generatorURL in alerts is only using hostname

Prometheus is currently using only the hostname in the 'generatorURL' of an alert which causes issues when clicking on the URL in the Ceph Dashboard or somewhere else, because in most cases the hostname of the node that is running the Prometheus container is not resolvable.

To fix that the command line argument '--web.external-url' must be appended in the systemd unit file of the Prometheus container, e.g. '--web.external-url http://foo.bar:9095' whereas a FQDN hostname is used.

Fixes: https://tracker.ceph.com/issues/55595
Signed-off-by: Volker Theile <vtheile@suse.com>
(cherry picked from commit 4281dc1bbc466dd061781a984b34bb0eafaf482f)

doc: 16.2.9 Release notes

Signed-off-by: David Galloway <dgallowa@redhat.com>
(cherry picked from commit d7c5dc0dd3a5b4df0fe080d0f1af59af84e60328)
Signed-off-by: Zac Dover <zac.dover@gmail.com>

Merge pull request #46327 from adk3798/pacific-batch-may1

pacific: cephadm batch backport May

Reviewed-by: Redouane Kachach <rkachach@redhat.com>

Merge pull request #46309 from adk3798/pacific-public-network-bootstrap

pacific: cephadm: improve network handling during bootstrap

Reviewed-by: Redouane Kachach <rkachach@redhat.com>

Merge pull request #44769 from guits/wip-54009-pacific

pacific: ceph-volume: zap osds in rollback_osd()

Reviewed-by: Teoman ONAY <tonay@redhat.com>

mgr/dashboard: customizable log-in page text/banner

Fixes:https://tracker.ceph.com/issues/55231
Signed-off-by: Sarthak0702 <sarthak.dev.0702@gmail.com>
(cherry picked from commit 9f8bcd764e6d488d488e6ba1c05c2972329827b7)

mgr/dashboard: Creating and editing Prometheus AlertManager silences is buggy

When creating a new monitoring silence the form is pre-filled with the wrong alert data. It is always used the alert data from the very first object in the list of the API response but not the specified alert identified by the 'fingerprint' property.

The same problem applies to editing silences. The selected silence is not edited, it's always the first one in the list returned API response but not that with the specified 'id' property.

The main problem of the origin implementation is that the Prometheus Alertmanager API endpoints /api/v1/[alerts/silences] do not support querying. To fix that, filtering is done in the frontend.

Fixes: https://tracker.ceph.com/issues/55578
Signed-off-by: Volker Theile <vtheile@suse.com>
(cherry picked from commit 658486b566f0f9cac2fc0225c4cd78702f943d40)

Merge pull request #46326 from zdover23/wip-pr-46315-backport-to-pacific

pacific: doc/start: s/3/three/ in intro.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

doc/start: s/3/three/ in intro.rst

I'm changing "3" to "three" for two reasons:

1. It's correct.
2. This allows me to test backports into Octopus, Pacific, and Quincy.
   I am particularly interested to see what happens when I attempt
   the backport into Octopus, because backports into Octopus have
   failed. This will provide me with another unit of data.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
(cherry picked from commit 28efcec2d65e85ff2fa54e62b5b134e63ace853b)

16.2.9

mgr/cephadm: unit test for re-adding host and receiving loopback address

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit abfbbd383cadfa3e2862d939444e0e9218b3cb3b)

mgr/cephadm: re-use old ip when re-adding hosts if necessary

When a host is re-added without an explicit ip we can default to the old
ip we had stored for the host rather than either keeping the loopback
address or throwing an exception. We only want to actually error when
the only options left are error or use a resolved loopback address

Fixes: https://tracker.ceph.com/issues/53438
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 7e8d8317bef1b35cddd99950e503f57710002e80)

Conflicts:
src/pybind/mgr/cephadm/module.py

mgr/cephadm: stripping out / from the end of the url
Fixes: https://tracker.ceph.com/issues/55638
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 17032f6be22e9efc3e199d7e35091025bfaae965)