git.apps.os.sepia.ceph.com Git

librbd/cache/pwl/ssd: persist correct write_data_pos

WriteLogCacheEntry gets appended to persist_log_entries before
write_data_pos is updated with the actual media offset. Because
push_back() makes a copy, the updated write_data_pos value never
makes it to media, making recovery impossible.

Fixes: https://tracker.ceph.com/issues/50669
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit fe757401ada7bfd6784b6f9ca5556e1459df7a69)

librbd/cache/pwl/ssd: set m_bytes_allocated_cap on recovery

Currently it's set only when a new cache is formatted.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit cb9b3afd87a28e96d3688e1c73900d8043fac6cc)

librbd/cache/pwl/ssd: actually use first_{valid,free}_entry on recovery

first_valid_entry and first_free_entry pointers are read from media
but not actually used: both m_first_valid_entry and m_first_free_entry
get assigned 0 (or garbage). next_log_pos gets the same value as well
meaning that not only no recovery is attempted but the cache also gets
corrupted because DATA_RING_BUFFER_OFFSET is not applied.

Fixes: https://tracker.ceph.com/issues/50669
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ef020b85fb16c1730fc08eadfd1b51d3c4cd019a)

librbd/cache/pwl/ssd: don't count log entries

In ssd mode log entries are variable size. Attempting to count and
impose watermarks on the number of log entries is bogus because the
total number of entries it would take to fill the cache to capacity
is also variable and can't be precisely estimated.

had conflicts, but no new changes
Fixes: https://tracker.ceph.com/issues/50669
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ea65553b4a9ee1349c6da8452d861afe579e99e9)

librbd/cache/pwl: fix AbstractWriteLog::check_allocation() signature

All parameters are integers and none of them are (in-)out, so don't
take them by reference. Additionally num_lanes, num_log_entries and
num_unpublished_reserves don't need to be 64-bit as their respective
fields in AbstractWriteLog are 32-bit.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 74ecc4b76a10c53be928807b5be077f080d34724)

librbd/cache/pwl: rename m_log_pool_config_size to m_log_pool_size

trivial fix: no new changes: https://www.diffchecker.com/9btXJhCC
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 829ef952d2e408fe3676b38e7ecd26cbb04571a5)

librbd/cache/pwl: get rid of AbstractWriteLog::m_log_pool_actual_size

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 820bbecfb130ad99483f0d468f1b1c9612e54935)

librbd/cache/pwl/ssd: get rid of WriteLog::pool_size

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 6cd36592f631051d2ae49b7d92f2a30cd12c9c41)

librbd/cache/pwl/ssd/WriteLog: don't crash on split log entries

write_log_entries() will split a log entry at the end of the log, the
remainder is written to the beginning at DATA_RING_BUFFER_OFFSET. On
the read side aio_read_data_block() doesn't handle this case and just
crashes. Unless the workload in use is <= 4K, the image is rendered
unusable sooner or later.

trivial fix: formating changes
Fixes: https://tracker.ceph.com/issues/50589
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 155ee28d98b832e5414aa594094c86cb6bfee45e)

librbd/cache/pwl: include head and tail pointers in STATS

While at it, reduce the number of calls to operator<< and drop
the trailing comma.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 2a974fd0c1b5439be15559133077efe132b6628c)

librbd/cache/pwl: bump "Waiting for allocation" and "Retiring" dout level

Bump "Waiting for allocation" to 5.

"Retiring" is at 20 for rwl and 1 for ssd. Bump the latter to 20 as
well.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 626a995cf6e949ba47517eac62f2b97715e3fb06)

librbd/cache/pwl: use m_bytes_allocated_cap for both rwl and ssd

Follow rwl mode and use AbstractWriteLog::m_bytes_allocated_cap
instead of m_log_pool_ring_buffer_size specific to ssd. This fixes
"bytes available" calculation in STATS output.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 27dd7f85aefecea83a424036ef84116aae5d1857)

librbd/cache/pwl/ssd/WriteLog: decrement m_bytes_allocated when retiring

Currently if ssd cache is filled to capacity, all future I/O hangs
indefinitely because even though the cache eventually becomes clean
and retires enough entries to get back under RETIRE_HIGH_WATER, this
isn't communicated to AbstractWriteLog::check_allocation().

trivial fix: indentation https://www.diffchecker.com/9Vg9hgdl
Fixes: https://tracker.ceph.com/issues/50560
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit e2bbf4167fc8816d329ceae5c07c9e61599d9a17)

librbd/cache/pwl/ssd/WriteLog: fix free()/delete mismatch

Trivial-fix: space mismatch(no new change) https://www.diffchecker.com/WCSkqu2R

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 5b89c47ec5139a8d07be09c9f0021d90fe4a663b)

Merge pull request #43748 from tchaikov/pacific-doc-build

pacific: admin/doc-requirements.txt: pin Sphinx at 3.5.4

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>

admin/doc-requirements.txt: pin Sphinx at 3.5.4

* pin Sphinx at 3.5.4
* pin docutils at 0.18

at least the combination of these two versions
is known to compile.

to address the bug reported at
https://sourceforge.net/p/docutils/bugs/431/

the backtrace looks like:

/home/jenkins-build/build/workspace/ceph-pr-docs/build-doc/virtualenv/lib/python3.8/site-packages/sphinx/util/docutils.py:285:
RemovedInSphinx30Warning: function based directive support is now
deprecated. Use class based directive instead.
  warnings.warn('function based directive support is now deprecated. '

Exception occurred:
  File
"/home/jenkins-build/build/workspace/ceph-pr-docs/build-doc/virtualenv/lib/python3.8/site-packages/docutils/writers/html5_polyglot/__init__.py",
line 445, in section_title_tags
    if (ids and self.settings.section_self_link
AttributeError: 'Values' object has no attribute 'section_self_link'

please note this change is not cherry-picked from
master, because master already bumped Sphinx to 3.5.4
in 4968baa2523bd2a5ca6be147b26bc28906a864c9.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>

Merge pull request #43543 from rhcs-dashboard/wip-52870-pacific

pacific: mgr/dashboard: clean-up controllers and API backward versioning compatibility

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>

Merge pull request #43417 from trociny/wip-51646-pacific

pacific: osd/OSD: mkfs need wait for transcation completely finish

Reviewed-by: Kefu Chai <kchai@redhat.com>

Merge pull request #43562 from lxbsz/vino_fix

Pacific: test/libcephfs: put inodes after lookup

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #43559 from batrick/i52654-pacific

pacific: pybind/mgr/cephadm: set allow_standby_replay during CephFS upgrade

Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>

Merge pull request #43475 from lxbsz/tracker_52876

pacific: test: shutdown the mounter after test finishes

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #43644 from aaSharma14/wip-52965-pacific

pacific: mgr/dashboard: monitoring: grafonnet refactoring for radosgw dashboards

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

Merge pull request #43619 from smithfarm/wip-53005-pacific

pacific: rgw/tracing: unify SO version numbers within librgw2 package

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #43512 from neha-ojha/wip-52770-pacific

pacific: os/bluestore: list obj which equals to pend

Reviewed-by: Igor Fedotov <ifedotov@suse.com>

Merge pull request #43513 from neha-ojha/wip-52620-pacific

pacific: osd: fix partial recovery become whole object recovery after restart osd

Reviewed-by: Josh Durgin <jdurgin@redhat.com>

Merge pull request #43511 from neha-ojha/wip-52843-pacific

pacific: msg/async/ProtocolV2: Set the recv_stamp at the beginning of receiving a message

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>

Merge pull request #43445 from k0ste/wip-52848-pacific

pacific: mgr: Add check to prevent mgr from crashing

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>

Merge pull request #43437 from trociny/wip-52831-pacific

pacific: osd: re-cache peer_bytes on every peering state activate

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #43421 from callithea/wip-52289-pacific

pacific: qa/tasks/mgr: skip test_diskprediction_local on python>=3.8

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #43353 from kamoltat/wip-ksirivad-backport-pacific-37544

pacific: mgr/progress: optimize global recovery && introduce 5 seconds interval

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>

mgr/dashboard: monitoring: grafonnet refactoring for hosts dashboards

This PR intends to refactor hosts dashboards using grafonnet

Fixes:https://tracker.ceph.com/issues/52777
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit f7714de294dd7376a9a8ae5131aa429322b459c3)

Conflicts:
monitoring/grafana/dashboards/jsonnet/grafana_dashboards.jsonnet(merging all the jsonnet dashboards in one PR)

Merge pull request #43646 from rhcs-dashboard/wip-53026-pacific

pacific: mgr/dashboard: pin a version for autopep8 and pyfakefs

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

mgr/dashboard: pin a version for autopep8 and pyfakefs

Fixes: https://tracker.ceph.com/issues/53024
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 946dab4f608ec47e0a3cfefdf8e7d1afda69117f)

mgr/dashboard: monitoring: grafonnet refactoring for cephfs dashboards

This PR intends to refactor cephfs dashboards using grafonnet

Fixes:https://tracker.ceph.com/issues/52777
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit ed954b0e6ce24fbae66f78f7e4f90416b9ed7749)

mgr/dashboard: monitoring: grafonnet refactoring for osds dashboards

This PR intends to refactor osds dashboards using grafonnet

Fixes:https://tracker.ceph.com/issues/52777
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit e490e2f3abe707a2e891171f3c230d44e282c601)

mgr/dashboard: monitoring: grafonnet refactoring for pools dashboards

This PR intends to refactor pools dashboards using grafonnet

Fixes:https://tracker.ceph.com/issues/52777
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 8c48821c21f7a6b248de10ff6750a63bab1e4948)

mgr/dashboard: monitoring: grafonnet refactoring for rbd dashboards

This PR intends to refactor rbd dashboards using grafonnet

Fixes:https://tracker.ceph.com/issues/52777
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit e737aaa000a31e2f37ca90eb813f031a42edef3b)

mgr/dashboard: monitoring: grafonnet refactoring for radosgw dashboards

This PR intends to refactor radosgw dashboards using grafonnet

Fixes:https://tracker.ceph.com/issues/52777
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit eb01954cd999430417555628e0099f645d371746)

rgw/tracing: unify SO version numbers within librgw2 package

The librgw2 package contains several SO files. Two of those - librgw_op_tp.so
and librgw_rados_tp.so - had a different version number than the main librgw.

This was a violation of the openSUSE Shared Library Packaging Policy [1] but it
also seems like a "violation" of common sense.

[1] https://en.opensuse.org/openSUSE:Shared_library_packaging_policy#Package_naming

Fixes: https://tracker.ceph.com/issues/52979
Signed-off-by: Nathan Cutler <ncutler@suse.com>
(cherry picked from commit 172d6e01d5079f445044da9fe0823ceb353bdc86)

Merge pull request #43548 from rzarzynski/pacific-50483

pacific: msgr/async: fix unsafe access in unregister_conn()

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #43610 from rhcs-dashboard/wip-pr_triage_dashboard-pacific

.github: add dashboard PRs to Dashboard project

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #43440 from rhcs-dashboard/wip-52835-pacific

pacific: qa/mgr/dashboard/test_pool: don't check HEALTH_OK

Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

.github/pr-triage: rename GH token

Repo projects use GITHUB_TOKEN instead of MY_GITHUB_TOKEN:
https://github.com/srggrs/assign-one-project-github-action/blob/master/entrypoint.sh#L19

Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit 2220646c2085f6967e61d21ff19145666f5a1285)

.github: add dashboard PRs to Dashboard project

This action automatically adds PRs with 'dashboard' label to the
'Dashboard' project (https://github.com/ceph/ceph/projects/6).

Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit ed55c527f10237c0ab48038639a971e85f8e1377)

Merge pull request #43200 from batrick/i52639

pacific: MDSMonitor: handle damaged state from standby-replay

Reviewed-by: Venky Shankar <vshankar@redhat.com>

qa/tasks/backfill_toofull: make test work when compression on

The osd backfill reservation does not take compression into account so
we need to operate with "uncompressed" bytes when calculating nearfull
ratio.

Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit 429ac06cbb44b8a8263beb0d0780a01cedb517ba)

Merge pull request #43267 from cfsnyder/wip-52588-pacific

pacific: ceph-volume: fix lvm activate --all --no-systemd

Merge pull request #43523 from rhcs-dashboard/wip-52911-pacific

pacific: mgr/dashboard: replace "Ceph-cluster" Client connections with active-standby MGRs

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

Merge pull request #43541 from rhcs-dashboard/wip-52931-pacific

pacific: mgr/dashboard: Fix orchestrator/01-hosts.e2e-spec.ts failure

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

Merge pull request #43240 from callithea/wip-52292-pacific

pacific: mgr/dashboard: visual tests: Add more ignore regions for dashboard component

Reviewed-by: aaryanporwal <NOT@FOUND>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

mgr/dashboard: replace string version with class

* APIVersion:
  * Moved to a separate file
  * Added doctests
  * Added sentinel values:
    * DEFAULT = 1.0
    * EXPERIMENTAL = 0.1
    * NONE = 0.0
  * Added to_mime_type() helper method
* Controllers.__init__:
  * Added type hints
  * Replaced string versions with APIVersions
* Feedback controller:
  * Replaced with EXPERIMENTAL (probably it should be NONE)

Fixes: https://tracker.ceph.com/issues/52480
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
Conflicts:
src/pybind/mgr/dashboard/controllers/__init__.py
   - Remove the current changes and keep the incoming new changes
src/pybind/mgr/dashboard/controllers/crush_rule.py
   - Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/controllers/docs.py
   - Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/controllers/feedback.py
   - Deleted the file since feedback module isn't backported to pacific
src/pybind/mgr/dashboard/controllers/host.py
   - Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/openapi.yaml
   - Generated a new openapi yaml file
src/pybind/mgr/dashboard/tests/__init__.py
   - Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/tests/test_docs.py
   - Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/tests/test_host.py
   - Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/tests/test_tools.py
   - Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/tests/test_versioning.py
   - Changes related to the versioning like importing the APIVersion
src/pybind/mgr/dashboard/controllers/crush_rule.py
   - Removed the MethodMap decorator which updates the version of the
     enpoint to 2.0 because those changes which caused that version
     updating were not backported to pacific

test: shutdown the mounter after test finishes

In the previous backport commit (5772641cb9bde083), when resolving
the conflicts, this has been missed.

Fixes: https://tracker.ceph.com/issues/52876
Signed-off-by: Xiubo Li <xiubli@redhat.com>

test/libcephfs: put inodes after lookup

Otherwise, the client umount will hang due to inability to trim the
inodes looked up using the low-level interface. This results in slow-op
warnings and an eviction:

2021-09-11T17:23:31.097+0000 7f99c3522700 0 log_channel(cluster) log [WRN] : evicting unresponsive client smithi176 (9756), after 303.924 seconds
2021-09-11T17:23:31.097+0000 7f99c3522700 10 mds.0.server autoclosing stale session client.9756 172.21.15.176:0/3891214934 last renewed caps 303.924s ago

From: /ceph/teuthology-archive/yuriw-2021-09-11_16:21:09-smoke-pacific-distro-basic-smithi/6385038/remote/smithi175/log/ceph-mds.b.log.gz

Fixes: https://tracker.ceph.com/issues/52572
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit c0252063b94d811dc7863058999856ac5614d1eb)

Conflicts:
src/test/libcephfs/test.cc

qa: add test for cephfs upgrade sequence

This also checks max_mds>1 and allow_standby_replay are restored to
previous values.

Future work can add tests for multiple file systems (or volumes).

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit b1420e5771927f5c659e0e5edbc5714035f3df09)

qa: add tasks to check mds upgrade state

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 5a7382214fe4dbd4b79773c6e732512ade22793a)

qa: add note about where caps are generated

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit dbe5573ed4781cb4b214e701c77be7bc2cddabf3)

qa: move CephManager cluster instantiation to subtask

This needs to be available for the cephfs_setup task so administration
mounts can run ceph commands, potentially through `cephadm shell`.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 7812cfb6744fc3bce50e26aa7dd6a4e47a43bb23)

pybind/mgr/cephadm: disable allow_standby_replay during CephFS upgrade

Following procedure in [1].

Also: harden checks for active. Ensure "up" and "in" are both [0]. There
should be no standby-replay daemon.

[1] https://docs.ceph.com/en/pacific/cephfs/upgrading/

Fixes: https://tracker.ceph.com/issues/52654
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit bca21f01ce3bb32e0951f0fe15da88a81750a191)

pybind/mgr/cephadm: always do mds upgrade sequence

Minor versions also require this sequence.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 4affb5c7029f6b83d640aa7b7206d9cf61e75f1d)

mgr/dashboard: make modified API endpoints backward compatible

Fixes: https://tracker.ceph.com/issues/52480
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Introducing APIVersion class to handle versioning for API-endpints and making
them backward compatible.

mgr/dashboard: clean-up controllers

Fixes: https://tracker.ceph.com/issues/52589
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
Conflicts:
src/pybind/mgr/dashboard/CMakeLists.txt
- Added some testts in the CephTest section

msgr/async: fix unsafe access in unregister_conn()

We were looking at anon_conns and accepting_conns without holding
the lock (deleted_lock is not sufficient).

Drop this test, and move the decrements:

- inc when we add to conns or anon_conns (no changes there)
- dec when we remove from deleted_conns (several different paths!)

Fixes: https://tracker.ceph.com/issues/49237
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit d51d80b3234e17690061f65dc7e1515f4244a5a3)
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

mgr/dashboard: Fix orchestrator/01-hosts.e2e-spec.ts failure

The test is failing on deleting a host because the agent daemon is
present in that host. Its not possible to simply delete a host. We need
to drain it first and then delete it.

Fixes: https://tracker.ceph.com/issues/52764
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit db5cfb15e55dadf7bd5c381f53a4ea548fcea152)

mgr/dashboard: replace Client connections with active-stdby mgrs

Fixes: https://tracker.ceph.com/issues/52121
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit d388c5e958ddf5447c78db50ca2061bb443d2227)

osd: fix partial recovery become whole object recovery after restart osd

support SERVER_OCTOPUS feature for pg_missing_item::encode()

Fixes: https://tracker.ceph.com/issues/52583
Signed-off-by: Jianwei Zhang <jianwei1216@qq.com>
(cherry picked from commit dcdb188b6f577551fb377ba34145419f81322b03)

os/bluestore: list obj which equals to pend

otherwise we could have failures like

scrub : stat mismatch, got 3/4 objects, 1/2 clones, 3/4 dirty, 3/4 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 49/56 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes."

where the numbers of scrubbed object, clones, dirty and omap are always
less than the total number of corresponding numbers, if the PG contains
object(s) whose hash happens to be 0xffffffff.

in this change, if the calculated hash of the upper bound is greater
than the maximum possible number represented by uint32_t, in addition to
setting the hash of the upper bound hobj to 0xffffffff, we also set the
nspace of hobj of the upper bound to "\xff", so that the upper bound
is greater than an hobj whose hash happens to be 0xfffffff. please note,
the nspace of "\xff" is not an ascii string, so it's not likely to be
less than a real-world nspace of an hobj.

with this new *greater* upper bound, we are able to include the previous
missing hobj when listing the objects in a PG. so the scrub won't be
annoyed when the number of objects does not match.

Fixes: https://tracker.ceph.com/issues/52705
Signed-off-by: Mykola Golub <mykola.golub@clyso.com>
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit ffab13bcd9006c1f961a24b8016df9d1fe06ba1d)

os/bluestore: use scope_guard to log latency

simpler this way, and avoid using `goto`.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 715a83822ebc1a3d102d1ec13323b69db0600719)

msg/async/ProtocolV2: replace ltt_recv_stamp with recv_stamp

Fixes: https://tracker.ceph.com/issues/52739
Signed-off-by: dongdong tao <dongdong.tao@canonical.com>
(cherry picked from commit 1b1a91c31ba6078caff045c499b8737e0068460f)

msg/async/ProtocolV2: Set the recv_stamp at the beginning of receiving a message instead of after receiving.

Fixes: https://tracker.ceph.com/issues/52739
Signed-off-by: dongdong tao <dongdong.tao@canonical.com>
(cherry picked from commit 5ca30f396bface2a8e95a0efb1b97f8c1b64de1c)

Merge pull request #43368 from tchaikov/pacific-pr-39602

pacific: mgr/influx: use "N/A" for unknown hostname

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #43351 from rhcs-dashboard/wip-52772-pacific

pacific: qa/mgr/dashboard: add extra wait to test

Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #43347 from rhcs-dashboard/wip-52763-pacific

pacific: mgr/dashboard: Move force maintenance test to the workflow test suite

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

Merge pull request #43167 from ktdreyer/pacific-52610-cmake-thread-libs-init

pacific: cmake: link Threads::Threads instead of CMAKE_THREAD_LIBS_INIT

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #43199 from vshankar/wip-52627

pacific: mgr/mirroring: remove unnecessary fs_name arg from daemon status command

Reviewed-by: Xiubo Li <xiubli@redhat.com>

Merge pull request #43198 from vshankar/wip-52444

pacific: cephfs-mirror: shutdown ClusterWatcher on termination

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jos Collin <jcollin@redhat.com>

Merge pull request #43148 from lxbsz/fair_mutex

pacific: mds: switch mds_lock to fair mutex to fix the slow performance issue

Reviewed-by: Jeff Layton <jlayton@redhat.com>

MetricCollector.h: Add check to prevent mgr from crashing

Fixes: https://tracker.ceph.com/issues/52801
Signed-off-by: Aswin Toni <aswin.toni@cern.ch>
(cherry picked from commit 9a05872fdd499575961ee1a8d188d19054841eb8)

qa/mgr/dashboard/test_pool: don't check HEALTH_OK

Fixes: https://tracker.ceph.com/issues/48845
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit 2283cb068b82033b14587c7bac6a28440221dcd8)

qa/suites/rados: add backfill_toofull test

Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit 76743e005866664795e9240460734b31108824e2)

qa/tasks/ceph_manager: fix assertion

The osd may be 0.

Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit e0a926a2c18d76225fd4d4051bc19b9a1917b932)

osd: re-cache peer_bytes on every peering state activate

peer_bytes is used for backfill reservation request and may be
reset if backfill is interrupted, and we want it set back before
continuing backfill and re-sending the reservation request.

Fixes: https://tracker.ceph.com/issues/52448
Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit bdfdf96d2f6c3cf7e5595ae5b8238fd4c0b3c6bc)

Merge pull request #43348 from cfsnyder/wip-52350-pacific

pacific: rgw: fix sts memory leak

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #42643 from cfsnyder/wip-51803-pacific

pacific: rgw/notifications: send correct size in case of delete marker creation

Reviewed-by: Casey Bodley <cbodley@redhat.com>

qa/tasks/mgr: skip test_diskprediction_local on python>=3.8

query the python version before trying to test diskprediction_local

Fixes: https://tracker.ceph.com/issues/50196
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 39b2b5edc008900d531be95ece1ce75a1e036914)

mgr/selftest: add a command for querying python version

so the test driver can skip certain tests based on the version of python
runtime on the test node

Fixes: https://tracker.ceph.com/issues/50196
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 91bc0e54ab816fca12a08817c261bbbf65606726)

osd/OSD: mkfs need wait for transcation completely finish

when do ceph-osd mkfs, when ceph-osd process exit, sometimes
the block data could be written incompletely. we need add
wait for it complete.

Signed-off-by: Chen Fan <fan.chen@easystack.cn>
(cherry picked from commit 0ffadad3a83b3ca634d7d58a80c84d1d8761e2ea)

Merge pull request #43235 from MrFreezeex/wip-51839-pacific

pacific: ceph.spec: selinux scripts respect CEPH_AUTO_RESTART_ON_UPGRADE

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Dan van der Ster <daniel.vanderster@cern.ch>

Merge pull request #43264 from cfsnyder/wip-52332-pacific

pacific: cmake: s/Python_EXECUTABLE/Python3_EXECUTABLE/

Reviewed-by: Michael Fritch <mfritch@suse.com>

Merge pull request #43099 from cfsnyder/wip-51952-pacific

pacific: osd: fix to recover adjacent clone when set_chunk is called

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #43306 from myoungwon/pacific-backport-52322

pacific: osd: fix to allow inc manifest leaked

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>

mgr/influx: use "N/A" for unknown hostname

in theory, there is chance that get_metadata() returns None, so let use
"N/A" in this case.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit e457ca50011f70cf01a62323998af233a484f338)

mgr/progress: optimize global recovery module

Instead of fetching `pg_stats` from the python
part of manager module, we filter out the pgs
that are in active + clean state in ActivePyModules.cc
then parse these pgs along with `reported_epoch` and
the `total_num_pgs` of the clusters to global recovery
module.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
(cherry picked from commit fa92db1b37e5633e89fc39a4653c39973bf23867)

mgr/test_progress.py: Delay recover in test_progress

Changes some the tests in teuthology to make
the test more deterministic.
Using:

`ceph osd set norecover` and
`ceph osd set nobackfill` when marking osds in
or out. As this will delay the recovery and make
sure it the test cases get the chance to check
that there is actually events poping up in
the progress module.

took out test_osd_cannot_recover from
tasks/mgr/test_progress.py since it is no longer
a relevant test case since recovery will get
triggered regardless if pg is unmoved.

Ignoring `OSDMAP_FLAGS` in teuthology
because we are using norecover and nobackfill
to delay the recovery process, therefore, it
will create a health warning and fails the
teuthology test.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
(cherry picked from commit 5f33f2f6e0609b452db47b341aaf6d5889917563)

pybind/mgr/progress: introduce 5 second sleep interval

Current progress module only checks pg stats
and osdmap when it is notified by the cluster.
However, this is expensive in large cluster
with many pools and osds. we
change it to only check both pg stats and osdmap
every 5 seconds.

in the function _osd_in_out() we now calculate
`is_relocated` by: old_osds != new_osds such that
it does not matter if the difference between osds
are positive or negative.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
(cherry picked from commit 4504749b81f9cb11d92d5f280565aff3f243adf3)

pybind/mgr/progress/test_progress.py: fix type of reported_epoch

because reported_epoch is an int, not a string

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit a8f3a0eb83653ce6b50aaccd43bdc456e6394484)

pybind/mgr/progress/module.py: no need to cast reported_epoch and _start_epoch

reported_epoch is an int, see 22128e3de697f3fdf66faf3fe3b701a3a599968f
and _start_epoch is also an int, see type annotations in
2af2afa5e9191115bb6f0b36194830ffb91938bf

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit da268faed8e7a3eacb68b1c92855dc3a43225961)
Signed-off-by: Kamoltat <ksirivad@redhat.com>

Merge pull request #43238 from rhcs-dashboard/wip-52685-pacific

pacific: mgr/dashboard: Fix failing config dashboard e2e check

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

qa/mgr/dashboard: add extra wait to test

Fixes: https://tracker.ceph.com/issues/49344
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit 9ff778cdaa1ef40fcfa04f221a1da786a0e19655)

rgw: fix sts memory leak

fix https://tracker.ceph.com/issues/52290

Signed-off-by: yuliyang_yewu <yuliyang_yewu@cmss.chinamobile.com>
(cherry picked from commit ef921bcdaa78d33ed0611a60ec58826d8e6ccb45)

rgw/notifications: send correct size in case of delete marker creation

Fixes: https://tracker.ceph.com/issues/51681
Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
(cherry picked from commit d81e27faa1033c5290cfd0b4cf27cdaf98d34bc4)

Conflicts:
src/rgw/rgw_op.cc
src/test/rgw/bucket_notification/test_bn.py

Cherry-pick notes:
- src/test/rgw/bucket_notification/test_bn.py changes manually applied to src/test/rgw/rgw_multi/tests_ps.py for Pacific
- conflicts in rgw_op.cc due to rename of RGWObject to Object after Pacific