git.apps.os.sepia.ceph.com Git

mon/OSDMonitor: avoid null dereference if stats are not available

Not confirmed yet whether this was the issue in the bug referenced
below, however it's a necessary defensive check for the
'osd pool get-quota' command.

All other uses of get_pool_stats() already handle this case.

Related-to: https://tracker.ceph.com/issues/53740
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
(cherry picked from commit 9c8392be33a574f15bc35b8f49e319af50d99e90)

Conflicts:
src/mon/OSDMonitor.cc

Cherry-pick notes:
- mon variable was a pointer in Octopus

Merge pull request #44373 from rhcs-dashboard/cypress-octopus

octopus: mgr/dashboard: upgrade Cypress to the latest stable version

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>

mgr/dashboard: upgrade Cypress to the latest stable version

- Remove unneeded dependency that was causing UI performance issues: zone.js
- Ignore 'ResizeObserver loop limit exceeded' error.
- run-frontend-e2e-tests.sh refactoring: create rgw dashboard user through
  'ceph dashboard set-rgw-credentials' and use it on rgw buckets' tests.

Fixes: https://tracker.ceph.com/issues/53357
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 3e4e29590aa1742fc3b44d21389325a13cca8199)

Conflicts:
src/pybind/mgr/dashboard/frontend/cypress/integration/rgw/buckets.e2e-spec.ts
   Reject the current changes
        src/pybind/mgr/dashboard/frontend/cypress/integration/rgw/buckets.po.ts
   Reject the current changes
src/pybind/mgr/dashboard/frontend/cypress/integration/ui/navigation.po.ts
   Deleted this file since its not in octopus
src/pybind/mgr/dashboard/frontend/package-lock.json
   Generated new file
src/pybind/mgr/dashboard/frontend/package.json
   Kept zone.js and changed the cypress version to 9.0.0
src/pybind/mgr/dashboard/run-frontend-e2e-tests.sh
   Accept the current change

Merge pull request #43822 from trociny/wip-48925-octopus

octopus: cephadm: Fix iscsi client caps (allow mgr <service status> calls)

Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>

Merge pull request #43788 from sebastian-philipp/backport-43039

octopus: qa/distros: Remove stale kubic distros

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Reviewed-by: Adam King adking@redhat.com

Merge pull request #43785 from lxbsz/wip-51415

Octopus: mds: just respawn mds daemon when osd op requests timeout

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Venky Shankar vshankar@redhat.com

Merge pull request #44271 from nmshelke/wip-53331-octopus

octopus: doc: prerequisites fix for cephFS mount

Reviewed-by: Venky Shankar vshankar@redhat.com

Merge pull request #44270 from vshankar/tr-53444

octopus: qa: account for split of the kclient "metrics" debugfs file

Reviewed-by: Xiubo Li <xiubli@redhat.com>

Merge pull request #44169 from cfsnyder/wip-50850-octopus

octopus: mds: PurgeQueue.cc fix for 32bit compilation

Reviewed-by: Venky Shankar vshankar@redhat.com

Merge pull request #43842 from vshankar/wip-52953

octopus: mds: skip journaling blocklisted clients when in `replay` state

Reviewed-by: Venky Shankar vshankar@redhat.com

Merge pull request #43816 from lxbsz/wip-53163

octopus: mds: do not trim stray dentries during opening the root

Reviewed-by: Venky Shankar vshankar@redhat.com

Merge pull request #43252 from kotreshhr/wip-52632-octopus

octopus: mds: Add new flag to MClientSession

Reviewed-by: Venky Shankar vshankar@redhat.com

Merge pull request #41295 from kotreshhr/wip-50626-octopus

octopus: cephfs: client: Fix executeable access check for the root user

Reviewed-by: Venky Shankar vshankar@redhat.com

Merge pull request #44159 from callithea/wip-53094-octopus

octopus: mgr/dashboard: all pyfakefs must be pinned on same version

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Laura Flores <lflores@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: rishabh-d-dave <NOT@FOUND>

Merge pull request #43965 from cfsnyder/wip-51485-octopus

octopus: pybind/mgr/balancer: define Plan.{dump,show}()

Reviewed-by: Neha Ojha <nojha@redhat.com>

pybind/mgr/balancer: define Plan.{dump,show}()

as they are called by the commands

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 0d48b03)

Conflicts:
src/pybind/mgr/balancer/module.py

Cherry-pick notes:
- Conflicts due to missing type annotations in Octopus

Merge pull request #43967 from cfsnyder/wip-53224-octopus

octopus: qa/rgw: bump tempest version to resolve dependency issue

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #43963 from cfsnyder/wip-53099-octopus

octopus: qa/rgw: Fix vault token file access.case

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #43953 from guits/wip-53279-octopus

octopus: cephadm/ceph-volume: do not use lvm binary in containers

Merge pull request #44177 from cfsnyder/wip-52451-octopus

octopus: rpm, debian: move smartmontools and nvme-cli to ceph-base

Reviewed-by: Yaarit Hatuka <yaarit@redhat.com>
Reviewed-by: Kefu Chai <tchaikov@gmail.com>

Merge pull request #43947 from guits/wip-53277-octopus

octopus: ceph-volume: fix bug with miscalculation of required db/wal slot size for VGs with multiple PVs

ceph-volume: remove --root param from nsenter cmd

This is redundant and makes nsenter throw messages like following:
```
  Failed to find sysfs mount point
  dev/block/11:0/holders/: opendir failed: Not a directory
  dev/block/252:0/holders/: opendir failed: Not a directory
  dev/block/253:0/holders/: opendir failed: Not a directory
  dev/block/252:1/holders/: opendir failed: Not a directory
  dev/block/253:1/holders/: opendir failed: Not a directory
  dev/block/252:2/holders/: opendir failed: Not a directory
  dev/block/253:2/holders/: opendir failed: Not a directory
  dev/block/252:3/holders/: opendir failed: Not a directory
  dev/block/253:3/holders/: opendir failed: Not a directory
  dev/block/252:16/holders/: opendir failed: Not a directory
  dev/block/252:32/holders/: opendir failed: Not a directory
  dev/block/252:48/holders/: opendir failed: Not a directory
  dev/block/252:64/holders/: opendir failed: Not a directory
  ```

Fixes: https://tracker.ceph.com/issues/52926
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e4667e81301295f4c81328505e4376d2aef66fb2)

cephadm: mount rootfs in osd containers

See ceph-volume tracker for details [1]

[1] https://tracker.ceph.com/issues/52926

Fixes: https://tracker.ceph.com/issues/51592
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 48b369e2caf3222bf594dc09f87b5969a53dfbe7)

ceph-volume: implement lvm wrapper

ceph-volume should run pv/vg/lv commands in the host namespace rather than
running them inside the container in order to avoid lvm metadata corruption.

Fixes: https://tracker.ceph.com/issues/52926
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4d33630deeaee51578868fb29337da802e9cb231)

Merge pull request #44210 from guits/wip-53372-octopus

octopus: ceph-volume: human_readable_size() refactor

Merge pull request #44174 from cfsnyder/wip-51149-octopus

octopus: osd: set r only if succeed in FillInVerifyExtent

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #44176 from cfsnyder/wip-51171-octopus

octopus: common/PriorityCache: low perf counters priorities for submodules.

Reviewed-by: Igor Fedotov <ifedotov@suse.com>

Merge pull request #44165 from cfsnyder/wip-52710-octopus

octopus: osd: fix partial recovery become whole object recovery after restart osd

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #44097 from cfsnyder/wip-53389-octopus

octopus: osd/OSDMap.cc: clean up pg_temp for nonexistent pgs

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>

Merge pull request #44254 from neha-ojha/wip-perf-octopus

octopus: qa: miscellaneous perf suite fixes

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>

Merge pull request #43962 from cfsnyder/wip-53200-octopus

octopus: osd: fix 'ceph osd stop <osd.nnn>' doesn't take effect

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #43861 from pponnuvel/wip-53198-octopus

octopus: mon/MgrStatMonitor: ignore MMgrReport from non-active mgr

Reviewed-by: Mykola Golub <mgolub@mirantis.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>

doc: prerequisites fix for cephFS mount

Fixes: https://tracker.ceph.com/issues/53054
Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>
(cherry picked from commit e369dc1128941e9fb37bf38a4b66e6c54c8a84df)

qa: account for split of the kclient "metrics" debugfs file

Recently, Luis posted a patch to turn the metrics debugfs file into a
directory with separate files for the different sections in the old
metrics file.

Account for this change in get_op_read_count().

Fixes: https://tracker.ceph.com/issues/53214
Signed-off-by: Jeff Layton <jlayton@redhat.com>
(cherry picked from commit e9f2bff8cd7df1c81ff8bbfa2530f470d9c6af2c)

qa: test_readahead add kernel client support

If the "ceph.cluster_fsid" and "ceph.client_id" vxattrs or the
"metric" debug file are not support yet, will assume the test
succeeds.

Fixes: https://tracker.ceph.com/issues/48053
Signed-off-by: Xiubo Li <xiubli@redhat.com>
(cherry picked from commit 2f4980f394301fca793af9ca513aca6fa1d1822a)

qa: switch to 'osdop_read' instead of 'op_r'

The 'op_r' will just acount CEPH_OSD_FLAG_READ flag, which will
include some other none real data read opcodes, like the CEPH_OSD_OP_STAT.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
(cherry picked from commit 4f1817aa8fd62d70e708e69aeb1ad7413498b63b)

Merge pull request #43887 from ifed01/wip-ifed-more-errors-shared-blob-repair-oct

octopus: os/bluestore: fix additional errors during missed shared blob repair.

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #43885 from ifed01/wip-ifed-fix-invalid-offset-repair-oct

octopus: os/bluestore: fix writing to invalid offset when repairing

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #43883 from ifed01/wip-ifed-fix-53011-oct

octopus: os/bluestore: use proper prefix when removing undecodable Share Blob.

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #43757 from ifed01/wip-ifed-fix-write-small-head-pad-oct

octopus: os/bluestore: _do_write_small fix head_pad

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #44172 from cfsnyder/wip-52074-octopus

octopus: rgw: user stats showing 0 value for "size_utilized" and "size_kb_utilized" fields

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #44170 from cfsnyder/wip-53133-octopus

octopus: rgw: disable prefetch in rgw_file to fix 3x read amplification

Reviewed-by: Casey Bodley <cbodley@redhat.com>

qa/suites/rados/perf/ceph.yaml: remove rgw

This is no longer required because we removed cosbench workloads in
fd350fd0150a2d4072f055658c20314a435a19ba. This is also required to prevent
failures like the following or any other changes that break the rgw task:

```
2021-08-06T20:13:25.812 INFO:teuthology.orchestra.run.smithi060.stderr:curl: (7) Failed to connect to smithi060.front.sepia.ceph.com port 80: Connection refused
2021-08-06T20:15:33.813 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_04c2febe7099917d97a71271f17abb5710030132/teuthology/contextutil.py", line 31, in nested
    vars.append(enter())
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_3c0f8c8164075af7aac4d1f2805d3f4580709461/qa/tasks/rgw.py", line 191, in start_rgw
    wait_for_radosgw(url, remote)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_3c0f8c8164075af7aac4d1f2805d3f4580709461/qa/tasks/util/rgw.py", line 94, in wait_for_radosgw
    assert exit_status == 0
AssertionError
```

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 119544bb29e253322af33e593ffd09e325c2af8a)

qa: remove cosbench workloads from perf suites

Due to https://tracker.ceph.com/issues/49139

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit fd350fd0150a2d4072f055658c20314a435a19ba)

qa: use ubuntu_latest for perf suites

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 5957d1797a4f67b4545c2554dff240463af87359)

Merge pull request #44227 from ceph/octopus-mistune

doc: Use older mistune

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #43863 from ivancich/wip-multipart-purge-fix-octopus

octopus: rgw: fix bucket purge incomplete multipart uploads

Reviewed-by: Adam Emerson <aemerson@redhat.com>

Merge pull request #43961 from cfsnyder/wip-53272-octopus

octopus: rgw/beast: optimizations for request timeout

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #43810 from cbodley/wip-qa-rgw-java-octopus

qa/rgw: octopus branch targets ceph-octopus branch of java_s3tests

Reviewed-by: Nathan Cutler <ncutler@suse.com>
Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
Reviewed-by: Ali Maredia <amaredia@redhat.com>

Merge pull request #43696 from cfsnyder/wip-52959-octopus

octopus: rgw/rgw_rados: make RGW request IDs non-deterministic

Reviewed-by: Casey Bodley <cbodley@redhat.com>

doc: Use older mistune

https://github.com/miyakogi/m2r/issues/66

Signed-off-by: David Galloway <dgallowa@redhat.com>
(cherry picked from commit ed2ad24a4ba3ad3f8103926bfea2466b9eb61222)

ceph-volume: human_readable_size() refactor

This commit refactors the `human_readable_size()` function.

The current implementation has a couple of issues:

in a 'human readable' mindset, I would expect `human_readable_size(1024)` to
return '1.00 KB' instead of '1024.00 KB'.

```
In [1]: from ceph_volume.util.disk import human_readable_size

In [2]: human_readable_size(1024)
Out[2]: '1024.00 B'

In [3]: human_readable_size(1024*1024)
Out[3]: '1024.00 KB'

```

Also, it doesn't support PB unit:

```
In [4]: human_readable_size(1024*1024*1024*1024*1024)
Out[4]: '1024.00 TB'

In [5]: human_readable_size(1024*1024*1024*1024*1024*1024)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-31-0859861661dc> in <module>
----> 1 human_readable_size(1024*1024*1024*1024*1024*1024)

~/GIT/ceph/src/ceph-volume/ceph_volume/util/disk.py in human_readable_size(size)
    640     return "{size:.2f} {suffix}".format(
    641         size=size,
--> 642         suffix=suffixes[suffix_index])
    643
    644

IndexError: list index out of range
```

This commit fixes this.

Fixes: https://tracker.ceph.com/issues/48492
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6940856f233f4d365a119eed90ff88fd918f6916)

rpm, debian: move smartmontools and nvme-cli to ceph-base

We wish to be able to scrape SMART and NVMe metrics from OSD and MON
nodes. For this we require / recommend smartmontools and nvme-cli
dependencies for both the ceph-osd and ceph-mon packages. However, the
sudoers file (which is required for invoking `smartctl` by user 'ceph')
was installed only in the ceph-osd package. Since different packages
cannot own the same file, and because we want to be able to scrape from
every daemon, we move the dependencies and the sudoers installation to
ceph-base. For generalization, we rename:
sudoers.d/ceph-osd-smartctl -> sudoers.d/ceph-smartctl

Fixes: https://tracker.ceph.com/issues/50657
Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
(cherry picked from commit 7ca39fa92b47427af2f1c6000c653bb4dffc47fe)

Conflicts:
ceph.spec.in
debian/rules

Cherry-pick notes:
- conflict due to octopus not having jaeger dep
- conflict due to octopus not installing rbd-nbd_quiesce on debian

common/PriorityCache: low perf counters priorities for submodules.

Having too many perf counters with nicknames priorities >= PRIO_INTERESTING spoils daemonperf output and causes no "osd" section there due to presumably too many columns.

Fixes: https://tracker.ceph.com/issues/51002
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 35238d41360a22e22fae7d8ceddf3a2a047e5464)

osd: set r only if succeed in FillInVerifyExtent

When read failed, ret can be taken as data len in FillInVerifyExtent, which should be avoided.
It may cause errors in crc repair or retry read because of the data len. In my case, we use FillInVerifyExtent for EC read,
when meet -EIO，we will try crc repair, which need read data from other shard accrding to data len.
And I meet assert in ECBackend.cc (loc: line 2288 ceph_assert(range.first != range.second) ), But it seems master branch not support EC crc repair.
In shot, when reuse the readop may cause unpredictable error.

Fixes: https://tracker.ceph.com/issues/51115
Signed-off-by: yanqiang-ux <yanqiang_ux@163.com>
(cherry picked from commit 127745161fbcdee06b2dfa8464270c3934bcd06a)

rgw: user stats showing 0 value for "size_utilized" and "size_kb_utilized" fields

When accumulating user stats, the "utilized" fields are not looked
at. Updates RGWStorageStats::dump so it only outputs the "utilized"
data if they're updated.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit 248fbce6b54f6c91e63b05861f8631ca64c8df81)

Conflicts:
src/rgw/rgw_admin.cc

Cherry-pick notes:
- Conflicts due to change of interface for rgw user and rados store

rgw: disable prefetch in rgw_file

Each call to rgw_read (rgw_file.cc) invokes three calls to RGWRados::get_obj_state with s->prefetch_data=true. It results in great read amplification. If length argument in rgw_read call is smaller than rgw_max_chunk_size, then the amplification is threefold.

Signed-off-by: Kajetan Janiak <kjaniak@cloudferro.com>
(cherry picked from commit f915e21e5a1baf6030c1407b3058d4f58c638df9)

Conflicts:
src/rgw/rgw_op.cc

Cherry-pick notes:
- Octopus sets prfetch data flag through Rados method vs. method on object

mds: PurgeQueue.cc fix for 32bit compilation

files_high_water is defined as uint64_t but when compiling on 32bit these max functions
fail as they are both not considered uint64_t by gcc 10 even though they are

files_high_water = std::max(files_high_water,
static_cast<uint64_t>(in_flight.size()));

Fixes: https://tracker.ceph.com/issues/50707
Signed-off-by: Duncan Bellamy <dunk@denkimushi.com>
(cherry picked from commit 0b7f69252c7701f70d38cc6221d393cbd5a507a4)

Conflicts:
src/mds/PurgeQueue.cc

Cherry-pick notes:
- Octopus did not previously have static_cast to uint64

osd: fix partial recovery become whole object recovery after restart osd

support SERVER_OCTOPUS feature for pg_missing_item::encode()

Fixes: https://tracker.ceph.com/issues/52583
Signed-off-by: Jianwei Zhang <jianwei1216@qq.com>
(cherry picked from commit dcdb188b6f577551fb377ba34145419f81322b03)

mgr/dashboard: all pyfakefs must be pinned on same version

Without this patch execution of install-deps.sh fails with an error.

Fixes: https://tracker.ceph.com/issues/53088
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 24d3a01a3019d30fc1d4dc4963a3942665243a48)

Merge pull request #42958 from ifed01/wip-ifed-fix-huge-omap-rename-oct

octopus: os/bluestore: cap omap naming scheme upgrade transactoin.

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>

osd/OSDMap.cc: clean up pg_temp for nonexistent pgs

Fixes an issue where the OSDMap does not clear pg-temp entries for PGs that no longer exist.

Fixes: https://tracker.ceph.com/issues/53308
Signed-off-by: Cory Snyder <csnyder@iland.com>
(cherry picked from commit 86367ea008281cf4398073466f3ece5ea18e82af)

Merge pull request #43959 from guits/wip-53283-octopus

octopus: ceph-volume: `get_first_lv()` refactor

os/bluestore: Fix omap upgrade to per-pg scheme

This is fix to regression introduced by fix to omap upgrade: https://github.com/ceph/ceph/pull/43687
The problem was that we always skipped first omap entry.
This worked fine with objects having omap header key.
For objects without header key we skipped first actual omap key.

Fixes: https://tracker.ceph.com/issues/53260
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
(cherry picked from commit 65a3f374aa1c57c5bb9401e57dab98a643b4360a)

ceph-volume: `get_first_*()` refactor

As indicated by commit 17957d9beb42a04b8f180ccb7ba07d43179a41d3 those
fuctions were meant to avoid writing something like following:

```
lvs = get_lvs()
if len(lvs) >= 1:
lvs = lv[0]
```

Those functions should return `None` if 0 or more than 1 item is returned.
The current name of these functions are confusing and can lead to thinking that
we just want the first item returned, even though it returns more than 1
item, let's rename them to `get_single_pv()`, `get_single_vg()` and
`get_single_lv()`

Closes: https://tracker.ceph.com/issues/49643
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a5e4216b49704783c55fb83b3ae6dde35b0082ad)

Merge pull request #43952 from guits/wip-52597-octopus

octopus: ceph-volume: util/prepare fix osd_id_available()

Merge pull request #43950 from guits/wip-53189-octopus

octopus: ceph-volume: fix a typo causing AttributeError

ceph-volume/tests: update setup_mixed_type playbook

we need to create a file with a larger size.
see https://github.com/ceph/ceph/pull/43300#issuecomment-951961243

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8af00e25aa4ab60d0309e31f6c20edd6cd5be1ee)

qa/rgw: bump tempest version to resolve dependency issue

Fixes: https://tracker.ceph.com/issues/53095
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 0bb60469d1c8439eaabd1b89dc494e49e7863b33)

Fix vault token file access.

Put the vault token file in a location that ceph can read.
Make it readable only by ceph.

On rhel8 (and indeed, any vanilla rhel machine), $HOME is liable to be
mode 700. This means the ceph user can't read things in that user's
directory. This causes radosgw to emit the confusing message "ERROR:
Vault token file ... not found" even though the teuthology log will
plainly show it was created and made readable by ceph.

Fixes: http://tracker.ceph.com/issues/51539
Signed-off-by: Marcus Watts <mwatts@redhat.com>
(cherry picked from commit 454cc8a18c4c3851de5976d3e36e42644dbb1a70)

Conflicts:
qa/tasks/rgw.py

Cherry-pick notes:
- Conflict due to ctx.rgw.vault_role not set in Octopus test

osd: fix 'ceph osd stop <osd.nnn>' doesn't take effect

when the osd state is in the non-active state, the osd daemon can be stopped.

Fixes: https://tracker.ceph.com/issues/53039
Signed-off-by: tan changzhi <544463199@qq.com>
(cherry picked from commit d595c95ef6c3dc34b8389ff4270639ff1550d269)

rgwi/beast: stream timer with duration 0 disables timeout

fixes all S3 operations failing with:
`2021-11-15T15:46:05.992+0000 7ffee17fa700 20 failed to read header: Bad file descriptor`
when `--rgw_frontends="beast port=8000 request_timeout_ms=0"`

Signed-off-by: Mark Kogan <mkogan@redhat.com>

rgw/beast: reference count Connections for timeout_handler

resolves a use-after-free in the timeout_handler, where a timeout fires
and schedules the timeout_handler for execution, but the coroutine exits
and destroys the socket before asio executes the timeout_handler

timeout_handler now holds a reference on the Connection to extend its
lifetime

now that the Connection is allocated on the heap, we can include the
parse_buffer in this memory instead of allocating it separately

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw/beast: replace beast::tcp_stream with manual timeouts

remove the beast::tcp_stream wrapper from the socket, and track timeouts
manually with a timeout_timer. this timer uses ceph's coarse_mono_clock
which is cheaper to sample than std::chrono::steady_clock

Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw/beast: use explicit executor type for tcp socket and stream

Signed-off-by: Casey Bodley <cbodley@redhat.com>

spawn: use explicit strand executor

the default spawn::yield_context uses the polymorphic boost::asio::executor
to support any executor type

rgw's beast frontend always uses the same executor type for these
coroutines, so we can use that type directly to avoid the overhead of
type erasure and virtual function calls

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 9d9258e06b78bb47fd0156d9bd7bb00b52a726b0)

Conflicts:
src/common/async/yield_context.h
src/rgw/rgw_d3n_cacherequest.h
src/rgw/rgw_notify.cc
src/rgw/rgw_sync_checkpoint.cc

Cherry-pick notes:
- src/rgw/rgw_d3n_cacherequest.h doesn't exist in Octopus
- src/rgw/rgw_sync_checkpoint.cc doesn't exist in Octopus
- conflicts due to rename of structs after Octopus
- conflicts due to macro for conditional inclusion of beast context in Octopus

rgw: clean up WITH_RADOSGW_BEAST_OPENSSL

the #ifdef was covering more includes than it should have

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 44f4b083dea933f62cd76279593e3f1b3cd21f77)

ceph-volume: util/prepare fix osd_id_available()

The current check only allows to request an OSD id that exists but
marked as 'destroyed'.
With this small fix, we can now use `--osd-id` with an id that doesn't
exist at all.

Fixes: https://tracker.ceph.com/issues/50880
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 73bfa5d2b0157f92721d8bf36619fd35ee265cdd)

ceph-volume: fix a typo causing AttributeError

Signed-off-by: Taha Jahangir <mtjahangir@gmail.com>
(cherry picked from commit 4cdbba3344fe26b6351e88ce00a8655890a02115)

ceph-volume: fix bug with miscalculation of required db/wal slot size for VGs with multiple PVs

Previous logic for calculating db/wal slot sizes made the assumption that there would only be
a single PV backing each db/wal VG. This wasn't the case for OSDs deployed prior to v15.2.8,
since ceph-volume previously deployed multiple SSDs in the same VG. This fix removes the
assumption and does the correct calculation in either case.

Fixes: https://tracker.ceph.com/issues/52730
Signed-off-by: Cory Snyder <csnyder@iland.com>
(cherry picked from commit cd6aa1329f70f89338757ba295e279ecfdbc2d07)

rgw: fix bucket purge incomplete multipart uploads

The marker was not working correctly as segments of the bucket index
were listed to shut down any incomplete multipart uploads. This fixes
the marker, so it's maintained properly across iterations.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit 9f2c2d901dff0acc16f80cb6ad32bb8c39c9ac6e)

Conflicts:
src/rgw/rgw_multi.cc
src/rgw/rgw_multi.h
- dpp changes

os/bluestore: fix additional errors during missed shared blob repair.

Fixes: https://tracker.ceph.com/issues/51762
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit d92ebd3ebea9a153c22a711bb2aae0ce17f5b304)

test/objectstore/store_test: reveal incomplete fix for missed shared
blob repair.

Related-to: https://tracker.ceph.com/issues/51762
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 9893710328cef0942c86676e2dee72ee1fbecffd)

os/bluestore: fix improper offset calculation when repairing.

While repairing misreferenced blobs BlueStore could improperly calculate
an offset within a blob being fixed. This could happen when single
physical extent has been replaced by multiple ones - the following
pextent (if any in the current blob) would be treated with the improper offset within the blob. Offset calculation didn't account for each of that new pextents but the last one only.

Fixes: https://tracker.ceph.com/issues/51682
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit ca4b6675fc3fd2f4cadad58044c97c5bb23d5938)

test/objectstore/bluestore_types: add map_bl test case

Along with the basic bluestore_blob_t::map_any functionality
verification this UT shows how invalid offset might appear in
https://tracker.ceph.com/issues/51682

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 81f8e063c6f15d7763f51247babe2db7bf4c2aae)

os/bluestore: use proper prefix when removing undecodable Share Blob.

Fixes: https://tracker.ceph.com/issues/53011
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit aaa0a172080a5a9ecba76be364af9a5277bc2187)

mds: skip journaling blocklisted clients when in `replay` state

When a standby MDS is transitioning to active, it passes through
`replay` state. When the MDS is in this state, there are no journal
segments available for recording journal updates. If the MDS receives
an OSDMap update in this state, journaling blocklisted clients causes
a crash since no journal segments are available. This is a bit hard
to reproduce as it requires correct timing of an OSDMap update along
with various other factors.

Note that, when the MDS reaches `reconnect` state, it will journal
the blocklisted clients anyway.

This partially fixes tracker: https://tracker.ceph.com/issues/51589
which mentions a similar crash but in `reconnect` state. However,
that crash was seen in nautilus.

A couple of minor changes include removing hardcoded function names
and carving out reusable parts into a separate function.

Partially-fixes: https://tracker.ceph.com/issues/51589
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 6d6236dc8d15636af8060057e6e69c26c473f987)

Conflicts:
src/mds/MDSRank.cc

Trivial conflict: s/blocklist/blacklist/

Merge pull request #43747 from mfoliveira/wip-53100-octopus

octopus: os/bluestore/AvlAllocator: introduce bluestore_avl_alloc_ff_max_* options

Reviewed-by: Igor Fedotov <ifedotov@suse.com>

os/bluestore: fix invalid omap name conversion when upgrading to per-pg.

Fixes: https://tracker.ceph.com/issues/53062
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit cbc97018d883333f81ab9a3cfa99d2f68a9874cd)
(cherry picked from commit dc0a7e49434f76d97016934feed9a8ec806d1e42)

Merge pull request #43658 from badone/wip-octopus-ceph-ansible-systemd-bug

octopus: qa/ceph-ansible: Bump OS version for centos

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Reviewed-by: Yuri Weinstein <yweinste@redhat.com>

os/bluestore/AvlAllocator: introduce bluestore_avl_alloc_ff_max_search_bytes

so AvlAllocator can switch from the first-first mode to best-fit mode
without walking through the whole space map tree. in the
highly-fragmented system, iterating the whole tree could hurt the
performance of fast storage system a lot.

the idea comes from openzfs's metaslab allocator.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 5a26875049d13130ffe5954428da0e1b9750359f)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Conflicts:
src/common/options/global.yaml.in:
- Moved new option into src/common/options.cc

os/bluestore/AvlAllocator: introduce bluestore_avl_alloc_ff_max_search_count

so AvlAllocator can switch from the first-first mode to best-fit mode
without walking through the whole space map tree. in the
highly-fragmented system, iterating the whole tree could hurt the
performance of fast storage system a lot.

the idea comes from openzfs's metaslab allocator.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 40f05b971f5a8064cf9819f80fc3bbf21d5206da)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Conflicts:
src/common/options/global.yaml.in
- Moved new option into src/common/options.cc

os/bluestore/AvlAllocator: use cbit for counting the order of alignment

no need to calculate the alignment first, cbits() would suffice. as it
counts the first set bit and the follow 0's in a number. the result
is identical to the cbit(alignment of that number).

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 573cbb796e8ba2f433caa308925735101a8161a6)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>

os/bluestore/AvlAllocator: use delegated ctor

less repeating this way

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 9b52ba1dd0a5e199833d7ab2561a7b388d85afc1)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Conflicts:
src/os/bluestore/AvlAllocator.cc
- Replace `std::string_view name` w/ `const std::string& name`.

os/bluestore/AvlAllocator: specialize _block_picker()

before this change AvlAllocator::_block_picker() is used by both the
best-fit mode and first-fit mode. but since we cannot achieve the
locality by searching in the area pointed by curosr in best-fit mode,
we just pass a dummy cursor to AvlAllocator::_block_picker() when
searching in the best-fit mode.

but since the range_size_tree is already sorted by the size of ranges,
if _block_picker() fails to find one by the size, we should just give
up right away, and instead try again using a smaller size.

after this change, instead of sharing AvlAllocator::_block_picker()
across both the first-fit mode and the best-fit mode, this method
is specialize to two different variants: one for first-fit, and the
simplified one for best-fit.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 4837166f9e7a659742d4184f021ad12260247888)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>

os/bluestore: Improve _block_picker function

Make _block_picker function scan (*cursor, end) + (begin, *cursor) instead of (*cursor, end) + (begin, end).
The second run over range (*cursor, end) could never yield any results.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
(cherry picked from commit c732060d3e3ef96c6da06c9dde3ed8c064a50965)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>

os/bluestore: do not call _block_picker() again if already searched from start()

Fixes: https://tracker.ceph.com/issues/48272
Signed-off-by: Xue Yantao <jhonxue@tencent.com>
(cherry picked from commit fd5ca26e4a23d6c8992ab5927ce85ade958e251f)
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>

octopus: qa/ceph-ansible: Bump OS version for centos

The systemd version in the 8.3 image is buggy so use 8.4 instead.

Fixes: https://tracker.ceph.com/issues/52923
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>

mon/MgrStatMonitor: ignore MMgrReport from non-active mgr

If it's not the active mgr, we should ignore it.

Since the mgr instance is best identified by the gid, add that to the
message. (We can't use the source_addrs for the message since that is
the MgrStandby monc addr, not the active mgr addrs in the MgrMap.)

This fixes a problem where a just-demoted mgr report gets processed and a
new mgr gets a ServiceMap with an epoch >= its pending map. (At least,
that is my theory!)

Fixes: https://tracker.ceph.com/issues/48022
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 4d447092c3542bf57dfb4942db766adf2923c069)

Conflicts:
src/messages/MMonMgrReport.h
src/mon/MgrStatMonitor.cc

mgr: tell monc when we get new servicemap, fsmap

Otherwise, when we re-subscribe we'll request an old map again. In the
case of the servicemap, that can lead to a failed assertion.

Fixes: https://tracker.ceph.com/issues/48022
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 3dbc1f0578d944217ad2bacb58ef561e678abb6c)