git.apps.os.sepia.ceph.com Git

mgr/dashboard/api: set a UTF-8 locale when running pip

ansible-core started to include files whose filenames are encoded in
non-ascii characters, so we have to use a more capable encoding for the
locale in order to install this package. otherwise we'd have following
error:

Collecting ansible-core<2.12,>=2.11.3
  Using cached ansible-core-2.11.4.tar.gz (6.8 MB)
ERROR: Exception:

Traceback (most recent call last):
  File "/tmp/tmp.fX76ASIrch/venv/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 173, in _main
    status = self.run(options, args)
...
  File "/tmp/tmp.fX76ASIrch/venv/lib/python3.8/site-packages/pip/_internal/utils/unpacking.py", line 226, in untar_file
    with open(path, "wb") as destfp:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 137-140: ordinal not in range(256)

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 05e4145856bb5ed19ecc879f2e50b5a88cb2045e)

Merge pull request #43418 from trociny/wip-51645-octopus

octopus: osd/OSD: mkfs need wait for transcation completely finish

Reviewed-by: Kefu Chai <kchai@redhat.com>

Merge pull request #43407 from trociny/wip-52809-octopus

octopus: tools/erasure-code: new tool to encode/decode files

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #43268 from cfsnyder/wip-52587-octopus

octopus: ceph-volume: fix lvm activate --all --no-systemd

Merge pull request #43424 from cfsnyder/wip-51695-octopus

octopus: rgw: fail as expected when set/delete-bucket-website attempted on a non-exis…

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #43314 from idryomov/wip-rbd-mirror-snapshot-rx-only-octopus

octopus: rbd-mirror: unbreak one-way snapshot-based mirroring

Reviewed-by: Mykola Golub <mgolub@mirantis.com>

Merge pull request #43312 from idryomov/wip-keyring-resolve-error-octopus

octopus: auth,mon: don't log "unable to find a keyring" error when key is given

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #43263 from cfsnyder/wip-51552-octopus

octopus: ceph-monstore-tool: use a large enough paxos/{first,last}_committed

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #42986 from MrFreezeex/wip-52460-octopus

octopus: rbd-mirror: add perf counters to snapshot replayed

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Mykola Golub <mgolub@mirantis.com>

Merge pull request #42837 from steveftaylor/48212

octopus: mon/OSDMonitor: account for PG merging in epoch_by_pg accounting

Reviewed-by: Neha Ojha <nojha@redhat.com>

rgw: fail as expected when set/delete-bucket-website attempted on a non-existent bucket, rgw should return HTTP 404 and NoSuchBucket.

Fixes: https://tracker.ceph.com/issues/51536
Signed-off-by: xiangrui meng <mengxr@chinatelecom.cn>
(cherry picked from commit c623aa45d35b269c6701a57e44ac05bb29a79dc8)

Conflicts:
- src/rgw/rgw_op.cc

Cherry-pick notes:
- rgw_op.cc forward_reqeuest_to_master takes different arguments in Octopus vs. Quincy

osd/OSD: mkfs need wait for transcation completely finish

when do ceph-osd mkfs, when ceph-osd process exit, sometimes
the block data could be written incompletely. we need add
wait for it complete.

Signed-off-by: Chen Fan <fan.chen@easystack.cn>
(cherry picked from commit 0ffadad3a83b3ca634d7d58a80c84d1d8761e2ea)

Merge pull request #43349 from cfsnyder/wip-52351-octopus

octopus: rgw: fix sts memory leak

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #43270 from cfsnyder/wip-51778-octopus

octopus: rgw : add check for tenant provided in RGWCreateRole

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>

Merge pull request #43265 from cfsnyder/wip-52331-octopus

octopus: cmake: s/Python_EXECUTABLE/Python3_EXECUTABLE/

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #43234 from MrFreezeex/wip-51838-octopus

octopus: ceph.spec: selinux scripts respect CEPH_AUTO_RESTART_ON_UPGRADE

Reviewed-by: Kefu Chai <kchai@redhat.com>

Merge pull request #43369 from tchaikov/octopus-pr-39602

octopus: mgr/influx: use "N/A" for unknown hostname

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #43352 from rhcs-dashboard/wip-52773-octopus

octopus: qa/mgr/dashboard: add extra wait to test

Reviewed-by: Nizamudeen A <nia@redhat.com>

test/erasure-code: remove ceph_erasure_code

Its functionality is moved to ceph-erasure-code-tool.

Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit 56aaaf8a97574a0284bad37cc0ba5e1c262f33e0)

rpm,deb: add ceph-erasure-code-tool to ceph-osd package

Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit 53c75eebbdc9d85b07dd90608b45ad794a54f848)

ceph-erasure-code-tool: new tool to encode/decode files

E.g. it may be useful as a last resort when recovering an object from
a damaged PG: extract the encoded object chunks from the PG shards
with ceph-objectstore-tool and then decode with ceph-erasure-code-tool.

It also has functionality similar to what ceph_erasure_code test provides.

Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit b274357d29fb25a29f75c08283fec185902e7970)

mgr/influx: use "N/A" for unknown hostname

in theory, there is chance that get_metadata() returns None, so let use
"N/A" in this case.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit e457ca50011f70cf01a62323998af233a484f338)

qa/mgr/dashboard: add extra wait to test

Fixes: https://tracker.ceph.com/issues/49344
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit 9ff778cdaa1ef40fcfa04f221a1da786a0e19655)

rgw: fix sts memory leak

fix https://tracker.ceph.com/issues/52290

Signed-off-by: yuliyang_yewu <yuliyang_yewu@cmss.chinamobile.com>
(cherry picked from commit ef921bcdaa78d33ed0611a60ec58826d8e6ccb45)

rgw : add check for tenant provided in RGWCreateRole

Fixes: https://tracker.ceph.com/issues/51206
Signed-off-by: caolei <halei15848934852@163.com>
(cherry picked from commit 3c99ac14080c9f5b1611c9bbe4a223a9fd2927a0)

Conflicts:
src/rgw/rgw_rest_role.cc

- Octopus constructs role explicitly vs. using store->get_role(), and does not wrap in a unique_ptr

tasks/ceph_manager: ignore EACCES when waiting for quorum

mon_tick_interval is 5 seconds by default. monitors update their
rotating keys every mon_tick_interval. before monitors forms a
quorum, the auth requests from clients are put into the wait list.
these requests are re-enqueued once the monitors form a quorum. but
there is a small window of mon_tick_interval, before they are able
to serve the auth requests even after their claim to be able to
server requests. if these re-enqueued requests happen to be served
in this window, and if authx is enabled, they will be greeted with
errors like

handle_auth_bad_method server allowed_methods [2] but i only support [2]

in the case of ceph cli, the error would look like:

[errno 13] RADOS permission denied (error connecting to the cluster)

so, to address this issue, the EACCES error is ignored when waiting
for a quorum.

Signed-off-by: Kefu Chai <kchai@redhat.com>

tasks/ceph_manager: use safe_while() to refactor the wait for quorum

for better readability

Signed-off-by: Kefu Chai <kchai@redhat.com>

ceph-monstore-tool: use a large enough paxos/{first,last}_committed

so the rebuild paxos transaction won't be overwritten by the ones
created before recovery completes.

when the quorum is recovering, the leader will collect the paxos
transactions from peons. if the quorum accept the proposal for setting
the fingerprint, the peon will update the monitor with the paxos
transaction with a newer "last_committed" than the one created using
update_paxos() in ceph_monstore_tool.cc. the latter "last_committed" is
always 0.

so, to avoid this extra paxos proposal obsoleting the "rebuilding" paxos
transaction, we use a large enough number for {first,last}_committed.

Fixes: http://tracker.ceph.com/issues/38219
Signed-off-by: Kefu Chai <kchai@redhat.com>

Merge pull request #43094 from mgfritch/use-quay-octopus

octopus: cephadm: use quay, not docker

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>

Merge pull request #43266 from cfsnyder/wip-51555-octopus

octopus: mon: return -EINVAL when handling unknown option in 'ceph osd pool get'

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #42616 from neha-ojha/wip-51967-octopus

octopus: common/options: Set osd_client_message_cap to 256.

Reviewed-by: Josh Durgin <jdurgin@redhat.com>

qa/suites/rbd: test case for one-way snapshot-based mirroring

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 366e9c51a8d83a7c185a00d9fb9e4cde290145e4)

rbd-mirror: fix a couple of brainos in log messages

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit fdcdeae2a26927b51ab8a480d48a52b896b5532b)

rbd-mirror: unbreak one-way snapshot-based mirroring

Snapshot replayer needs the remote's mirror peer uuid to find its
snapshots in the remote image. It is obtained by listing remote's
mirror peers but RemotePoolPoller::handle_mirror_peer_list() skips
tx-only (MIRROR_PEER_DIRECTION_TX) peers. In effect only rx-tx
(MIRROR_PEER_DIRECTION_RX_TX) peers are considered for matching
and snapshot replayer always fails with "failed to retrieve mirror
peer uuid from remote pool" error.

Instead, skip rx-only (MIRROR_PEER_DIRECTION_RX) peers as we are
definitely not interested in anything having to do with mirroring
_to_ the remote cluster.

Fixes: https://tracker.ceph.com/issues/52675
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit b02d3b0c5aa59aa294de43f94c793f5abf71ac03)

auth,mon: don't log "unable to find a keyring" error when key is given

This error is logged even if --key or --keyring are specified and
confuses users because the command actually does its job and exits
with success.  This primarily affects "rbd mirror pool peer bootstrap
import" command and rbd-mirror and cephfs-mirror daemons which connect
to the remote cluster with just mon_host and key:

  $ rbd mirror pool peer bootstrap import mypool tokenfile
  ... -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
  ... -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
  ... -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory

Local cluster commands are affected too:

  $ rados --no-config-file --mon-host $MON_HOST --key $KEY lspools
  ... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
  ... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
  ... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
  ... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
  ... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
  ... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
  ... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
  ... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
  ... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
  device_health_metrics
  rbd

This was introduced in commit 98a2e5c59daa ("rados: translate errno to
str in CLI").

Fixes: https://tracker.ceph.com/issues/51628
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 70aa026b097d919b41b2a1221d73b326557f75e3)

Merge pull request #43024 from ifed01/wip-ifed-fix-bluefs-replay-crc-oct

octopus: os/bluestore: accept undecodable multi-block bluefs transactions on log

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #43133 from mgfritch/octopus-backport-43010

octopus: cephadm: add thread ident to log messages

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #43186 from rhcs-dashboard/wip-52616-octopus

octopus: mgr/dashboard: Incorrect MTU mismatch warning

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #43140 from ifed01/wip-ifed-fix-migrate-oct

octopus: os/bluestore: fix bluefs migrate command

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #43008 from ifed01/wip-ifed-fix-52311-oct

octopus: os/bluestore: fix using incomplete bluefs log when dumping it.

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>

Merge pull request #42975 from idryomov/wip-51419-octopus

octopus: common/buffer: fix SIGABRT in rebuild_aligned_size_and_memory

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

Merge pull request #43273 from cfsnyder/wip-52052-octopus

octopus: rgw: when deleted obj removed in versioned bucket, extra del-marker added

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>

Merge pull request #43272 from cfsnyder/wip-51330-octopus

octopus: rgw: avoid infinite loop when deleting a bucket

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>

Merge pull request #43271 from cfsnyder/wip-51012-octopus

octopus: rgw: remove quota soft threshold

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>

rgw: remove quota soft threshold

Remove quota soft threshold, which causes expensive checks for sharded buckets

Fixes: 14eabd4aa7b8a2e2c0c43fe7f877ed2171277526
Signed-off-by: Zulai Wang <wangzl31@outlook.com>
(cherry picked from commit 32a39705765af0f87bec9101e5d337b797e05fea)

Conflicts:
src/common/options/rgw.yaml.in
src/rgw/rgw_quota.cc

Cherry-pick notes:
- Options defined in src/common/options.cc in Octopus vs src/common/options/rgw.yaml.in
- RGWQuotaCache::get_stats does not take optional_yeild or DoutPrefixProvider arguments in Octopus

rgw: when deleted obj removed in versioned bucket, extra del-marker added

After initial checks are complete, this will read the OLH earlier than
previously to check the delete-marker flag and under the bug's
conditions will return -ENOENT rather than create a spurious delete
marker.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit 69d7589fb1305b7d202ffd126c3c835e7cd0dda3)

Conflicts:
src/cls/rgw/cls_rgw_types.h
src/rgw/rgw_rados.cc

Cherry-pick notes:
- RGWRados::apply_olh_log does not take DoutPrefixProvider in Octopus
- change to use some namespace-qualified names in cls_rgw_types

rgw: avoid infinite loop when deleting a bucket

When deleting a bucket with an incomplete multipart upload that
has about 2000 parts uploaded, we noticed an infinite loop, which
stopped s3cmd from deleting the bucket forever.
Per check, when the bucket index was sharded (for example 128
shards), the original logic in
RGWRados::cls_bucket_list_unordered() did not calculate
the bucket shard ID correctly when the index key of a data
part was taken as the marker.

The issue is not necessarily reproduced each time. It will depend
on the key of the object. To reproduce it in 128-shard bucket,
we use 334 as the key for the incomplete multipart upload,
which will be located in Shard 127 (known by experiment). In this
setup, the original logic will usually come out a shard ID smaller
than 127 (since 127 is the largest one) from the marker and
thus a circle is constructed, which results in an infinite loop.

PS: Some times the bucket ID calculation may incorrectly going forward
instead of backward. Thus, the check logic may skip some shards,
which may have regular keys. In such scenarios, some non-empty buckets may
be deleted by accident.

Fixes: http://tracker.ceph.com/issues/49206
Signed-off-by: Jeegn Chen <jeegnchen@tencent.com>
(cherry picked from commit 3cafe5774a5a453d58a3a6bed1f02d3200c4bb1d)

Conflicts:
src/rgw/rgw_rados.cc

Cherry-pick notes:
- Octopus cls_bucket_list_unordered doesn't take DoutPrefixProvider as first arg

ceph-volume: fix lvm activate --all --no-systemd

When using a system without systemd then the `lvm activate --all --no-systemd`
subcommand still calls systemd.
We already allow users to activate a single OSD without systemd so there's
no reason to not do the same with --all (because activate_all calls activate).

Fixes: https://tracker.ceph.com/issues/25070
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8e402e112a6383555e2df31ba3321e5956f1841a)

mon: return -EINVAL when handling unknown option in 'ceph osd pool get'

Signed-off-by: Zhao Cuicui <brucen1030@163.com>
(cherry picked from commit 7ed494076e2390f8e6a386278346632d00ee718a)

cmake: s/Python_EXECUTABLE/Python3_EXECUTABLE/

pass the python3 exec when creating the ceph-volume build venv
fixup for 5fc657b40dc7

Fixes: https://tracker.ceph.com/issues/52304
Signed-off-by: Michael Fritch <mfritch@suse.com>
(cherry picked from commit 7db830598507d90d1c9e1f4468f818bebce58037)

cephadm: quay.io for non-ceph images too

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit dbc1d6303f4c2a22f5fa59218aa032fc92073906)

mgr/cephadm: Put together default container images references

Placed all in the same location in order to make easy downstream modifications
and future changes

Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
(cherry picked from commit ce246479443a64b292c7cff2a662161c8a598e09)

Merge pull request #43189 from rhcs-dashboard/wip-51275-octopus

octopus: mgr/dashboard: deprecated variable usage in Grafana dashboards

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: p-se <NOT@FOUND>
Reviewed-by: Yuri Weinstein <yweins@redhat.com>

ceph.spec: selinux scripts respect CEPH_AUTO_RESTART_ON_UPGRADE

In /etc/sysconfig/ceph we allow operators to define if ceph daemons
should be restarted on upgrade: CEPH_AUTO_RESTART_ON_UPGRADE.

But the post selinux scripts will stop ceph.target regardless if this
is set to `no`, leading to operators adding various hacks to prevent
these unexpected or inconvenient daemon restarts. By now, if users
are using rpms directly, they are likely orchestrating their own
daemon restarts so should not rely on the rpm itself to do this.

Fixes: https://tracker.ceph.com/issues/21672
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit 092a6e3e83e9ef8e37cb6f1033c345dcb5224cfc)

mgr/dashboard: deprecated variable usage in Grafana dashboards

Fixes: https://tracker.ceph.com/issues/50059
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
(cherry picked from commit a709abf8bf5a6b25c21db100e87af3a6c2cf382d)

mgr/dashboard: Incorrect MTU mismatch warning

The MTU mismatch warning was being fired for those NIC's as well that are in down state. This PR intends to fix this issue

Fixes:https://tracker.ceph.com/issues/52028
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 58d635455d1f59921d5ad821168f31b6f937588a)

Merge pull request #42533 from liewegas/use-quay-octopus

octopus: cephadm: default to quay.io, not docker.io

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #43102 from ideepika/wip-octopus-52279

octopus: qa/suites/rados: use centos_8.3_container_tools_3.0.yaml

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>

Merge pull request #42862 from ideepika/wip-52337-octopus

octopus: mon/PGMap: remove DIRTY field in `ceph df detail` when cache tiering is not in use

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #42694 from tchaikov/octopus-pr-41215

octopus: cmake: Replace boost download url

Reviewed-by: Brad Hubbard <bhubbard@redhat.com>

Merge pull request #42674 from callithea/wip-51980-octopus

octopus: mgr/prometheus: Fix metric types from gauge to counter

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
Reviewed-by: Patrick Seidensal <pnawracay@suse.com>

Merge pull request #42670 from callithea/wip-51949-octopus

octopus: Don't persist report data

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #42666 from SUSE/wip-fix-52064

octopus: mgr/cephadm: pass --container-init to cephadm if specified

Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>
Reviewed-by: Nathan Cutler <ncutler@suse.com>

Merge pull request #42498 from ideepika/wip-51741-octopus

octopus: qa/workunits/mon/test_mon_config_key: use subprocess.run() instead of proc.communicate()

Reviewed-by: Kefu Chai <kchai@redhat.com>

os/bluestore: fix bluefs migrate command

After migrating DB volume to a slow one RocksDB still
needs to be provided with slow.db path to properly access relevant files under db.slow subfolder.
Without that specification it tries to access them under 'db' one which
results in "not-found" error.

Fixes: https://tracker.ceph.com/issues/40434
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 90852d9b6f0da7967121200c9a1c56bed1929d2d)

qa/osd-bluefs-volume-ops: reproduce bluefs migrate bug

Reproduces: https://tracker.ceph.com/issues/40434
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 0b0f8ef12fcc53666809b5bea6d24d07a8777425)

Conflicts:
qa/standalone/osd/osd-bluefs-volume-ops.sh (more args for test
cases in octopus)

cephadm: add thread ident to log messages

can be used to filter msgs from a specific cephadm command

Fixes: https://tracker.ceph.com/issues/52484
Signed-off-by: Michael Fritch <mfritch@suse.com>
(cherry picked from commit 2f1482890bad99cf29623585c6e4f8abf15cecc5)

Merge pull request #43117 from guits/wip-52561-octopus

octopus: ceph-volume: fix lvm activate arguments

Merge pull request #43111 from guits/wip-52555-octopus

octopus: ceph-volume: fix lvm migrate without args

ceph-volume: fix lvm activate arguments

When using the `lvm activate` subcommand without any osd ID or osd FSID
then ceph-volume fails.

Currently we can either activate with:
- both osd ID and osd FSID
- only the osd FSID (because it's unique)

The remaining scenarios aren't covered and generate a stack trace:
- only the osd ID
- no osd ID nor osd FSID

This ends up with an error like:

--> UnboundLocalError: local variable 'tags' referenced before assignment

Fixes: https://tracker.ceph.com/issues/50665
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b77ed5f99d3cb14a46bb36bb74e4136c22edf48a)

ceph-volume: fix lvm migrate without args

When running the `lvm migrate` subcommand without any args then the
ceph-volume command fails with a stack trace.

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
    return f(*a, **kw)
  File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 151, in main
    terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 46, in main
    terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/migrate.py", line 520, in main
    self.migrate_osd()
  File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/migrate.py", line 403, in migrate_osd
    if self.args.osd_id:
AttributeError: 'Migrate' object has no attribute 'args'

That's because we're exiting the parse_argv function but we continue to
execute the migrate_osd function. We should instead exit from the main function.

This update the parsing argument to have the same code than new-db and
new-wal classes.
Now the parsing is done in the make_parser function but the argv testing is
done in the main function allowing to exit the program and displaying the
help message when no arguments are provided.

Fixes: https://tracker.ceph.com/issues/51811
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 98de3306127cb758e5517a322f6da9e636f91036)

Merge pull request #43092 from guits/wip-52544-octopus

octopus: ceph-volume: support no_systemd with lvm migrate

Merge pull request #43097 from guits/wip-52546-octopus

octopus: ceph-volume: remove --all ref from deactivate help

qa/distro: Add centos_8.2_container_tools_3.0.yaml

Let's avoid latest kubic stable

trivial fix: skipped mgr-nfs-upgrade, dashboard symlinks
Fixes: https://tracker.ceph.com/issues/52279
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit e436483c77adf7f876533c25a11c14d19a2d32a4)

qa/cephadm: centos_8.3_container_tools_3.0.yaml

Let's avoid latest stable

Fixes: https://tracker.ceph.com/issues/52279
octopus cherry pick resolve: do not use rhel and orch symlinks, octopus
does not have them.

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit c6bd968eff96666da7e321fb5787c82dc66ddbb7)

ceph-volume: support no_systemd with lvm migrate

The `ceph-volume lvm migrate/new-db/new-wal` commands don't support
running on non systemd systems or within containers.
Like other ceph-volume commands (lvm activate/batch/zap or raw activate)
we also need to be able to use the --no-systemd flag.

Fixes: https://tracker.ceph.com/issues/51854
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 96d84acf5aedc9cf462230fd5fb685f0562a1371)

ceph-volume: remove --all ref from deactivate help

Until the `--all` feature is implemented.

This was partially removed in c13901f but not from the help command.

Fixes: https://tracker.ceph.com/issues/50109
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3a385bc38686e45c734ffaa182a23837aaceaaa2)

Merge pull request #43077 from guits/wip-52524-octopus

octopus: ceph-volume: pvs --noheadings replace pvs --no-heading

Merge pull request #43088 from guits/wip-52540-octopus

octopus: ceph-volume: fix raw list with logical partition

Merge pull request #43090 from guits/wip-52543-octopus

octopus: doc/ceph-volume: add lvm migrate/new-db/new-wal

Merge pull request #42978 from idryomov/wip-52063-octopus

octopus: rbd-mirror: fix potential async op tracker leak in start_image_replayers

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Reviewed-by: Mykola Golub <mgolub@mirantis.com>

Merge pull request #42971 from idryomov/wip-51867-octopus

octopus: pybind/rbd: fix mirror_image_get_status

Reviewed-by: Mykola Golub <mgolub@mirantis.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

doc/ceph-volume: add lvm migrate/new-db/new-wal

The inital PR was implementing those new commands only adds the man page
and not the online documentation.

Fixes: https://tracker.ceph.com/issues/51814
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 22de626220e6145fce8e1fd963802170f099f733)

ceph-volume: test the lsblk command in list.py

Let's test we use the expected args when we build the `lsblk` command
to list the devices present on the node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 959ea33c0ba118984d678403b234e7628ccac3dc)

ceph-volume: fix raw list with logical partition

This is a regression introduced by 9212420, when the host is using a
logical partition then lsblk reports that partition as a child from the
physical device.
That logical partition is prefixed by the `└─` character.

This leads the `raw list` subcommand to show the lsblk error on the stderr.

```
$ ceph-volume raw list
{}
stderr: lsblk: `-/dev/sda1: not a block device
```

The lsblk command output looks like:

```
$ lsblk --paths --output=NAME --noheadings
/dev/sda
└─/dev/sda1
/dev/sdb
/dev/sdc
/dev/sdd
```

Using the `--list` option with lsblk solves the issue.

```
$ lsblk --list --paths --output=NAME --noheadings
/dev/sda
/dev/sda1
/dev/sdb
/dev/sdc
/dev/sdd
```

Fixes: https://tracker.ceph.com/issues/52504
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7a6486bf4ddb2d789cc16047d8163baa8473e253)

ceph-volume: pvs --noheadings replace pvs --no-heading

Fixes: https://tracker.ceph.com/issues/52482
Signed-off-by: FengJiankui <fengjiankui@inspur.com>
(cherry picked from commit 7083c4ffc63274bdd3103e9c5296bc70acbe745f)

os/bluestore: accept undecodable multi-block bluefs transactions on log
replay.

We should proceed with OSD startup when detecting undecodable bluefs
transaction spanning multiple disk blocks during log replay.
The rationale is that such a transaction might appear during unexpected
power down - just not every disk block is written to disk. Hence we can
consider this a normal log replay stop condition.

https://tracker.ceph.com/issues/52079

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 2ba5cb7865c2a7452ee2abda5e0986ce9c42f4b9)

Conflicts:
src/os/bluestore/BlueFS.cc (trivial)

Merge pull request #42892 from rhcs-dashboard/wip-52377-octopus

octopus: mgr/dashboard: stats=false not working when listing buckets

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

os/bluestore: fix using incomplete bluefs log when dumping it.

BlueFS superblock might contain incomplete list of physical extents for
bluefs log. Hence we should alway replay ops for ino 1 to get them properly.

Fixes: https://tracker.ceph.com/issues/52311
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 47632fe9baf7f20921c3e6888e29c4478fa5ff2d)

Merge pull request #42968 from idryomov/wip-krbd-escape-match-sysattr-octopus

octopus: krbd: escape udev_enumerate_add_match_sysattr values

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

krbd: escape udev_enumerate_add_match_sysattr values

libudev uses fnmatch(3) for matching attributes, meaning that shell
glob pattern matching is employed instead of literal string matching.
Escape glob metacharacters to suppress pattern matching.

Fixes: https://tracker.ceph.com/issues/52425
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 8841029b0a4705825ba394541240b3cb4eb2cf5c)

Conflicts:
src/test/cli-integration/rbd/unmap.t [ drop --no-progress,
"rbd snap create" does not show progress in octopus ]

rbd-mirror: add perf counters to snapshot replayer

Add bytes replayed and statistics about number of snapshots synchronized
and how long it took to do so.

Fixes: https://tracker.ceph.com/issues/50973
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
(cherry picked from commit cf9317b58f966c64554f46092fc4162b5e8fd5fb)

rbd-mirror: fix potential async op tracker leak in start_image_replayers

Fixes: https://tracker.ceph.com/issues/52063
Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit 2f83b43afc6f6864655dc5e37e4b8cfb88b4a805)

common/buffer: fix SIGABRT in rebuild_aligned_size_and_memory

There is such a bl, which needs to satisfy two conditions:
1)all ptrs' length sum except last ptr is aligned with 4K;
2)the length of last ptr is 0.
This bl will cause stack corruption when calling
bufferlist::rebuild_aligned_size_and_memory().

Deal with this special scenario in rebuild_aligned_size_and_memory() to
solve the bug. And added a specialtest-case to reproduce this scenario.

Fixes: https://tracker.ceph.com/issues/51419
Signed-off-by: Yin Congmin <congmin.yin@intel.com>
(cherry picked from commit 05a49aeeb3897a41e26d005c5143fdfe739d5c1b)

pybind/rbd: explain why "primary" isn't exposed in mirror_image_status_list()

"primary" is part of mirror image info (rbd_mirror_image_info_t) and
is exposed in mirror_image_get_status(). mirror_image_status_list(),
even though it is often thought of as an equivalent of repeated calls
to mirror_image_get_status(), doesn't actually fetch the mirror image
info.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1d2a142108b2aa102103e62ccdf3bd17688b783b)

pybind/rbd: actually append site_status dict to remote_statuses

Using += operator is wrong -- only site_status keys get appended
(and repeatedly at that in case there is more than one remote site
as the keys are added one by one).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 89888beb1266be5a661fef8643bb6bef0c720f5f)

rbd: Fix mirror_image_get_status in rbd python bindings

When retrieving the status of a mirrored image from the Python rbd
library, a TypeError is raised.

*To Reproduce:*

Set up two Ceph clusters for block storage, and configure image
mirroring between their pools.  Create a least one image with mirroring
enabled, then run the following script on either cluster (once the image
exists everywhere):

```python
import rados, rbd

CONF_PATH = "YOUR-CONF-PATH"
POOL_NAME = "YOUR-POOL-NAME"
IMAGE_LABEL = "YOUR-IMAGE-LABEL"

with rados.Rados(conffile=CONF_PATH) as cluster:
  with cluster.open_ioctx(POOL_NAME) as ioctx:
    with rbd.Image(ioctx, IMAGE_LABEL) as image:
      image.mirror_image_get_status()
```

This will result in the following stack trace:

```
Traceback (most recent call last):
  File "repo-bug.py", line 10, in <module>
    image.mirror_image_get_status()
  File "rbd.pyx", line 3363, in rbd.requires_not_closed.wrapper
  File "rbd.pyx", line 5209, in rbd.Image.mirror_image_get_status
TypeError: list indices must be integers or slices, not str
```

Fixes: https://tracker.ceph.com/issues/51867
Signed-off-by: Will Smith <wsmith@linode.com>
(cherry picked from commit 5dfda932b2012bb11a1860d8a81de3208b17f622)

Merge pull request #42761 from ceph/wip-yuriw-p2p-octopus

octopus: qa/tests: advanced version to 15.2.14 to match the latest release

Reviewed-by: Josh Durgin <jdurgin@redhat.com>

mgr/dashboard: stats=false not working when listing buckets

Fixes: https://tracker.ceph.com/issues/51154
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit 95543bb150fc9e458314e62d80667190863caa0b)

Conflicts:
src/pybind/mgr/dashboard/controllers/rgw.py
- rgw-list endpoint doesn't have daemon_name parameter in octopus, so adopted
changes accordingly.

mon/OSDMonitor: account for PG merging in epoch_by_pg accounting

After a pool has merged PGs, the epoch_by_pg accounting will refer
to osdmap epochs of PGs that no longer exist. We'll never again get
OSD beacons for these PGs, so the min epoch in epoch_by_pg will not
advance until the mon leader has restarted. The effect of this is
that osdmaps are not trimmed after a pool has undergone PG merging,
until the mon leader restarts. To fix, we unconditionally resize
epoch_by_pg to the pg_num of the pool during each beacon report.

Fixes: https://tracker.ceph.com/issues/48212
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit cf5ea22cc0b10560c9fa3fbd5d93431f874d38b9)