Kefu Chai [Tue, 17 Aug 2021 07:53:51 +0000 (15:53 +0800)]
mgr/dashboard/api: set a UTF-8 locale when running pip
ansible-core started to include files whose filenames are encoded in
non-ascii characters, so we have to use a more capable encoding for the
locale in order to install this package. otherwise we'd have following
error:
Collecting ansible-core<2.12,>=2.11.3
Using cached ansible-core-2.11.4.tar.gz (6.8 MB)
ERROR: Exception:
Traceback (most recent call last):
File "/tmp/tmp.fX76ASIrch/venv/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 173, in _main
status = self.run(options, args)
...
File "/tmp/tmp.fX76ASIrch/venv/lib/python3.8/site-packages/pip/_internal/utils/unpacking.py", line 226, in untar_file
with open(path, "wb") as destfp:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 137-140: ordinal not in range(256)
Mykola Golub [Fri, 20 Mar 2020 16:19:25 +0000 (16:19 +0000)]
ceph-erasure-code-tool: new tool to encode/decode files
E.g. it may be useful as a last resort when recovering an object from
a damaged PG: extract the encoded object chunks from the PG shards
with ceph-objectstore-tool and then decode with ceph-erasure-code-tool.
It also has functionality similar to what ceph_erasure_code test provides.
Kefu Chai [Thu, 10 Jun 2021 12:19:09 +0000 (20:19 +0800)]
tasks/ceph_manager: ignore EACCES when waiting for quorum
mon_tick_interval is 5 seconds by default. monitors update their
rotating keys every mon_tick_interval. before monitors forms a
quorum, the auth requests from clients are put into the wait list.
these requests are re-enqueued once the monitors form a quorum. but
there is a small window of mon_tick_interval, before they are able
to serve the auth requests even after their claim to be able to
server requests. if these re-enqueued requests happen to be served
in this window, and if authx is enabled, they will be greeted with
errors like
handle_auth_bad_method server allowed_methods [2] but i only support [2]
in the case of ceph cli, the error would look like:
[errno 13] RADOS permission denied (error connecting to the cluster)
so, to address this issue, the EACCES error is ignored when waiting
for a quorum.
ceph-monstore-tool: use a large enough paxos/{first,last}_committed
so the rebuild paxos transaction won't be overwritten by the ones
created before recovery completes.
when the quorum is recovering, the leader will collect the paxos
transactions from peons. if the quorum accept the proposal for setting
the fingerprint, the peon will update the monitor with the paxos
transaction with a newer "last_committed" than the one created using
update_paxos() in ceph_monstore_tool.cc. the latter "last_committed" is
always 0.
so, to avoid this extra paxos proposal obsoleting the "rebuilding" paxos
transaction, we use a large enough number for {first,last}_committed.
Snapshot replayer needs the remote's mirror peer uuid to find its
snapshots in the remote image. It is obtained by listing remote's
mirror peers but RemotePoolPoller::handle_mirror_peer_list() skips
tx-only (MIRROR_PEER_DIRECTION_TX) peers. In effect only rx-tx
(MIRROR_PEER_DIRECTION_RX_TX) peers are considered for matching
and snapshot replayer always fails with "failed to retrieve mirror
peer uuid from remote pool" error.
Instead, skip rx-only (MIRROR_PEER_DIRECTION_RX) peers as we are
definitely not interested in anything having to do with mirroring
_to_ the remote cluster.
auth,mon: don't log "unable to find a keyring" error when key is given
This error is logged even if --key or --keyring are specified and
confuses users because the command actually does its job and exits
with success. This primarily affects "rbd mirror pool peer bootstrap
import" command and rbd-mirror and cephfs-mirror daemons which connect
to the remote cluster with just mon_host and key:
$ rbd mirror pool peer bootstrap import mypool tokenfile
... -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
... -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
... -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
Local cluster commands are affected too:
$ rados --no-config-file --mon-host $MON_HOST --key $KEY lspools
... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
... -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
device_health_metrics
rbd
This was introduced in commit 98a2e5c59daa ("rados: translate errno to
str in CLI").
Cherry-pick notes:
- Options defined in src/common/options.cc in Octopus vs src/common/options/rgw.yaml.in
- RGWQuotaCache::get_stats does not take optional_yeild or DoutPrefixProvider arguments in Octopus
J. Eric Ivancich [Tue, 15 Jun 2021 19:20:33 +0000 (15:20 -0400)]
rgw: when deleted obj removed in versioned bucket, extra del-marker added
After initial checks are complete, this will read the OLH earlier than
previously to check the delete-marker flag and under the bug's
conditions will return -ENOENT rather than create a spurious delete
marker.
Cherry-pick notes:
- RGWRados::apply_olh_log does not take DoutPrefixProvider in Octopus
- change to use some namespace-qualified names in cls_rgw_types
Jeegn Chen [Wed, 25 Nov 2020 09:15:25 +0000 (17:15 +0800)]
rgw: avoid infinite loop when deleting a bucket
When deleting a bucket with an incomplete multipart upload that
has about 2000 parts uploaded, we noticed an infinite loop, which
stopped s3cmd from deleting the bucket forever.
Per check, when the bucket index was sharded (for example 128
shards), the original logic in
RGWRados::cls_bucket_list_unordered() did not calculate
the bucket shard ID correctly when the index key of a data
part was taken as the marker.
The issue is not necessarily reproduced each time. It will depend
on the key of the object. To reproduce it in 128-shard bucket,
we use 334 as the key for the incomplete multipart upload,
which will be located in Shard 127 (known by experiment). In this
setup, the original logic will usually come out a shard ID smaller
than 127 (since 127 is the largest one) from the marker and
thus a circle is constructed, which results in an infinite loop.
PS: Some times the bucket ID calculation may incorrectly going forward
instead of backward. Thus, the check logic may skip some shards,
which may have regular keys. In such scenarios, some non-empty buckets may
be deleted by accident.
Dimitri Savineau [Tue, 24 Aug 2021 21:17:45 +0000 (17:17 -0400)]
ceph-volume: fix lvm activate --all --no-systemd
When using a system without systemd then the `lvm activate --all --no-systemd`
subcommand still calls systemd.
We already allow users to activate a single OSD without systemd so there's
no reason to not do the same with --all (because activate_all calls activate).
In /etc/sysconfig/ceph we allow operators to define if ceph daemons
should be restarted on upgrade: CEPH_AUTO_RESTART_ON_UPGRADE.
But the post selinux scripts will stop ceph.target regardless if this
is set to `no`, leading to operators adding various hacks to prevent
these unexpected or inconvenient daemon restarts. By now, if users
are using rpms directly, they are likely orchestrating their own
daemon restarts so should not rely on the rpm itself to do this.
Fixes: https://tracker.ceph.com/issues/21672 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit 092a6e3e83e9ef8e37cb6f1033c345dcb5224cfc)
Igor Fedotov [Tue, 31 Aug 2021 12:54:23 +0000 (15:54 +0300)]
os/bluestore: fix bluefs migrate command
After migrating DB volume to a slow one RocksDB still
needs to be provided with slow.db path to properly access relevant files under db.slow subfolder.
Without that specification it tries to access them under 'db' one which
results in "not-found" error.
Fixes: https://tracker.ceph.com/issues/40434 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 90852d9b6f0da7967121200c9a1c56bed1929d2d)
When running the `lvm migrate` subcommand without any args then the
ceph-volume command fails with a stack trace.
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
return f(*a, **kw)
File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 151, in main
terminal.dispatch(self.mapper, subcommand_args)
File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 46, in main
terminal.dispatch(self.mapper, self.argv)
File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/migrate.py", line 520, in main
self.migrate_osd()
File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/migrate.py", line 403, in migrate_osd
if self.args.osd_id:
AttributeError: 'Migrate' object has no attribute 'args'
That's because we're exiting the parse_argv function but we continue to
execute the migrate_osd function. We should instead exit from the main function.
This update the parsing argument to have the same code than new-db and
new-wal classes.
Now the parsing is done in the make_parser function but the argv testing is
done in the main function allowing to exit the program and displaying the
help message when no arguments are provided.
The `ceph-volume lvm migrate/new-db/new-wal` commands don't support
running on non systemd systems or within containers.
Like other ceph-volume commands (lvm activate/batch/zap or raw activate)
we also need to be able to use the --no-systemd flag.
This is a regression introduced by 9212420, when the host is using a
logical partition then lsblk reports that partition as a child from the
physical device.
That logical partition is prefixed by the `└─` character.
This leads the `raw list` subcommand to show the lsblk error on the stderr.
```
$ ceph-volume raw list
{}
stderr: lsblk: `-/dev/sda1: not a block device
```
Igor Fedotov [Wed, 18 Aug 2021 10:39:02 +0000 (13:39 +0300)]
os/bluestore: accept undecodable multi-block bluefs transactions on log
replay.
We should proceed with OSD startup when detecting undecodable bluefs
transaction spanning multiple disk blocks during log replay.
The rationale is that such a transaction might appear during unexpected
power down - just not every disk block is written to disk. Hence we can
consider this a normal log replay stop condition.
libudev uses fnmatch(3) for matching attributes, meaning that shell
glob pattern matching is employed instead of literal string matching.
Escape glob metacharacters to suppress pattern matching.
Yin Congmin [Wed, 30 Jun 2021 08:56:23 +0000 (16:56 +0800)]
common/buffer: fix SIGABRT in rebuild_aligned_size_and_memory
There is such a bl, which needs to satisfy two conditions:
1)all ptrs' length sum except last ptr is aligned with 4K;
2)the length of last ptr is 0.
This bl will cause stack corruption when calling
bufferlist::rebuild_aligned_size_and_memory().
Deal with this special scenario in rebuild_aligned_size_and_memory() to
solve the bug. And added a specialtest-case to reproduce this scenario.
pybind/rbd: explain why "primary" isn't exposed in mirror_image_status_list()
"primary" is part of mirror image info (rbd_mirror_image_info_t) and
is exposed in mirror_image_get_status(). mirror_image_status_list(),
even though it is often thought of as an equivalent of repeated calls
to mirror_image_get_status(), doesn't actually fetch the mirror image
info.
pybind/rbd: actually append site_status dict to remote_statuses
Using += operator is wrong -- only site_status keys get appended
(and repeatedly at that in case there is more than one remote site
as the keys are added one by one).
Will Smith [Fri, 23 Jul 2021 19:18:12 +0000 (15:18 -0400)]
rbd: Fix mirror_image_get_status in rbd python bindings
When retrieving the status of a mirrored image from the Python rbd
library, a TypeError is raised.
*To Reproduce:*
Set up two Ceph clusters for block storage, and configure image
mirroring between their pools. Create a least one image with mirroring
enabled, then run the following script on either cluster (once the image
exists everywhere):
with rados.Rados(conffile=CONF_PATH) as cluster:
with cluster.open_ioctx(POOL_NAME) as ioctx:
with rbd.Image(ioctx, IMAGE_LABEL) as image:
image.mirror_image_get_status()
```
This will result in the following stack trace:
```
Traceback (most recent call last):
File "repo-bug.py", line 10, in <module>
image.mirror_image_get_status()
File "rbd.pyx", line 3363, in rbd.requires_not_closed.wrapper
File "rbd.pyx", line 5209, in rbd.Image.mirror_image_get_status
TypeError: list indices must be integers or slices, not str
```
Conflicts:
src/pybind/mgr/dashboard/controllers/rgw.py
- rgw-list endpoint doesn't have daemon_name parameter in octopus, so adopted
changes accordingly.
mon/OSDMonitor: account for PG merging in epoch_by_pg accounting
After a pool has merged PGs, the epoch_by_pg accounting will refer
to osdmap epochs of PGs that no longer exist. We'll never again get
OSD beacons for these PGs, so the min epoch in epoch_by_pg will not
advance until the mon leader has restarted. The effect of this is
that osdmaps are not trimmed after a pool has undergone PG merging,
until the mon leader restarts. To fix, we unconditionally resize
epoch_by_pg to the pg_num of the pool during each beacon report.
Fixes: https://tracker.ceph.com/issues/48212 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit cf5ea22cc0b10560c9fa3fbd5d93431f874d38b9)