Alexander Sushko [Fri, 27 Nov 2020 11:04:13 +0000 (14:04 +0300)]
pybind/mgr/prometheus/module.py: defaultdict for num_by_state
num_by_state[state] += count in get_pg_status method raises KeyError
if pg state is not in PG_STATES list. PG_STATES should be synced with
osd_types.cc:pg_state_string(). But sometimes it is not. After the
KeyError raise mgr metrics are not available at all.
Fixes: https://tracker.ceph.com/issues/46142 Signed-off-by: Alexander Sushko <alexandrsushko@gmail.com>
(cherry picked from commit 3f7ee9cbd335e4b8686688b79ec6110d73a7390e)
Kotresh HR [Tue, 1 Dec 2020 10:44:17 +0000 (16:14 +0530)]
tasks/cephfs/test_volume_client: Add tests for authorize/deauthorize
1. Add testcase for authorizing auth_id which is not added by
ceph_volume_client
2. Add testcase to test 'allow_existing_id' option
3. Add testcase for deauthorizing auth_id which has got it's caps
updated out of band
Optionally allow authorizing auth-ids not created by ceph_volume_client
via the option 'allow_existing_id'. This can help existing deployers
of manila to disallow/allow authorization of pre-created auth IDs
via a manila driver config that sets 'allow_existing_id' to False/True.
Kotresh HR [Thu, 26 Nov 2020 09:18:16 +0000 (14:48 +0530)]
pybind/ceph_volume_client: Preserve existing caps while authorize/deauthorize auth-id
Authorize/Deauthorize used to overwrite the caps of auth-id which would
end up deleting existing caps. This patch fixes the same by retaining
the existing caps by appending or deleting the new caps as needed.
This patch disallow the ceph_volume_client to authorize the auth_id
which is not created by ceph_volume_client. Those auth_ids could be
created by other means for other use cases which should not be modified
by ceph_volume_client.
Fixes: https://tracker.ceph.com/issues/48555 Signed-off-by: Ramana Raja <rraja@redhat.com> Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 3a85d2d04028a323952a31d18cdbefb710be2e2b)
Nizamudeen A [Tue, 8 Dec 2020 14:35:28 +0000 (20:05 +0530)]
mgr/dashboard: Adding the alert bad certificate error to the ssl providers error
upstream tracked in https://github.com/cherrypy/cheroot/pull/348 Fixes: https://tracker.ceph.com/issues/48490 Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 4cbe89f4db8ed13b2be46f2563c9d9618b0cf52b)
Jan Fajerski [Wed, 9 Dec 2020 16:01:39 +0000 (17:01 +0100)]
Merge PR #38205 into octopus
* refs/pull/38205/head:
ceph-volume: pass *-slots arguments to LV creation
use extent count for slots conversion instead of free count
ceph-volume: available_lvm: vg space takes precedence
Stephan Müller [Wed, 23 Sep 2020 09:16:44 +0000 (11:16 +0200)]
mgr/dashboard: Add clay plugin support
The erasure code plugin "clay" is now supported by the dashboard. Now a
clay based profile can be created in the ec profile creation modal
dialog which can be found in the pool form.
The defaults of the plugin are calculated or preselected and shown in the
dashboard, therefore things are made mandatory even if they are not on the
cli, but as they automatically set the user doesn't have to set them,
but sees the defaults instantly before creating the profile.
(This is the same behavior that is used for all other supported
plugins.)
Fixes: https://tracker.ceph.com/issues/44433 Signed-off-by: Stephan Müller <smueller@suse.com>
(cherry picked from commit b3fd05bbc568cb775d25032ce87ea8dbb5106b3a)
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/shared/api/erasure-code-profile.service.ts
Fixed conflicts because https://github.com/ceph/ceph/pull/34696 has not
been backported to octopus.
Simon Gao [Sun, 9 Aug 2020 07:38:30 +0000 (15:38 +0800)]
mds : move start_files_to_recover() to recovery_done
the requests in the queue, named waiting_for_replay, may modify the state of the filelock,
resulting in the wrong lock state when repairing file (start_files_to_recover)
Fixes : https://tracker.ceph.com/issues/46906 Signed-off-by: Simon Gao <simon29rock@gmail.com>
(cherry picked from commit fafb5b4f84e12ba00a68550ffb73fc9bcde867a0)
Xiubo Li [Mon, 23 Nov 2020 12:55:01 +0000 (20:55 +0800)]
common: do not dup the options when reexpanding
The old code will store all the options, which has `$pid` in them,
in may_reexpand_meta map. And when reexpanding later, the reexpand
code will dup them with a higher priority(CONF_OVERRIDE).
This will be a problem, if the default value has `$pid` and be
stored in the may_reexpand_meta map, and then the code set a new
different value, which may have no `$pid`, from CLI or config file.
The reexpand will override it with the default value always.
This will do not duplicate the options with CONF_OVERRIDE priority
when reexpanding, just refresh them and call the observers. And the
finalize_reexpand_meta() will always be called after the fork() is
done in child processes.
Xiubo Li [Fri, 13 Nov 2020 08:08:31 +0000 (16:08 +0800)]
global: reexpand the conf meta in all the child processes
Especially for the tools or the daemons whose config options need
to expand the '$pid', they will be always expanded with the parent
processes. We need to reexpand them in child processes just after
the fork is done.
Dimitri Savineau [Mon, 26 Oct 2020 19:12:59 +0000 (15:12 -0400)]
ceph-volume: consume mount opt in simple activate
When running ceph-volume simple activate command on a Filestore OSD
then the data device is mounted without any specific options so the
one from the ceph configuration file are ignored.
When deploying Filestore with the lvm subcommand then everything is
fine because the filestore_activate method uses mount_osd which relies
on the mount options defined in the ceph configuration file (if any).
Kevin Meijer [Sat, 14 Nov 2020 18:44:07 +0000 (19:44 +0100)]
mgr/dashboard: Disable sso without python3-saml
Removed the requirement for the python3-saml package when wanting to disable SSO for the dashboard, this is currently relevant since the official container that runs Ceph mgr does not have this package installed.
So when upgrading from an older, non-containerized version, you would be stuck using a non-functional dashboard.
This pull requests changes that and allows the ceph dashboard sso disable command without the requirement of the library so that we SSO can always be disabled again.
Fixes: https://tracker.ceph.com/issues/48237 Signed-off-by: Kevin Meijer <admin@kevinmeijer.nl>
(cherry picked from commit 0c18437d2c786ef1ade8b89e42dbf4b0e163aafe)
When the block changes, systemd-udevd will open the block,
read some information and close it. Then a failure occurs here.
So we need to try again here.
Igor Fedotov [Mon, 14 Sep 2020 20:28:42 +0000 (23:28 +0300)]
os/bluestore: provide a different name for fallback allocator
Originally primary Hybrid allocator provided its own name when creating a
secondary fallback allocator. This resulted in duplicate admin socket
command registrations for both allocator. Registration return code was
ignored and henoe nobody was aware of the issue. Nautilus might suffer
from the issue though since it asserts on command deregistration failure.
And duplicate name causes such a failure for the secode
unregister_command() call.
Fixes: https://tracker.ceph.com/issues/47443 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit b0866b60461b06e6563cad47d0ad3ce9302114f5)
Casey Bodley [Mon, 23 Nov 2020 23:06:26 +0000 (18:06 -0500)]
rgw: temporarily disable calls to defer_gc() in RGWGetObj
cls_rgw_gc_queue_update_entry() is known to cause data loss when called
on objects that have not actually been scheduled for garbage collection
RGWGetObj is the only caller, and uses defer_gc() when reads are taking
a long time compared to rgw_gc_obj_min_wait. if an object has since been
deleted and submitted for garbage collection, this allows RGWGetObj to
defer that gc until the entire read completes
by disabling these calls to defer_gc(), very long reads (longer than 1hr,
with default configuration) may fail if the object gets deleted, and a
retry will result in a 404 Not Found error as expected
J. Eric Ivancich [Sat, 21 Nov 2020 16:10:35 +0000 (11:10 -0500)]
rgw: during GC defer, prevent new GC enqueue
With the new queue-based GC code, when a GC defer operation is
performed, it adds an "urgent" record to prevent GC from removing
objects that are still being read. It does not check whether the
objects are on the GC queue or not and that's OK for the urgent
record.
The code *also* adds a new GC entry to the queue to cause GC to occur
at a later time. This would be incorrect if there was no GC entry to
begin with, however. In such a case this would cause GC to delete tail
objects when no user-initiated remove has happend. In other words a
READ could cause a DELETE of tail objects and therefore data loss.
This fix prevents such a new GC entry from being enqueued, thus
preventing the data loss in this rare case. There is a new risk that
tail object orphans to be created, but as an immediate fix to prevent
data loss, this is appropriate and it is a rare event. A follow-on PR
that will handle these cases is likely.
This PR adds a level 0 log entry as a way to potentially confirm this
case is being triggered in real-world cases. In time, this log entry
should be deleted.