Kotresh HR [Tue, 1 Dec 2020 10:44:17 +0000 (16:14 +0530)]
tasks/cephfs/test_volume_client: Add tests for authorize/deauthorize
1. Add testcase for authorizing auth_id which is not added by
ceph_volume_client
2. Add testcase to test 'allow_existing_id' option
3. Add testcase for deauthorizing auth_id which has got it's caps
updated out of band
Optionally allow authorizing auth-ids not created by ceph_volume_client
via the option 'allow_existing_id'. This can help existing deployers
of manila to disallow/allow authorization of pre-created auth IDs
via a manila driver config that sets 'allow_existing_id' to False/True.
Kotresh HR [Thu, 26 Nov 2020 09:18:16 +0000 (14:48 +0530)]
pybind/ceph_volume_client: Preserve existing caps while authorize/deauthorize auth-id
Authorize/Deauthorize used to overwrite the caps of auth-id which would
end up deleting existing caps. This patch fixes the same by retaining
the existing caps by appending or deleting the new caps as needed.
This patch disallow the ceph_volume_client to authorize the auth_id
which is not created by ceph_volume_client. Those auth_ids could be
created by other means for other use cases which should not be modified
by ceph_volume_client.
Fixes: https://tracker.ceph.com/issues/48555 Signed-off-by: Ramana Raja <rraja@redhat.com> Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 3a85d2d04028a323952a31d18cdbefb710be2e2b)
Jan Fajerski [Wed, 9 Dec 2020 16:01:39 +0000 (17:01 +0100)]
Merge PR #38205 into octopus
* refs/pull/38205/head:
ceph-volume: pass *-slots arguments to LV creation
use extent count for slots conversion instead of free count
ceph-volume: available_lvm: vg space takes precedence
Dimitri Savineau [Mon, 26 Oct 2020 19:12:59 +0000 (15:12 -0400)]
ceph-volume: consume mount opt in simple activate
When running ceph-volume simple activate command on a Filestore OSD
then the data device is mounted without any specific options so the
one from the ceph configuration file are ignored.
When deploying Filestore with the lvm subcommand then everything is
fine because the filestore_activate method uses mount_osd which relies
on the mount options defined in the ceph configuration file (if any).
Kevin Meijer [Sat, 14 Nov 2020 18:44:07 +0000 (19:44 +0100)]
mgr/dashboard: Disable sso without python3-saml
Removed the requirement for the python3-saml package when wanting to disable SSO for the dashboard, this is currently relevant since the official container that runs Ceph mgr does not have this package installed.
So when upgrading from an older, non-containerized version, you would be stuck using a non-functional dashboard.
This pull requests changes that and allows the ceph dashboard sso disable command without the requirement of the library so that we SSO can always be disabled again.
Fixes: https://tracker.ceph.com/issues/48237 Signed-off-by: Kevin Meijer <admin@kevinmeijer.nl>
(cherry picked from commit 0c18437d2c786ef1ade8b89e42dbf4b0e163aafe)
Casey Bodley [Mon, 23 Nov 2020 23:06:26 +0000 (18:06 -0500)]
rgw: temporarily disable calls to defer_gc() in RGWGetObj
cls_rgw_gc_queue_update_entry() is known to cause data loss when called
on objects that have not actually been scheduled for garbage collection
RGWGetObj is the only caller, and uses defer_gc() when reads are taking
a long time compared to rgw_gc_obj_min_wait. if an object has since been
deleted and submitted for garbage collection, this allows RGWGetObj to
defer that gc until the entire read completes
by disabling these calls to defer_gc(), very long reads (longer than 1hr,
with default configuration) may fail if the object gets deleted, and a
retry will result in a 404 Not Found error as expected
J. Eric Ivancich [Sat, 21 Nov 2020 16:10:35 +0000 (11:10 -0500)]
rgw: during GC defer, prevent new GC enqueue
With the new queue-based GC code, when a GC defer operation is
performed, it adds an "urgent" record to prevent GC from removing
objects that are still being read. It does not check whether the
objects are on the GC queue or not and that's OK for the urgent
record.
The code *also* adds a new GC entry to the queue to cause GC to occur
at a later time. This would be incorrect if there was no GC entry to
begin with, however. In such a case this would cause GC to delete tail
objects when no user-initiated remove has happend. In other words a
READ could cause a DELETE of tail objects and therefore data loss.
This fix prevents such a new GC entry from being enqueued, thus
preventing the data loss in this rare case. There is a new risk that
tail object orphans to be created, but as an immediate fix to prevent
data loss, this is appropriate and it is a rare event. A follow-on PR
that will handle these cases is likely.
This PR adds a level 0 log entry as a way to potentially confirm this
case is being triggered in real-world cases. In time, this log entry
should be deleted.
Jan Fajerski [Wed, 18 Nov 2020 08:37:48 +0000 (09:37 +0100)]
ceph-volume inventory: make libstoragemgmt data retrieval optional
Default to not retrieving libstoragemgmt data since it seems this can
cause serious issues on older hardware. Safest way is to only retrieve
lsm data when the user opts in..
Fixes: https://tracker.ceph.com/issues/48270 Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit b29a54d21e314db7a9d681cf5cc089dcfcbf6dc0)
Casey Bodley [Mon, 23 Nov 2020 23:06:26 +0000 (18:06 -0500)]
rgw: temporarily disable calls to defer_gc() in RGWGetObj
cls_rgw_gc_queue_update_entry() is known to cause data loss when called
on objects that have not actually been scheduled for garbage collection
RGWGetObj is the only caller, and uses defer_gc() when reads are taking
a long time compared to rgw_gc_obj_min_wait. if an object has since been
deleted and submitted for garbage collection, this allows RGWGetObj to
defer that gc until the entire read completes
by disabling these calls to defer_gc(), very long reads (longer than 1hr,
with default configuration) may fail if the object gets deleted, and a
retry will result in a 404 Not Found error as expected
J. Eric Ivancich [Sat, 21 Nov 2020 16:10:35 +0000 (11:10 -0500)]
rgw: during GC defer, prevent new GC enqueue
With the new queue-based GC code, when a GC defer operation is
performed, it adds an "urgent" record to prevent GC from removing
objects that are still being read. It does not check whether the
objects are on the GC queue or not and that's OK for the urgent
record.
The code *also* adds a new GC entry to the queue to cause GC to occur
at a later time. This would be incorrect if there was no GC entry to
begin with, however. In such a case this would cause GC to delete tail
objects when no user-initiated remove has happend. In other words a
READ could cause a DELETE of tail objects and therefore data loss.
This fix prevents such a new GC entry from being enqueued, thus
preventing the data loss in this rare case. There is a new risk that
tail object orphans to be created, but as an immediate fix to prevent
data loss, this is appropriate and it is a rare event. A follow-on PR
that will handle these cases is likely.
This PR adds a level 0 log entry as a way to potentially confirm this
case is being triggered in real-world cases. In time, this log entry
should be deleted.
Jan Fajerski [Wed, 4 Mar 2020 10:39:40 +0000 (11:39 +0100)]
ceph-volume: available_lvm: vg space takes precedence
This changes available_lvm to check for generic reasons only if no VGs
were found. A VG can contain a (mounted) lv, which triggers the
ro/locked test, despite the VG having space available.
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/iscsi-target-list/iscsi-target-list.component.ts
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/rbd-namespace-list/rbd-namespace-list.component.ts
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/rbd-snapshot-list/rbd-snapshot-actions.model.ts
src/pybind/mgr/dashboard/frontend/src/app/ceph/cluster/hosts/hosts.component.ts
src/pybind/mgr/dashboard/frontend/src/app/ceph/cluster/mgr-modules/mgr-module-list/mgr-module-list.component.ts
src/pybind/mgr/dashboard/frontend/src/app/ceph/pool/pool-list/pool-list.component.ts
- `$localize` calls are not available in Angular 8. They are replaced with i18n.
- Optional chaining syntax is not supported in typescript 3.5.3. Statements with optional chaining are re-coded.
Conflicts:
src/pybind/mgr/dashboard/frontend/package-lock.json
src/pybind/mgr/dashboard/frontend/package.json
- The master has different packages dependencies.
src/pybind/mgr/dashboard/frontend/src/app/ceph/cluster/osd/osd-details/osd-details.component.spec.ts
- Imports are refactored: https://github.com/ceph/ceph/pull/37918.
src/pybind/mgr/dashboard/frontend/src/app/ceph/shared/ceph-shared.module.ts
src/pybind/mgr/dashboard/frontend/src/app/ceph/shared/smart-list/smart-list.component.html
src/pybind/mgr/dashboard/frontend/src/app/ceph/shared/smart-list/smart-list.component.spec.ts
- We migrated from ngx-bootstrap to ng-bootstrap.
src/pybind/mgr/dashboard/frontend/src/app/ceph/shared/smart-list/smart-list.component.ts
- I18n services is replaced with $localize function.
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/dashboard/health-pie/health-pie.component.scss
src/pybind/mgr/dashboard/frontend/src/app/ceph/dashboard/health/health.component.html
src/pybind/mgr/dashboard/frontend/src/app/ceph/dashboard/health/health.component.ts
src/pybind/mgr/dashboard/frontend/src/styles/defaults/_bootstrap-defaults.scss
Discarded all changes except the relevant code part. The rest was sucessfully backported by b2360b1a6101b5cc61c236047ce7c757fd02c93d.
Fix configuration created by cephadm to prevent any "many-to-many
matching not allowed: matching labels must be unique on one side"
issues. The mgr/prometheus exporter exports suitable instance labels
itself, which can be taken over when `honor_labels` in Prometheus is set
to `true`.
Fixes: https://tracker.ceph.com/issues/47997 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
(cherry picked from commit ea8a3aca02f2adc1e68a055ab95ced207da1561a)
Volker Theile [Fri, 30 Oct 2020 08:22:30 +0000 (09:22 +0100)]
mgr/cephadm: Allow customizing mgr/cephadm/lsmcli_blink_lights_cmd per host
* Rename key name from 'lsmcli_blink_lights_cmd' to 'blink_device_light_cmd'
* Refactor TemplateMgr::render() method to use the Ceph common behavior how to name store/module option keys. The old implementation required a key like 'mgr/cephadm/services_nfs_ganesha.conf' instead of 'mgr/cephadm/services/nfs/ganesha.conf' or 'mgr/cephadm/mgr0_blink_device_light_cmd' instead of 'mgr/cephadm/mgr0/blink_device_light_cmd'.
Varsha Rao [Wed, 28 Oct 2020 13:37:35 +0000 (19:07 +0530)]
doc/mgr/orchestrator: Update about "{mds, rgw} add" status in rook
"mds add" and "rgw add" are no longer supported in rook. Their implementation
was removed by commits 56cfeb6 and 0580297. Instead "apply mds" and "apply rgw"
is preferred.
diwilli [Wed, 28 Oct 2020 17:43:05 +0000 (17:43 +0000)]
cephadm: Set listen-addresses on alertmanager container
This explicitly passes web.listen-address and cluster.listen-address to the alertmanager container allowing the use of public IP addresses.
Fixes: https://tracker.ceph.com/issues/48031 Signed-off-by: Dan Williams <dw@adventsol.co.uk>
(cherry picked from commit 29730a4bc168913d5dad6d9e487d2dc58a0e3c86)
Tim Serong [Mon, 5 Oct 2020 09:14:42 +0000 (20:14 +1100)]
cephadm: allow uid/gid == 0 in copy_tree, copy_files, move_files
If the uid or gid passed to copy_tree(), copy_files() or
move_files() is 0 (the root user), the current check for
`if not uid or not gid` does the wrong thing, i.e. it
thinks the uid and/or gid aren't set, then calls out to
extract_uid_gid(), which fails when run against
prometheus/grafana/alertmanager containers.