Tim Serong [Wed, 5 Aug 2020 06:34:20 +0000 (16:34 +1000)]
cephadm: don't add `ceph-volume lvm activate` for adopted simple OSDs
This changes the logic in deploy_daemon_units() to add either `chown` calls
for simple (ceph-disk style) OSDs, or to add `ceph-volume lvm activate` calls
for LVM OSDs, rather than always adding both. When I was working on
https://github.com/ceph/ceph/pull/34703, I'd originally added an "osd_simple"
flag to figure out what type of OSD was being adopted/deployed, but passing
that around was kinda ugly, so was removed from that PR. This time around
I'm checking if /etc/ceph/osd/$OSD_ID-$OSD_FSID.json.adopted-by-cephadm
exists, which seems pretty safe IMO. My only concern with this method is:
what happens if someone adopts a simple OSD, then later wants to migrate it
to LVM. Presumably that's a destroy and recreate, keeping the same OSD ID?
If that's true, then the JSON file probably still exists, so the subsequent
create will do the wrong thing, i.e. will add `chown` calls, not `ceph-volume
lvm activate` calls. Any/all feedback appreciated...
Jason Dillaman [Thu, 7 May 2020 20:25:50 +0000 (16:25 -0400)]
librbd: generic helper for tracking in-flight IO and managing flush requests
Layers that potentially queue IOs but not all IOs will need to track all
in-flight IOs to properly ensure that a flush cannot complete while
older IO is still stuck in a queue.
Jason Dillaman [Tue, 14 Jul 2020 22:49:30 +0000 (18:49 -0400)]
crush/CrushWrapper: rebuild reverse maps after rebuilding crush map
The Objecter will crash when localized reads are enabled and two threads
attempt to rebuild the (invalidated) reverse maps concurrently. This
should address the issue for the Objecter use-case without the need to
add additional locking.
Fixes: https://tracker.ceph.com/issues/44311 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 8b866794f5b3674c5e3ad9adceb5e3230d55a0e0)
Switch the default path for the immutable object cache from "/tmp" to
"/tmp/ceph_immutable_object_cache" to prevent the deletion of all
files in the "/tmp" directory and sub-directories for unconfigured
daemons.
Jason Dillaman [Fri, 17 Jul 2020 14:25:20 +0000 (10:25 -0400)]
immutable-object-cache: fix error handling during start up
Previously the daemon would crash if it couldn't remove all the
files from the specified cache directory. It would also crash if
it could not access its specified domain socket file.
Fixes: https://tracker.ceph.com/issues/45169 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit f6dcb6c3dbd45574e1a7ca3f5e8f1f52a0d7ae1e)
Jason Dillaman [Wed, 22 Jul 2020 15:25:56 +0000 (11:25 -0400)]
librbd: flush all queued object IO from simple scheduler
Normally IO is tracked via the AioCompletion's async_op but the
scheduler will "complete" writes while the IO might be still
executing. Therefore, prior to shutting down this dispatch layer
we need to wait for all IO to complete.
Fixes: https://tracker.ceph.com/issues/46668 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 75ff8fd14dccaa7d2f11ba8a561ad3c0f410585c)
Matt Benjamin [Thu, 30 Apr 2020 22:59:11 +0000 (18:59 -0400)]
rgw: introduce safe user-reset-stats
Defines cls_user_reset_stats2, a value-returning cls operation
that sets new stats via progressive calls with an accumulator,
avoiding risk of excessive call runtime.
Fixes: https://tracker.ceph.com/issues/41080 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 25a82ed3795ecf6395e3d16fcbb4c29e478aa065)
root [Fri, 26 Jun 2020 10:44:45 +0000 (12:44 +0200)]
rgw: fix double slash (//) killing the gateway
When a bucket is inialized as a static website, a curl request on the bucket with double slash kills the gateway.
The problem is on the URL handling of the subdirectory, which tries to remove the last slash of any URL, so when only / is given as a sub-directory, this results to an empty string.
Casey Bodley [Fri, 29 May 2020 16:31:16 +0000 (12:31 -0400)]
rgw: fix shutdown crash in RGWAsyncReadMDLogEntries
RGWAsyncReadMDLogEntries must not store pointers into coroutine memory,
because it's not guaranteed to outlive our call. store these by-value
instead, and have RGWReadMDLogEntriesCR::request_complete() copy/move
them back on completion
When rgw_bucket_unlink_instance removes the last instance of a name, it
also clears the value of rgw_bucket_olh_entry.key. However, bucket index
resharding uses this key when choosing its shard placement, so an empty
key causes all of these olh entries to be misplaced in shard 0. After
reshard, all of the olh recovery/cleanup logic would be sent to the
correct shard, and these misplaced olh entries would never be cleaned
up.
Preserving the key's name on last unlink allows the olh entry to be
resharded correctly and cleaned up normally.
* extract get_ragweed_branch() out of download() task, for better
readablity.
* use a loop for retry when the first clone fails
* drop the `raise ValueError()` clause as it never happens. we could use
an assert() here, but i don't think it is necessary anyway.
* use sh() instead of run() for better readablity.
* always set ragweed_repo. before this change this variable is
unbounded if `force-branch` is set.
Soumya Koduri [Tue, 16 Jun 2020 12:40:08 +0000 (18:10 +0530)]
rgw: Empty reqs_change_state queue before unregistered_reqs
In RGWHTTPManager::manage_pending_request(), before unregistering
or unlinking the http requests, empty the reqs_change_state list
to avoid use after free.
ofriedma [Wed, 20 May 2020 16:07:03 +0000 (19:07 +0300)]
rgw: fix nginx-rgw docs
Signed-off-by: Or Friedmann <ofriedma@redhat.com> Signed-off-by: Mark Kogan <mkogan@redhat.com>
(cherry picked from commit d73b879ac169c46f2dfeba0f4ca7f3a8af272a53)
Nginx (have not cached):
real 0m24.714s
user 0m8.692s
sys 0m10.360s
Nginx (have been cached):
real 0m21.070s
user 0m9.140s
sys 0m10.316s
RGW:
real 0m21.859s
user 0m8.850s
sys 0m10.386s
The results are showing that for objects larger than 512K the cache will increase the performance by twice or more.
For small objs, the overhead of sending the auth request will make the cache less efficient
The result for cached objects in the 10MB test can be explained by net limit of 25 Gb/s(it could reach more)
In Gdal (image decoder/encoder over s3 using range requests) the results were not that different because of Gdal single cpu encoding/decoding.
Gdal have been chosen because of the ability to check the smart cache of the nginx.
https://www.nginx.com/blog/smart-efficient-byte-range-caching-nginx/
Jason Dillaman [Tue, 28 Jul 2020 01:14:18 +0000 (21:14 -0400)]
librbd: update hidden global config when setting pool config override
The new "dev"-level global config setting will be updated when any
pool-level config override is updated. librbd clients will detect
the new global-level config update and trigger a refresh. This avoids
the need for potentially tens of thousands of librbd clients
registering a watch on the pool metadata object or periodically polling
the pool metadata object for updates.
Fixes: https://tracker.ceph.com/issues/46694 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit f45df9fe786e8057c491c082e840483759d67e9e)
Conflicts:
src/common/options.cc
- "rbd_quiesce_notification_attempts", "rbd_default_snapshot_quiesce_mode", and
"rbd_plugins" options have not been backported to Octopus, yet
Jason Dillaman [Mon, 27 Jul 2020 19:31:09 +0000 (15:31 -0400)]
librbd: initial config watcher implementation
The config watcher will initially observe all "rbd_" configuration
updates received from the MON that have not been locally overridden
at the pool and/or image level.
rgw: policy: reuse eval_principal to evaluate the policy principal
Since the other edge case when no Principal or a NotPrincipal is supplied also
must be accounted for, which is already done in eval_principal function. Also
reraising the error as Effect::Pass in line with the previous output, though an
Effect::Deny would also work here.
The commit adds 2 different parts to show the
Telemetry activation notification in the dashboard:
1. The Telemetry activation notification component
itself. It contains the definition of the
notification panel.
2. The Telemetry notification service. The service
is needed to be able to show/hide the
notification from:
* the component itself (e.g. when clicking the
button button)
* the Telemetry configuration component (when
enabling/disabling Telemetry)
* the navigation component (to set the css-
classes accordingly)
Fixes: https://tracker.ceph.com/issues/45464 Signed-off-by: Tatjana Dehler <tdehler@suse.com>
(cherry picked from commit f7e457952ba5780b4b7fc3a1e4290c02d4e5d70d)
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/cluster/telemetry/telemetry.component.ts
A form loading directive has been introduced in master: https://github.com/ceph/ceph/pull/34746
src/pybind/mgr/dashboard/frontend/src/app/core/navigation/navigation/navigation.component.spec.ts
The test configuration has been improved in master: https://github.com/ceph/ceph/pull/34965
src/pybind/mgr/dashboard/frontend/src/app/shared/components/telemetry-notification/telemetry-notification.component.html
src/pybind/mgr/dashboard/frontend/src/app/shared/components/telemetry-notification/telemetry-notification.component.spec.ts
The alert component has been migrated from ngx-bootstrap to ng-boostrap in master: https://github.com/ceph/ceph/pull/35297
TestBed.get has been replaced by TestBed.inject in master: https://github.com/ceph/ceph/pull/34934
src/pybind/mgr/dashboard/frontend/src/app/shared/services/telemetry-notification.service.spec.ts
TestBed.get has been replaced by TestBed.inject in master: https://github.com/ceph/ceph/pull/34934
The backport contains one commit less (39a26ae4b2a1b154c88a414c4abee1c04808effa
is missing) than the original pull request because the migration from alert to
ngb-alert (https://github.com/ceph/ceph/pull/35297) has not been backported.
Tatjana Dehler [Tue, 26 May 2020 13:41:02 +0000 (15:41 +0200)]
mgr/dashboard: reset pwd notification value
Reset the password notification value if the
user logs out. Otherwise if a another user logs
in, who doesn't see the password expiration
notification (because his password is not going to
expire soon), will see a blank bar instead.
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/shared/components/pwd-expiration-notification/pwd-expiration-notification.component.html
The alert component has been migrated from ngx-bootstrap to ng-boostrap in master: https://github.com/ceph/ceph/pull/35297
The API call is a task and the response status is determined by whether
the call is completed within a pre-defined duration (2 seconds) or not.
We should also allow the status when the call takes longer.
Jan Fajerski [Thu, 30 Jul 2020 09:46:00 +0000 (11:46 +0200)]
ceph-volume: dependency on python-ceph-common
Since e5b585d15de8b07e0a179344d4187582a5c069f2 ceph-volume depends on
python-ceph-common. This commit introduces this dependency for the
ceph-osd rpm (which includes ceph-volume) and installs the dependency
for tox runs.
Fixes: https://tracker.ceph.com/issues/46772 Fixes: e5b585d15de8b07e0a179344d4187582a5c069f2 Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit cb432fe41d4ea8cb71aa592e0727d2da1978121f)