Ernesto Puerta [Mon, 11 May 2020 18:33:25 +0000 (20:33 +0200)]
mgr/dashboard: work with RBD images v1
Add support for RBD Image Format v1:
- This format lacks ID field, required for dashboard. Instead,
RBD image `block_name_prefix` is used as unique ID (together with pool
id and namespace)
- Additionally, `image_format` is now exposed.
- In the front-end side:
- Copy action on a v1 image will cause the image to be copied to v2
format.
- List doesn't allow Move to Trash on v1 images,
- Details section now shows `image_format` for images,
- Edit Form disables flags not supported for v1 (`deep-flatten`,
`layering`, `exclusive-lock`).
- Protect does not work on v1 images or v2 images created from v1
ones.
Avan Thakkar [Tue, 25 Feb 2020 08:49:10 +0000 (14:19 +0530)]
mgr/dashboard: add popover list of managers in landing page Fixes: https://tracker.ceph.com/issues/42979 Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit cdfeb1d196c7d47340baae2be5910b90c889e778)
Conflicts:
src/pybind/mgr/dashboard/controllers/health.py
-removed few lines as those lines were removed in the master branch too
src/pybind/mgr/dashboard/frontend/src/app/ceph/dashboard/health/health.component.html
-added the braces missing
mgr/dashboard: Prometheus query error in the metrics of Pools, OSDs and RBD images Fixes: https://tracker.ceph.com/issues/45068 Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit 47b515c09496da8fc326300bab6618250466effe)
This commit adds the dmcrypt support in `ceph-volume raw` mode.
Note about `ceph-volume raw list` change:
Given `lsblk -J` (json output) option isn't available on all OS, I came up with
adding '--inverse' option to the existing command which allows us to get the
mapper devices list in that command output. Not listing root devices containing
partitions shouldn't have side effect since we are in `ceph-volume raw`
context.
example:
running `lsblk --paths --nodeps --output=NAME --noheadings` doesn't allow to
get the mapper list because the output is like following :
adding `--inverse` is a trick to get around this issue, the counterpart is that
we can't list root devices if they contain at least one partition but this
shouldn't be an issue in `ceph-volume raw` context given we only deal with
raw devices.
mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold.
Reset the grace hearbeat period if there have been no failures since the
set threshold value (48 Hrs). The mon_osd_laggy_halflife value is
leveraged to calculate the threshold.
A couple of helper functions do the following:
- get_grace_interval_threshold():
Calculates and returns the grace interval threshold value.
- grace_interval_threshold_exceeded(int):
Checks if grace interval threshold is exceeded based on the last
down stamp.
- set_default_laggy_params(int):
Resets the laggy_probability and laggy_interval in the
new_xinfo structure maintained within pending_inc to be applied
eventually as part of update from paxos.
The threshold value is checked and the laggy parameters are reset at the
following point,
- encode_pending() - If an existing osd is experiencing failure
after an interval exceeding the failure threshold period.
Casey Bodley [Tue, 14 Jan 2020 14:42:52 +0000 (09:42 -0500)]
qa/rgw: remove test against hadoop v2.8.5
the hadoop branch rel/release-2.8.5 fails to build with:
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:37 min
[INFO] Finished at: 2020-01-14T13:09:02Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-parallel-tests-dirs) on project hadoop-aws: An Ant BuildException has occured: Unable to create javax script engine for javascript
Casey Bodley [Tue, 26 May 2020 19:03:03 +0000 (15:03 -0400)]
rgw: sanitize newlines in s3 CORSConfiguration's ExposeHeader
the values in the <ExposeHeader> element are sent back to clients in a
Access-Control-Expose-Headers response header. if the values are allowed
to have newlines in them, they can be used to inject arbitrary response
headers
this issue only affects s3, which gets these values from an xml document
in swift, they're given in the request header
X-Container-Meta-Access-Control-Expose-Headers, so the value itself
cannot contain newlines
Signed-off-by: Casey Bodley <cbodley@redhat.com> Reported-by: Adam Mohammed <amohammed@linode.com>
Nathan Cutler [Wed, 24 Jun 2020 19:08:40 +0000 (21:08 +0200)]
doc: PendingReleaseNotes: clean slate for 14.2.11
All of these Pending Release Notes have been included in the official
14.2.10 Release Notes, so keeping them in this file any longer would be
counterproductive.
Alfonso Martínez [Thu, 23 Jan 2020 10:16:27 +0000 (11:16 +0100)]
ceph.spec.in: fix 'make check' deps for centos8
When running 'FOR_MAKE_CHECK=1 ./install-deps.sh' in CentOS 8
these dependencies were not being installed.
Missing dependencies are provided by
https://copr.fedorainfracloud.org/coprs/ktdreyer/ceph-el8/
Kefu Chai [Tue, 24 Dec 2019 05:17:55 +0000 (13:17 +0800)]
ceph.spec.in: re-enable "make check" deps for el8
this change partially reverts e92cb7a0. as these packages are now
available in AppStream, BaseOS or PowerTools in el8, in this change,
they are re-enabled.
qa/test_exports: fix TestExports failure under new python3 compability changes
self.mount_a.client_remote.sh() returns an 'str' object rather than a StringIO object. Hence the p.stdout.getvalue() produces an error. This commit fixes this and also fix str and byte mismatch as byte and string were the same object in Python2 but this is not the case in Python3.
mgr/volumes: Create subvolume with isolated rados namespace
1. Add --namespace-isolated option to 'subvolume create' command
to create subvolume in a separate RADOS namespace
2. Add "pool_namespace" field to 'subvolume info' command
which displays the rados namespace if set else empty string
Jan Fajerski [Tue, 31 Mar 2020 14:07:45 +0000 (16:07 +0200)]
ceph-volume: add and delete lvm tags in a single lvchange call.
Otherwise we can end up in race-y situations when a concurrent c-v calls
sees only one tag but expects all tags to be present. Say if the
ceph.type tag is present, c-v expects ceph.osd_id to be present. By
setting/deleting tags in bulk, we use lvchange (and lvms internal
locking) as a sync mechanism.
Fixes: https://tracker.ceph.com/issues/44852 Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit 20ecc309371e53fda5d6a5b6cf6de6110dbe5497)
Stephan Müller [Fri, 24 Jan 2020 15:36:48 +0000 (16:36 +0100)]
mgr/dashboard: Prevent dashboard breakdown on bad pool selection
The problem was that if a pool was created that outsizes the max
available OSDs the pool gets stuck with the "creating+incomplete" pg
state. If this pool than is selected to get it's details, the method the
dashboard is calling to get the pools RBD configuration will get stuck,
therefore the dashboard gets stuck.
This is the issue to the related bug:
https://tracker.ceph.com/issues/43771
ATM this is only a workaround it won't fix the underlying problem, it
will just ensure that the dashboard just won't call the method if the
pools pg state is in the mentioned state.
Fixes: https://tracker.ceph.com/issues/43765 Signed-off-by: Stephan Müller <smueller@suse.com>
(cherry picked from commit e174b91d6b7670ed575577ddff18edc354be69fb)
Conflicts:
src/pybind/mgr/dashboard/services/rbd.py
- Import for ceph_service was missing
- Filters out 'unknown' state instead of 'incomplete' state, as pools
in nautilus wait for PGs to be there.
src/pybind/mgr/dashboard/services/ceph_service.py
- Import conflict
src/pybind/mgr/dashboard/tests/test_rbd_service.py
- Test file wasn't there before
simon gao [Tue, 10 Sep 2019 09:57:25 +0000 (05:57 -0400)]
mds: add config to require forward to auth MDS
If mds_forward_all_requests_to_auth is set to true. MDS will been forbidden to load noauth inode, and the auth MDS will not send other info of rep. MDS through func named set_trace_dist. so the client will only send req to auth mds of inode.
(cherry picked from commit 7d42df0)
Signed-off-by: simon gao <simon29rock@gmail.com>
Conflicts:
src/mds/MDCache.h
- There were conflicts involving definitions in master between the added
lines. The master only code was removed and the PR's changes were kept.
bool forward_all_requests_to_auth was moved to the appropriate position,
the surrounding code block in the "theirs" section was already repeated
in the file at another place.
Xiubo Li [Mon, 1 Jun 2020 01:57:24 +0000 (21:57 -0400)]
qa/tasks/cephfs/test_scrub.py: use umount_wait to avoid ceph-fuse stuck
If the ceph-fuse client need to flush the caps and does sync wait,
the umount() will just return successfully, then the netns container
will be destroyed and the network will not be reachable, but the
ceph-fuse daemon is still stucked and waiting for the flush caps ack.
This will cause the ceph-fuse daemon get stuck forever and if the
mds daemons get restarted, it will try to reconnect the clients,
but the stucked ceph-fuse daemnon won't reply to it, because it is
not reachable any more.
mds: preserve ESlaveUpdate::OP_PREPARE logevent before doing commit
Fixes: https://tracker.ceph.com/issues/45024 Signed-off-by: songxinying <songxinying@sensetime.com>
(cherry picked from commit 4940ab62e0d19ce36e53bcc67b2a2161c47f6c6d)
Conflicts:
src/mds/MDCache.cc
- use MMDSResolve::create() in nautilus, instead of make_message<MMDSResolve>()
src/mds/MDCache.h
src/mds/Mutation.h
- in nautilus, these two files are structured differently from master (large
chunks of the master code are missing in nautilus, ordering of code is
different also)
src/mds/Server.cc
- use nautilus equivalent instead of "make_message<MMDSSlaveRequest>"
Conflicts:
qa/tasks/cephfs/cephfs_test_case.py
- RuntimeError call has different number of arguments in nautilus, but
this difference is not relevant to this backport
Jeff Layton [Fri, 17 Apr 2020 13:55:41 +0000 (09:55 -0400)]
client: add a new inode release request callback
trim_caps() walks the list of caps on the session, and releases
non-auth caps, and attempts to trim dentries until the cache
size is under the max_caps value requested by MDS.
This is fine for FUSE, but doesn't really match the use-case of
nfs-ganesha. Ganesha typically looks up inodes by inode number, not
by dentry. It's quite possible that after a restart, we may have a
ton of outstanding inodes with no dentries associated with them.
Ganesha holds a reference to each inode, so libcephfs can't release
them, and we don't have a way to request that ganesha do so.
Add a new ino_release_callback and finisher. The intent is to allow
libcephfs to "upcall" to the application and request that it release
references to a specific inode.
Jeff Layton [Tue, 28 Apr 2020 18:00:13 +0000 (14:00 -0400)]
test: add a new program for testing ino_release_cb
Create a bunch of files and get their inode numbers. Remount, look them
all up by inode number and hold references. Stop looking up inodes as
soon as we get a callback from libcephfs. If we got the callback, return
success. Fail otherwise.
Since this has the same cluster setup as the other client_trim_caps
testcase, we can piggyback onto that task.
Jeff Layton [Tue, 21 Apr 2020 12:50:54 +0000 (08:50 -0400)]
client: only override umask_cb with non-NULL values
Client::init sets this, but if we later call ll_register_callbacks again
with a new set of function pointers that has umask_cb set to nullptr,
it'll override the value in the cmount.
Only reset umask_cb if the one in args is not nullptr.
Conflicts:
doc/cephfs/administration.rst
- nautilus has "filesystems" where master has "file systems"
- a difference that is not relevant to this backport
Jason Dillaman [Thu, 28 May 2020 20:38:40 +0000 (16:38 -0400)]
librbd: Watcher should not attempt to re-watch after detecting blacklisting
Currently, the Watcher state machine will spin as fast as it can sending
re-watch requests to the OSD and then retrying after it fails with the
EBLACKLISTED error. Treat a blacklisting similarly to how removal of the
object is treated: stop attempting to re-watch.
Fixes: https://tracker.ceph.com/issues/45715 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 6be1d49c35be4c937664939947a52f33696b0d8f)