Or Ozeri [Tue, 9 Mar 2021 20:14:49 +0000 (22:14 +0200)]
librbd: crypto format api semantics change
This commit alters the semantics of the encryption format api
to also load the encryption after format completes.
Additionally, several other small changes in librbd crypto are included,
in preparation of supporting clone formatting.
Jason Dillaman [Wed, 3 Mar 2021 19:38:35 +0000 (14:38 -0500)]
qa/objectstore: reduce debug log levels for bluestore
This will help speed up teuthology jobs for non-RADOS suites
where previously tests were IO bound due to excessive logging
and the artifact collection was slowed due to very large OSD
logs.
Jason Dillaman [Wed, 3 Mar 2021 19:26:38 +0000 (14:26 -0500)]
qa/suites: move RADOS tests to use new debug log objectstores
This will retain the debug log settings for all RADOS suites
that were previously symlinked to the 'objectstore'
directory. The next commit will reduce the debug log level
for the original 'objectstore' directory for the remainder
of tests.
Kefu Chai [Sat, 6 Mar 2021 16:32:42 +0000 (00:32 +0800)]
.github: correct the regex in mileston workflow
also use pull_request_target event so the action is run in the
context of the base of the pull request. this helps us to overcome
the "Resource not accessible by integration" issue where the action
is run in the context of the pull request.
Sage Weil [Mon, 8 Mar 2021 15:42:59 +0000 (09:42 -0600)]
Merge PR #39856 into pacific
* refs/pull/39856/head:
qa/distro/ubuntu_20.04_podman: Avoid getting asked
qa/suites/rados/cephadm: drop centos/rhel cephadm tests for the moment
qa/sites/rados/cephadm/thrash: rename 3-tasks.yaml/ -> 3-tasks/
qa/suites/rados/cephadm: adjust distros
qa/suites/upgrade: use kubic; test all distros
qa/suites/rados/cephadm/upgrade: use kubic on centos
qa: new kubic distro files; use kubic podman for centos/rhel
qa/suites/rados/cephadm: Add 20.04 podman:testing
Sage Weil [Wed, 3 Mar 2021 14:14:29 +0000 (08:14 -0600)]
qa: new kubic distro files; use kubic podman for centos/rhel
The current centos/rhel version of podman (2.2.1) is broken.
- create new qa/distros/podman/* files that install kubic podman
- include centos/rhel variants
- adjust cephadm jobs to use new yaml files
- remove old qa/distros/all/*_podman.yaml files
Sage Weil [Thu, 4 Mar 2021 21:08:22 +0000 (15:08 -0600)]
Merge PR #39737 into pacific
* refs/pull/39737/head:
mgr/DaemonServer: osd ok-to-stop: return json when there are unknown PGs
doc/man/8/ceph: document --max option
src/test/osd/safe-to-destroy: adjust test
ceph: print command output to stdout even on error
mgr/DaemonServer: include details in 'osd ok-to-stop' output
mgr: add --max <n> to 'osd ok-to-stop' command
mgr: relax osd ok-to-stop condition on degraded pgs
Sage Weil [Thu, 4 Mar 2021 18:49:37 +0000 (12:49 -0600)]
Merge PR #39736 into pacific
* refs/pull/39736/head:
crush/CrushWrapper: rebuild shadow tree on 'osd crush reweight-subtree'
crush/CrushWrapper: update shadow trees on update_item()
Sage Weil [Thu, 4 Mar 2021 13:35:24 +0000 (08:35 -0500)]
mgr/DaemonServer: osd ok-to-stop: return json when there are unknown PGs
In 791952cc01201010f298033003ba52374cc0159f we switched to return JSON
both on success and fail to describe which PGs are affected or are blocking
the ability to stop/restart OSDs. Do the same for the case where
some PG states are unknown (i.e., just after a mgr restart) so that
the cephadm upgrade process can unconditionally expect a JSON result.
Sage Weil [Thu, 18 Feb 2021 14:27:49 +0000 (08:27 -0600)]
mgr/devicehealth: extract+store wear level from metrics scraping
When we scrape and store health metrics for a device, extract the wear
level from the JSON. If present, also store it in the config-key
per-device metadata.
Sage Weil [Mon, 8 Feb 2021 18:53:24 +0000 (12:53 -0600)]
common/blkdev: collect non-SMART data too
Call smartctl with -x instead of -a:
-a, --all
Prints all SMART information about the disk, or TapeAlert infor‐
mation about the tape drive or changer. For ATA devices this is
equivalent to
'-H -i -c -A -l error -l selftest -l selective'
and for SCSI, this is equivalent to
'-H -i -A -l error -l selftest'.
For NVMe, this is equivalent to
'-H -i -c -A -l error'.
Note that for ATA disks this does not enable the non-SMART
options and the SMART options which require support for 48-bit
ATA commands.
vs
-x, --xall
Prints all SMART and non-SMART information about the device. For
ATA devices this is equivalent to
'-H -i -g all -g wcreorder -c -A -f brief -l xerror,error -l
xselftest,selftest -l selective -l directory -l scttemp -l scterc
-l devstat -l defects -l sataphy'.
and for SCSI, this is equivalent to
'-H -i -g all -A -l error -l selftest -l background -l sasphy'.
For NVMe, this is equivalent to
'-H -i -c -A -l error'.
Sage Weil [Fri, 26 Feb 2021 16:42:52 +0000 (11:42 -0500)]
mon/ConfigMonitor: make config changes via KVMonitor's pending set
We need to ensure that changes we make to the kv store (config/...)
are proposed via KVMonitor so that they are properly versioned there
and shared with subscribers (notably, the mgr).
Fixes: bb7ebc41532aeb23cff2241ab07b3f01c2f57ddd Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit dab72abd0ae8a3038f73dbe0983b2eaef3937ef6)
myoungwon oh [Tue, 16 Feb 2021 05:42:44 +0000 (14:42 +0900)]
osd, test: wait if the snapshot is deleting
After calling selfmanaged_snap_remove, we don't know
when trimming snapshot is finished.
So, we make the OSD to return EBUSY if the snapshot in removed_snap_queue,
then the unit test waits the completion
myoungwon oh [Mon, 18 Jan 2021 03:16:32 +0000 (12:16 +0900)]
src/test: fix to avoid fail notification when testing manifest refcount
Due to false-positive design on manifest snap refcounting,
a message to decrement the refcount can be missing.
This commit checks whether the manifest object's state is correct
when such mismatch happens to prevent aborting unit test.
J. Eric Ivancich [Fri, 29 Jan 2021 17:03:50 +0000 (12:03 -0500)]
rgw: add rgw-gap-list-comparator tool
The rgw-gap-list tool can produce a number of false positives when the
cluster is being used during its run. One technique to minimize the
number of false positives is to run the tool twice and look for the
objects that appear in both lists. The rgw-gap-list-comparator tool is
designed to do this comparison.
J. Eric Ivancich [Thu, 17 Dec 2020 23:21:36 +0000 (18:21 -0500)]
rgw: add rgw-gap-list tool
Due to a prior bug (pr: 38228) tail rados objects of some RGW objects
could have been incorrectly deleted. This tool is designed to look for
such cases. It essentially does the opposite of rgw-orphan-list,
looking for rados objects that RGW expects to be there, but which are
not to be found.
IMPORTANT: This is very experimental at this point in time, and any
"results" produced should be verified by other means.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com> Signed-off-by: Michael Kidd <linuxkidd@gmail.com>
(cherry picked from commit 07b42195fbbcd27e330cb1daa35e77e0952f8a3c)
A new command-line option "--rgw-obj-fs" is added to
radosgw-admin. When used with the "bucket radoslist" subcommand, will
output lines with a rados object, bucket name, and object name,
separated by the field separator specified. Without this command-line
option, only the rados object is output, which is the previous
behavior.
Adam Kupczyk [Sat, 30 Jan 2021 11:57:05 +0000 (12:57 +0100)]
os/bluestore: Add option to check BlueFS reads
Add option "bluefs_check_for_zeros" to check if there are any zero-filled page.
If so, reread data. It is known that sometimes BlueStore gets such pages.
See "bluestore_retry_disk_reads".
if we are creating an osd which has the same id as a previously
removed 'in' osd, we should not mark this newly created osd as 'in'
This isn't actually a good idea, however. If we are creating (or reusing)
a new OSD id, the OSD that starts up will have no data. So no matter what
there will be a data migration from the before state to the final state.
If we mark the osd OUT when the osd id is allocated but before the OSD
starts up, we'll create a middle state where PGs are mapped to the id (by
virtue of the CRUSH weight) and then remapped away (due to out), creating
a middle state where a bunch of PGs will repeer and maybe data will move.
Instead, we have two cases:
1) If we are reusing a DESTROYED osd id, we should leave the in/out
state the way it was. This way we still go straight from the before
state to the after state (the osd will mark itself in when it starts up).
2) If we are allocating a new id in do_osd_create(), we want the OSD
to be IN, so there is no middle state. Unfortunately, we have to work
around apply_incremental() being obnoxious here: it's sloppy implementation
will implicitly set EXISTS by virtue of new_osd_weight (the mark IN part)
before applying the osd_state XOR, so be careful! (This behavior is
mirrored by the Linux kernel implementation too, thankfully.)
Kefu Chai [Wed, 10 Feb 2021 08:30:49 +0000 (16:30 +0800)]
mgr/rbd_support: bail out when snapshot mirroring is not enabled
before this change, we continue on and try to get mirror info of
specified image after calling close_image(), even the snapshot mirroring
is not enabled.
after this change, we bail out after calling close_image(). this
behavior is consistent with other places where we handle errors.