qa/*/osd-scrub-repair.sh: don't fail if PG is in active+clean+wait
a0b453ad335671bd92f165115d6ee984d2412448 added the wait state, which can
make PGs stay in active+clean+wait for a while instead of going into
active+clean directly. As far as TEST_auto_repair_bluestore_failed is
concerned, we only care about the repair state being cleared.
Jason Dillaman [Wed, 22 Apr 2020 17:51:58 +0000 (13:51 -0400)]
rbd-mirror: skip snapshot image sync if mirror snapshot is marked clean
This is currently only utilized for the case where a newly created image
has mirroring enabled at time of creation, but it could be expanded in the
future if we track writes.
Fixes: https://tracker.ceph.com/issues/44596 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit fbede28d1a7713248765cf91136bb19b3b3fdac2)
Jason Dillaman [Wed, 22 Apr 2020 15:57:45 +0000 (11:57 -0400)]
librbd: EnableRequest now accepts a boolean to indicate a clean image
If the image is clean, it's treated as if it was newly created and
therefore clean since snapshot id 0. The CreateRequest and
CloneRequest state machines pass true for this bool if mirroring
is being enabled during creation.
Jason Dillaman [Mon, 20 Apr 2020 22:16:40 +0000 (18:16 -0400)]
librbd: mirror enable state machine might need to open image
If attempting to create a snapshot-based mirroring primary snapshot,
the image needs to first be opened. If we weren't supplied an image,
open the image, create the snapshot, and close the image again.
Jason Dillaman [Mon, 20 Apr 2020 19:23:53 +0000 (15:23 -0400)]
librbd: pass bit-flags to image::CreateRequest
The current boolean for skip enable mirror will be able to be changed to
a tri-state to force enable mirror (in addition to the current auto-enable
if in pool-mode).
Or Friedmann [Sun, 8 Mar 2020 13:34:48 +0000 (15:34 +0200)]
rgw: Disable prefetch of entire head object when GET request with range header
Disable prefetch of entire head object when GET request with range header.
The current behavior for the RGW is getting the whole object although the client asked only for a small bytes offset.
For example: If the client asked for bytes=0-1, The RGW will anyway fetch 0-4194304
Fixes: https://tracker.ceph.com/issues/44508 Signed-off-by: Or Friedmann <ofriedma@redhat.com>
(cherry picked from commit 2be5af0006169cb54547034aa98b7eacb8751d59)
Introduced in 4d42b4c5a0ed ("common/TextTable: default to 2 spaces
separating columns") and 41f003518a07 ("common/TextTable: only pad
between columns").
Jason Dillaman [Fri, 17 Apr 2020 15:17:05 +0000 (11:17 -0400)]
rbd-mirror: track in-flight start/stop/restart in instance replayer
The shut down waits for in-flight ops to complete but the
start/stop/restart operations were previously not tracked. This
could cause a potential race and crash between an image replayer
operation and the instance replayer shutting down.
Fixes: https://tracker.ceph.com/issues/45072 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 31140a940ea1909c4b5d68ef4593cb582a527354)
Jason Dillaman [Mon, 6 Apr 2020 20:21:35 +0000 (16:21 -0400)]
rbd-mirror: propagate full snap-seq mapping in non-primary snapshots
Previously only newly created user snapshots were included in the
non-primary snapshot snap-seq mapping table. However, we need to
retain a full history of the mapping table if we want to be able to
prune non-primary snapshots.
Failovers are a special case since we won't have a valid snap seq mapping
so it will need to be rebuilt. Luckily, both sides should be read-only
in the previous state so we can use the snapshot names to find matches.
Jason Dillaman [Tue, 7 Apr 2020 23:12:03 +0000 (19:12 -0400)]
rbd-mirror: ignore non-primary read-only state for remote images
snapshot-based mirroring needs to be able to potentially delete a
demotion snapshot during the unlink process. Previously, these
snapshots have been left while the read-only error was ignored.
Jason Dillaman [Thu, 9 Apr 2020 03:06:05 +0000 (23:06 -0400)]
librbd: fixed race condition on demotion of snapshot-based mirrored image
A pending refresh could occur after setting the non-primary feature flag but
before the creation of the demotion snapshot. This would prevent the snapshot
from being created and would leave the image in a half-primary state.
The mirror image status for replaying snapshot-based images now includes
bytes per second and per snapshot, in addition to an estimated number of
seconds until the image is fully synced.
The mirror image status for replaying journal-based images now includes
bytes and entries per second in addition to an estimated number of seconds
until the image is fully synced.
Jason Dillaman [Wed, 1 Apr 2020 19:26:39 +0000 (15:26 -0400)]
rbd-mirror: switch to json_spirit formatter for journal image status
The free-form journal replay status description is now JSON-encoded. The
"master"/"mirror" designators have been changed to "primary"/"non_primary"
to better align with RBD terminology.
Jason Dillaman [Thu, 2 Apr 2020 18:50:37 +0000 (14:50 -0400)]
rbd-mirror: periodically poll image replayer status
When metrics are incorporated, there might not be a forced status update
if no new data is available to replicate. However, we will want the metrics
to decrease over time.
Yan Jun [Fri, 27 Mar 2020 01:49:05 +0000 (09:49 +0800)]
osd/PrimaryLogPG: fix SPARSE_READ stat
22960192 use readv to reimplement SPARSE_READ, however it is
still using total_read to accumulate total bytes it reads from
bluestore, which is always zero in code.
Fix by dropping the redundant local total_read counter.
By definition objects_readv_sync should return the correct bytes
it has read. Use that instead.
xie xingguo [Fri, 13 Mar 2020 00:45:52 +0000 (08:45 +0800)]
qa/osd-recovery: pass osd_pg_log_trim_min = 0 to exercise short pg logs
we have osd_min_pg_log_entries to 2 (good) but not osd_pg_log_trim_min
which defaults to 100. Thus, even on those tests we're only rarely vulnerable.
Reset osd_min_pg_log_entries to 0 to make sure we really
would keep a minimal pg log in hand.
xie xingguo [Thu, 12 Mar 2020 23:59:07 +0000 (07:59 +0800)]
qa/short_pg_log: pass osd_pg_log_trim_min = 0 to exercise short pg logs
we have osd_min_pg_log_entries to 2 (good) but not osd_pg_log_trim_min
which defaults to 100. Thus, even on those tests we're only rarely vulnerable.
Reset osd_min_pg_log_entries to 0 to make sure we really
keep a minimal pg log in hand.
xie xingguo [Thu, 12 Mar 2020 10:01:45 +0000 (18:01 +0800)]
osd/PeeringState: do not trim pg log past last_update_ondisk
Trimming past last_update_ondisk would be really bad, e.g.,
a new interval change would cancel&redo a previous op, and if
we trim past last_update_ondisk, there could be potential
object inconsistencies as log merging won't necessarily be able
to find all divergent entries later (we lost track of the unfinished
op that should really be reverted).
include/denc: replace bufferlist::copy with iterator version
This version was only compiled as part of ceph-object-corpus
generation, when ENCODE_DUMP_PATH is defined, so it was missed
when bufferlist::copy() was removed.
Igor Fedotov [Mon, 3 Feb 2020 15:50:50 +0000 (18:50 +0300)]
os/bluestore: do not use 'unused' bitmap if makes no sense.
The processing logic which relies on 'unused' bitmap makes sense for
bluestore setup where min alloc size is different from device block
size. Now omitting if that's not true.
Nathan Cutler [Fri, 6 Mar 2020 09:09:27 +0000 (10:09 +0100)]
rpm: drop "is_opensuse" conditional in SUSE-specific bcond block
Until now, "ocf" and "libradosstriper" were disabled on SLE, but not
openSUSE.
Leaving them enabled for openSUSE makes it appear as if these features
are expected to do something useful on SUSE.
Dropping the "is_opensuse" conditional has the desirable side effect of
streamlining the SUSE bcond block, and in the spirit of "and that's not
all", we take the opportunity to put the bconds in alphabetical order
for comforting cosmetic effect.
This is directly fixed in octopus since it's needed
to get the dashboard backend API tests running
based on the previous commits of this PR.
There is a fix in master but this one
uses "from io import StringIO" while
we still need to be python2 compatible in octopus.
So this import is done with "six" (expects str in python2
instead of "io" (expects unicode
in python2).
This import line is not cherry-picked from master
due to the above mentioned reasons.
Sebastian Wagner [Wed, 22 Apr 2020 16:26:13 +0000 (18:26 +0200)]
cephadm: improve warn message
make it more fancy!
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com> Co-authored-by: Michael Fritch <mfritch@suse.com>
(cherry picked from commit 02c46b6d811765904ee3042a0240e910d70eb7e3)
Sebastian Wagner [Wed, 22 Apr 2020 09:51:49 +0000 (11:51 +0200)]
cephadm: Aquire lock, if fsid != None
Fixes:
```
Traceback (most recent call last):
File "./cephadm", line 4494, in <module>
r = args.func()
File "./cephadm", line 1077, in _infer_fsid
return func()
File "./cephadm", line 1103, in _infer_image
return func()
File "./cephadm", line 2813, in command_ceph_volume
l = FileLock(args.fsid)
File "./cephadm", line 560, in __init__
self._lock_file = os.path.join(LOCK_DIR, name + '.lock')
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
```