Zhiqiang Wang [Thu, 20 Aug 2015 01:48:07 +0000 (09:48 +0800)]
doc: fix the format of peering.rst
Fix an incorrent number in the ordered list and some indention issue.
Make the ordered list to use '1' or 'a' for the first item, and '#' for
the remaining items.
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
Reviewed-by: Samuel Just <sjust@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
Conflicts:
src/include/ceph_features.h
src/osd/ReplicatedPG.cc
src/osd/ReplicatedPG.h
Sage Weil [Sun, 9 Aug 2015 14:46:10 +0000 (10:46 -0400)]
osd/PGLog: dirty_to is inclusive
There are only two callers of mark_dirty_to who do not pass max,
and they are both in the merge_log extending tail path. In that
case, we want to include the last version specified in the log
writeout. Fix the tail extending code to always specify the
last entry added, inclusive.
Fixes: #12652 Signed-off-by: Sage Weil <sage@redhat.com>
Boris Ranto [Tue, 11 Aug 2015 07:13:01 +0000 (09:13 +0200)]
selinux: Relabel files if and only if the policy version changed
Currently, the ceph files are being relabelled every time the package is
rebuilt. Fix this by checking the policy versions and relabel the files
only if the policy actually changed (different policy version was
detected).
Samuel Just [Mon, 27 Jul 2015 20:12:25 +0000 (13:12 -0700)]
ReplicatedPG: enforce write ordering on rollback
Previously, rollback ops could reorder w.r.t other writes due to waiting
on degraded snaps other than head. To fix that, we'll introduce a new
map tracking objects blocked on degraded snaps. A particular object can
only be blocked on one snap at a time (subsequent writes won't get far
enough to add another entry).
It might have been possible use the blocked_by machinery for this, but
it requires that the object have an extant obc, which we may not
have for a missing object. Also, that machinery exists primarily to
support clone_range, which I hope to remove soon.
Zhiqiang Wang [Wed, 10 Jun 2015 06:21:36 +0000 (14:21 +0800)]
osd: copy the reqids even if the object is deleted during promotion
If the object is deleted on the base tier, and the reqids are not copied
during promotion, this again leads to the 'ops not idempotent' problem.
For the copy-get op, this fix copies the reqids even if the object doesn't
exist.
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
Zhiqiang Wang [Tue, 2 Jun 2015 08:36:56 +0000 (16:36 +0800)]
osd: purge the object from the cache when proxying and not promoting the op
When proxying the write/cache op, if it is decided to not promote the
object, need to purge it from the object_contexts cache. Otherwise, it
causes problems for the later ops on this object.
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
Zhiqiang Wang [Tue, 2 Jun 2015 08:20:35 +0000 (16:20 +0800)]
osd: set the blocked_by relationship when rolling back to a degraded
object
In a scenario like below:
- A rollback op comes in, and is enqueued.
- Several other ops on the same object come in, and are enqueued.
- The rollback op dispatches, and finds the object which it rollbacks to is
degraded, then this op is pushbacked into a list to wait for the degraded
object to recover.
- The later ops are handled and responded back to client.
- The degraded object recovers. The rollback op is enqueued again and finally
responded to client.
This breaks the op order. Need to set the blocked_by relationship to enqueue
the later ops until the degraded object recovers.
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
Zhiqiang Wang [Wed, 27 May 2015 06:02:33 +0000 (14:02 +0800)]
osd: explicitly set the reqid when proxying the write op
This is needed as in the following scenario:
- Client sends 3 writes and a read on the same object to base tier
- Set up cache tiering
- Client retries ops and sends the 3 writes and 1 read to the cache tier
- The 3 writes finished on the base tier, say with versions v1, v2 and v3
- Cache tier proxies the 1st write, and start to promote the object for the 2nd
write, the 2nd and 3rd writes and the read are blocked
- The proxied 1st write finishes on the base tier with version v4, and returns
to cache tier. But somehow the cache tier fails to send the reply due to socket
failure injecting
- Client retries the writes and the read again, the writes are identified as
dup ops
- The promotion finishes, it copies the pg_log entries from the base tier and
put it in the cache tier's pg_log. This includes the 3 writes on the base tier
and the proxied write
- The writes dispatches after the promotion, they are identified as completed
dup ops. Cache tier replies these write ops with the version from the base tier
(v1, v2 and v3)
- In the last, the read dispatches, it reads the version of the proxied write
(v4) and replies to client
- Client complains that 'racing read got wrong version'
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
Zhiqiang Wang [Thu, 18 Dec 2014 05:31:04 +0000 (13:31 +0800)]
osd/ReplicatedPG: promote on 2nd write
If min_write_recency_for_promote is
- 0: Promote when there is a write.
- 1: Check if the object is in current hit set. Promote if yes.
- else: Check if the object is in current and other in memory hit sets.
Promote if yes.
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
Conflicts:
src/osd/ReplicatedPG.cc
Loic Dachary [Tue, 18 Aug 2015 12:43:15 +0000 (14:43 +0200)]
ceph-disk: only call restorecon when available
9db80da12803d42bb676d67f37442c0c54d83448 added an unconditional call to
restorecon after mounting the filesystem. It fails when restorecon is
not available and must be made conditional.
Sage Weil [Mon, 10 Aug 2015 14:14:03 +0000 (10:14 -0400)]
osdc/Objecter: restart listing this PG if sort order changes
If the cluster sort order changes mid-way through our listing, our
cursor within this pg is meaningless and we need to restart at the
beginning of the PG.
xinxin shu [Thu, 13 Aug 2015 03:57:58 +0000 (11:57 +0800)]
fix print error of rados bench
Total time run: 12.279167
Total writes made: 92
Write size: 4194304
Bandwidth (MB/sec): 30
Stddev Bandwidth: 23.4
Max bandwidth (MB/sec): 64
Min bandwidth (MB/sec): 2
Average IOPS: 7
Stddev IOPS: 6
Max IOPS: 32767
Min IOPS: -1537890352
Average Latency: 2.12
Stddev Latency: 1.35
Max latency: 6.05
Min latency: 0.501