Greg Farnum [Wed, 13 Jan 2016 21:17:53 +0000 (13:17 -0800)]
fsx: checkout old version until it compiles properly on miras
I sent a patch to xfstests upstream at
http://article.gmane.org/gmane.comp.file-systems.fstests/1665, but
until that's fixed we need a version that works in our test lab.
Kefu Chai [Thu, 28 Jan 2016 10:09:53 +0000 (02:09 -0800)]
mon: compact full epochs also
by compacting the ${prefix}.${start}..${prefix}..${end} does not
necessary compact the range of ${prefix}."full_"${start}..
${prefix}."full_"${end}. so when more and more epochs get trimmed
with out a full range compaction, the size of monitor store could
be very large.
ReplicatedPG::prepare_transaction(): check if the pool is full before
updating the cached ObjectContext to avoid the discrepancy between
the cached and the actual object size (and other metadata).
While at it improve the check itself: consider cluster full flag,
not just the pool full flag, also consider object count changes too,
not just bytes.
Conflicts:
src/osd/ReplicatedPG.cc
code section was moved to ReplicatedPG::maybe_promote
in master. Signed-off-by: Robert LeBlanc <robert.leblanc@endurance.com>
Sage Weil [Wed, 25 Nov 2015 19:39:08 +0000 (14:39 -0500)]
osd/ReplicatedPG: fix promotion recency logic
Recency is defined as how many of the last N hitsets an object
must appear in in order to be promoted. The previous logic did
nothing of the sort... it checked for the object in any one of
the last N hitsets, which led to way to many promotions and killed
any chance of the cache performing properly.
While we are here, we can simplify the code to drop the max_in_*
fields (no longer necessary).
Note that we may still want a notion of 'temperature' that does
tolerate the object missing in one of the recent hitsets.. but
that would be different than recency, and should probably be
modeled after the eviction temperature model.
Backport: infernalis, hammer Reported-by: Nick Fisk <nick@fisk.me.uk> Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 180c8743addc5ae2f1db9c58cd2996ca6e7ac18b)
Conflicts:
src/osd/ReplicatedPG.cc
code section was moved to ReplicatedPG::maybe_promote
in master. Signed-off-by: Robert LeBlanc <robert.leblanc@endurance.com>
man/rados.8: also added the rendered man.8 man page, as we don't
put the generated man pages in master anymore. but
they are still in the hammer's source repo.
Douglas Fuller [Fri, 22 Jan 2016 19:18:40 +0000 (11:18 -0800)]
rbd: remove canceled tasks from timer thread
When canceling scheduled tasks using the timer thread, TaskFinisher::cancel
does not call SafeTimer::cancel_event, so events fire anyway. Add this call.
this command repeatly add the latest pgmap to the monstore in order
to inflate it. the command helps with the testing of some monstore
related performance issue of monitor
Kefu Chai [Fri, 19 Jun 2015 14:57:57 +0000 (22:57 +0800)]
tools/ceph-monstore-tools: add rewrite command
"rewrite" command will
- add a new osdmap version to update current osdmap held by OSDMonitor
- add a new paxos version, as a proposal it will
* rewrite all osdmap epochs from specified epoch to the last_committed
one with the specified crush map.
* add the new osdmap which is added just now
so the leader monitor can trigger a recovery process to apply the transaction
to all monitors in quorum, and hence bring them back to normal after being
injected with a faulty crushmap.
Ken Dreyer [Mon, 18 Jan 2016 15:24:46 +0000 (08:24 -0700)]
osd: disable filestore_xfs_extsize by default
This option involves a tradeoff: When disabled, fragmentation is worse,
but large sequential writes are faster. When enabled, large sequential
writes are slower, but fragmentation is reduced.
Loic Dachary [Fri, 29 Jan 2016 03:36:05 +0000 (10:36 +0700)]
Merge pull request #7316 from ceph/wip-deb-lttng-hammer
deb: strip tracepoint libraries from Wheezy/Precise builds
All other "modern" Debian-based OSes have a functional LTTng-UST. Since only hammer needs to build on these older distros, this fix only affects the deb building process for those two releases(since autoconf detects that LTTng is broken).
Kefu Chai [Tue, 5 May 2015 07:07:33 +0000 (15:07 +0800)]
configure.ac: no use to add "+" before ac_ext=c
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 899dd23766c5ae6fef39bf24ef0692127d17deb3) Signed-off-by: Robin H. Johnson <robin.johnson@dreamhost.com>
Herve Rousseau [Fri, 6 Nov 2015 08:52:28 +0000 (09:52 +0100)]
rgw: fix reload on non Debian systems.
When using reload in non-debian systems, /bin/sh's kill is used to send the HUP signal to the radosgw process.
This kill version doesn't understand -SIGHUP as a valid signal, using -HUP does work.
xiexingguo [Tue, 22 Dec 2015 09:05:06 +0000 (17:05 +0800)]
ReplicatedPG: fix sparse-read result code checking logic
Move ahead the result code checking logic before we continue to verify the trailing hole, otherwise
the real result of non-hole reading may be overwritten and thus confuse caller.
but so happens to intruduce a line that wasn't on the original patch. We
imagine it was meant to make the 's->osd_epoch' assignment work without
checking the session, as per the original patch, but the backporter must
have forgotten to also backport the assertion on the not-null session.
The unfortunate introduction of the check for a not-null session
triggered this regression.
The regression itself is due to enforcing that a session exists for the
osd we are sending the incrementals to. However, if we come via the
OSDMonitor::process_failures() path, that may very well not be the case,
as we are handling potentially-old MOSDFailure messages that may no
longer have an associated session. By enforcing the not-null session, we
don't check whether we have the requested versions (i.e., if
our_earliest_version <= requested_version), and thus we end up on the
path that assumes that we DO HAVE all the necessary versions -- when we
may not, thus finally asserting because we are reading blank
incremental versions.
Fixes: #14236 Signed-off-by: Joao Eduardo Luis <joao@suse.de>
Sage Weil [Mon, 14 Dec 2015 18:00:27 +0000 (13:00 -0500)]
osd/ReplicatedPG: do not set local_mtime on non-tiered pool
If a pool isn't tiered, don't bother with setting local_mtime. The only
users are the tiering agent (which isn't needed if there is not tiering)
and scrub for deciding if an object should get its digest recorded (we can
use mtime instead).
xiexingguo [Mon, 2 Nov 2015 13:46:11 +0000 (21:46 +0800)]
Objecter: remove redundant result-check of _calc_target in _map_session.
Result-code check is currently redundant since _calc_target never returns a negative value. Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 5a6117e667024f51e65847f73f7589467b6cb762)
xiexingguo [Thu, 29 Oct 2015 09:32:50 +0000 (17:32 +0800)]
Objecter: potential null pointer access when do pool_snap_list.
Objecter: potential null pointer access when do pool_snap_list. Shall check pool existence first. Fixes: #13639 Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 865541605b6c32f03e188ec33d079b44be42fa4a)
Chengyuan Li [Fri, 20 Nov 2015 05:29:39 +0000 (22:29 -0700)]
mon/PGMonitor: MAX AVAIL is 0 if some OSDs' weight is 0
In get_rule_avail(), even p->second is 0, it's possible to be used
as divisor and quotient is infinity, then is converted to an integer
which is negative value.
So we should check p->second value before calculation.
Piotr Dałek [Mon, 6 Jul 2015 07:56:11 +0000 (09:56 +0200)]
tools: fix race condition in seq/rand bench
Under certain conditions (like bench seq/rand -b 1024 -t 128) it is
possible that aio_read reads data into destination buffers before or
during memcmp execution, resulting in "[..] is not correct!" errors
even if actual objects are perfectly fine.
Also, moved latencty calculation around, so it is no longer affeted
by memcmp.
Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com>
Conflicts:
src/common/obj_bencher.cc
Piotr Dałek [Wed, 20 May 2015 10:41:22 +0000 (12:41 +0200)]
tools: add --no-verify option to rados bench
When doing seq and rand read benchmarks using rados bench, a quite large
portion of cpu time is consumed by doing object verification. This patch
adds an option to disable this verification when it's not needed, in turn
giving better cluster utilization. rados -p storage bench 600 rand scores
without --no-verification:
Total time run: 600.228901
Total reads made: 144982
Read size: 4194304
Bandwidth (MB/sec): 966
Average IOPS: 241
Stddev IOPS: 38
Max IOPS: 909522486
Min IOPS: 0
Average Latency: 0.0662
Max latency: 1.51
Min latency: 0.004
real 10m1.173s
user 5m41.162s
sys 11m42.961s
Same command, but with --no-verify:
Total time run: 600.161379
Total reads made: 174142
Read size: 4194304
Bandwidth (MB/sec): 1.16e+03
Average IOPS: 290
Stddev IOPS: 20
Max IOPS: 909522486
Min IOPS: 0
Average Latency: 0.0551
Max latency: 1.12
Min latency: 0.00343
real 10m1.172s
user 4m13.792s
sys 13m38.556s
Note the decreased latencies, increased bandwidth and more reads performed.