]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
10 years agoPG: mark_log_for_rewrite on resurrection
Samuel Just [Tue, 26 Aug 2014 23:53:02 +0000 (16:53 -0700)]
PG: mark_log_for_rewrite on resurrection

Fixes: #8777
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 8346e10755027e982f26bab4642334fd91cc31aa)

10 years agoReplicatedPG:start_flush send a second delete
Samuel Just [Mon, 8 Sep 2014 03:13:41 +0000 (20:13 -0700)]
ReplicatedPG:start_flush send a second delete

Suppose we start with the following in the cache pool:

30:[29,21,20,15,10,4]:[22(21), 15(15,10), 4(4)]+head

The object doesn't exist at 29 or 20.

First, we flush 4 leaving the backing pool with:

3:[]+head

Then, we begin to flush 15 with a delete with snapc 4:[4] leaving the
backing pool with:

4:[4]:[4(4)]

Then, we finish flushing 15 with snapc 9:[4] with leaving the backing
pool with:

9:[4]:[4(4)]+head

Next, snaps 10 and 15 are removed causing clone 10 to be removed leaving
the cache with:

30:[29,21,20,4]:[22(21),4(4)]+head

We next begin to flush 22 by sending a delete with snapc 4(4) since
prev_snapc is 4 <---------- here is the bug

The backing pool ignores this request since 4 < 9 (ORDERSNAP) leaving it
with:

9:[4]:[4(4)]

Then, we complete flushing 22 with snapc 19:[4] leaving the backing pool
with:

19:[4]:[4(4)]+head

Then, we begin to flush head by deleting with snapc 22:[21,20,4] leaving
the backing pool with:

22[21,20,4]:[22(21,20), 4(4)]

Finally, we flush head leaving the backing pool with:

30:[29,21,20,4]:[22(21*,20*),4(4)]+head

When we go to flush clone 22, all we know is that 22 is dirty, has snaps
[21], and 4 is clean. As part of flushing 22, we need to do two things:
1) Ensure that the current head is cloned as cloneid 4 with snaps [4] by
sending a delete at snapc 4:[4].
2) Flush the data at snap sequence < 21 by sending a copyfrom with snapc
20:[20,4].

Unfortunately, it is possible that 1, 1&2, or 1 and part of the flush
process for some other now non-existent clone have already been
performed.  Because of that, between 1) and 2), we need to send
a second delete ensuring that the object does not exist at 20.

Fixes: #9054
Backport: firefly
Related: 66c7439ea0888777b5cfc08bcb0fbd7bfd8653c3
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 4843fd510b33a71999cdf9c2cfa2b4c318fa80fd)

10 years agoReplicatedPG::start_flush: remove superfluous loop
Samuel Just [Mon, 11 Aug 2014 19:59:16 +0000 (12:59 -0700)]
ReplicatedPG::start_flush: remove superfluous loop

Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 66c7439ea0888777b5cfc08bcb0fbd7bfd8653c3)

10 years agoMerge remote-tracking branch 'origin/wip-9339' into wip-sam-testing-firefly
Samuel Just [Sun, 21 Sep 2014 17:03:53 +0000 (10:03 -0700)]
Merge remote-tracking branch 'origin/wip-9339' into wip-sam-testing-firefly

10 years agoPGLog::claim_log_and_clear_rollback_info: fix rollback_info_trimmed_to
Samuel Just [Mon, 15 Sep 2014 22:44:11 +0000 (15:44 -0700)]
PGLog::claim_log_and_clear_rollback_info: fix rollback_info_trimmed_to

We have been setting it to the old head value.  This is usually
harmless since the new head will virtually always be ahead of the
old head for claim_log_and_clear_rollback_info, but can cause trouble
in some edge cases.

Fixes: #9481
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 0769310ccd4e0dceebd8ea601e8eb5c0928e0603)

10 years agoMerge remote-tracking branches 'origin/wip-9497' and 'origin/wip-9482' into wip-log...
Samuel Just [Thu, 18 Sep 2014 16:46:38 +0000 (09:46 -0700)]
Merge remote-tracking branches 'origin/wip-9497' and 'origin/wip-9482' into wip-log-crash-firefly

10 years agoPG::find_best_info: let history.last_epoch_started provide a lower bound 2519/head
Samuel Just [Mon, 15 Sep 2014 23:53:21 +0000 (16:53 -0700)]
PG::find_best_info: let history.last_epoch_started provide a lower bound

If we find a info.history.last_epoch_started above any
info.last_epoch_started, we must be missing updates and
min_last_update_acceptable should provisionally be max().

Fixes: #9482
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoPG::choose_acting: let the pg go down if acting is smaller than min_size 2520/head
Samuel Just [Wed, 17 Sep 2014 03:36:51 +0000 (20:36 -0700)]
PG::choose_acting: let the pg go down if acting is smaller than min_size

Even if the backfill peer would bring us up to min_size, we can't go
active since build_prior will not consider the interval maybe_went_rw.

Fixes: #9497
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agolibrbd: fix crash using clone of flattened image
Josh Durgin [Thu, 24 Jul 2014 22:29:40 +0000 (15:29 -0700)]
librbd: fix crash using clone of flattened image

The crash occurs due to ImageCtx->parent->parent being uninitialized,
since the inital open_parent() -> open_image(parent) ->
ictx_refresh(parent) occurs before ImageCtx->parent->snap_id is set,
so refresh_parent() is not called to open an ImageCtx for the parent
of the parent. This leaves the ImageCtx->parent->parent NULL, but the
rest of ImageCtx->parent updated to point at the correct parent snapshot.

Setting the parent->snap_id earlier has some unintended side effects
currently, so for now just call refresh_parent() during
open_parent(). This is the easily backportable version of the
fix. Further patches can clean up this whole initialization process.

Fixes: #8845
Backport: firefly, dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 2545e80d274b23b6715f4d8b1f4c6b96182996fb)

10 years agoinit-radosgw.sysv: Support systemd for starting the gateway
JuanJose 'JJ' Galvez [Mon, 15 Sep 2014 03:38:20 +0000 (20:38 -0700)]
init-radosgw.sysv: Support systemd for starting the gateway

When using RHEL7 the radosgw daemon needs to start under systemd.

Check for systemd running on PID 1. If it is then start
the daemon using: systemd-run -r <cmd>. pidof returns null
as it is executed too quickly, adding one second of sleep and
script reports startup correctly.

Signed-off-by: JuanJose 'JJ' Galvez <jgalvez@redhat.com>
(cherry picked from commit ddd52e87b25a6861d3b758a40d8b3693a751dc4d)

10 years agoMerge pull request #2479 from ceph/wip-9444
Sage Weil [Sat, 13 Sep 2014 00:31:03 +0000 (17:31 -0700)]
Merge pull request #2479 from ceph/wip-9444

mds: fix root and mdsdir inodes' rsubdirs

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agomds: fix root and mdsdir inodes' rsubdirs 2479/head
Yan, Zheng [Fri, 2 May 2014 15:08:41 +0000 (23:08 +0800)]
mds: fix root and mdsdir inodes' rsubdirs

inode rstat accounts inode itself.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
(cherry picked from commit da17394941386dab88ddbfed4af2c8cb6b5eb72f)

10 years agoFileStore: report l_os_j_lat as commit latency
Samuel Just [Tue, 9 Sep 2014 21:03:50 +0000 (14:03 -0700)]
FileStore: report l_os_j_lat as commit latency

l_os_commit_lat is actually the commit cycle latency.

Fixes: #9269
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit d165238b4ee7e925e06ca22890c1e9dac101a7da)

10 years agoObjecter::_recalc_linger_op: resend for any acting set change
Samuel Just [Tue, 9 Sep 2014 19:58:07 +0000 (12:58 -0700)]
Objecter::_recalc_linger_op: resend for any acting set change

Fixes: #9220
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 1349383ac416673cb6df2438729fd2182876a7d1)

Conflicts:

src/osdc/Objecter.cc
src/osdc/Objecter.h

10 years agoosdc/Objecter: revoke rx_buffer on op_cancel
Sage Weil [Mon, 8 Sep 2014 20:44:57 +0000 (13:44 -0700)]
osdc/Objecter: revoke rx_buffer on op_cancel

If we cancel a read, revoke the rx buffers to avoid a use-after-free and/or
other undefined badness by using user buffers that may no longer be
present.

Fixes: #9362
Backport: firefly, dumpling
Reported-by: Matthias Kiefer <matthias.kiefer@1und1.de>
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 2305b2897acba38384358c33ca3bbfcae6f1c74e)

(adjusted for op->con instead of s->con)

10 years agoceph_test_rados_api_io: add read timeout test
Sage Weil [Mon, 8 Sep 2014 20:45:52 +0000 (13:45 -0700)]
ceph_test_rados_api_io: add read timeout test

Verify we don't receive data after a timeout.

Based on reproducer for #9362 written by
Matthias Kiefer <matthias.kiefer@1und1.de>.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit f295c1fee4afb9447cdf46f05a44234274d23b6c)

10 years agoceph_test_rados_api_*: expose nspace
Sage Weil [Mon, 8 Sep 2014 20:42:43 +0000 (13:42 -0700)]
ceph_test_rados_api_*: expose nspace

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 977d289055d69ab8a7baaf7ef68c013019225833)

10 years agoRevert "PG: mark_log_for_rewrite on resurrection"
Samuel Just [Tue, 9 Sep 2014 19:40:51 +0000 (12:40 -0700)]
Revert "PG: mark_log_for_rewrite on resurrection"

Actually, we don't want to backport this one without the fix
for #9293.

This reverts commit 7ddf0a252bb887553b29fd93e58d01cac38835e6.

10 years agoReplicatedPG: create max hitset size 2437/head
Samuel Just [Wed, 3 Sep 2014 22:49:47 +0000 (15:49 -0700)]
ReplicatedPG: create max hitset size

Otherwise, hit_set_create could create an unbounded size hitset
object.

Fixes: #9339
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoPG::can_discard_op: do discard old subopreplies
Samuel Just [Wed, 27 Aug 2014 23:21:41 +0000 (16:21 -0700)]
PG::can_discard_op: do discard old subopreplies

Otherwise, a sub_op_reply from a previous interval can stick around
until we either one day go active again and get rid of it or delete the
pg which is holding it on its waiting_for_active list.  While it sticks
around futily waiting for the pg to once more go active, it will cause
harmless slow request warnings.

Fixes: #9259
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit ae3d87348ca4e2dde809c9593b0d54ce0469f7a0)

10 years agoPG: mark_log_for_rewrite on resurrection
Samuel Just [Tue, 26 Aug 2014 23:53:02 +0000 (16:53 -0700)]
PG: mark_log_for_rewrite on resurrection

Fixes: #8777
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 8346e10755027e982f26bab4642334fd91cc31aa)

10 years agodebian: only B-R yasm on amd64
Thorsten Glaser [Mon, 8 Sep 2014 19:49:50 +0000 (12:49 -0700)]
debian: only B-R yasm on amd64

Make yasm dependency amd64 only, it isn?t used elsewhere
but breaks x32 (which is mis-detected as amd64)

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 9ab46dc5b49219aa6194861c393c938f23001c52)

10 years agoosd: fix osd_tp shutdown
Sage Weil [Wed, 27 Aug 2014 00:43:10 +0000 (17:43 -0700)]
osd: fix osd_tp shutdown

We need to clear the queue, not just drain the currently executing jobs.

Fixes: #9218
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit c2f21c04207b9a2a65e514994a775632b36d6874)

Conflicts:

src/osd/OSD.cc

10 years agoosd/PG: fix crash from second backfill reservation rejection
Sage Weil [Wed, 27 Aug 2014 13:19:12 +0000 (06:19 -0700)]
osd/PG: fix crash from second backfill reservation rejection

If we get more than one reservation rejection we should ignore them; when
we got the first we already sent out cancellations.  More importantly, we
should not crash.

Fixes: #8863
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 2b13de16c522754e30a0a55fb9d072082dac455e)

10 years agomon/Paxos: don't spam log with is_readable at dout level 1
Sage Weil [Mon, 8 Sep 2014 13:58:45 +0000 (06:58 -0700)]
mon/Paxos: don't spam log with is_readable at dout level 1

Backport: firefly, dumpling
Reported-by: Aanchal Agrawal <Aanchal.Agrawal@sandisk.com>
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 62ca27d0b119b597ebad40dde64c4d86599e466d)

10 years agodoc: add note on soft JS dependency for navigating docs
Alfredo Deza [Thu, 4 Sep 2014 17:58:14 +0000 (13:58 -0400)]
doc: add note on soft JS dependency for navigating docs

Signed-off-by: Alfredo Deza <alfredo.deza@inktank.com>
(cherry picked from commit 657be818375bea2d8b5998ea1e5505eedc2f294d)

10 years agodoc: fix missing bracket
Alfredo Deza [Thu, 4 Sep 2014 01:21:45 +0000 (21:21 -0400)]
doc: fix missing bracket

Signed-off-by: Alfredo Deza <alfredo.deza@inktank.com>
(cherry picked from commit 69638dfaeb0dcd96dac4b5f5c00ed08042432487)

10 years agodoc: attempt to get the ayni JS into all head tags
Alfredo Deza [Thu, 4 Sep 2014 00:47:54 +0000 (20:47 -0400)]
doc: attempt to get the ayni JS into all head tags

Signed-off-by: Alfredo Deza <alfredo.deza@inktank.com>
(cherry picked from commit 35663fa55ac1579a3b0c8b67028a3a8dfea87b48)

10 years agoFix FTBFS on alpha due to incorrect check on BLKGETSIZE
Dmitry Smirnov [Sat, 23 Aug 2014 12:41:30 +0000 (22:41 +1000)]
Fix FTBFS on alpha due to incorrect check on BLKGETSIZE

Ceph FTBFS on Alpha with:

~~~~
libtool: compile:  g++ -DHAVE_CONFIG_H -I. -D__CEPH__ -D_FILE_OFFSET_BITS=64 -D_REENTRANT -D_THREAD_SAFE -D__STDC_FORMAT_MACROS -D_GNU_SOURCE -DCEPH_LIBDIR=\"/usr/lib/alpha-linux-gnu\" -DCEPH_PKGLIBDIR=\"/usr/lib/alpha-linux-gnu/ceph\" -DGTEST_HAS_TR1_TUPLE=0 -D_FORTIFY_SOURCE=2 -I/usr/include/nss -I/usr/include/nspr -Wall -Wtype-limits -Wignored-qualifiers -Winit-self -Wpointer-arith -Werror=format-security -fno-strict-aliasing -fsigned-char -rdynamic -ftemplate-depth-1024 -Wnon-virtual-dtor -Wno-invalid-offsetof -Wstrict-null-sentinel -g -O2 -Wformat -Werror=format-security -c common/blkdev.cc  -fPIC -DPIC -o common/.libs/blkdev.o
In file included from /usr/include/alpha-linux-gnu/asm/ioctls.h:4:0,
                 from /usr/include/alpha-linux-gnu/bits/ioctls.h:23,
                 from /usr/include/alpha-linux-gnu/sys/ioctl.h:26,
                 from common/blkdev.cc:3:
common/blkdev.cc:13:7: error: missing binary operator before token "int"
 #elif BLKGETSIZE
       ^
~~~~

This error occurs because the value of BLKGETSIZE is tested in a
c-preprocessor conditional compilation test whereas the test should
be for existence.

From: Michael Cree <mcree@orcon.net.nz>
Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=756892
Signed-off-by: Dmitry Smirnov <onlyjob@member.fsf.org>
(cherry picked from commit 6ad8e61a428cfc9fc60ccdb9bce812e1f49822ac)
Reviewed-by: Greg Farnum <greg@inktank.com>
10 years agoMerge pull request #2356 from dachary/wip-9273-mon-preload-erasure-code-firefly
Sage Weil [Sat, 30 Aug 2014 00:31:29 +0000 (17:31 -0700)]
Merge pull request #2356 from dachary/wip-9273-mon-preload-erasure-code-firefly

erasure-code: preload the default plugins in the mon (firefly)

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoosd: OSDMap: ordered blacklist on non-classic encode function
Joao Eduardo Luis [Fri, 29 Aug 2014 19:21:25 +0000 (20:21 +0100)]
osd: OSDMap: ordered blacklist on non-classic encode function

Fixes: #9211
Backport: firefly

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 81102044f417bd99ca570d9234b1df5195e9a8c9)

10 years agoosd/OSDMap: encode blacklist in deterministic order
Sage Weil [Tue, 26 Aug 2014 15:16:29 +0000 (08:16 -0700)]
osd/OSDMap: encode blacklist in deterministic order

When we use an unordered_map the encoding order is non-deterministic,
which is problematic for OSDMap.  Construct an ordered map<> on encode
and use that.  This lets us keep the hash table for lookups in the general
case.

Fixes: #9211
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 4672e50922b75d642056020b9745a3a5844424d3)

10 years agoerasure-code: preload the default plugins in the mon 2356/head
Loic Dachary [Fri, 29 Aug 2014 16:13:08 +0000 (18:13 +0200)]
erasure-code: preload the default plugins in the mon

The commit 164f1a1959a863848319585fa752250c7b261381 preloads the
jerasure plugin in the OSD. They must also be preloaded in the mon for
the same reasons.

http://tracker.ceph.com/issues/9273 Fixes: #9273

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
10 years agomds: fix FP error in ROUND_UP_TO
John Spray [Tue, 26 Aug 2014 16:36:16 +0000 (17:36 +0100)]
mds: fix FP error in ROUND_UP_TO

Explicitly handle case where denominator is 0, instead of
passing into ROUND_UP_TO.

Regression from 9449520b121fc6ce0c64948386d4ff77f46f4f5f

Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit bf3e4835dabc057982def1b5c9a6499c04ac5312)

10 years agomon: generate cluster_fingerprint if null
Sage Weil [Thu, 21 Aug 2014 18:14:39 +0000 (11:14 -0700)]
mon: generate cluster_fingerprint if null

This triggers after an upgrade of a legacy cluster that has no fingerprint.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit b245d600163f6337af15aedd1fea68f4e2a668a8)

10 years agomon: add a cluster fingerprint
Sage Weil [Wed, 20 Aug 2014 15:59:46 +0000 (08:59 -0700)]
mon: add a cluster fingerprint

Generate it on cluster creations with the initial monmap.  Include it in
the report.  Provide no way for this uuid to be fed in to the cluster
(intentionally or not) so that it can be assumed to be a truly unique
identifier for the cluster.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 675b0042eff0ad5e1453838410210b1206c39004)

10 years agoMerge pull request #2244 from dachary/wip-9044-use-ruleset-firefly
Sage Weil [Tue, 26 Aug 2014 20:13:08 +0000 (13:13 -0700)]
Merge pull request #2244 from dachary/wip-9044-use-ruleset-firefly

erasure-code: OSDMonitor::crush_ruleset_create_erasure needs ruleset (firefly)

10 years agoReplicatedPG::cancel_copy: clear cop->obc
Samuel Just [Tue, 12 Aug 2014 23:41:38 +0000 (16:41 -0700)]
ReplicatedPG::cancel_copy: clear cop->obc

Otherwise, an objecter callback might still be hanging
onto this reference until after the flush.

Fixes: #8894
Introduced: 589b639af7c8834a1e6293d58d77a9c440107bc3
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 5040413054e923d6d5a2b4928162dba140d980e0)

10 years agoMerge pull request #2203 from ceph/wip-scrub-firefly
Samuel Just [Tue, 26 Aug 2014 17:30:14 +0000 (10:30 -0700)]
Merge pull request #2203 from ceph/wip-scrub-firefly

backport scrub throttling to firefly

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoos/FileStore: fix mount/remount force_sync race
Sage Weil [Sat, 16 Aug 2014 19:42:33 +0000 (12:42 -0700)]
os/FileStore: fix mount/remount force_sync race

Consider:

 - mount
 - sync_entry is doing some work
 - umount
   - set force_sync = true
   - set done = true
 - sync_entry exits (due to done)
   - ..but does not set force_sync = false
 - mount
 - journal replay starts
 - sync_entry sees force_sync and does a commit while op_seq == 0
 ...crash...

Fixes: #9144
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit dd11042f969b94f7a461d02e1475794031c79f61)

Conflicts:
src/os/FileStore.cc

10 years agoAdd random_cache.hpp to Makefile.am
Haomai Wang [Thu, 10 Jul 2014 02:32:17 +0000 (10:32 +0800)]
Add random_cache.hpp to Makefile.am

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
(cherry picked from commit a3e5c6d632119febd2150944a6f2cbce33cfda3a)

10 years agoos/KeyValueStore, MemStore: fix warning
Sage Weil [Tue, 26 Aug 2014 13:42:12 +0000 (06:42 -0700)]
os/KeyValueStore, MemStore: fix warning

os/MemStore.cc: In member function 'void MemStore::_do_transaction(ObjectStore::Transaction&)':
os/MemStore.cc:956:18: warning: unused variable 'expected_object_size' [-Wunused-variable]
os/MemStore.cc:957:18: warning: unused variable 'expected_write_size' [-Wunused-variable]
os/KeyValueStore.cc: In member function 'unsigned int KeyValueStore::_do_transaction(ObjectStore::Transaction&, KeyValueStore::BufferTransaction&, ThreadPool::TPHandle*)':
os/KeyValueStore.cc:1426:18: warning: unused variable 'expected_object_size' [-Wunused-variable]
os/KeyValueStore.cc:1427:18: warning: unused variable 'expected_write_size' [-Wunused-variable]

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoosd: automatically scrub PGs with invalid stats
Sage Weil [Tue, 29 Apr 2014 18:23:58 +0000 (11:23 -0700)]
osd: automatically scrub PGs with invalid stats

If a PG has recnetly split and has invalid stats, scrub it now, even if
it has scrubbed recently.  This helps the stats become valid again soon.

Fixes: #8147
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 68b440d66539e820c9ce86a6942c3188be4ee1ec)

10 years agoMerge pull request #2328 from dachary/wip-9209-round-up-to-firefly
Sage Weil [Tue, 26 Aug 2014 13:38:34 +0000 (06:38 -0700)]
Merge pull request #2328 from dachary/wip-9209-round-up-to-firefly

common: ROUND_UP_TO accepts any rounding factor (firefly)

10 years agoMerge pull request #2326 from yuyuyu101/wip-kvstore-firefly
Sage Weil [Tue, 26 Aug 2014 13:09:17 +0000 (06:09 -0700)]
Merge pull request #2326 from yuyuyu101/wip-kvstore-firefly

Backport from master to Firefly(KeyValueStore)

10 years agocommon: ROUND_UP_TO accepts any rounding factor 2328/head
Loic Dachary [Mon, 25 Aug 2014 15:05:04 +0000 (17:05 +0200)]
common: ROUND_UP_TO accepts any rounding factor

The ROUND_UP_TO function was limited to rounding factors that are powers
of two. This saves a modulo but it is not used where it would make a
difference. The implementation is changed so it is generic.

http://tracker.ceph.com/issues/9209 Fixes: #9209

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit 9449520b121fc6ce0c64948386d4ff77f46f4f5f)

10 years agoRemove exclusive lock on GenericObjectMap 2326/head
Haomai Wang [Thu, 20 Mar 2014 06:09:49 +0000 (14:09 +0800)]
Remove exclusive lock on GenericObjectMap

Now most of GenericObjectMap interfaces use header as argument not the union of
coll_t and ghobject_t. So caller should be responsible for maintain the
exclusive header.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
10 years agocommon/RandomCache: Fix inconsistence between contents and count
Haomai Wang [Wed, 23 Jul 2014 03:26:18 +0000 (11:26 +0800)]
common/RandomCache: Fix inconsistence between contents and count

The add/clear method may cause count inconsistent with the real size of
contents.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
10 years agoAdd random cache and replace SharedLRU in KeyValueStore
Haomai Wang [Tue, 26 Aug 2014 04:41:28 +0000 (04:41 +0000)]
Add random cache and replace SharedLRU in KeyValueStore

SharedLRU plays pool performance in KeyValueStore with large header cache size,
so a performance optimized RandomCache could improve it.

RandomCache will record the lookup frequency of key. When evictint element,
it will randomly compare several elements's frequency and evict the least
one.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
Conflicts:

src/common/config_opts.h
src/os/KeyValueStore.cc

10 years agoAdd Header cache to KeyValueStore
Haomai Wang [Tue, 26 Aug 2014 04:40:16 +0000 (04:40 +0000)]
Add Header cache to KeyValueStore

In the performance statistic recently, the header lookup becomes the main time
consuming for the read/write operations. Most of time it occur 50% to deal with
header lookup, decode/encode logics.

Now adding header cache using SharedLRU structure which will maintain the header
cache and caller will get the pointer to the real header. It also avoid too much
header copy operations overhead.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
Conflicts:

src/os/KeyValueStore.cc
src/os/KeyValueStore.h

10 years agoFix write operation on a deleted object in the same transaction
Haomai Wang [Wed, 26 Feb 2014 09:46:07 +0000 (17:46 +0800)]
Fix write operation on a deleted object in the same transaction

If the following op happened:
touch obj
delete obj
write obj

KeyValueStore will fail at "write" operation.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
10 years agoRemove SequencerPosition from KeyValueStore
Haomai Wang [Tue, 26 Aug 2014 04:35:57 +0000 (04:35 +0000)]
Remove SequencerPosition from KeyValueStore

Now KeyValueStore expects kv backend to ensure consistency and there is unusable
for KeyValueStore to store a SequencerPosition.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
Conflicts:

src/os/KeyValueStore.cc
src/os/KeyValueStore.h

10 years agoFix keyvaluestore fiemap bug
Haomai Wang [Wed, 4 Jun 2014 04:58:07 +0000 (12:58 +0800)]
Fix keyvaluestore fiemap bug

The result of fiemap is wrong and the offset get from
"StripObjectMap::file_to_extents" need to multiply by sequence number

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
10 years agomon: fix occasional message leak after session reset
Sage Weil [Thu, 21 Aug 2014 20:05:35 +0000 (13:05 -0700)]
mon: fix occasional message leak after session reset

Consider:

 - we get a message, put it on a wait list
 - the client session resets
 - we go back to process the message later and discard
   - _ms_dispatch returns false, but nobody drops the msg ref

Since we call _ms_dispatch() a lot internally, we need to always return
true when we are an internal caller.

Fixes: #9176
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 19df386b2d36d716be2e6d02de0386fac9e7bc1f)

10 years agoMerge pull request #2298 from dachary/wip-9153-jerasure-upgrade-firefly
Sage Weil [Thu, 21 Aug 2014 17:14:18 +0000 (10:14 -0700)]
Merge pull request #2298 from dachary/wip-9153-jerasure-upgrade-firefly

erasure-code: preload the jerasure plugin variant (sse4,sse3,generic)

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoerasure-code: preload the jerasure plugin variant (sse4,sse3,generic) 2298/head
Loic Dachary [Thu, 21 Aug 2014 12:41:55 +0000 (14:41 +0200)]
erasure-code: preload the jerasure plugin variant (sse4,sse3,generic)

The preloading of the jerasure plugin ldopen the plugin that is in
charge of selecting the variant optimized for the
CPU (sse4,sse3,generic). The variant plugin itself is not loaded because
it does not happen at load() but when the factory() method is called.

The JerasurePlugin::preload method is modified to call the factory()
method to load jerasure_sse4 or jerasure_sse3 or jerasure_generic as a
side effect.

Indirectly loading another plugin in the factory() method is error prone
and should be moved to the load() method instead. This change should be
done in a separate commit.

http://tracker.ceph.com/issues/9153 Fixes: #9153

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
10 years agoFix set_alloc_hint op cause KeyValueStore crash problem
Haomai Wang [Tue, 20 May 2014 06:32:18 +0000 (14:32 +0800)]
Fix set_alloc_hint op cause KeyValueStore crash problem

Now KeyValueStore doesn't support set_alloc_hit op, the implementation of
_do_transaction need to consider decoding the arguments. Otherwise, the
arguments will be regarded as the next op.

Fix the same problem for MemStore.

Fix #8381

Reported-by: Xinxin Shu <xinxin.shu5040@gmail.com>
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
(cherry picked from commit c08adbc98ff5f380ecd215f8bd9cf3cab214913c)

10 years agoMerge pull request #2286 from dachary/wip-9153-jerasure-upgrade-firefly
Sage Weil [Wed, 20 Aug 2014 17:10:08 +0000 (10:10 -0700)]
Merge pull request #2286 from dachary/wip-9153-jerasure-upgrade-firefly

erasure-code: preload the jerasure plugin (firefly)

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoerasure-code: preload the jerasure plugin 2286/head
Loic Dachary [Mon, 18 Aug 2014 23:30:15 +0000 (01:30 +0200)]
erasure-code: preload the jerasure plugin

Load the jerasure plugin when ceph-osd starts to avoid the following
scenario:

* ceph-osd-v1 is running but did not load jerasure

* ceph-osd-v2 is installed being installed but takes time : the files
  are installed before ceph-osd is restarted

* ceph-osd-v1 is required to handle an erasure coded placement group and
  loads jerasure (the v2 version which is not API compatible)

* ceph-osd-v1 calls the v2 jerasure plugin and does not reference the
  expected part of the code and crashes

Although this problem shows in the context of teuthology, it is unlikely
to happen on a real cluster because it involves upgrading immediately
after installing and running an OSD. Once it is backported to firefly,
it will not even happen in teuthology tests because the upgrade from
firefly to master will use the firefly version including this fix.

While it would be possible to walk the plugin directory and preload
whatever it contains, that would not work for plugins such as jerasure
that load other plugins depending on the CPU features, or even plugins
such as isa which only work on specific CPU.

http://tracker.ceph.com/issues/9153 Fixes: #9153

Backport: firefly
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit 9b802701f78288ba4f706c65b853415c69002d27)

Conflicts:
src/test/erasure-code/test-erasure-code.sh
src/common/config_opts.h

10 years agoWork around an apparent binding bug (GCC 4.8).
Matt Benjamin [Thu, 29 May 2014 14:34:20 +0000 (10:34 -0400)]
Work around an apparent binding bug (GCC 4.8).

A reference to h->seq passed to std::pair ostensibly could not bind
because the header structure is packed.  At first this looked like
a more general unaligned access problem, but the only location the
compiler rejects is a false positive.

Signed-off-by: Matt Benjamin <matt@linuxbox.com>
(cherry picked from commit c930a1f119069a424af28a618b0abff4947c221f)

10 years agoqa/workunits/rbd/qemu-iotests: touch common.env
Sage Weil [Mon, 18 Aug 2014 03:54:28 +0000 (20:54 -0700)]
qa/workunits/rbd/qemu-iotests: touch common.env

This seems to be necessary on trusty.

Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 055be68cf8e1b84287ab3631a02e89a9f3ae6cca)

10 years agounittest_strtol: fix compilation warning
Joao Eduardo Luis [Fri, 23 May 2014 15:52:08 +0000 (16:52 +0100)]
unittest_strtol: fix compilation warning

Was fixed in master by a4923f5bc373d530d1ffdf6c58a4d88139daedd2

Signed-off-by: Sage Weil <sage@redhat.com>
10 years ago Fix EINVAL err when use "ceph tell osd.* bench"
huangjun [Tue, 17 Jun 2014 05:12:58 +0000 (13:12 +0800)]
  Fix EINVAL err when use "ceph tell osd.* bench"

Signed-off-by: huangjun <hjwsm1989@gmail.com>
(cherry picked from commit 7dc93a9651f602d9c46311524fc6b54c2f1ac595)

10 years agoqa/workunits/cephtool/test.sh: fix get erasure_code_profile test
Ma Jianpeng [Thu, 17 Jul 2014 00:48:34 +0000 (17:48 -0700)]
qa/workunits/cephtool/test.sh: fix get erasure_code_profile test

Manual backport of 4d6899c7560e990650959b442980a7249f0ba4c1

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agomon: OSDMonitor: add 'osd pool get-quota' command
Joao Eduardo Luis [Fri, 27 Jun 2014 20:41:18 +0000 (21:41 +0100)]
mon: OSDMonitor: add 'osd pool get-quota' command

Enables us to obtain current quotas for a given pool.

Fixes: #8523
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 714a9bb5a058b2553f3be3e4cfb7e7f30150e75a)

10 years agomon: name instead of id in "has tiers" message
John Spray [Tue, 3 Jun 2014 09:12:41 +0000 (10:12 +0100)]
mon: name instead of id in "has tiers" message

Instead of "Pool foo has tiers 1 2" print
"Pool foo has tiers bar baz".

Signed-off-by: John Spray <jspray@redhat.com>
(cherry picked from commit 97772c2f53f726bd71710d0d3e34159d2679390a)

10 years agocommon/config.cc: allow integer values to be parsed as SI units
Joao Eduardo Luis [Fri, 23 May 2014 16:01:38 +0000 (17:01 +0100)]
common/config.cc: allow integer values to be parsed as SI units

We are allowing this for all and any integer values; that is, OPT_INT,
OPT_LONGLONG, OPT_U32 and OPT_U64.

It's on the user to use appropriate units.  For instance, the user should
not use 'E(xabyte)' when setting a signed int, and use his best judgment
when setting options that, for instance, ought to receive seconds.

Fixes: 8265
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 5500437e064cd6b4b45d63ee9396193df87f4d44)

10 years agotest/strtol.cc: Test 'strict_strtosi()'
Joao Eduardo Luis [Fri, 23 May 2014 15:52:08 +0000 (16:52 +0100)]
test/strtol.cc: Test 'strict_strtosi()'

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 40587d4792fd55db72d33870aae8b6a806c9baaf)

10 years agocommon/strtol.cc: strict_strtosi() converts str with SI units to uint64_t
Joao Eduardo Luis [Fri, 23 May 2014 15:51:37 +0000 (16:51 +0100)]
common/strtol.cc: strict_strtosi() converts str with SI units to uint64_t

Accepts values with a suffix (B, K, M, G, T, P, E) and returns the
appropriate byte value.

E.g., 10B = 10, while 10K = 10240.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 67dc5751ba9a4e527ff12ea65000d1ba45d956f6)

10 years agoceph-disk: linter cleanup
Alfredo Deza [Wed, 13 Aug 2014 19:50:20 +0000 (15:50 -0400)]
ceph-disk: linter cleanup

Signed-off-by: Alfredo Deza <alfredo.deza@inktank.com>
(cherry picked from commit d74ed9d53fab95f27a9ad8e9f5dab7192993f6a3)

10 years agoceph-disk: warn about falling back to sgdisk (once)
Sage Weil [Wed, 13 Aug 2014 19:00:50 +0000 (12:00 -0700)]
ceph-disk: warn about falling back to sgdisk (once)

This way the user knows something funny might be up if dmcrypt is in use.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 6f7798e37e098de38fbc73f86c4c6ee705abbe38)

10 years agoceph-disk: only fall back to sgdisk for 'list' if blkid seems old
Sage Weil [Wed, 13 Aug 2014 18:40:34 +0000 (11:40 -0700)]
ceph-disk: only fall back to sgdisk for 'list' if blkid seems old

If the blkid doesn't show us any ID_PART_ENTRY_* fields but we know it is
a GPT partition, *then* fallback.  Otherwise, don't bother.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit b1651afb34d9d2c324db3bf5f54ac9ce001c6af9)

10 years agoceph-disk: add get_partition_base() helper
Sage Weil [Wed, 13 Aug 2014 18:39:47 +0000 (11:39 -0700)]
ceph-disk: add get_partition_base() helper

Return the base devices/disk for a partition device.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit b75e8a340c49cbc067baa19790b994a5f904bb4f)

10 years agoceph-disk: display information about dmcrypted data and journal volumes
Sage Weil [Wed, 13 Aug 2014 00:26:07 +0000 (17:26 -0700)]
ceph-disk: display information about dmcrypted data and journal volumes

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit c7a1ceba441fa99a82e19ed2cd3c6782a5d77636)

10 years agoceph-disk: move fs mount probe into a helper
Sage Weil [Wed, 13 Aug 2014 00:25:42 +0000 (17:25 -0700)]
ceph-disk: move fs mount probe into a helper

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit f80ed26d2403ba12e80da6459fc45c22584f72de)

10 years agoceph-disk: use partition type UUIDs, and blkid
Sage Weil [Wed, 13 Aug 2014 00:25:10 +0000 (17:25 -0700)]
ceph-disk: use partition type UUIDs, and blkid

Use blkid to give us the GPT partition type.  This lets us distinguish
between dmcrypt and non-dmcrypt partitions.  Fake it if blkid doesn't
give us what we want and try with sgdisk.  This isn't perfect (it can't
tell between dmcrypt and not dmcrypt), but such is life, and we are better
off than before.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 6c77f5f2f994c881232d76ce9c69af80d10772bd)

10 years agoceph-disk: fix log syntax error
Sage Weil [Tue, 12 Aug 2014 20:53:16 +0000 (13:53 -0700)]
ceph-disk: fix log syntax error

  File "/usr/sbin/ceph-disk", line 303, in command_check_call
    LOG.info('Running command: %s' % ' '.join(arguments))
TypeError: sequence item 2: expected string, NoneType found

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 1088d6cd11b476cd67ed30e07edd363c4057a003)

10 years agoRevert "Fix for bug #6700"
Sage Weil [Mon, 11 Aug 2014 22:58:15 +0000 (15:58 -0700)]
Revert "Fix for bug #6700"

This reverts commit 673394702b725ff3f26d13b54d909208daa56d89.

This appears to break things when the journal and data disk are *not* the same.
And I can't seem to reproduce the original failure...

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 2edf01ffa4a7425af2691b4e94bc5fd0bfab1e5b)

10 years agoceph-disk: fix verify_no_in_use check
Sage Weil [Mon, 11 Aug 2014 22:57:52 +0000 (15:57 -0700)]
ceph-disk: fix verify_no_in_use check

We only need to verify that partitions aren't in use when we want to
consume the whole device (osd data), not when we want to create an
additional partition for ourselves (osd journal).

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit d6e6ba198efc4b3afff0c70af53497a70c6b3f19)

10 years agobetter error reporting on incompatible device requirements
Alfredo Deza [Thu, 22 May 2014 21:04:28 +0000 (17:04 -0400)]
better error reporting on incompatible device requirements

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
(cherry picked from commit 1ac3a503a15ddf7f7c1a33310a468fac10a1b7b6)

10 years agoceph-disk: fix list for encrypted or corrupt volume
Stuart Longland [Tue, 6 May 2014 21:06:36 +0000 (14:06 -0700)]
ceph-disk: fix list for encrypted or corrupt volume

Continue gracefully if an fs type is not detected, either because it is
encrypted or because it is corrupted.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 09beebe3f1fd1b179547743648049b891cb8bc56)

10 years agosupport dmcrypt partitions when activating
Alfredo Deza [Fri, 13 Jun 2014 13:37:33 +0000 (09:37 -0400)]
support dmcrypt partitions when activating

Signed-off-by: Alfredo Deza <alfredo.deza@inktank.com>
(cherry picked from commit ef8a1281512c4ee70a3764b28891da691a183804)

10 years agoinit-ceph: don't use bashism
Sage Weil [Fri, 15 Aug 2014 23:41:43 +0000 (16:41 -0700)]
init-ceph: don't use bashism

       -z STRING
              the length of STRING is zero

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 0d6d1aa7e0c5e0b5f99c9b548a1f890c511b4299)

10 years agoosd: fix feature requirement for mons
Sage Weil [Fri, 15 Aug 2014 21:28:57 +0000 (14:28 -0700)]
osd: fix feature requirement for mons

These features should be set on the client_messenger, not
cluster_messenger.

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit ae0b9f17760eda9a7e09a6babac50bfe8ebb4b36)

10 years agounittest_osdmap: test EC rule and pool features
Sage Weil [Fri, 15 Aug 2014 20:54:11 +0000 (13:54 -0700)]
unittest_osdmap: test EC rule and pool features

TODO: tiering feature bits.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 2f0e2951d773b6acce781b4b991d6d8e817ee2f9)

10 years agounittest_osdmap: create an ec pool in test osdmap
Sage Weil [Fri, 15 Aug 2014 21:04:05 +0000 (14:04 -0700)]
unittest_osdmap: create an ec pool in test osdmap

This is part of 7294e8c4df6df9d0898f82bb6e0839ed98149310.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoosd: only require crush features for rules that are actually used
Sage Weil [Fri, 15 Aug 2014 15:55:10 +0000 (08:55 -0700)]
osd: only require crush features for rules that are actually used

Often there will be a CRUSH rule present for erasure coding that uses the
new CRUSH steps or indep mode.  If these rules are not referenced by any
pool, we do not need clients to support the mapping behavior.  This is true
because the encoding has not changed; only the expected CRUSH output.

Fixes: #8963
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 16dadb86e02108e11a970252411855d84ab0a4a2)

10 years agocrush: add is_v[23]_rule(ruleid) methods
Sage Weil [Fri, 15 Aug 2014 15:52:37 +0000 (08:52 -0700)]
crush: add is_v[23]_rule(ruleid) methods

Add methods to check if a *specific* rule uses v2 or v3 features.  Refactor
the existing checks to use these.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 1d95486780a54c85a5c88936a4da4bdc3576a7b8)

10 years agoPGLog: fix clear() to avoid the IndexLog::zero() asserts
Samuel Just [Mon, 30 Jun 2014 20:40:07 +0000 (13:40 -0700)]
PGLog: fix clear() to avoid the IndexLog::zero() asserts

Introduced in:
  c5b8d8105d965da852c79add607b69d5ae79a4d4
  ac11ca40b4f4525cbe9b1778b1c5d9472ecb9efa
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 959f2b25910360b930183fbf469ce984a48542dd)

10 years agoosd: allow io priority to be set for the disk_tp 2203/head
Sage Weil [Thu, 19 Jun 2014 19:34:36 +0000 (12:34 -0700)]
osd: allow io priority to be set for the disk_tp

The disk_tp covers scrubbing, pg deletion, and snap trimming

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit d9073f486527ca13cdb2774745c4c63c218333ad)

10 years agocommon/WorkQueue: allow io priority to be set for wq
Sage Weil [Wed, 18 Jun 2014 18:02:09 +0000 (11:02 -0700)]
common/WorkQueue: allow io priority to be set for wq

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit dd6badcb5eedfec6748b3e6ca4d46e3b266038f6)

Conflicts:

src/common/WorkQueue.cc

10 years agocommon/Thread: allow io priority to be set for a Thread
Sage Weil [Wed, 18 Jun 2014 18:01:42 +0000 (11:01 -0700)]
common/Thread: allow io priority to be set for a Thread

Ideally, set this before starting the thread.  If you set it after, we
could potentially race with create() itself.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 1b8741022c5a2ebae38905215dadee696433e931)

10 years agocommon/io_priority: wrap ioprio_set() and gettid()
Sage Weil [Wed, 18 Jun 2014 18:01:09 +0000 (11:01 -0700)]
common/io_priority: wrap ioprio_set() and gettid()

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit a2b49110ef65efd526c3430ad03c988ca9dde768)

10 years agoosd: introduce simple sleep during scrub
Sage Weil [Tue, 17 Jun 2014 17:47:24 +0000 (10:47 -0700)]
osd: introduce simple sleep during scrub

This option is similar to osd_snap_trim_sleep: simply inject an optional
sleep in the thread that is doing scrub work.  This is a very kludgey and
coarse knob for limiting the impact of scrub on the cluster, but can help
until we have a more robust and elegant solution.

Only sleep if we are in the NEW_CHUNK state to avoid delaying processing of
an in-progress chunk.  In this state nothing is blocked on anything.
Conveniently, chunky_scrub() requeues itself for each new chunk.

Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c4e8451cc5b4ec5ed07e09c08fb13221e31a7ac6)

10 years agoosd: add sanity check/warning on a few key configs
Sage Weil [Sat, 14 Jun 2014 17:30:50 +0000 (10:30 -0700)]
osd: add sanity check/warning on a few key configs

Warn when certain config values are set to bad values.

Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit f3ec7d0b23fdee39a34bda7595cd2a79c08daf8a)

10 years agoosd: prevent pgs from getting too far ahead of the min pg epoch
Sage Weil [Fri, 2 May 2014 00:24:48 +0000 (17:24 -0700)]
osd: prevent pgs from getting too far ahead of the min pg epoch

Bound the range of PG epochs between the slowest and fastest pg
(epoch-wise) with 'osd map max advance'.  This value should be set to
something less than 'osd map cache size' so that the maps we are
processing will be in memory as many PGs advance forward in time in
loose synchrony.

This is part of the solution to #7576.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit cf25bdf6b0090379903981fe8cee5ea75efd7ba0)

10 years agoosd: fix pg epoch floor tracking
Sage Weil [Fri, 8 Aug 2014 00:42:06 +0000 (17:42 -0700)]
osd: fix pg epoch floor tracking

If you call erase() on a multiset it will delete all instances of a value;
we only want to delete one of them.  Fix this by passing an iterator.

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit a52a855f6c92b03dd84cd0cc1759084f070a98c2)

10 years agoosd: track per-pg epochs, min
Sage Weil [Wed, 2 Apr 2014 21:29:08 +0000 (14:29 -0700)]
osd: track per-pg epochs, min

Add some simple tracking so that we can quickly determine what the min
pg osdmap epoch is.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 81e4c47722255ac3d46f701a80e104cc390e766c)

10 years agomon: fix divide by zero when pg_num adjusted and no osds
Sage Weil [Wed, 13 Aug 2014 20:31:10 +0000 (13:31 -0700)]
mon: fix divide by zero when pg_num adjusted and no osds

Fixes: #9052
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 239401db7b51541a57c59a261b89e0f05347c32d)

10 years agoceph_test_rados_api_tier: fix cache cleanup (ec too)
Sage Weil [Sun, 10 Aug 2014 19:48:29 +0000 (12:48 -0700)]
ceph_test_rados_api_tier: fix cache cleanup (ec too)

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit d7fb7bf5f2059f411633751e376c2270e6040fba)