git.apps.os.sepia.ceph.com Git

client: cast m->get_client_tid() to compare to 16-bit Inode::flushing_cap_tid

m->get_client_tid() is 64 bits (as it should be), but Inode::flushing_cap_tid
is only 16 bits. 16 bits should be plenty to let the cap flush updates
pipeline appropriately, but we need to cast in the proper direction when
comparing these differently-sized versions. So downcast the 64-bit one
to 16 bits.

Fixes: #9869
Backport: giant, firefly, dumpling

Signed-off-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit a5184cf46a6e867287e24aeb731634828467cd98)

0.80.7

Merge remote-tracking branch 'origin/wip-7588-firefly' into firefly

Merge remote-tracking branch 'upstream/wip-9696-firefly' into firefly

osd/ReplicatedPG: carry CopyOpRef in copy_from completion

There is a race with copy_from cancellation.  The internal Objecter
completion decodes a bunch of data and copies it into pointers provided
when the op is queued.  When we cancel, we need to ensure that we can cope
until control passes back to our provided completion.

Once we *do* get into the (ReplicatedPG) callbacks, we will bail out
because the tid in the CopyOp or FlushOp no longer matches.

Fix this by carrying a ref to keep the copy-from targets alive, and
clearing out the tids that we cancel.

Note that previously, the trigger for this was that the tid changes when
we handle a redirect, which made the op_cancel() call fail.  With the
coming Objecter changes, this will no longer be the case.  However, there
are also locking and threading changes that will make cancellation racy,
so we will not be able to rely on it always preventing the callback.
Either way, this will avoid the problem.

Fixes: #7588
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 589b639af7c8834a1e6293d58d77a9c440107bc3)

PG::choose_acting: in mixed cluster case, acting may include backfill

Fixes: 9696
Backport: firefly, giant
Introduced: 92cfd370395385ca5537b5bc72220934c9f09026
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 9b18d99817c8b54e30dff45047dfe1b29871d659)

PGLog::IndexedLog::trim(): rollback_info_trimmed_to_riter may be log.rend()

Fixes: #9731
Backport: giant, firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit d458b4f0f31161f561ff98e58ed979cf20c6f588)

0.80.6

Merge pull request #2603 from dachary/wip-9620-test-mon-thrash-firefly

qa/workunits/cephtool/test.sh: fix thrash (ultimate)

Reviewed-by: Sage Weil <sage@redhat.com>

qa/workunits/cephtool/test.sh: fix thrash (ultimate)

Keep the osd trash test to ensure it is a valid command but make it a
noop by giving it a zero argument (meaning thrash 0 OSD maps).

Remove the loops that were added after the command in an attempt to wait
for the cluster to recover and not pollute the rest of the tests. Actual
testing of osd thrash would require a dedicated cluster because it the
side effects are random and it is unnecessarily difficult to ensure they
are finished.

http://tracker.ceph.com/issues/9620 Fixes: #9620

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit beade63a17db2e6fc68d1f55332d602f8f7cb93a)

Conflicts:
qa/workunits/cephtool/test.sh

Merge pull request #2576 from ceph/wip-9593

osd/ReplicatedPG: fix objecter locking in start_flush

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

osd/ReplicatedPG: fix objecter locking in start_flush

Broken in backport fd96eb62ece27f5c660429584c2ff2e058bc6e94.

Signed-off-by: Sage Weil <sage@redhat.com>

Merge pull request #2548 from dachary/wip-9547-python-rados-truncate-firefly

python radio aio_read must not truncate on \000 (firefly)

Reviewed-by: Samuel Just <sam.just@inktank.com>

Merge branch 'wip-sam-testing-firefly' into firefly

Merge remote-tracking branch 'origin/wip-9240' into wip-sam-testing-firefly

PG: wait until we've build the missing set to discover_all_missing

Fixes: #9179
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 970d9830a3a6e8568337c660fb8b4c4a60a2b3bf)

Conflicts:
src/osd/PG.cc

PG: mark_log_for_rewrite on resurrection

Fixes: #8777
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 8346e10755027e982f26bab4642334fd91cc31aa)

ReplicatedPG:start_flush send a second delete

Suppose we start with the following in the cache pool:

30:[29,21,20,15,10,4]:[22(21), 15(15,10), 4(4)]+head

The object doesn't exist at 29 or 20.

First, we flush 4 leaving the backing pool with:

3:[]+head

Then, we begin to flush 15 with a delete with snapc 4:[4] leaving the
backing pool with:

4:[4]:[4(4)]

Then, we finish flushing 15 with snapc 9:[4] with leaving the backing
pool with:

9:[4]:[4(4)]+head

Next, snaps 10 and 15 are removed causing clone 10 to be removed leaving
the cache with:

30:[29,21,20,4]:[22(21),4(4)]+head

We next begin to flush 22 by sending a delete with snapc 4(4) since
prev_snapc is 4 <---------- here is the bug

The backing pool ignores this request since 4 < 9 (ORDERSNAP) leaving it
with:

9:[4]:[4(4)]

Then, we complete flushing 22 with snapc 19:[4] leaving the backing pool
with:

19:[4]:[4(4)]+head

Then, we begin to flush head by deleting with snapc 22:[21,20,4] leaving
the backing pool with:

22[21,20,4]:[22(21,20), 4(4)]

Finally, we flush head leaving the backing pool with:

30:[29,21,20,4]:[22(21*,20*),4(4)]+head

When we go to flush clone 22, all we know is that 22 is dirty, has snaps
[21], and 4 is clean. As part of flushing 22, we need to do two things:
1) Ensure that the current head is cloned as cloneid 4 with snaps [4] by
sending a delete at snapc 4:[4].
2) Flush the data at snap sequence < 21 by sending a copyfrom with snapc
20:[20,4].

Unfortunately, it is possible that 1, 1&2, or 1 and part of the flush
process for some other now non-existent clone have already been
performed. Because of that, between 1) and 2), we need to send
a second delete ensuring that the object does not exist at 20.

Fixes: #9054
Backport: firefly
Related: 66c7439ea0888777b5cfc08bcb0fbd7bfd8653c3
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 4843fd510b33a71999cdf9c2cfa2b4c318fa80fd)

ReplicatedPG::start_flush: remove superfluous loop

Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 66c7439ea0888777b5cfc08bcb0fbd7bfd8653c3)

Merge remote-tracking branch 'origin/wip-9339' into wip-sam-testing-firefly

test: check python rados aio_read with buffers containing null

http://tracker.ceph.com/issues/9547 Refs: #9547

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit 226c0c7ac6ee95ff2c1665d4e7164e2962c0346e)

pybind: Fix aio_read handling of string buffer

Read data may contain \0, and buf.value interprerts them as string terminator.

Signed-off-by: Mohammad Salehe <salehe+dev@gmail.com>
(cherry picked from commit 8bda44ff37fd04a0fc9498fbbc22f0daf515d721)

Merge pull request #2535 from dachary/wip-9470-pidfile-firefly

daemons: write pid file even when told not to daemonize (firefly)

Reviewed-by: Sage Weil <sage@redhat.com>

daemons: write pid file even when told not to daemonize

systemd wants to run daemons in foreground, but daemons wouldn't write
out the pid file with -f. Fixed.

Signed-off-by: Alexandre Oliva <oliva@gnu.org>
(cherry picked from commit bccb0eb64891f65fd475e96b6386494044cae8c1)

PGLog::claim_log_and_clear_rollback_info: fix rollback_info_trimmed_to

We have been setting it to the old head value. This is usually
harmless since the new head will virtually always be ahead of the
old head for claim_log_and_clear_rollback_info, but can cause trouble
in some edge cases.

Fixes: #9481
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 0769310ccd4e0dceebd8ea601e8eb5c0928e0603)

Merge remote-tracking branches 'origin/wip-9497' and 'origin/wip-9482' into wip-log-crash-firefly

PG::find_best_info: let history.last_epoch_started provide a lower bound

If we find a info.history.last_epoch_started above any
info.last_epoch_started, we must be missing updates and
min_last_update_acceptable should provisionally be max().

Fixes: #9482
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>

PG::choose_acting: let the pg go down if acting is smaller than min_size

Even if the backfill peer would bring us up to min_size, we can't go
active since build_prior will not consider the interval maybe_went_rw.

Fixes: #9497
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>

librbd: fix crash using clone of flattened image

The crash occurs due to ImageCtx->parent->parent being uninitialized,
since the inital open_parent() -> open_image(parent) ->
ictx_refresh(parent) occurs before ImageCtx->parent->snap_id is set,
so refresh_parent() is not called to open an ImageCtx for the parent
of the parent. This leaves the ImageCtx->parent->parent NULL, but the
rest of ImageCtx->parent updated to point at the correct parent snapshot.

Setting the parent->snap_id earlier has some unintended side effects
currently, so for now just call refresh_parent() during
open_parent(). This is the easily backportable version of the
fix. Further patches can clean up this whole initialization process.

Fixes: #8845
Backport: firefly, dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 2545e80d274b23b6715f4d8b1f4c6b96182996fb)

init-radosgw.sysv: Support systemd for starting the gateway

When using RHEL7 the radosgw daemon needs to start under systemd.

Check for systemd running on PID 1. If it is then start
the daemon using: systemd-run -r <cmd>. pidof returns null
as it is executed too quickly, adding one second of sleep and
script reports startup correctly.

Signed-off-by: JuanJose 'JJ' Galvez <jgalvez@redhat.com>
(cherry picked from commit ddd52e87b25a6861d3b758a40d8b3693a751dc4d)

Merge pull request #2479 from ceph/wip-9444

mds: fix root and mdsdir inodes' rsubdirs

Reviewed-by: Sage Weil <sage@redhat.com>

mds: fix root and mdsdir inodes' rsubdirs

inode rstat accounts inode itself.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
(cherry picked from commit da17394941386dab88ddbfed4af2c8cb6b5eb72f)

FileStore: report l_os_j_lat as commit latency

l_os_commit_lat is actually the commit cycle latency.

Fixes: #9269
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit d165238b4ee7e925e06ca22890c1e9dac101a7da)

Objecter::_recalc_linger_op: resend for any acting set change

Fixes: #9220
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 1349383ac416673cb6df2438729fd2182876a7d1)

Conflicts:

src/osdc/Objecter.cc
src/osdc/Objecter.h

osdc/Objecter: revoke rx_buffer on op_cancel

If we cancel a read, revoke the rx buffers to avoid a use-after-free and/or
other undefined badness by using user buffers that may no longer be
present.

Fixes: #9362
Backport: firefly, dumpling
Reported-by: Matthias Kiefer <matthias.kiefer@1und1.de>
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 2305b2897acba38384358c33ca3bbfcae6f1c74e)

(adjusted for op->con instead of s->con)

ceph_test_rados_api_io: add read timeout test

Verify we don't receive data after a timeout.

Based on reproducer for #9362 written by
Matthias Kiefer <matthias.kiefer@1und1.de>.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit f295c1fee4afb9447cdf46f05a44234274d23b6c)

ceph_test_rados_api_*: expose nspace

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 977d289055d69ab8a7baaf7ef68c013019225833)

Revert "PG: mark_log_for_rewrite on resurrection"

Actually, we don't want to backport this one without the fix
for #9293.

This reverts commit 7ddf0a252bb887553b29fd93e58d01cac38835e6.

ReplicatedPG: create max hitset size

Otherwise, hit_set_create could create an unbounded size hitset
object.

Fixes: #9339
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>

PG::can_discard_op: do discard old subopreplies

Otherwise, a sub_op_reply from a previous interval can stick around
until we either one day go active again and get rid of it or delete the
pg which is holding it on its waiting_for_active list. While it sticks
around futily waiting for the pg to once more go active, it will cause
harmless slow request warnings.

Fixes: #9259
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit ae3d87348ca4e2dde809c9593b0d54ce0469f7a0)

PG: mark_log_for_rewrite on resurrection

Fixes: #8777
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 8346e10755027e982f26bab4642334fd91cc31aa)

debian: only B-R yasm on amd64

Make yasm dependency amd64 only, it isn?t used elsewhere
but breaks x32 (which is mis-detected as amd64)

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 9ab46dc5b49219aa6194861c393c938f23001c52)

osd: fix osd_tp shutdown

We need to clear the queue, not just drain the currently executing jobs.

Fixes: #9218
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit c2f21c04207b9a2a65e514994a775632b36d6874)

Conflicts:

src/osd/OSD.cc

osd/PG: fix crash from second backfill reservation rejection

If we get more than one reservation rejection we should ignore them; when
we got the first we already sent out cancellations. More importantly, we
should not crash.

Fixes: #8863
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 2b13de16c522754e30a0a55fb9d072082dac455e)

mon/Paxos: don't spam log with is_readable at dout level 1

Backport: firefly, dumpling
Reported-by: Aanchal Agrawal <Aanchal.Agrawal@sandisk.com>
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 62ca27d0b119b597ebad40dde64c4d86599e466d)

doc: add note on soft JS dependency for navigating docs

Signed-off-by: Alfredo Deza <alfredo.deza@inktank.com>
(cherry picked from commit 657be818375bea2d8b5998ea1e5505eedc2f294d)

doc: fix missing bracket

Signed-off-by: Alfredo Deza <alfredo.deza@inktank.com>
(cherry picked from commit 69638dfaeb0dcd96dac4b5f5c00ed08042432487)

doc: attempt to get the ayni JS into all head tags

Signed-off-by: Alfredo Deza <alfredo.deza@inktank.com>
(cherry picked from commit 35663fa55ac1579a3b0c8b67028a3a8dfea87b48)

Fix FTBFS on alpha due to incorrect check on BLKGETSIZE

Ceph FTBFS on Alpha with:

~~~~
libtool: compile:  g++ -DHAVE_CONFIG_H -I. -D__CEPH__ -D_FILE_OFFSET_BITS=64 -D_REENTRANT -D_THREAD_SAFE -D__STDC_FORMAT_MACROS -D_GNU_SOURCE -DCEPH_LIBDIR=\"/usr/lib/alpha-linux-gnu\" -DCEPH_PKGLIBDIR=\"/usr/lib/alpha-linux-gnu/ceph\" -DGTEST_HAS_TR1_TUPLE=0 -D_FORTIFY_SOURCE=2 -I/usr/include/nss -I/usr/include/nspr -Wall -Wtype-limits -Wignored-qualifiers -Winit-self -Wpointer-arith -Werror=format-security -fno-strict-aliasing -fsigned-char -rdynamic -ftemplate-depth-1024 -Wnon-virtual-dtor -Wno-invalid-offsetof -Wstrict-null-sentinel -g -O2 -Wformat -Werror=format-security -c common/blkdev.cc  -fPIC -DPIC -o common/.libs/blkdev.o
In file included from /usr/include/alpha-linux-gnu/asm/ioctls.h:4:0,
                 from /usr/include/alpha-linux-gnu/bits/ioctls.h:23,
                 from /usr/include/alpha-linux-gnu/sys/ioctl.h:26,
                 from common/blkdev.cc:3:
common/blkdev.cc:13:7: error: missing binary operator before token "int"
#elif BLKGETSIZE
       ^
~~~~

This error occurs because the value of BLKGETSIZE is tested in a
c-preprocessor conditional compilation test whereas the test should
be for existence.

From: Michael Cree <mcree@orcon.net.nz>
Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=756892
Signed-off-by: Dmitry Smirnov <onlyjob@member.fsf.org>
(cherry picked from commit 6ad8e61a428cfc9fc60ccdb9bce812e1f49822ac)
Reviewed-by: Greg Farnum <greg@inktank.com>

Merge pull request #2356 from dachary/wip-9273-mon-preload-erasure-code-firefly

erasure-code: preload the default plugins in the mon (firefly)

Reviewed-by: Sage Weil <sage@redhat.com>

osd: OSDMap: ordered blacklist on non-classic encode function

Fixes: #9211
Backport: firefly

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 81102044f417bd99ca570d9234b1df5195e9a8c9)

osd/OSDMap: encode blacklist in deterministic order

When we use an unordered_map the encoding order is non-deterministic,
which is problematic for OSDMap. Construct an ordered map<> on encode
and use that. This lets us keep the hash table for lookups in the general
case.

Fixes: #9211
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 4672e50922b75d642056020b9745a3a5844424d3)

erasure-code: preload the default plugins in the mon

The commit 164f1a1959a863848319585fa752250c7b261381 preloads the
jerasure plugin in the OSD. They must also be preloaded in the mon for
the same reasons.

http://tracker.ceph.com/issues/9273 Fixes: #9273

Signed-off-by: Loic Dachary <loic-201408@dachary.org>

mds: fix FP error in ROUND_UP_TO

Explicitly handle case where denominator is 0, instead of
passing into ROUND_UP_TO.

Regression from 9449520b121fc6ce0c64948386d4ff77f46f4f5f

Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit bf3e4835dabc057982def1b5c9a6499c04ac5312)

mon: generate cluster_fingerprint if null

This triggers after an upgrade of a legacy cluster that has no fingerprint.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit b245d600163f6337af15aedd1fea68f4e2a668a8)

mon: add a cluster fingerprint

Generate it on cluster creations with the initial monmap. Include it in
the report. Provide no way for this uuid to be fed in to the cluster
(intentionally or not) so that it can be assumed to be a truly unique
identifier for the cluster.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 675b0042eff0ad5e1453838410210b1206c39004)

Merge pull request #2244 from dachary/wip-9044-use-ruleset-firefly

erasure-code: OSDMonitor::crush_ruleset_create_erasure needs ruleset (firefly)

ReplicatedPG::cancel_copy: clear cop->obc

Otherwise, an objecter callback might still be hanging
onto this reference until after the flush.

Fixes: #8894
Introduced: 589b639af7c8834a1e6293d58d77a9c440107bc3
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 5040413054e923d6d5a2b4928162dba140d980e0)

PG: recover from each osd at most once

Signed-off-by: Samuel Just <sam.just@inktank.com>

PG: make the reservation sets more descriptively named

These sets won't precisely be the backfill_targets or actingbackfill
shortly.

%s/sorted_backfill_set/remote_shards_to_reserve_backfill/g
%s/acting_osd_it/remote_recovery_reservation_it/g
%s/sorted_actingbackfill_set/remote_shards_to_reserve_recovery/g

Signed-off-by: Samuel Just <sam.just@inktank.com>

Merge pull request #2203 from ceph/wip-scrub-firefly

backport scrub throttling to firefly

Reviewed-by: Samuel Just <sam.just@inktank.com>

os/FileStore: fix mount/remount force_sync race

Consider:

- mount
- sync_entry is doing some work
- umount
   - set force_sync = true
   - set done = true
- sync_entry exits (due to done)
   - ..but does not set force_sync = false
- mount
- journal replay starts
- sync_entry sees force_sync and does a commit while op_seq == 0
...crash...

Fixes: #9144
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit dd11042f969b94f7a461d02e1475794031c79f61)

Conflicts:
src/os/FileStore.cc

Add random_cache.hpp to Makefile.am

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
(cherry picked from commit a3e5c6d632119febd2150944a6f2cbce33cfda3a)

os/KeyValueStore, MemStore: fix warning

os/MemStore.cc: In member function 'void MemStore::_do_transaction(ObjectStore::Transaction&)':
os/MemStore.cc:956:18: warning: unused variable 'expected_object_size' [-Wunused-variable]
os/MemStore.cc:957:18: warning: unused variable 'expected_write_size' [-Wunused-variable]
os/KeyValueStore.cc: In member function 'unsigned int KeyValueStore::_do_transaction(ObjectStore::Transaction&, KeyValueStore::BufferTransaction&, ThreadPool::TPHandle*)':
os/KeyValueStore.cc:1426:18: warning: unused variable 'expected_object_size' [-Wunused-variable]
os/KeyValueStore.cc:1427:18: warning: unused variable 'expected_write_size' [-Wunused-variable]

Signed-off-by: Sage Weil <sage@redhat.com>

osd: automatically scrub PGs with invalid stats

If a PG has recnetly split and has invalid stats, scrub it now, even if
it has scrubbed recently. This helps the stats become valid again soon.

Fixes: #8147
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 68b440d66539e820c9ce86a6942c3188be4ee1ec)

Merge pull request #2328 from dachary/wip-9209-round-up-to-firefly

common: ROUND_UP_TO accepts any rounding factor (firefly)

Merge pull request #2326 from yuyuyu101/wip-kvstore-firefly

Backport from master to Firefly(KeyValueStore)

common: ROUND_UP_TO accepts any rounding factor

The ROUND_UP_TO function was limited to rounding factors that are powers
of two. This saves a modulo but it is not used where it would make a
difference. The implementation is changed so it is generic.

http://tracker.ceph.com/issues/9209 Fixes: #9209

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit 9449520b121fc6ce0c64948386d4ff77f46f4f5f)

Remove exclusive lock on GenericObjectMap

Now most of GenericObjectMap interfaces use header as argument not the union of
coll_t and ghobject_t. So caller should be responsible for maintain the
exclusive header.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>

common/RandomCache: Fix inconsistence between contents and count

The add/clear method may cause count inconsistent with the real size of
contents.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>

Add random cache and replace SharedLRU in KeyValueStore

SharedLRU plays pool performance in KeyValueStore with large header cache size,
so a performance optimized RandomCache could improve it.

RandomCache will record the lookup frequency of key. When evictint element,
it will randomly compare several elements's frequency and evict the least
one.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
Conflicts:

src/common/config_opts.h
src/os/KeyValueStore.cc

Add Header cache to KeyValueStore

In the performance statistic recently, the header lookup becomes the main time
consuming for the read/write operations. Most of time it occur 50% to deal with
header lookup, decode/encode logics.

Now adding header cache using SharedLRU structure which will maintain the header
cache and caller will get the pointer to the real header. It also avoid too much
header copy operations overhead.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
Conflicts:

src/os/KeyValueStore.cc
src/os/KeyValueStore.h

Fix write operation on a deleted object in the same transaction

If the following op happened:
touch obj
delete obj
write obj

KeyValueStore will fail at "write" operation.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>

Remove SequencerPosition from KeyValueStore

Now KeyValueStore expects kv backend to ensure consistency and there is unusable
for KeyValueStore to store a SequencerPosition.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
Conflicts:

src/os/KeyValueStore.cc
src/os/KeyValueStore.h

Fix keyvaluestore fiemap bug

The result of fiemap is wrong and the offset get from
"StripObjectMap::file_to_extents" need to multiply by sequence number

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>

mon: fix occasional message leak after session reset

Consider:

- we get a message, put it on a wait list
- the client session resets
- we go back to process the message later and discard
- _ms_dispatch returns false, but nobody drops the msg ref

Since we call _ms_dispatch() a lot internally, we need to always return
true when we are an internal caller.

Fixes: #9176
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 19df386b2d36d716be2e6d02de0386fac9e7bc1f)

Merge pull request #2298 from dachary/wip-9153-jerasure-upgrade-firefly

erasure-code: preload the jerasure plugin variant (sse4,sse3,generic)

Reviewed-by: Sage Weil <sage@redhat.com>

erasure-code: preload the jerasure plugin variant (sse4,sse3,generic)

The preloading of the jerasure plugin ldopen the plugin that is in
charge of selecting the variant optimized for the
CPU (sse4,sse3,generic). The variant plugin itself is not loaded because
it does not happen at load() but when the factory() method is called.

The JerasurePlugin::preload method is modified to call the factory()
method to load jerasure_sse4 or jerasure_sse3 or jerasure_generic as a
side effect.

Indirectly loading another plugin in the factory() method is error prone
and should be moved to the load() method instead. This change should be
done in a separate commit.

http://tracker.ceph.com/issues/9153 Fixes: #9153

Signed-off-by: Loic Dachary <loic-201408@dachary.org>

Fix set_alloc_hint op cause KeyValueStore crash problem

Now KeyValueStore doesn't support set_alloc_hit op, the implementation of
_do_transaction need to consider decoding the arguments. Otherwise, the
arguments will be regarded as the next op.

Fix the same problem for MemStore.

Fix #8381

Reported-by: Xinxin Shu <xinxin.shu5040@gmail.com>
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
(cherry picked from commit c08adbc98ff5f380ecd215f8bd9cf3cab214913c)

Merge pull request #2286 from dachary/wip-9153-jerasure-upgrade-firefly

erasure-code: preload the jerasure plugin (firefly)

Reviewed-by: Sage Weil <sage@redhat.com>

erasure-code: preload the jerasure plugin

Load the jerasure plugin when ceph-osd starts to avoid the following
scenario:

* ceph-osd-v1 is running but did not load jerasure

* ceph-osd-v2 is installed being installed but takes time : the files
  are installed before ceph-osd is restarted

* ceph-osd-v1 is required to handle an erasure coded placement group and
  loads jerasure (the v2 version which is not API compatible)

* ceph-osd-v1 calls the v2 jerasure plugin and does not reference the
  expected part of the code and crashes

Although this problem shows in the context of teuthology, it is unlikely
to happen on a real cluster because it involves upgrading immediately
after installing and running an OSD. Once it is backported to firefly,
it will not even happen in teuthology tests because the upgrade from
firefly to master will use the firefly version including this fix.

While it would be possible to walk the plugin directory and preload
whatever it contains, that would not work for plugins such as jerasure
that load other plugins depending on the CPU features, or even plugins
such as isa which only work on specific CPU.

http://tracker.ceph.com/issues/9153 Fixes: #9153

Backport: firefly
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit 9b802701f78288ba4f706c65b853415c69002d27)

Conflicts:
src/test/erasure-code/test-erasure-code.sh
src/common/config_opts.h

Work around an apparent binding bug (GCC 4.8).

A reference to h->seq passed to std::pair ostensibly could not bind
because the header structure is packed. At first this looked like
a more general unaligned access problem, but the only location the
compiler rejects is a false positive.

Signed-off-by: Matt Benjamin <matt@linuxbox.com>
(cherry picked from commit c930a1f119069a424af28a618b0abff4947c221f)

qa/workunits/rbd/qemu-iotests: touch common.env

This seems to be necessary on trusty.

Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 055be68cf8e1b84287ab3631a02e89a9f3ae6cca)

unittest_strtol: fix compilation warning

Was fixed in master by a4923f5bc373d530d1ffdf6c58a4d88139daedd2

Signed-off-by: Sage Weil <sage@redhat.com>

Fix EINVAL err when use "ceph tell osd.* bench"

Signed-off-by: huangjun <hjwsm1989@gmail.com>
(cherry picked from commit 7dc93a9651f602d9c46311524fc6b54c2f1ac595)

qa/workunits/cephtool/test.sh: fix get erasure_code_profile test

Manual backport of 4d6899c7560e990650959b442980a7249f0ba4c1

Signed-off-by: Sage Weil <sage@redhat.com>

mon: OSDMonitor: add 'osd pool get-quota' command

Enables us to obtain current quotas for a given pool.

Fixes: #8523
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 714a9bb5a058b2553f3be3e4cfb7e7f30150e75a)

mon: name instead of id in "has tiers" message

Instead of "Pool foo has tiers 1 2" print
"Pool foo has tiers bar baz".

Signed-off-by: John Spray <jspray@redhat.com>
(cherry picked from commit 97772c2f53f726bd71710d0d3e34159d2679390a)

common/config.cc: allow integer values to be parsed as SI units

We are allowing this for all and any integer values; that is, OPT_INT,
OPT_LONGLONG, OPT_U32 and OPT_U64.

It's on the user to use appropriate units. For instance, the user should
not use 'E(xabyte)' when setting a signed int, and use his best judgment
when setting options that, for instance, ought to receive seconds.

Fixes: 8265
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 5500437e064cd6b4b45d63ee9396193df87f4d44)

test/strtol.cc: Test 'strict_strtosi()'

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 40587d4792fd55db72d33870aae8b6a806c9baaf)

common/strtol.cc: strict_strtosi() converts str with SI units to uint64_t

Accepts values with a suffix (B, K, M, G, T, P, E) and returns the
appropriate byte value.

E.g., 10B = 10, while 10K = 10240.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 67dc5751ba9a4e527ff12ea65000d1ba45d956f6)

ceph-disk: linter cleanup

Signed-off-by: Alfredo Deza <alfredo.deza@inktank.com>
(cherry picked from commit d74ed9d53fab95f27a9ad8e9f5dab7192993f6a3)

ceph-disk: warn about falling back to sgdisk (once)

This way the user knows something funny might be up if dmcrypt is in use.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 6f7798e37e098de38fbc73f86c4c6ee705abbe38)

ceph-disk: only fall back to sgdisk for 'list' if blkid seems old

If the blkid doesn't show us any ID_PART_ENTRY_* fields but we know it is
a GPT partition, *then* fallback. Otherwise, don't bother.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit b1651afb34d9d2c324db3bf5f54ac9ce001c6af9)

ceph-disk: add get_partition_base() helper

Return the base devices/disk for a partition device.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit b75e8a340c49cbc067baa19790b994a5f904bb4f)

ceph-disk: display information about dmcrypted data and journal volumes

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit c7a1ceba441fa99a82e19ed2cd3c6782a5d77636)

ceph-disk: move fs mount probe into a helper

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit f80ed26d2403ba12e80da6459fc45c22584f72de)

ceph-disk: use partition type UUIDs, and blkid

Use blkid to give us the GPT partition type.  This lets us distinguish
between dmcrypt and non-dmcrypt partitions.  Fake it if blkid doesn't
give us what we want and try with sgdisk.  This isn't perfect (it can't
tell between dmcrypt and not dmcrypt), but such is life, and we are better
off than before.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 6c77f5f2f994c881232d76ce9c69af80d10772bd)

ceph-disk: fix log syntax error

File "/usr/sbin/ceph-disk", line 303, in command_check_call
LOG.info('Running command: %s' % ' '.join(arguments))
TypeError: sequence item 2: expected string, NoneType found

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 1088d6cd11b476cd67ed30e07edd363c4057a003)

Revert "Fix for bug #6700"

This reverts commit 673394702b725ff3f26d13b54d909208daa56d89.

This appears to break things when the journal and data disk are *not* the same.
And I can't seem to reproduce the original failure...

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 2edf01ffa4a7425af2691b4e94bc5fd0bfab1e5b)