git.apps.os.sepia.ceph.com Git

osd/ReplicatedPG: check agent_mode if agent is enabled but hit_sets aren't

It is probably not a good idea to try to run the tiering agent without a
hit_set to inform its actions, but it is technically possible. For
example, one could simply blindly evict when we reach the full point.
However, this doesn't work because the agent mode is guarded by a hit_set
check, even though agent_setup() is not. Fix that.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 5d1c76f641310f5f65600f70ae76945b2aa472d7)

common/LogClient: fix sending dup log items

We need to skip even the most recently sent item in order to get to the
ones we haven't sent yet.

Fixes: #9080
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 057c6808be5bc61c3f1ac2b956c1522f18411245)

RadosClient: Fixing potential lock leaks.

In lookup_pool and pool_delete, a lock is taken
before invoking wait_for_osdmap, but is not
released for the failure case of the call. Fixing the same.

Fixes: #9022
Signed-off-by: Pavan Rallabhandi <pavan.rallabhandi@sandisk.com>
(cherry picked from commit f1aad8bcfc53f982130dbb3243660c3c546c3523)

librbd: fix error path cleanup for opening an image

If the image doesn't exist and caching is enabled, the ObjectCacher
was not being shutdown, and the ImageCtx was leaked. The IoCtx could
later be closed while the ObjectCacher was still running, resulting in
a segfault. Simply use the usual cleanup path in open_image(), which
works fine here.

Fixes: #8912
Backport: dumpling, firefly
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 3dfa72d5b9a1f54934dc8289592556d30430959d)

Merge remote-tracking branch 'gh/firefly-next' into firefly

Add rbdcache max dirty object option

Librbd will calculate max dirty object according to rbd_cache_max_size, it
doesn't suitable for every case. If user set image order 24, the calculating
result is too small for reality. It will increase the overhead of trim call
which is called each read/write op.

Now we make it as option for tunning, by default this value is calculated.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
(cherry picked from commit 3c7229a2fea98b30627878c86b1410c8eef2b5d7)

librbd/internal.cc: check earlier for null pointer

Fix potential null ponter deref, move check for 'order != NULL'
to the beginning of the function to prevent a) deref in ldout() call
and b) to leave function as early as possible if check fails.

[src/librbd/internal.cc:843] -> [src/librbd/internal.cc:865]: (warning)
Possible null pointer dereference: order - otherwise it is redundant
to check it against null.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit 3ee3e66a9520a5fcafa7d8c632586642f7bdbd29)

librbd: add an interface to invalidate cached data

This is useful for qemu to guarantee live migration with caching is
safe, by invalidating the cache on the destination before starting it.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 5d340d26dd70192eb0e4f3f240e3433fb9a24154)

librbd: check return code and error out if invalidate_cache fails

This will only happen when shrinking or rolling back an image is done
while other I/O is in flight to the same ImageCtx. This is unsafe, so
return an error before performing the resize or rollback.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit e08b8b66c77be3a3d7f79d91c20b1619571149ee)

os/FileStore: dump open fds before asserting

Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 4e8de1792731cf30f2744ab0659d036adc0565a3)

ceph_test_rados_api_tier: do fewer writes in HitSetWrite

We don't need to do quite so many writes. It can be slow when we are
thrashing and aren't doing anything in parallel.

Fixes: #8932
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit c5f766bb16c0ab3c3554e73791ad0b74077ad35c)

Merge remote-tracking branch 'gh/firefly-next' into firefly

ceph_test_rados_api_tier: fix [EC] HitSet{Read,Write,Trim} tests

The hit_set_ fields can only be set on tier pools as of
f131dfbaedf6f451572e7aa3a83f653912122953.

Fixes: #8823
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e17e9d857722ee478abda10adb32e15b11fff2ff)

Merge branch 'wip-8701-firefly' into firefly-next

ceph_test_objectstore: clean up on finish of MoveRename

Otherwise, we leave collections around, and the next test fails.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit d4faf747b73e70dff9cb5c98ee6aaa4ecec215fc)

os/LFNIndex: use FDCloser for fsync_dir

This prevents an fd leak when maybe_inject_failure() throws an exception.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 3ec9a42b470422b1fe72b6294d82d9efcaca7f53)

os/LFNIndex: only consider alt xattr if nlink > 1

If we are doing a lookup, the main xattr fails, we'll check if there is an
alt xattr. If it exists, but the nlink on the inode is only 1, we will
kill the xattr. This cleans up the mess left over by an incomplete
lfn_unlink operation.

This resolves the problem with an lfn_link to a second long name that
hashes to the same short_name: we will ignore the old name the moment the
old link goes away.

Fixes: #8701
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 6fb3260d59faab1e20ebf1e44f850f85f6b8342a)

os/LFNIndex: remove alt xattr after unlink

After we unlink, if the nlink on the inode is still non-zero, remove the
alt xattr. We can *only* do this after the rename or unlink operation
because we don't want to leave a file system link in place without the
matching xattr; hence the fsync_dir() call.

Note that this might leak an alt xattr if we happen to fail after the
rename/unlink but before the removexattr is committed. We'll fix that
next.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit ec36f0a130d67df6cbeefcc9c2d83eb703b6b28c)

os/LFNIndex: FDCloser helper

Add a helper to close fd's when we leave scope. This is important when
injecting failures by throwing exceptions.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit a320c260a9e088ab0a4ea3d5298c06a2d077de37)

os/LFNIndex: handle long object names with multiple links (i.e., rename)

When we rename an object (collection_move_rename) to a different name, and
the name is long, we run into problems because the lfn xattr can only track
a single long name linking to the inode.  For example, suppose we have

foobar -> foo_123_0 (attr: foobar) where foobar hashes to 123.

At first, collection_add could only link a file to another file in a
different collection with the same name. Allowing collection_move_rename
to rename the file, however, means that we have to convert:

col1/foobar -> foo_123_0 (attr: foobar)

to

col1/foobaz -> foo_234_0 (attr: foobaz)

This is a problem because if we link, reset xattr, unlink we end up with

col1/foobar -> foo_123_0 (attr: foobaz)

if we restart after we reset the attr.  This will cause the initial foobar
lookup to since the attr doesn't match, and the file won't be able to be
looked up.

Fix this by allow *two* (long) names to link to the same inode.  If we
lfn_link a second (different) name, move the previous name to the "alt"
xattr and set the new name.  (This works because link is always followed
by unlink.)  On lookup, check either xattr.

Don't even bother to remove the alt xattr on unlink.  This works as long
as the old name and new name don't hash to the same shortname and end up
in the same LFN chain.  (Don't worry, we'll fix that next.)

Fixes part of #8701
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit b2cdfce6461b81f4926602a8c63b54aa92684e6c)

ceph_test_objectstore: fix warning

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit cf98805c09a38cce78ac08317899dc4152ae55a5)

store_test: add long name collection_move_rename tests

Currently fails.

Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 6aa48a485e03ca100f3d9ebec77cc06f99756cd7)

Conflicts:
src/test/objectstore/store_test.cc

Revert "enforce rados put aligment"

This reverts commit 7a58da53ebfcaaf385c21403b654d1d2f1508e1a.

This was alread backported in dece65064d949b5afcc359cd408615883b5e002a.

Fixes: #8996
Signed-off-by: Sage Weil <sage@redhat.com>

rgw: fix crash in swift CORS preflight request

Fixes: #8586
This fixes error handling, in accordance with commit 6af5a537 that fixed
the same issue for the S3 case.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 18ea2a869791b4894f93fdafde140285f2e4fb65)

rgw: fix decoding + characters in URL

Fixes: #8702
Backport: firefly

Only decode + characters to spaces if we're in a query argument. The +
query argument. The + => ' ' translation is not correct for
file/directory names.

Resolves http://tracker.ceph.com/issues/8702

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Brian Rak <dn@devicenull.org>
(cherry picked from commit 4a63396ba1611ed36cccc8c6d0f5e6e3e13d83ee)

rgw: call processor->handle_data() again if needed

Fixes: #8937
Following the fix to #8928 we end up accumulating pending data that
needs to be written. Beforehand it was working fine because we were
feeding it with the exact amount of bytes we were writing.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 0553890e79b43414cc0ef97ceb694c1cb5f06bbb)

Conflicts:
src/rgw/rgw_rados.h

rgw: object write should not exceed part size

Fixes: #8928
This can happen if the stripe size is not a multiple of the chunk size.

Backport: firefly

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 14cad5ece7d1de9d93e72acca6d4c3b4a9cfcfa2)

rgw: align object chunk size with pool alignment

Fixes: #8442
Backport: firefly
Data pools might have strict write alignment requirements. Use pool
alignment info when setting the max_chunk_size for the write.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit fc83e197ab85355e385c13f2a64957cad7481298)

Conflicts:
src/rgw/rgw_rados.cc

cls_rgw: fix object name of objects removed on object creation

Fixes: #8972
Backport: firefly, dumpling

Reported-by: Patrycja Szabłowska <szablowska.patrycja@gmail.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 0f8929a68aed9bc3e50cf15765143a9c55826cd2)

Merge remote-tracking branch 'origin/wip-8438' into firefly-next

Backport of c5b8d8105d965da852c79add607b69d5ae79a4d4

Merge remote-tracking branch 'origin/wip-7999' into firefly-next

Backport of 830940bf242a73403ec1882a489e31f7694b7f7e

unittest_crush_wrapper: fix build

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit f36cffc986c973014c89aa37ca73740b2fc194ca)

Merge pull request #2178 from dachary/wip-erasure-code-profile-default-firefly

erasure-code: create default profile if necessary (firefly)

mon: s/%%/%/

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit d700076a42a5a5ebe769a8311fd3b52bf2e98cd2)

atomic: fix read() on i386, clean up types

Among other things, fixes #8969

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 96863128e6668257f435c6962263caae0d7d10dd)

include/atomic: make 32-bit atomic64_t unsigned

This fixes

In file included from test/perf_counters.cc:19:0:
./common/perf_counters.h: In member function ‘std::pair PerfCounters::perf_counter_data_any_d::read_avg() const’:
warning: ./common/perf_counters.h:156:36: comparison between signed and unsigned integer expressions [-Wsign-compare]
} while (avgcount2.read() != count);
^

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 2081c992bbe3a83d711f465634d19c011d28ea3e)

Define AO_REQUIRE_CAS (fixes FTBFS on 'hppa')

to fix FTBFS due to undeclared atomic functions.

As reported

    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=748571

by John David Anglin <dave.anglin@bell.net>

~~~~
./include/atomic.h: In member function 'size_t ceph::atomic_t::inc()':
./include/atomic.h:42:36: error: 'AO_fetch_and_add1' was not declared in this scope
       return AO_fetch_and_add1(&val) + 1;
                                    ^
./include/atomic.h: In member function 'size_t ceph::atomic_t::dec()':
./include/atomic.h:45:42: error: 'AO_fetch_and_sub1_write' was not declared in this scope
       return AO_fetch_and_sub1_write(&val) - 1;
                                          ^
./include/atomic.h: In member function 'void ceph::atomic_t::add(size_t)':
./include/atomic.h:48:36: error: 'AO_fetch_and_add' was not declared in this scope
       AO_fetch_and_add(&val, add_me);
                                    ^
./include/atomic.h: In member function 'void ceph::atomic_t::sub(int)':
./include/atomic.h:52:48: error: 'AO_fetch_and_add_write' was not declared in this scope
       AO_fetch_and_add_write(&val, (AO_t)negsub);
                                                ^
./include/atomic.h: In member function 'size_t ceph::atomic_t::dec()':
./include/atomic.h:46:5: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
make[5]: *** [cls/user/cls_user_client.o] Error 1
~~~~

Signed-off-by: Dmitry Smirnov <onlyjob@member.fsf.org>
(cherry picked from commit 74218f3d6ca8ca9943ff9d08b7926e38fb13b329)

atomic_t: add atomic64_t

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit bf3ba6001c7b4cf37edfe6551d3ef298ebcbf421)

test/cli-integration/rbd: fix trailing space

Newer versions of json.tool remove the trailing ' ' after the comma. Add
it back in with sed so that the .t works on both old and new versions, and
so that we don't have to remove the trailing spaces from all of the test
cases.

Backport: firefly
Fixes: #8920
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 605064dc685aa25cc7d58ec18b6449a3ce476d01)

Conflicts:
src/test/cli-integration/rbd/defaults.t

tests: don't depend on 'data' pool in rbd test

Since we removed the default 'data' and 'metadata' pools,
tests which need a pool should create it themselves.

Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit a7a631d1e284f151e305f770cef2042a1b9f86c0)

PGMonitor: fix bug in caculating pool avail space

Currently for pools with different rules, "ceph df" cannot report
right available space for them, respectively. For detail assisment
of the bug ,pls refer to bug report #8943

This patch fix this bug and make ceph df works correctlly.

Fixes Bug #8943

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
(cherry picked from commit 04d0526718ccfc220b4fe0c9046ac58899d9dafc)

mon: set min_size to data chunk count for erasure pools

Make the min_size value meaningful for erasure pools.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e06c58c9b8f585d2fe7c97d010aa0aa61c09d609)

mon: include 'max avail' in df output

Include an estimate of the maximum writeable space for each pool. Note
that this value is a conservative estimate for that pool based on the
most-full OSD. It is also potentially misleading as it is the available
space if *all* new data were written to this pool; one cannot (generally)
add up the available space for all pools.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 7a9652b58ea70f9a484a135bde20d872616c5947)

mon: right justify df values

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 2f63a309df4b7086725949bc0a532595cf927edf)

mon: Fix % escaping (\% should be %%)

Clang's -Wpedantic points this out.

Signed-off-by: John Spray <john.spray@inktank.com>
(cherry picked from commit f0231ef364d531eb60351598c4a0f5fa6efad23c)

Conflicts:
src/mon/DataHealthService.cc

crush: add get_rule_weight_map

Calculate a weight map of OSDs for a given rule.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 297f6169feecd20e121d102e1b63a505c8b3e74a)

Fix the PG listing issue which could miss objects for EC pool (where there is object shard and generation).
Backport: firefly
Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
(cherry picked from commit 228760ce3a7109f50fc0f8e3c4a5697a423cb08f)

osd/ReplicatedPG: requeue cache full waiters if no longer writeback

If the cache is full, we block some requests, and then we change the
cache_mode to something else (say, forward), the full waiters don't get
requeued until the cache becomes un-full. In the meantime, however, later
requests will get processed and redirected, breaking the op ordering.

Fix this by requeueing any full waiters if we see that the cache_mode is
not writeback.

Fixes: #8931
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 8fb761b660c268e2264d375a4db2f659a5c3a107)

osd/ReplicatedPG: fix cache full -> not full requeueing when !active

We only want to do this if is_active(). Otherwise, the normal
requeueing code will do its thing, taking care to get the queue orders
correct.

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 36aaab9eee7ed41a46a4ac27376d630a29de5eb9)

qa/workunits/cephtool/test_daemon.sh: verify ceph -c works with daemon

Signed-off-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit aa9ae1f270293778aa937e7f7e4bcaee3099b9b2)

qa/workunits/cephtool/test_daemon.sh: typo

Signed-off-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 22d20f39b7355966554319d5a1aa888967607569)

qa/workunits/cephtool/test_daemon.sh: allow local ceph command

(cherry picked from commit 97a8d5a9fdbd3a25cc922c242ee57da58c57d0bc)

ceph.in: Pass global args to ceph-conf for proper lookup

Fixes: #8944
Signed-off-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 6d89a99648630f81b85ad115fe7662dba6b08a55)

qa/workunits/cephtool/test.sh: test osd pool get erasure_code_profile

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ce9f12d7a2202948532fed9da4d763ed03f6b8fa)

Conflicts:
qa/workunits/cephtool/test.sh

mon: OSDMonitor: add "osd pool get <pool> erasure_code_profile" command

Enable us to obtain the erasure-code-profile for a given erasure-pool.

Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e8ebcb79a462de29bcbabe40ac855634753bb2be)

osd/ReplicatedPG: observe INCOMPLETE_CLONES in is_present_clone()

We cannot assume that just because cache_mode is NONE that we will have
all clones present; check for the absense of the INCOMPLETE_CLONES flag
here too.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 63abf11390bb9b8dd604aae2b3e90596f9ab65ac)

osd/ReplicatedPG: observed INCOMPLETE_CLONES when doing clone subsets

During recovery, we can clone subsets if we know that all clones will be
present. We skip this on caching pools because they may not be; do the
same when INCOMPLETE_CLONES is set.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 41364711a66c89ce2e94435fe0d54eeda6092614)

osd/ReplicatedPG: do not complain about missing clones when INCOMPLETE_CLONES is set

When scrubbing, do not complain about missing cloens when we are in a
caching mode *or* when the INCOMPLETE_CLONES flag is set. Both are
indicators that we may be missing clones and that that is okay.

Fixes: #8882
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 956f28721dd98c5fb9eb410f4fe9e320b3f3eed3)

osd/osd_types: add pg_pool_t FLAG_COMPLETE_CLONES

Set a flag on the pg_pool_t when we change cache_mode NONE. This
is because object promotion may promote heads without all of the clones,
and when we switch the cache_mode back those objects may remain. Do
this on any cache_mode change (to or from NONE) to capture legacy
pools that were set up before this flag existed.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 54bf055c5dadc55acf5731e08712d529b180ffc5)

qa/workunits: cephtool: adjust pool name where missing as it has changed

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 50e93c2138978f7f7c2fbafacc1611c8705a8eab)

qa/workunits: cephtool: cleanup after pool creation

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 6cd345732b15e84de17d743e06bc4d85569b79d4)

qa/workunits: cephtool: pool needs to be a tier to be used as such

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 704b0a33f2071eabeb8c5b000a6805ef6d498961)

qa/workunits: cephtool: test erroneous 'tier remove'

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 49db6767152092d503ccf8ead6f7cb069e152a22)

qa/workunits: cephtool: test get/set on both tier and non-tier pools

Make sure gets and sets of tiering-specific variables succeed on tier
pools and fail on non-tier pools.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 9fea033f30aec44a3273c623ec6c93eb1d7dd26b)

qa/workunits: cephtool: split get/set on tier pools from get/set tests

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit df5944955d96c041e65964a13b802028e9700904)

qa/workunits: cephtool: test for 'osd pool {get,set}-quota'

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit b927c0de7d5c7a78bf3c133be52cbc1d769974bb)

mon: OSDMonitor: 'osd pool' - if we can set it, we must be able to get it

Add support to get the values for the following variables:
- target_max_objects
- target_max_bytes
- cache_target_dirty_ratio
- cache_target_full_ratio
- cache_min_flush_age
- cache_min_evict_age

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit ddc04c83ff6842ca0b2f804b46099ea729b9fb6b)

qa: support running under non privileged user

If the test is run against a cluster started with vstart.sh (which is
the case for make check), the --asok-does-not-need-root disables the use
of sudo and allows the test to run without requiring privileged user
permissions.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 522174b066044e51a7019bd6cad81117e83c394e)

qa/workunits/cephtool/test.sh: sudo ceph daemon

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit bcc09f93761d46202742ca85cce498a352edd494)

qa/workunits: cephtool: fix 'osd bench' test

Commit 7dc93a9651f602d9c46311524fc6b54c2f1ac595 fixed an incorrect
behavior with the OSD's 'osd bench' value hard-caps. The test wasn't
appropriately modified unfortunately.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 48e38ac6323f4a0e06b0dedd37ecd10dc339b1e3)

qa/workunits: cephtool: only run heap profiler test if tcmalloc enabled

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 4b0809a13eb363a15e52a6a57372a0a31a64cef8)

qa/workunits: cephtool: set +e for the tcmalloc tests

Avoids failing the tests when tcmalloc is not present

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 5c4616e100297ba8639919aca7a9cb59e4bda54a)

qa/workunits: cephtool: delete unnecessary function calls

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 67255435151627314cc2fc38732d4fb5efddc3cc)

qa/workunits: cephtool: disable bash debug when not running tests

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 946bd0dad4b027326b03c13405782b99ef0f91b2)

qa/workunits: cephtool: allow running individual tests

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 5d26575ef2d31d745ec4aa69ca1501cd76e5e8db)

qa/workunits: cephtool: cleanup state after erasure-code-profile test

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit f4184086d0d647e064e34308f678ef889e13c373)

qa/workunits: cephtool: add/remove comments

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 780424df3a107c7da57fc28d64f9e7a4bb47f8e8)

qa/workunits: cephtool: split into properly indented functions

The test was a big sequence of commands being run and it has been growing
organically for a while, even though it has maintained a sense of
locality with regard to the portions being tested.

This patch intends to split the commands into functions, allowing for a
better semantic context and easier expansion. On the other hand, this
will also allow us to implement mechanisms to run specific portions of
the test instead of always having to run the whole thing just to test a
couple of lines down at the bottom (or have to creatively edit the test).

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 3d14a96f4b2b7094d05ead1dec7547d165857e31)

Conflicts:
qa/workunits/cephtool/test.sh

qa/workunits: cephtool: move test line to where it's more appropriate

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 04658b7b2b5f98ae81ffb3f77303745e6d46eb81)

qa/workunits: cephtool: split into functions

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit db6cc133ba4bb38b3c11eb835fd3983dc2f6b00e)

Conflicts:
qa/workunits/cephtool/test.sh

mon: test that pools used in tiers cannot be removed

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 39a4b78177cb9896ff35ab05bcf8774bfc934f3a)

qa/workunits/cephtool: test setting options using SI units

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 38405d3554dfb0caf2b0a2c010b95a61bdb99349)

Conflicts:
qa/workunits/cephtool/test.sh

mon: OSDMonitor: be scary about inconsistent pool tier ids

We may not crash your cluster, but you'll know that this is not something
that should have happened. Big letters makes it obvious. We'd make them
red too if we bothered to look for the ANSI code.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 8e5a8daf98052954f3880d2d3516841b5062466b)

osd: pg_pool_t: clear tunables on clear_tier()

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 64bdf6c92bc44adad5a49b9dc4f674789cee80b0)

mon: OSDMonitor: limit tier-specific pool set/get on non-tier pools

Fixes: 8696
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit f131dfbaedf6f451572e7aa3a83f653912122953)

mon/OSDMonitor: improve no-op cache_mode set check

If we have a pending pool value but the cache_mode hasn't changed, this is
still a no-op (and we don't need to block).

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 67d13d76f5692fa20649ea877f254c34094c11f6)

mon: OSDMonitor: disallow nonsensical cache-mode transitions

Fixes: 8155
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit fd970bbc95d89bf66c9551feca17ac0afbf4d1e2)

mon: OSDMonitor: return immediately if 'osd tier cache-mode' is a no-op

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit d01aa5bff30441eec1ffaa3e59a21187f8478475)

osd/ReplicatedPG: debug obc locks

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 356af4bf46d6387e2f1a59646548f9a77e49e5f8)

osd/ReplicatedPG: greedily take write_lock for copyfrom finish, snapdir

In the cases where we are taking a write lock and are careful
enough that we know we should succeed (i.e, we assert(got)),
use the get_write_greedy() variant that skips the checks for
waiters (be they ops or backfill) that are normally necessary
to avoid starvation. We don't care about staration here
because our op is already in-progress and can't easily be
aborted, and new ops won't start because they do make those
checks.

Fixes: #8889
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 6fe27823b8459271bf0c0e807493bb7cf1e4559b)

osd: allow greedy get_write() for ObjectContext locks

There are several lockers that need to take a write lock
because there is an operation that is already in progress and
know it is safe to do so. In particular, they need to skip
the starvation checks (op waiters, backfill waiting).

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 09626501d7a0ff964027fd7a534465b76bad23cb)

qa/workunits/rest/test.py: make osd create test idempotent

Avoid possibility that we create multiple OSDs do to retries by passing in
the optional uuid arg. (A stray osd id will make the osd tell tests a
few lines down fail.)

Fixes: #8728
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit bb3e1c92b6682ed39968dc5085b69c117f43cbb0)

enforce rados put aligment

Signed-off-by: Lluis Pamies-Juarez <lluis.pamies-juarez@hgst.com>
(cherry picked from commit 304b08a23a3db57010078046955a786fe3589ef8)
(cherry picked from commit dece65064d949b5afcc359cd408615883b5e002a)

use llrintl when converting double to micro

This avoids rounding error (noticeable on i386).

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 80911736bd61b6b88eac0974d24f21c15c5385a4)

Conflicts:
src/mon/OSDMonitor.cc

msg/SimpleMessenger: drop local_conneciton priv link on shutdwon

This breaks ref cycles between the local_connection and session, and let's
us drop the explicit set_priv() calls in OSD::shutdown().

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 63c1711a9e237c14d137131f56751970ad1567b1)

erasure-code: create default profile if necessary

After an upgrade to firefly, the existing Ceph clusters do not have the
default erasure code profile. Although it may be created with

ceph osd erasure-code-profile set default

it was not included in the release notes and is confusing for the
administrator.

The *osd pool create* and *osd crush rule create-erasure* commands are
modified to implicitly create the default erasure code profile if it is
not found.

In order to avoid code duplication, the default erasure code profile
code creation that happens when a new firefly ceph cluster is created is
encapsulated in the OSDMap::get_erasure_code_profile_default method.

Conversely, handling the pending change in OSDMonitor is not
encapsulated in a function but duplicated instead. If it was a function
the caller would need a switch to distinguish between the case when goto
wait is needed, or goto reply or proceed because nothing needs to be
done. It is unclear if having a function would lead to smaller or more
maintainable code.

http://tracker.ceph.com/issues/8601 Fixes: #8601

Backport: firefly
Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 4e1405e7720eda71a872c991045ac8ead6f3e7d8)

common: s/stringstream/ostream/ in str_map

There is no need to specialize more than ostream : it only makes it
impossible to use cerr or cout as a parameter to str_map.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 6aa45b133956b974a992b372496b90c908d94f12)

0.80.5

osd: cancel agent_timer events on shutdown

We need to cancel all agent timer events on shutdown. This also needs to
happen early so that any in-progress events will execute before we start
flushing and cleaning up PGs.

Backport: firefly
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c0dc245b662f1f9c640d7dd15fdf4cf26e729782)

Conflicts:
src/osd/OSD.cc

osd: s/applying repop/canceling repop/

The 'applying' language dates back to when we would wait for acks from
replicas before applying writes locally. We don't do any of that any more;
now, this loop just cancels the repops with remove_repop() and some other
cleanup.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ef40737eee4389faa7792661a0f9d15b3d0440f2)