git.apps.os.sepia.ceph.com Git

qa: use correct binary path on rpm-based systems

Fixes: #10715
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
(cherry picked from commit 05ce2aa1bf030ea225300b48e7914577a412b38c)

fsync-tester: print info about PATH and locations of lsof lookup

We're seeing the lsof invocation fail (as not found) in testing and nobody can
identify why. Since attempting to reproduce the issue has not worked, this
patch will gather data from a genuinely in-vitro location.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
(cherry picked from commit a85051483874ff5b8b0fb50426a3577040457596)

qa: use sudo even more when rsyncing /usr

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
(cherry picked from commit 3aa7797741f9cff06053a2f31550fe6929039692)

qa: use sudo when rsyncing /usr so we can read everything

Signed-off-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit fa07c04231db2d130de54647957ffab4a7a53733)

rgw: fix partial GET in swift

Fixes: #10553
backport: firefly, giant

Don't set the ret code to reflect partial download, just set the
response status when needed.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 7e1553cedff90fa0fefded65cde87ad068eb5f0c)

osd: handle no-op write with snapshot case

If we have a transaction that does something to the object but it !exists
both before and after, we will continue through the write path. If the
snapdir object already exists, and we try to create it again, we will
leak a snapdir obc and lock and later crash on an assert when the obc
is destroyed:

0> 2014-12-06 01:49:51.750163 7f08d6ade700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f08d6ade700 time 2014-12-06 01:49:51.605411
osd/osd_types.h: 2944: FAILED assert(rwstate.empty())

Fix is to not recreated the snapdir if it already exists.

Fixes: #10262
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 02fae9fc54c10b5a932102bac43f32199d4cb612)

ceph_test_rados_api_misc: do not assert rbd feature match

This test fails on upgrades when we (or the server) have new
features. Make it less fragile.

Fixes: #10576
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 9147c62989871cea8b3a85b02c53017825efb55b)

rgw: change multipart upload id magic

Fixes: #10271
Backport: firefly, giant

Some clients can't sign requests correctly with the original magic
prefix.

Reported-by: Georgios Dimitrakakis <giorgis@acmac.uoc.gr>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 5fc7a0be67a03ed63fcc8408f8d71a31a1841076)

rgw: url decode http query params correctly

Fixes: #10271
Backport: firefly

This got broken by the fix for #8702. Since we now only url_decode if
we're in query, we need to specify that we're in query when decoding
these args.

Reported-by: Georgios Dimitrakakis <giorgis@acmac.uoc.gr>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 21e07eb6abacb085f81b65acd706b46af29ffc03)

qa: ignore duplicates in rados ls

These can happen with split or with state changes due to reordering
results within the hash range requested. It's easy enough to filter
them out at this stage.

Backport: giant, firefly
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
(cherry picked from commit e7cc6117adf653a4915fb7a75fac68f8fa0239ec)

Merge branch 'wip-firefly-rgw-backports' into firefly

init-radosgw.sysv: set ulimit -n before starting daemon

If we do the ulimit inside the daemon command we will have already
dropped privs and will fail.

Fixes: #9587
Backport: giant, firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 9803cedf54a7baff45ccd0e0f65d2bc220958a46)

rgw: use s->bucket_attrs instead of trying to read obj attrs

Fixes: #10307
Backport: firefly, giant

This is needed, since we can't really read the bucket attrs by trying to
read the bucket entry point attrs. We already have the bucket attrs
anyway, use these.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 5cf193c8686196d5235889e68cb5ea8f1fc8e556)

rgw: remove swift user manifest (DLO) hash calculation

Fixes: #9973
Backport: firefly, giant

Previously we were iterating through the parts, creating hash of the
parts etags (as S3 does for multipart uploads). However, swift just
calculates the etag for the empty manifest object.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit ef6d3ad964d34bc526dc4435486bd5c8cdc3b230)

Conflicts:
src/rgw/rgw_op.cc

fix can not disable max_size quota

Currently if we enable quota and set max_size = -1, it doesn’t
mean max_size is unlimited as expected. Instead, it means object
with any size is not allowed to upload because of “QuotaExceeded”.
The root cause is the function rgw_rounded_kb which convert max_size
to max_size_kb returns 0 for -1 because it takes an unsigned int
but we pass an int to it. A simple fix is check max_size before
it’s rounded to max_size_kb.

Test case:
1 enable and set quota:
radosgw-admin quota enable --uid={user_id} --quota-scope=user
radosgw-admin quota set --quota-scope=user --uid={user_id}\
--max-objects=100 --max-size=-1
2 upload any object with non-zero length
it will return 403 with “QuotaExceeded” and return 200 if you apply the fix.

Fixes: #9907
Backport: giant, firefly
Signed-off-by: Dong Lei leidong@yahoo-inc.com
(cherry picked from commit abd3fd3ef9ee9999b99811937af60b7a5e673e35)

rgw: rados->set_attrs() updates bucket index

Fixes: #5595
Backport: dumpling, firefly
We need to update the bucket index when updating object attrs, otherwise
we're missing meta changes that need to be registered. It also
solves issue of bucket index not knowing about object acl changes,
although this one still requires some more work.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit f833f12a200ecc2c4f74ddb443d6fa61b7ad14db)

rgw: RGWRados::get_obj() returns wrong len if len == 0

Fixes: #9877
We only updated if len was > 0, should update it if r >= 0. This was the
culprit for issue #9877.
Backport: giant, firefly

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit fe7bf06366adaf787816d1e68f5e3f68e8c91134)

rgw: send back ETag on S3 object copy

Fixes: #9479
Backport: firefly, giant
We didn't send the etag back correctly. Original code assumed the etag
resided in the attrs, but attrs only contained request attrs.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit b1bfc3a7e0c9088f01f8ff770ae14f569fbc570d)

Conflicts:
src/rgw/rgw_rados.cc

rgw: S3 object copy content type fix

Fixes: #9478
Backport: firefly, giant
Content type for S3 object copy response should be set to
application/xml.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 31963459a0a869c4d32f55baa629f36df33eaa90)

rgw: http headers need to end with \r\n

Fixes: #9254
Backport: firefly, giant

Reported-by: Benedikt Fraunhofer <fraunhofer@traced.net>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 7409ab3df18fb312dd6c9f79084f889c523afdce)

Conflicts:
src/rgw/rgw_civetweb.cc
src/rgw/rgw_fcgi.cc

rgw: index swift keys appropriately

Fixes: #10471
Backport: firefly, giant

We need to index the swift keys by the full uid:subuser when decoding
the json representation, to keep it in line with how we store it when
creating it through other mechanism.

Reported-by: hemant burman <hemant.burman@gmail.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 478629bd2f3f32afbe6e93eaebb8a8fa01af356f)

0.80.8

rgw-admin: create subuser if needed when creating user

Fixes: #10103
Backport: firefly, giant
This turned up after fixing #9973. Earlier we also didn't create the
subuser in this case, but we didn't really read the subuser info when it
was authenticating. Now we do that as required, so we end up failing the
authentication. This only applies to cases where a subuser was created
using 'user create', and not the 'subuser create' command.

Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 9ba17a321db06d3d76c9295e411c76842194b25c)

rgw: update swift subuser perm masks when authenticating

Fixes: #9918
Backport: firefly, giant
It seems that we weren't setting the swift perm mask correctly.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 5d9f36f757a7272c24d2c9adc31db1ed5e712992)

rgw: send http status reason explicitly in fastcgi

There are issues in certain versions of apache 2.4, where the reason is
not sent back. Instead, just provide the reason explicitly.

Backport: firefly, giant

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit a9dd4af401328e8f9071dee52470a0685ceb296b)

mon/OSDMonitor: fix double-free on old MOSDBoot

send_latest() does an m->put().

Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 97f9b6df668315fba6a5924b79024c7a986f4110)

osd: requeue PG when we skip handling a peering event

If we don't handle the event, we need to put the PG back into the peering
queue or else the event won't get processed until the next event is
queued, at which point we'll be processing events with a delay.

The queue_null is not necessary (and is a waste of effort) because the
event is still in pg->peering_queue and the PG is queued.

This is a firefly-specific patch; a (very) similar one will appear in master
in 492ccc900c3358f36b6b14a207beec071eb06707.

Backport: giant, firefly
Signed-off-by: Sage Weil <sage@redhat.com>

Merge pull request #3217 from boydc2014/firefly

clear data and payload after removed from ops_in_flight

Reviewed-by: Sage Weil <sage@redhat.com>

Merge pull request #3127 from ktdreyer/firefly-no-epoch

Revert "ceph.spec.: add epoch"

Reviewed-by: Ken Dreyer <kdreyer@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Loic Dachary <ldachary@redhat.com>

Merge pull request #3128 from dachary/wip-10281-make-check-fedora-20

tests: fixes to run make check on fedora 20

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Loic Dachary <ldachary@redhat.com>

Merge pull request #3169 from ceph/wip-8797-firefly

Wip 8797 firefly

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Loic Dachary <ldachary@redhat.com>

Merge pull request #3179 from dachary/wip-9998-crush-underfloat-firefly

crush: fix weight underfloat issue (firefly)

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Loic Dachary <ldachary@redhat.com>

Merge pull request #3220 from ceph/wip-mon-backports.firefly

mon: backports for #9987 against firefly

Reviewed-by: Joao Eduardo Luis <joao@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Loic Dachary <ldachary@redhat.com>

Merge pull request #3258 from ceph/wip-10372-firefly

osd: fix librados pool deletion race on firefly

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Loic Dachary <ldachary@redhat.com>

If trusty, use older version of qemu

Fixes #10319
Signed-off-by: Warren Usui <warren.usui@inktank.com>
(cherry-picked from 46a1a4cb670d30397979cd89808a2e420cef2c11)

clear data and payload inside ops_in_flight_lock

http://tracker.ceph.com/issues/9916 Fixes: #9916
Signed-off-by: Dong Lei <leidong@yahoo-inc.com>

Merge pull request #3264 from dachary/wip-jerasure-firefly

erasure-code: update links to jerasure upstream

Merge pull request #3268 from ceph/firefly-10415

libcephfs/test.cc: close fd before umount

erasure-code: update links to jerasure upstream

It moved from bitbucket to jerasure.org

Signed-off-by: Loic Dachary <ldachary@redhat.com>
(cherry picked from commit 8e86f901939f16cc9c8ad7a4108ac4bcf3916d2c)

libcephfs/test.cc: close fd before umount

Fixes: #10415
Signed-off-by: Yan, Zheng <zyan@redhat.com>
(cherry picked from commit d3fb563cee4c4cf08ff4ee01782e52a100462429)

osdc/Objecter: handle reply race with pool deletion

We need to handle this scenario:

- send request in epoch X
- osd replies
- pool is deleted in epoch X+1
- client gets map X+1, sends a map check
- client handles reply
   -> asserts that no map checks are in flight

This isn't the best solution.  We could infer that a map check isn't needed
since the pool existed earlier and doesn't now.  But this is firefly and
the fix is no more expensive than the old assert.

Fixes: #10372
Signed-off-by: Sage Weil <sage@redhat.com>

test/ceph-disk.sh: mkdir -p

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit c2f58e6694a2457200ab3d59e037ad17b9c82028)

test/ceph-disk.sh: resolve symlinks before check

Make sure symlinks are resolved in command_fixture()
before compare result of which command and the current
path.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit 8ea86dfa7c4a3d7e089cf9d4e49586657875f851)

test/ceph-disk.sh: fix for SUSE

On SUSE 'which' returns always the full path of (shell) commands and
not e.g. './ceph-conf' as on Debian. Add check also for full
path returned by which.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit 39530536ff923b91899f6303507c283b78040a20)

tests: prevent kill race condition

When trying to kill a daemon, keep its pid in a variable instead of
retrieving it from the pidfile multiple times. It prevents the following
race condition:

  * try to kill ceph-mon
  * ceph-mon is in the process of dying and removed its pidfile
  * try to kill ceph-mon fails because the pidfile is not found
  * another ceph-mon is spawned and fails to bind the port
    because the previous ceph-mon is still holding it

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit a1c13c57ba20fc329d943ea57523913e11067dc7)

osd/OSD.cc: parse lsb release data via lsb_release

Use lsb_release tool to be portable since parsing /etc/lsb-release
is not the same between different distributions. The old code failed
e.g. for SUSE products to parse LSB information.

Fixes: #8654
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit 0b3a3987d382ff33fdf892f189b30df22be80e59)

tests: histogram prevent re-use of local variables

By moving the tests to separate functions.

http://tracker.ceph.com/issues/9235 Fixes: #9235

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit 4b8b25ecd128c34a386ad7c4cc89f323c4d384e1)

tests: histogram prevent re-use of local variables

By moving the test to a separate function.

http://tracker.ceph.com/issues/9235 Fixes: #9235

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit ee02293ad2ef050672fa8c164ba17b10e8d4ceeb)

tests: avoid tab interpretation problem on fedora 20

Use . instead of tab in echo to avoid variations in how escape sequences
are interpreted by the shell.

http://tracker.ceph.com/issues/10281 Fixes: #10281

Signed-off-by: Loic Dachary <ldachary@redhat.com>

mon/PGMap and PGMonitor: update last_epoch_clean cache from new osd keys

We were only invalidating the cached value from apply_incremental, which
is no longer called on modern clusters.

Fix this by storing the update epoch in the key as well (it is not part
of osd_stat_t).

Backport: giant, firefly, dumpling(?)
Fixes: #9987
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 093c5f0cabeb552b90d944da2c50de48fcf6f564)

mon/PGMap: invalidate cached min_last_epoch_clean from new-style pg keys

We were only invalidating the cache from the legacy apply_incremental(),
which is no longer called on modern clusters.

Fixes: #9987
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 3fb731b722c50672a5a9de0c86a621f5f50f2d06)

crush/CrushWrapper: fix create_or_move_item when name exists but item does not

We were using item_exists(), which simply checks if we have a name defined
for the item. Instead, use _search_item_exists(), which looks for an
instance of the item somewhere in the hierarchy. This matches what
get_item_weightf() is doing, which ensures we get a non-negative weight
that converts properly to floating point.

Backport: giant, firefly
Fixes: #9998
Reported-by: Pawel Sadowski <ceph@sadziu.pl>
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 9902383c690dca9ed5ba667800413daa8332157e)

crush/builder: prevent bucket weight underflow on item removal

It is possible to set a bucket weight that is not the sum of the item
weights if you manually modify/build the CRUSH map. Protect against any
underflow on the bucket weight when removing items.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 8c87e9502142d5b4a282b94f929ae776a49be1dc)

crush/CrushWrapper: fix _search_item_exists

Reported-by: Pawel Sadowski <ceph@sadziu.pl>
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit eeadd60714d908a3a033aeb7fd542c511e63122b)

Merge pull request #3124 from ceph/wip-10194-firefly

rgw: optionally call FCGX_Free() on the fcgi connection

Reviewed-by: Sage Weil <sage@redhat.com>

Call Rados.shutdown() explicitly before exit

This is mostly a demonstration of good behavior, as the resources will
be reclaimed on exit anyway.

Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit b038e8fbf9103cc42a4cde734b3ee601af6019ea)

rados.py: remove Rados.__del__(); it just causes problems

Recent versions of Python contain a change to thread shutdown that
causes ceph to hang on exit; see http://bugs.python.org/issue21963.
As it turns out, this is relatively easy to avoid by not spawning
threads on exit, as Rados.__del__() will certainly do by calling
shutdown(); I suspect, but haven't proven, that the problem is
that shutdown() tries to start() a threading.Thread() that never
makes it all the way back to signal start().

Also add a PendingReleaseNote and extra doc comments to clarify.

Fixes: #8797
Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 5ba9b8f21f8010c59dd84a0ef2acfec99e4b048f)

Conflicts:
PendingReleaseNotes

Revert "ceph.spec.: add epoch"

If ICE ships 0.80.8, then it will be newer than what RHCEPH ships
(0.80.7), and users won't be able to seamlessly upgrade via Yum.

We have three options:
A) Revert the "Epoch: 1" change on the Firefly branch.
B) Revert the "Epoch: 1" change in the ICE packages.
C) Bump the Epoch to "2" in Red Hat's packages.

This commit does Option A.

Option B may or may not be feasible - it would require a "downstream"
change in ICE, and we haven't done that sort of thing before.

Due to the RHEL release schedule, Option C is not available to us at
this point.

This reverts commit b890c1e4706d7cfef7ed24c9df65b439b4f7ff1d.

rgw: optionally call FCGX_Free() on the fcgi connection

Fixes: #10194
A new configurable controls this behavior. This forces disconnection of
the fcgi connection when done with the request.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

Merge pull request #3109 from ceph/firefly-10263

mds: store backtrace for straydir

Reviewed-by: Greg Farnum <gfarnum@redhat.com>

Merge pull request #3009 from dachary/wip-10018-primary-erasure-code-hinfo-firefly

osd: deep scrub must not abort if hinfo is missing (firefly)

Reviewed-by: Samuel Just <sjust@redhat.com>

mds: store backtrace for straydir

Backport: giant, firefly, emperor, dumpling
Signed-off-by: Yan, Zheng <zyan@redhat.com>
(cherry picked from commit 0d89db5d3e5ae5d552d4058a88a4e186748ab1d2)

Merge pull request #3089 from dachary/wip-10063-hobject-shard-firefly

common: do not omit shard when ghobject NO_GEN is set (firefly)

Merge pull request #2480 from dachary/wip-9420-erasure-code-non-regression-firefly

erasure-code: store and compare encoded contents (firefly)

Merge pull request #3096 from dachary/wip-9785-dmcrypt-keys-permissions-firefly

ceph-disk: dmcrypt file permissions (firefly)

ceph-disk: dmcrypt file permissions

The directory in which key files are stored for dmcrypt must be 700 and
the file 600.

http://tracker.ceph.com/issues/9785 Fixes: #9785

Signed-off-by: Loic Dachary <ldachary@redhat.com>
(cherry picked from commit 58682d1776ab1fd4daddd887d921ca9cc312bf50)

Merge pull request #3086 from dachary/wip-10125-radosgw-init-firefly

rgw: run radosgw as apache with systemd (firefly)

common: do not omit shard when ghobject NO_GEN is set

Do not silence the display of shard_id when generation is NO_GEN.
Erasure coded objects JSON representation used by ceph_objectstore_tool
need the shard_id to find the file containing the chunk.

Minimal testing is added to ceph_objectstore_tool.py

http://tracker.ceph.com/issues/10063 Fixes: #10063

Signed-off-by: Loic Dachary <ldachary@redhat.com>
(cherry picked from commit dcf09aed121f566221f539106d10283a09f15cf5)

Conflicts:
src/test/ceph_objectstore_tool.py

rgw: run radosgw as apache with systemd

Same as sysv.

http://tracker.ceph.com/issues/10125 Fixes: #10125

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 7b621f4abf63456272dec3449aa108c89504a7a5)

Conflicts:
src/init-radosgw.sysv

Merge pull request #3078 from ceph/wip-10030-firefly

librbd: don't close an already closed parent image upon failure

Reviewed-by: Josh Durgin <jdurgin@redhat.com>

Merge pull request #3063 from ceph/wip-10123-firefly

librbd: protect list_children from invalid child pool IoCtxs

Reviewed-by: Sage Weil <sage@redhat.com>

ReplicatedPG: don't move on to the next snap immediately

If we have a bunch of trimmed snaps for which we have no
objects, we'll spin for a long time. Instead, requeue.

Fixes: #9487
Backport: dumpling, firefly, giant
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit c17ac03a50da523f250eb6394c89cc7e93cb4659)

osd: initialize purged_snap on backfill start; restart backfill if change

If we backfill a PG to a new OSD, we currently neglect to initialize
purged_snaps.  As a result, the first time the snaptrimmer runs it has to
churn through every deleted snap for all time, and to make matters worse
does so in one go with the PG lock held.  This leads to badness on any
cluster with a significant number of removed snaps that experiences
backfill.

Resolve this by initializing purged_snaps when we finish backfill.  The
backfill itself will clear out any stray snaps and ensure the object set
is in sync with purged_snaps.  Note that purged_snaps on the primary
that is driving backfill will not change during this period as the
snaptrimmer is not scheduled unless the PG is clean (which it won't be
during backfill).

If we by chance to interrupt backfill, go clean with other OSDs,
purge snaps, and then let this OSD rejoin, we will either restart
backfill (non-contiguous log) or the log will include the result of
the snap trim (the events that remove the trimmed snap).

Fixes: #9487
Backfill: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 255b430a87201c7d0cf8f10a3c1e62cbe8dd2d93)

librbd: don't close an already closed parent image upon failure

If librbd is not able to open a child's parent image, it will
incorrectly close the parent image twice, resulting in a crash.

Fixes: #10030
Backport: firefly, giant
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 61ebfebd59b61ffdc203dfeca01ee1a02315133e)

librbd: protect list_children from invalid child pool IoCtxs

While listing child images, don't ignore error codes returned
from librados when creating an IoCtx. This will prevent seg
faults from occurring when an invalid IoCtx is used.

Fixes: #10123
Backport: giant, firefly, dumpling
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 0d350b6817d7905908a4e432cd359ca1d36bab50)

Merge pull request #3014 from dachary/wip-9665-ceph-disk-partprobe-firefly

ceph disk zap must call partprobe

ceph-disk: use update_partition in prepare_dev and main_prepare

In the case of prepare_dev the partx alternative was missing and is not
added because update_partition does it.

http://tracker.ceph.com/issues/9721 Fixes: #9721

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit 23e71b1ee816c0ec8bd65891998657c46e364fbe)

Conflicts:
src/ceph-disk

ceph-disk: run partprobe after zap

Not running partprobe after zapping a device can lead to the following:

* ceph-disk prepare /dev/loop2
* links are created in /dev/disk/by-partuuid
* ceph-disk zap /dev/loop2
* links are not removed from /dev/disk/by-partuuid
* ceph-disk prepare /dev/loop2
* some links are not created in /dev/disk/by-partuuid

This is assuming there is a bug in the way udev events are handled by
the operating system.

http://tracker.ceph.com/issues/9665 Fixes: #9665

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit fed3b06c47a5ef22cb3514c7647544120086d1e7)

ceph-disk: encapsulate partprobe / partx calls

Add the update_partition function to reduce code duplication.
The action is made an argument although it always is -a because it will
be -d when deleting a partition.

Use the update_partition function in prepare_journal_dev

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit 922a15ea6865ef915bbdec2597433da6792c1cb2)

Conflicts:
src/ceph-disk

osd: deep scrub must not abort if hinfo is missing

Instead it should set read_error.

http://tracker.ceph.com/issues/10018 Fixes: #10018

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit 9d84d2e8309d26e39ca849a75166d2d7f2dec9ea)

erasure-code: add corpus verification to make check

Signed-off-by: Loic Dachary <loic-201408@dachary.org>

erasure-code: workunit to check for encoding regression

Clone the archive of encoded objects and decode all archived objects, up
to and including the current ceph version.

http://tracker.ceph.com/issues/9420 Refs: #9420

Signed-off-by: Loic Dachary <loic-201408@dachary.org>

erasure-code: store and compare encoded contents

Introduce ceph_erasure_code_non_regression to check and compare how an
erasure code plugin encodes and decodes content with a given set of
parameters. For instance:

./ceph_erasure_code_non_regression \
      --plugin jerasure \
      --parameter technique=reed_sol_van \
      --parameter k=2 \
      --parameter m=2 \
      --stripe-width 3181 \
      --create \
      --check

Will create an encoded object (--create) and store it into a directory
along with the chunks, one chunk per file. The directory name is derived
from the parameters. The content of the object is a random pattern of 31
bytes repeated to fill the object size specified with --stripe-width.

The check function (--check) reads the object back from the file,
encodes it and compares the result with the content of the chunks read
from the files. It also attempts recover from one or two erasures.

Chunks encoded by a given version of Ceph are expected to be encoded
exactly in the same way by all Ceph versions going forward.

http://tracker.ceph.com/issues/9420 Refs: #9420

Signed-off-by: Loic Dachary <loic-201408@dachary.org>

Merge pull request #2961 from ceph/wip-10114-firefly

Add annotation to all assembly files to turn off stack-execute bit

Reviewed-by: Loic Dachary <ldachary@redhat.com>

Add annotation to all assembly files to turn off stack-execute bit

See discussion in http://tracker.ceph.com/issues/10114

Building with these changes allows output from readelf like this:

$ readelf -lW src/.libs/librados.so.2 | grep GNU_STACK
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000
0x000000 RW 0x8

(note the absence of 'X' in 'RW')

Fixes: #10114
Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 06a245a9845c0c126fb3106b41b2fd2bc4bc4df3)
(not-yet-present-in-firefly files in isa-l manually removed)

Merge pull request #2760 from ceph/wip-9835-firefly

osd: fix erasure hung op bug (9835)

Reviewed-by: Samuel Just <sjust@redhat.com>

osd: use OSDMap helper to tell if ops are misdirected

calc_pg_role doesn't actually take into account primary affinity.

Fixes: #9835
Signed-off-by: Samuel Just <sam.just@inktank.com>

osd: discard rank > 0 ops on erasure pools

Erasure pools do not support read from replica, so we should drop
any rank > 0 requests.

This fixes a bug where an erasure pool maps to [1,2,3], temporarily maps
to [-1,2,3], sends a request to osd.2, and then remaps back to [1,2,3].
Because the 0 shard never appears on osd.2, the request sits in the
waiting_for_pg map indefinitely and cases slow request warnings.
This problem does not come up on replicated pools because all instances of
the PG are created equal.

Fix by only considering role == 0 for erasure pools as a correct mapping.

Fixes: #9835
Signed-off-by: Sage Weil <sage@redhat.com>

osd/OSDMap: add osd_is_valid_op_target()

Helper to check whether an osd is a given op target for a pg. This
assumes that for EC we always send ops to the primary, while for
replicated we may target any replica.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 89c02637914ac7332e9dbdbfefc2049b2b6c127d)

qa: allow small allocation diffs for exported rbds

The local filesytem may behave slightly differently. This isn't
foolproof, but seems to be reliable enough on rhel7 rootfs, where
exact comparison was failing.

Fixes: #10002
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
(cherry picked from commit e94d3c11edb9c9cbcf108463fdff8404df79be33)

osd: fix map advance limit to handle map gaps

The recent change in cf25bdf6b0090379903981fe8cee5ea75efd7ba0 would stop
advancing after some number of epochs, but did not take into consideration
the possibilty that there are missing maps. In that case, it is impossible
to advance past the gap.

Fix this by increasing the max epoch as we go so that we can always get
beyond the gap.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 1e0a82fd55dede473c0af32924f4bcb5bb697a2b)

Merge pull request #2880 from ceph/wip-10025-firefly

#10025/firefly -- tools: fix MDS journal import

Reviewed-by: Greg Farnum <greg@inktank.com>

tools: fix MDS journal import

Previously it only worked on fresh filesystems which
hadn't been trimmed yet, and resulted in an invalid
trimmed_pos when expire_pos wasn't on an object
boundary.

Fixes: #10025
Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit fb29e71f9a97c12354045ad2e128156e503be696)

Merge remote-tracking branch 'origin/wip-sam-firefly-backports' into firefly

Merge pull request #2737 from ceph/wip-9629-firefly

osd: do not clone/preserve snapdir on cache-evict (firefly backport)

Reviewed-by: Samuel Just <sjust@redhat.com>

Merge pull request #2657 from ceph/wip-9053-9301-firefly

mon: backport two paxos fixes to firefly

Reviewed-by: Joao Luis <joao@redhat.com>

Merge pull request #2656 from ceph/wip-9502-firefly

mon: backport mon disk full check to firefly

Reviewed-by: Samuel Just <sjust@redhat.com>

Merge pull request #2764 from ceph/wip-9851

osd: bring FileJournal in sync with giant

Reviewed-by: Samuel Just <sjust@redhat.com>

Merge pull request #2776 from ceph/wip-9675.firefly

CrushWrapper: pick a ruleset same as rule_id

Reviewed-by: Samuel Just <sam.just@inktank.com>

ceph-disk: mount xfs with inode64 by default

We did this forever ago with mkcephfs, but ceph-disk didn't. Note that for
modern XFS this option is obsolete, but for older kernels it was not the
default.

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 11496399ef318498c11e551f139d96db52d3309c)