git.apps.os.sepia.ceph.com Git

ceph-disk: prepare --data-dir must not override files

ceph-disk does nothing when given a device that is already prepared. If
given a directory that already contains a successfully prepared OSD, it
will however override it.

Instead of overriding the files in the osd data directory, return
immediately if the magic file exists. Make it so the magic file is
created last to accurately reflect the success of the OSD preparation.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 7dfe550ce18623cde4ae43a2416e31ef81381ab9)

ceph-disk: zap needs at least one device

If given no argument, ceph-disk zap should display the usage instead of
silently doing nothing. Silence can be confused with "I zapped all the
disks".

http://tracker.ceph.com/issues/6981 fixes #6981

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 07888ef3fd4440332c8287d0faa9f23a32cf141c)

use the new get_command helper in check_call

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
(cherry picked from commit 897dfc113fe3b86f3dda53172933bfd4f8089869)

use the absolute path for executables if found

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
(cherry picked from commit a9334a1c8c6681305e76b361377864d0dd1e3d34)

remove trailing semicolon

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
(cherry picked from commit 43561f791607f5fd6f03d5421e1f30a29fb4194e)

replace sgdisk subprocess calls with a helper

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
(cherry picked from commit e19e38012bc4579054f63865e682c8c3a7829c7b)

Call --mbrtogpt on journal run of sgdisk should the drive require a GPT table.

Signed-off-by: Jonathan Davies <jonathan.davies@canonical.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 35011e0b01d65e4c001876882d597451f2028345)

ceph-disk: blacklist /dev/fd0

blkid -s TYPE /dev/fd0 has been verified to hang forever on a
H8DMR-82 supermicro motherboard running

3.8.0-33-generic #48~precise1-Ubuntu SMP Thu Oct 24 16:28:06 UTC 2013
x86_64

It is unlikely that ceph will ever be used on floppy disks, they
can be blacklisted.

http://tracker.ceph.com/issues/6827 fixes: #6827

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 65701978715237ff5a4c68573c0696fd9d438e4f)

Make fsid comparison case-insensitive

get_fsid and find_cluster_by_uuid are modified so ceph-disk activate and
ceph-disk activate-all will work if the fsid uses uppercase characters.

Signed-off-by: Harry Harrington <git-harry@live.co.uk>
(cherry picked from commit 22f8325dbfce7ef2e97bf015c0f8bba53e75dfe9)

Merge pull request #1210 from dachary/dumpling

common: admin socket fallback to json-pretty format (dumpling)

Reviewed-by: Sage Weil <sage@inktank.com>

common: admin socket fallback to json-pretty format

If the format argument to a command sent to the admin socket is not
among the supported formats ( json, json-pretty, xml, xml-pretty ) the
new_formatter function will return null and the AdminSocketHook::call
function must fall back to a sensible default.

The CephContextHook::call and HelpHook::call failed to do that and a
malformed format argument would cause the mon to crash. A check is added
to each of them and fallback to json-pretty if the format is not
recognized.

To further protect AdminSocketHook::call implementations from similar
problems the format argument is checked immediately after accepting the
command in AdminSocket::do_accept and replaced with json-pretty if it is
not known.

A test case is added for both CephContextHook::call and HelpHook::call
to demonstrate the problem exists and is fixed by the patch.

Three other instances of unsafe calls to new_formatter were found and
a fallback to json-pretty was added. All other calls have been audited
and appear to be safe.

http://tracker.ceph.com/issues/7378 fixes #7378

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 165e76d4d03ffcc490fd3c2ba60fb37372990d0a)

qa: add script for testing rados client timeout options

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 9e62beb80b6c92a97ec36c0db5ea39e417661b35)

rados: check return values for commands that can now fail

A few places were not checking the return values of commands, since
they could not fail before timeouts were added.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 79c1874346ff55e2dc74ef860db16ce70242fd00)

librados: check and return on error so timeouts work

Some functions could not previously return errors, but they had an
int return value, which can now receive ETIMEDOUT.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 8e9459e897b1bc2f97d52ee07701fd22069efcf3)

msg/Pipe: add option to restrict delay injection to specific msg type

This makes it possible to test timeouts reliably by delaying certain
messages effectively forever, but still being able to e.g. connect and
authenticate to the monitors.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit d389e617c1019e44848330bf9570138ac7b0e5d4)

MonClient: add a timeout on commands for librados

Just use the conf option directly, since librados is the only caller.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 671a76d64bc50e4f15f4c2804d99887e22dcdb69)

Objecter: implement mon and osd operation timeouts

This captures almost all operations from librados other than mon_commands().

Get the values for the timeouts from the Objecter constructor, so only
librados uses them.

Add C_Cancel_*_Op, finish_*_op(), and *_op_cancel() for each type of
operation, to mirror those for Op. Create a callback and schedule it
in the existing timer thread if the timeouts are specified.

Fixes: #6507
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 3e1f7bbb4217d322f4e0ece16e676cd30ee42a20)

Conflicts:
src/osd/OSD.cc
src/osd/ReplicatedPG.cc
src/osdc/Objecter.cc
src/osdc/Objecter.h

librados: add timeout to wait_for_osdmap()

This is used by several pool operations independent of the objecter,
including rados_ioctx_create() to look up the pool id in the first
osdmap.

Unfortunately we can't just rely on WaitInterval returning ETIMEDOUT,
since it may also get interrupted by a signal, so we can't avoid
keeping track of time explicitly here.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 1829d2c9fd13f2cbae4e192c9feb553047dad42c)

Conflicts:
src/librados/RadosClient.cc

conf: add options for librados timeouts

These will be implemented in subsequent patches.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 0dcceff1378d85ca6d81d102d201890b8a71af6b)

test_striper: fix warning

Signed-off-by: Sage Weil <sage@inktank.com>

crushtool: add cli test for off-by-one tries vs retries bug

See bug #7370. This passes on dumpling and breaks prior to the #7370 fix.

Backport: emperor, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ed32c4002fb5cb1dd546331651eaf7de1a017471)

client: use 64-bit value in sync read eof logic

The file size can jump to a value that is very much larger than our current
position (for example, it could be a disk image file that gets a sparse
write at a large offset). Use a 64-bit value so that 'some' doesn't
overflow.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: John Spray <john.spray@inktank.com>
(cherry picked from commit 7ff2b541c24d1c81c3bcfbcb347694c2097993d7)

osd: do not send peering messages during init

Do not send any peering messages while we are still working our way
through init().

Fixes: #7093
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 35da8f9d80e0c6c33fb6c6e00f0bf38f1eb87d0e)
Signed-off-by: Greg Farnum <greg@inktank.com>

OSDMap: fix deepish_copy_from

Start with a shallow copy!

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit d0f13f54146694a197535795da15b8832ef4b56f)

Conflicts:

src/osd/OSDMap.h

rgw: fix listing of multipart upload parts

Fixes: #7169
There are two issues here. One is that we may return more entries than
we should (as specified by max_parts). Second issue is that the
NextPartNumberMarker is set incorrectly. Both of these issues mainly
affect uploads with > 1000 parts, although can be triggered with less
than that.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>

rgw: initialize RGWUserAdminOpState::system_specified

Fixes: #6829
Backport: dumpling, emperor
We didn't init this member variable, which might cause that when
modifying user info that has this flag set the 'system' flag might
inadvertently reset.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 561e7b0b287e65e90b80699e45a52ae44e94684f)

rgw: Fix CORS allow-headers validation

This fix is needed because Ceph presently validates CORS headers in a
case-sensitive manner. Keeps a local cache of lowercased allowed headers
to avoid converting the allowed headers to lowercase each time.

CORS 6.2.6: If any of the header field-names is not a ASCII
case-insensitive match for any of the values in list of headers do not
set any additional headers and terminate this set of steps.

Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 31b60bfd9347a386ff12b4e4f1812d664bcfff01)

rgw: Clarify naming of case-change functions

It is not clear that the lowercase_http_attr & uppercase_http_attr
functions replace dashes with underscores. Rename them to match the
pattern established by the camelcase_dash_http_attr function in
preperation for more case-change functions as needed by later fixes.

Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 6a7edab2673423c53c6a422a10cb65fe07f9b235)

rgw: Look at correct header about headers for CORS

The CORS standard dictates that preflight requests are made with the
Access-Control-Request-Headers header containing the headers of the
author request. The Access-Control-Allow-Headers header is sent in the
response.

The present code looks for Access-Control-Allow-Headers in request, so
fix it to look at Access-Control-Request-Headers instead.

Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 2abacd9678ae04cefac457882ba718a454948915)

rgw: fix reading bucket policy in RGWBucket::get_policy()

Fixes: 6940
Backport: dumpling, emperor

We changed the way we keep the bucket policy, and we shouldn't try to
access the bucket object directly. This had changed when we added the
bucket instance object around dumpling.

Reported-by: Gao, Wei M <wei.m.gao@intel.com>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 7a9a088d82d04f6105d72f6347673724ac16c9f8)

rgw: handle racing object puts when object doesn't exist

If the object didn't exist before and now we have multiple puts coming
in concurrently, we need to make sure that we behave correctly. Only one
needs to win, the other one can fail silently. We do that by setting
exclusive flag on the object creation and handling the error correctly.
Note that we still want to return -EEXIST in some cases (when the
exclusive flag is passed to put_obj_meta(), e.g., on bucket creation).

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 86c15480fc3e33b9a3b84d0af68d8398fc732bae)

rgw: don't return -ENOENT in put_obj_meta()

Fixes: #7168
An object put may race with the same object's delete. In this case just
ignore the error, same behavior as if object was created and then
removed.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 5c24a7ea905587fd4077e3b0cfc0f5ad2b178c29)

rgw: Use correct secret key for POST authn

The POST authentication by signature validation looked up a user based
on the access key, then used the first secret key for the user. If the
access key used was not the first access key, then the expected
signature would be wrong, and the POST would be rejected.

Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
(cherry picked from commit 586ad1f69e6260ef298805647911b3b49f8be7cf)

osd: OSDMonitor: ignore pgtemps from removed pool

There's a window in-between receiving an MOSDPGTemp message from an OSD
and actually handling it that may lead to the pool the pg temps refer to
no longer existing. This may happen if the MOSDPGTemp message is queued
pending dispatching due to an on-going proposal (maybe even the pool
removal).

This patch fixes such behavior in two steps:

1. Check if the pool exists in the osdmap upon preprocessing
- if pool does not exist in the osdmap, then the pool must have been
   removed prior to handling the message, but after the osd sent it.
- safe to ignore the pg update
2. If all pg updates in the message have been ignored, ignore the whole
   message.  Otherwise, let prepare handle the rest.

3. Recheck if pool exists in the osdmap upon prepare
- We may have ignored this pg back in preprocess, but other pgs in the
   message may have led the message to be passed on to prepare; ignore
   pg update once more.
4. Check if pool is pending removal and ignore pg update if so.

We delegate checking the pending value to prepare_pgtemp() because in this
case we should only ignore the update IFF the pending value is in fact
committed.  Otherwise we should retry the message.  prepare_pgtemp() is
the appropriate place to do so.

Fixes: 7116
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit f513f66f48383a07c70ca18a4dba6c2449ea9860)

OSDMonitor: use deepish_copy_from for remove_down_pg_temp

This is a backport of 368852f6c0a884b8fdc80a5cd6f9ab72e814d412.

Make a deep copy of the OSDMap to avoid clobbering the in-memory copy with
the call to apply_incremental.

Fixes: #7060
Signed-off-by: Sage Weil <sage@inktank.com>

OSDMap: deepish_copy_from()

Make a deep(ish) copy of another OSDMap. Unfortunatley we can't make the
compiler-generated copy operator/constructors private until c++11. :(

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit bd54b9841b9255406e56cdc7269bddb419453304)

buffer: make 0-length splice() a no-op

This was causing a problem in the Striper, but fixing it here will avoid
corner cases all over the tree. Note that we have to bail out before
the end-of-buffer check to avoid hitting that check when the bufferlist is
also empty.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit ff5abfbdae07ae8a56fa83ebaa92000896f793c2)

osdc/Striper: test zero-length add_partial_result

If we add a partial result that is 0-length, we used to hit an assert in
buffer::list::splice(). Add a unit test to verify the fix.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 28c7388d320a47657c2e12c46907f1bf40672b08)

packaging: apply udev hack rule to RHEL

In the RPM spec file there is a test to deploy the uuid hack udev rules
for older udev operating systems. This includes CentOS and RHEL, but the
check currently only is for CentOS, causing RHEL clients to get a bogus
osd rules file.

Adjust the conditional to apply to RHEL as well as CentOS. (The %{rhel}
macro is defined in both platforms' redhat-rpm-config package.)

Fixes http://tracker.ceph.com/issues/7245

Signed-off-by: Ken Dreyer <ken.dreyer@inktank.com>
(cherry picked from commit 64a0b4fa563795bc22753940aa3a4a2946113109)

Merge pull request #1129 from ceph/wip-dumpling-backport-6620

mds: MDSMap: adjust buffer size for uint64 values with more than 5 chars

mds: MDSMap: adjust buffer size for uint64 values with more than 5 chars

Fixes: #6620
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 0e8182edd850f061421777988974efbaa3575b9f)

mon/MDSMonitor: do not generate mdsmaps from already-laggy mds

There is one path where a mds that is not sending its beacon (e.g.,
because it is not running at all) will lead to proposal of new mdsmaps.
Fix it.

Backport: emperor, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 584c2dd6bea3fe1a3c7f306874c054ce0cf0d2b5)

Fix #7187: Include all summary items in JSON health output

Signed-off-by: John Spray <john.spray@inktank.com>
(cherry picked from commit fdf3b5520d150f14d90bdfc569b70c07b0579b38)

rgw: convert bucket info if needed

Fixes: #7110
In dumpling, the bucket info was separated into bucket entry point and
bucket instance objects. When setting bucket attrs we only ended up
updating the bucket instance object. However, pre-dumpling buckets still
keep everything at the entry-point object, so acl changes didn't affect
anything (because we never updated the entry point). This change just
converts the bucket info into the new format.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit a5f8cc7ec9ec8bef4fbc656066b4d3a08e5b215b)

osd: ignore OSDMap messages while we are initializing

The mon may occasionally send OSDMap messages to random OSDs, but is not
very descriminating in that we may not have authenticated yet. Ignore any
messages if that is the case; we will reqeust whatever we need during the
BOOTING state.

Fixes: #7093
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit f68de9f352d53e431b1108774e4a23adb003fe3f)

mon: only send messages to current OSDs

When choosing a random OSD to send a message to, verify not only that
the OSD id is up but that the session is for the same instance of that OSD
by checking that the address matches.

Fixes: #7093
Backport: emperor, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 98ed9ac5fed6eddf68f163086df72faabd9edcde)

osd: discriminate based on connection messenger, not peer type

Replace ->get_source().is_osd() checks and instead see if it is the
cluster_messenger so that we do not confuse ourselves when we get
legit requests from other OSDs on our public interface.

NOTE: backporting this because a mixed cluster may send OSD requests
via the client interface, even though dumpling doesn't do this.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit a6b04c5d8ba043727a2e39a62e9d4126485bcfeb)

Conflicts:

src/osd/OSD.cc

qa: test for error when ceph osd rm is EBUSY

http://tracker.ceph.com/issues/6824 fixes #6824

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 31507c90f0161c4569a2cc634c0b5f671179440a)

mon: set ceph osd (down|out|in|rm) error code on failure

Instead of always returning true, the error code is set if at least one
operation fails.

EINVAL if the OSD id is invalid (osd.foobar for instance).
EBUSY if trying to remove and OSD that is up.

When used with the ceph command line, it looks like this:

    ceph -c ceph.conf osd rm osd.0
    Error EBUSY: osd.0 is still up; must be down before removal.
    kill PID_OF_osd.0
    ceph -c ceph.conf osd down osd.0
    marked down osd.0.
    ceph -c ceph.conf osd rm osd.0 osd.1
    Error EBUSY: removed osd.0, osd.1 is still up; must be down before removal.

http://tracker.ceph.com/issues/6824 fixes #6824

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 15b8616b13a327701c5d48c6cb7aeab8fcc4cafc)

mon: OSDMonitor: fix some annoying whitespace

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 42c4137cbfacad5654f02c6608cc0e81b45c06be)

librbd: call user completion after incrementing perfcounters

The perfcounters (and the ictx) are only valid while the image is
still open. If the librbd user gets the callback for its last I/O,
then closes the image, the ictx and its perfcounters will be
invalid. If the AioCompletion object is has not run the rest of its
complete() method yet, it will access these now-invalid addresses,
possibly leading to a crash.

The AioCompletion object is independent of the ictx and does not
access it again after incrementing perfcounters, so avoid this race by
calling the user's callback after this step. The AioCompletion object
will be cleaned up by the rest of complete_request(), independent of
the ImageCtx.

Fixes: #5426
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 4cea7895da7331b84d8c6079851fdc0ff2f4afb1)

objecter: don't take extra throttle budget for resent ops

These ops have already taken their budget in the original op_submit().
It will be returned via put_op_budget() when they complete.
If there were many localized reads of missing objects from replicas,
or cache pool redirects, this would cause the objecter to use up all
of its op throttle budget and hang.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 8d0180b1b7b48662daef199931efc7f2a6a1c431)

Conflicts:

src/osdc/Objecter.cc

rbd: check write return code during bench-write

This is allows rbd-bench to detect http://tracker.ceph.com/issues/6938
when combined with rapidly changing the mon osd full ratio.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 3caf3effcb113f843b54e06099099909eb335453)

objecter: resend all writes after osdmap loses the full flag

Now that the osd does not respond if it gets a map with the full flag
set first, clients need to resend all writes.

Clients talking to old osds are still subject to the race condition,
so both sides must be upgraded to avoid it.

Refs: #6938
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit e32874fc5aa6f59494766b7bbeb2b6ec3d8f190e)

Conflicts:

src/osdc/Objecter.h

osd: drop writes when full instead of returning an error

There's a race between the client and osd with a newly marked full
osdmap.  If the client gets the new map first, it blocks writes and
everything works as expected, with no errors from the osd.

If the osd gets the map first, however, it will respond to any writes
with -ENOSPC. Clients will pass this up the stack, and not retry these
writes later.  -ENOSPC isn't handled well by all clients. RBD, for
example, may pass it on to qemu or kernel rbd which will both
interpret it as EIO.  Filesystems on top of rbd will not behave well
when they receive EIOs like this, especially if the cluster oscillates
between full and not full, so some writes succeed.

To fix this, never return ENOSPC from the osd because of a map marked
full, and rely on the client to retry all writes when the map is no
longer marked full.

Old clients talking to osds with this fix will hang instead of
propagating an error, but only if they run into this race
condition. ceph-fuse and rbd with caching enabled are not affected,
since the ObjectCacher will retry writes that return errors.

Refs: #6938
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 4111729dda7437c23f59e7100b3c4a9ec4101dd0)

objecter: clean pause / unpause logic

op->paused holds now whether operation should be paused or not, and it's
being updated when scanning requests. No need to do a second scan.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 5fe3dc647bf936df8e1eb2892b53f44f68f19821)

objecter: set op->paused in recalc_op_target(), resend in not paused

When going through scan_requests() in handle_osd_map() we need to make
sure that if an op should not be paused anymore then set it on the op
itself, and return NEED_RESEND. Otherwise we're going to miss reset of
the full flag.
Also in handle_osd_map(), make sure that op shouldn't be paused before
sending it. There's a lot of cleanup around that area that we should
probably be doing, make the code much more tight.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 98ab7d64a191371fa39d840c5f8e91cbaaa1d7b7)

objecter: don't resend paused ops

Paused ops are meant to block on the client side until a new map that
unpauses them is recieved. If we send paused writes when the FULL flag
is set, we'll get -ENOSPC from the osds, which is not what Objecter
users expect. This may cause rbd without caching to produce an I/O
error instead of waiting for the cluster to have capacity.

Fixes: #6725
Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit c5c399d327cfc0d232d9ec7d49ababa914d0b21a)

Merge pull request #808 from ceph/wip-6152-dumpling

rgw: Fix S3 auth when using response-* query string params

v0.67.5

rgw: fix use-after-free when releasing completion handle

Backport: emperor, dumpling
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c8890ab2d46fe8e12200a0d2f9eab31c461fb871)

rgw: don't return data within the librados cb

Fixes: #7030
The callback is running within a single Finisher thread, thus we
shouldn't block there. Append read data to a list and flush it within
the iterate context.

Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit d6a4f6adfaa75c3140d07d6df7be03586cc16183)

rgw: Allow wildcard in supported keystone roles.

http://tracker.ceph.com/issues/4365 fixes #4365

Signed-off-by: Christophe Courtaut <christophe.courtaut@gmail.com>
(cherry picked from commit 60195581273aee259e8c83d0b471af859d928342)

Validate S3 tokens against Keystone

- Added config option to allow S3 to use Keystone auth
- Implemented JSONDecoder for KeystoneToken
- RGW_Auth_S3::authorize now uses rgw_store_user_info on keystone auth
- Minor fix in get_canon_resource; dout is now after the assignment

Reviewed-by: Yehuda Sadeh<yehuda@inktank.com>
Signed-off-by: Roald J. van Loon <roaldvanloon@gmail.com>
(cherry picked from commit a200e184b15a03a4ca382e94caf01efb41cb9db7)

Conflicts:
src/rgw/rgw_swift.h

rgw: init src_bucket_name, src_object in the S3 handler

Be consistent and initialize these fields also in the S3 case.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit ccaab2abf9e49ce3e50c15df793a3a3ca6b29bb8)

rgw: get rid of req_state.bucket_name

No need for this field, as we already have req_state.bucket_name_str.
This saves us some memory allocation / freeing through every request
processing.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit db7eb771226beaa2372ef5860ce742cb3457cd89)

Conflicts:
src/rgw/rgw_rest_s3.cc

rgw: turn swift COPY into PUT

Fixes: #6606
The swift COPY operation is unique in a sense that it's a write
operation that has its destination not set by the URI target, but by a
different HTTP header. This is problematic as there are some hidden
assumptions in the code that the specified bucket/object in the URI is
the operation target. E.g., certain initialization functions, quota,
etc. Instead of creating a specialized code everywhere for this case
just turn it into a regular copy operation, that is, a PUT with
a specified copy source.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit cbf8f9a51737e7d89fb0ec0832b50fd47c35b08d)

OSDMonitor: add 'osd perf' command to dump recent osd perf information

Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 94c3f29a32cbf169d896015da6765febd3c724e0)

ObjectStore: add ObjectStore::get_cur_stats and impl in FileStore

Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit ebde89d5602536b4bc651737e4280fdfb6634c32)

osd_types: add osd_perf_stat_t type

Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 95f3353a6ab3a4dd2bf28eaca7122487942e913e)

rgw: do not use UNPRIVILEGED_DAEMON option

This is sort of a backport for 4f403c26dc0048ad63e20d20369fa86bfb31c50e
in that we get the same behavior change, but only for rgw.

Signed-off-by: Sage Weil <sage@inktank.com>

osd/OSDMonitor: accept 'osd pool set ...' value as string

Newer monitors take this as a CephString. Accept that so that if we are
mid-upgrade and get a forwarded message using the alternate schema from
a future mon we will handle it properly.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>

init, upstart: prevent daemons being started by both

There can be only one init system starting a daemon. If there is a
host entry in ceph.conf for a daemon, sysvinit would try to start it
even if the daemon's directory did not include a sysvinit file. This
preserves backwards compatibility with older installs using sysvinit,
but if an upstart file is present in the daemon's directory, upstart
will try to start them, regardless of host entries in ceph.conf.

If there's an upstart file in a daemon's directory and a host entry
for that daemon in ceph.conf, both sysvinit and upstart would attempt
to manage it.

Fix this by only starting daemons if the marker file for the other
init system is not present. This maintains backwards compatibility
with older installs using neither sysvinit or upstart marker files,
and does not break any valid configurations. The only configuration
that would break is one with both sysvinit and upstart files present
for the same daemon.

Backport: emperor, dumpling
Reported-by: Tim Spriggs <tims@uahirise.org>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 5e34beb61b3f5a1ed4afd8ee2fe976de40f95ace)

OSD: allow project_pg_history to handle a missing map

If we get a peering message for an old map we don't have, we
can throwit out: the sending OSD will learn about the newer
maps and update itself accordingly, and we don't have the
information to know if the message is valid. This situation
can only happen if the sender was down for a long enough time
to create a map gap and its PGs have not yet advanced from
their boot-up maps to the current ones, so we can rely on it

Fixes: #6712
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit cd0d612e1abdf5c87082eeeccd4ca09dd14fd737)

rgw: bucket meta remove don't overwrite entry point first

Fixes: #6056
When removing a bucket metadata entry we first unlink the bucket
and then we remove the bucket entrypoint object. Originally
when unlinking the bucket we first overwrote the bucket entrypoint
entry marking it as 'unlinked'. However, this is not really needed
as we're just about to remove it. The original version triggered
a bug, as we needed to propagate the new header version first (which
we didn't do, so the subsequent bucket removal failed).

Reviewed-by: Greg Farnum <greg@inktank.com>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 0373d749cea7d9b532069ba8ebca2f005b2c9f59)

rgw: lower some debug message

Fixes: #6084
Backport: dumpling, emperor

Reported-by: Ron Allred <rallred@itrefined.com>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit b35fc1bc2ec8c8376ec173eb1c3e538e02c1694e)

osd: fix bench block size

The command was declared to take 'size' in dumpling, but was trying to
read 'bsize' instead, so it always used the default of 4MiB. Change
the bench command to read 'size', so it matches what existing clients
are sending.

Fixes: #6795
Backport: emperor, dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 40a76ef0d09f8ecbea13712410d9d34f25b91935)

Conflicts:
src/osd/OSD.cc

os, osd, tools: Add backportable compatibility checking for sharded objects

OSD
  New CEPH_OSD_FEATURE_INCOMPAT_SHARDS
FileStore
  NEW CEPH_FS_FEATURE_INCOMPAT_SHARDS
  Add FSSuperblock with feature CompatSet in it
  Store sharded_objects state using CompatSet
  Add set_allow_sharded_objects() and get_allow_sharded_objects() to FileStore/ObjectStore
  Add read_superblock()/write_superblock() internal filestore functions
ceph_filestore_dump
  Add OSDsuperblock to export format
  Use CompatSet from OSD code itself in filestore-dump tool
  Always check compatibility of OSD features with on-disk features
  On import verify compatibility of on-disk features with export data
  Bump super_ver due to export format change

Backport: dumpling, cuttlefish

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit c6b83180f9f769de27ca7890f5f8ec507ee743ca)

Conflicts:

src/os/FileStore.cc
src/os/FileStore.h
src/osd/OSD.cc

Excluded from cherry-pick:
  Didn't add set_allow_sharded_objects() and get_allow_sharded_objects() to FileStore/ObjectStore
  Didn't add code to check for incomplete transition to sharded objects in ceph-filestore-dump

rgw: Wrap hex_to_num table into class HexTable

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Ray Lv <xiangyulv@gmail.com>
(cherry picked from commit 588ed60a8ec1d8db5e096fd8f7b762f2afcf7dd3)

Merge pull request #805 from ceph/wip-rgw-replica-log-dumpling

backport a bunch of rgw stuff to dumpling

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>

rgw: don't turn 404 into 400 for the replicalog api

404 is not actually a problem to clients like radosgw-agent, but 400
implies something about the request was incorrect.

Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 306ec7142d699c26ce874b11dc02ccdb3cf296c7)

rgw: when failing read from client, return correct error

Fixes: #6214
When getting a failed read from client when putting an object
we returned the wrong value (always 0), which in the chunked-
upload case ended up in assuming that the write was done
successfully.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 0499948aad64c6c66668b43963403fb0cb1a2737)

rgw: Fix S3 auth when using response-* query string params

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Sylvain Munaut <s.munaut@whatever-company.com>
(cherry picked from commit 7a7361d7e798d94796d4985e2e4b35af22463ae2)

rgw: fix leak in RGWMetadataManager::remove()

Backport: dumpling
Fixes: #6445
handler->get() returns a newly allocated obj, need to delete it when
done.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 8dd7ea3fadc06b5ebb330af41d686d3da155fb5d)

rgw: quiet down warning message

Fixes: #6123
We don't want to know about failing to read region map info
if it's not found, only if failed on some other error. In
any case it's just a warning.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 055e31359ac977943f014270cc56a1b7786abee3)

rgw: try to create log pool if doesn't exist

When using replica log, if the log pool doesn't exist all operations are
going to fail. Try to create it if doesn't exist.

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 4216eac0f59af60f60d4ce909b9ace87a7b64ccc)

formatter: dump_bool dumps unquoted strings

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit ad409f8a6d230e9b1199226a333bb54159c2c910)

Formatter: add dump_bool()

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 28949d5d43beba7cef37cb2f83e3399d978061a6)

rgw: escape bucket and object names in StreamReadRequests

This fixes copy operations for objects that contain unsafe characters,
like a newline, which would return a 403 otherwise, since the GET to
the source rgw would be unable to verify the signature on a partially
valid bucket name.

Fixes: #6604
Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit ec45b3b88c485140781b23d2c4f582f2cc26ea43)

rgw: move url escaping to a common place

This is useful outside of the s3 interface. Rename url_escape()
url_encode() for consistency with the exsting common url_decode()
function. This is in preparation for the next commit, which needs
to escape url-unsafe characters in another place.

Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit dd308cd481b368f90a64220847b91fc233d92a59)

rgw: update metadata log list to match data log list

Send the last marker whether the log is truncated in the same format
as data log list, so clients don't have more needless complexity
handling the difference. Keep bucket index logs the same, since they
contain the marker already, and are not used in exactly the same way
metadata and data logs are.

Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit e0e8fb1b2b4a308b2a9317e10c6fd53ad48dbfaf)

rgw: include marker and truncated flag in data log list api

Consumers of this api need to know their position in the log. It's
readily available when fetching the log, so return it. Without the
marker in this call, a client could not easily or efficiently figure
out its position in the log, since it would require getting the global
last marker in the log, and then reading all the log entries.

This would be slow for large logs, and would be subject to races that
would cause potentially very expensive duplicate work.

Returning this atomically while fetching the log entries simplifies
all of this.

Fixes: #6615
Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit c275912509255f8bb4c854e181318b45ab0f8564)

cls_log: always return final marker from log_list

There's no reason to restrict returning the marker to the case where
less than the whole log is returned, since there's already a truncated
flag to tell the client what happened.

Giving the client the last marker makes it easy to consume when the
log entries do not contain their own marker. If the last marker is not
returned, the client cannot get the last marker without racing with
updates to the log.

Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit e74776f4176470122485a79a4c07e9c12c9fc036)

rgw: skip read_policy checks for system_users

A system user should still be able to examine suspended buckets, and
get -ENOENT instead of -EACCESS for a deleted object.

Fixes: #6616
Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit ea816c1c2fd47eab647d6fab96c9ca4bfeecd5bb)

common/crc32c: fix #ifdef to be x86_64 specific

Signed-off-by: Sage Weil <sage@inktank.com>

rgw: fix rgw test to reflect usage change

otherwise src/test/cli/radosgw-admin/help.t fails when running make
check when run after a configure --with-radosgw

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit e50343e4423e20130035c860ba47a0edea876f7c)

rbd.py: increase parent name size limit

64 characters isn't all that long. 4096 ought to be enough for anyone.

Fixes: #6072
Backport: dumpling, cuttlefish
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 3c0042cde5a12de0f554a16b227ab437c6254ddd)

common/config: include --cluster in default usage message

Clean it up a bit too.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 157754b3a0191c5ff534a84adbeed88025615898)

mds: fix infinite loop of MDCache::populate_mydir().

make MDCache::populate_mydir() only fetch bare-bone stray dirs.
After all stray dirs are populated, call MDCache::scan_stray_dir(),
it fetches incomplete stray dirs.

Fixes: #4405
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 007f06ec174d4ee5cfb578c8b3f1c96b2bb0c238)

Conflicts:

src/mds/MDCache.h

Reviewed-by: Greg Farnum <greg@inktank.com>

rgw: fix authenticated users acl group check

Fixes: #6553
Backport: bobtail, cuttlefish, dumpling
Authenticated users group acl bit was not working correctly. Check to
test whether user is anonymous was wrong.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit bebbd6cb7b71697b34b8f27652cabdc40c97a33b)