git-server-git.apps.pok.os.sepia.ceph.com Git

PG: don't query unfound on empty pgs

When the replica responds, it responds with a notify
rather than a log, which the primary then ignores since
it is already in the peer_info map. Rather than fix that
we'll simply not send queries to peers we already know to
have no unfound objects.

Fixes: #6910
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
(cherry picked from commit 838b6c8387087543ce50837277f7f6b52ae87d00)

Merge pull request #1313 from ceph/dumpling-osd-subscribe

Dumpling backport: clean up osd subscriptions

Merge pull request #1485 from ceph/wip-7212.dumpling

backport 7212 fixes to dumpling

ReplicatedPG: don't skip missing if sentries is empty on pgls

Formerly, if sentries is empty, we skip missing. In general,
we need to continue adding items from missing until we get
to next (returned from collection_list_partial) to avoid
missing any objects.

Fixes: #6633
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
(cherry picked from commit c7a30b881151e08b37339bb025789921e7115288)

mon/Elector: bootstrap on timeout

Currently if an election times out we call a new
election.  If we have never joined a quorum, bootstrap
instead. This is heavier weight, but captures the case
where, during bootstrap:

- a and b have learned each others' addresses
- everybody calls an election
- a and b form a quorum
- c loops trying to call an election, but is ignored
   because a and b don't see its address in the monmap

See logs:
  ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-02-14_13:50:04-ceph-deploy-wip-7212-sage-b-testing-basic-plana/83194

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit a4bcb1f8129a4ece97bd3419abf1ff45d260ad8e)
(cherry picked from commit 143ec0281aa8b640617a3fe19a430248ce3b514c)

mon: tell MonmapMonitor first about winning an election

It is important in the bootstrap case that the very first paxos round
also codify the contents of the monmap itself in order to avoid any manner
of confusing scenarios where subsequent elections are called and people
try to recover and modify paxos without agreeing on who the quorum
participants are.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ad7f5dd481a7f45dfe6b50d27ad45abc40950510)
(cherry picked from commit e073a062d56099b5fb4311be2a418f7570e1ffd9)

mon: only learn peer addresses when monmap == 0

It is only safe to dynamically update the address for a peer mon in our
monmap if we are in the midst of the initial quorum formation (i.e.,
monmap.epoch == 0). If it is a later epoch, we have formed our initial
quorum and any and all monmap changes need to be agreed upon by the quorum
and committed via paxos.

Fixes: #7212
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 7bd2104acfeff0c9aa5e648d82ed372f901f767f)
(cherry picked from commit 1996fd89fb3165a63449b135e05841579695aabd)

ceph.in: do not allow using 'tell' with interactive mode

This avoids a lot of hassle when dealing with to whom tell each command
on interactive mode, and even more so if multiple targets are specified.

As so, 'tell' commands should be used while on interactive mode instead.

Backport: dumpling,emperor

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit e39c213c1d230271d23b74086664c2082caecdb9)

RGWListBucketMultiparts: init max_uploads/default_max with 0

CID 717377 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member "max_uploads" is not initialized
in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member "default_max" is not initialized
in this constructor nor in any functions that it calls.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit b23a141d54ffb39958aba9da7f87544674fa0e50)

ceph_test_rados: wait for commit, not ack

First, this is what we wanted in the first place

Second, if we wait for ACK, we may look at a user_version value that is
not stable.

Fixes: #7705
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit f2124c5846f1e9cb44e66eb2e957b8c7df3e19f4)

Conflicts:

src/test/osd/RadosModel.h

test-upgrade-firefly: skip watch-notify system test

This also fails on mixed version clusters due to watch on a
non-existent object returning ENOENT in firefly and 0 in dumpling.

Reviewed-by: Sage Weil <sage.weil@inktank.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>

qa/workunit/rados/test-upgrade-firefly: skip watch-notify test

A watch on a non-existent object now returns ENOENT in firefly; skip this
test as it will fail on a hybrid or upgraded cluster.

Signed-off-by: Sage Weil <sage@inktank.com>

Merge pull request #1411 from ceph/wip-7076-dumpling

dumpling backport of watchers check for rbd_remove()

rgw: off-by-one in rgw_trim_whitespace()

Fixes: #7543
Backport: dumpling

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Ray Lv <raylv@yahoo-inc.com>
(cherry picked from commit 195d53a7fc695ed954c85022fef6d2a18f68fe20)

rbd: check for watchers before trimming an image on 'rbd rm'

Check for watchers before trimming image data to try to avoid getting
into the following situation:

  - user does 'rbd rm' on a mapped image with an fs mounted from it
  - 'rbd rm' trims (removes) all image data, only header is left
  - 'rbd rm' tries to remove a header and fails because krbd has a
    watcher registered on the header
  - at this point image cannot be unmapped because of the mounted fs
  - fs cannot be unmounted because all its data and metadata is gone

Unfortunately, this fix doesn't make it impossible to happen (the
required atomicity isn't there), but it's a big improvement over the
status quo.

Fixes: http://tracker.ceph.com/issues/7076
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
(cherry picked from commit 0a553cfa81b06e75585ab3c39927e307ec0f4cb6)

Merge pull request #1407 from dachary/wip-7188-dumpling

common: ping existing admin socket before unlink (dumpling)

Reviewed-by: Sage Weil <sage@inktank.com>

common: ping existing admin socket before unlink

When a daemon initializes it tries to create an admin socket and unlinks
any pre-existing file, regardless. If such a file is in use, it causes
the existing daemon to loose its admin socket.

The AdminSocketClient::ping is implemented to probe an existing socket,
using the "0" message. The AdminSocket::bind_and_listen function is
modified to call ping() on when it finds existing file. It unlinks the
file only if the ping fails.

http://tracker.ceph.com/issues/7188 fixes: #7188

Backport: emperor, dumpling
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 45600789f1ca399dddc5870254e5db883fb29b38)

Merge pull request #1366 from ceph/wip-6820.dumpling

mon: OSDMonitor: don't crash if formatter is invalid during osd crush dump

Merge pull request #1377 from ceph/wip-7584

qa/workunit/rados/test-upgrade-firely.sh

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

qa/workunit/rados/test-upgrade-firely.sh

Skip the tests that don't pass when run against firefly OSDs.

Fixes: #7584
Signed-off-by: Sage Weil <sage@inktank.com>

Merge pull request #1357 from ceph/wip-dumpling-removewq

OSD: ping tphandle during pg removal

Reviewed-by: Greg Farnum <greg@inktank.com>

mon: OSDMonitor: don't crash if formatter is invalid during osd crush dump

Code would assume a formatter would always be defined. If a 'plain'
formatter or even an invalid formatter were to be supplied, the monitor
would crash and burn in poor style.

Fixes: 6820
Backport: emperor

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 49d2fb71422fe4edfe5795c001104fb5bc8c98c3)

OSD: ping tphandle during pg removal

Fixes: #6528
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c658258d9e2f590054a30c0dee14a579a51bda8c)

Conflicts:
src/osd/OSD.cc

Merge pull request #1316 from ceph/dumpling-6922

Dumpling: Prevent extreme PG split multipliers

Reviewed-by: Samuel Just <sam.just@inktank.com>

Merge pull request #1315 from ceph/dumpling-hashpspool

mon: OSDMonitor: allow (un)setting 'hashpspool' flag via 'osd pool set'

Reviewed-by: Samuel Just <sam.just@inktank.com>

Merge pull request #1314 from ceph/dumpling-osd-pgstatsack

Dumpling osd pgstatsack

Reviewed-by: Samuel Just <sam.just@inktank.com>

mon: OSDMonitor: allow (un)setting 'hashpspool' flag via 'osd pool set'

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 1c2886964a0c005545abab0cf8feae7e06ac02a8)

Conflicts:

src/mon/MonCommands.h
src/mon/OSDMonitor.cc

mon: ceph hashpspool false clears the flag
instead of toggling it.
Signed-off-by: Loic Dachary <loic@dachary.org>
Reviewed-by: Christophe Courtaut <christophe.courtaut@gmail.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 589e2fa485b94244c79079f249428d4d545fca18

Replace some of the infrastructure required by this command that
was not present in Dumpling with single-use code.
Signed-off-by: Greg Farnum <greg@inktank.com>

OSDMonitor: use a different approach to prevent extreme multipliers on PG splits

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit d8ccd73968fbd0753ca08916ebf1062cdb4d5ac1)

Conflicts:

src/mon/OSDMonitor.cc

OSDMonitor: prevent extreme multipliers on PG splits

Fixes: #6922
Backport: emperor

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit f57dad6461171c903e8b5255eaed300374b00e74)

Conflicts:

src/mon/OSDMonitor.cc

osd: fix off-by-one is boot subscription

If we have osdmap N, we want to onetime subscribe
starting at N+1. Among other things, it means we
hear when the NOUP flag is cleared.

This appears to have broken somewhere around
3c76b81f2f96b790b72f2088164ed8e9d5efbba1.

Fixes: #7511
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Just <sam.just@inktank.com>
(cherry picked from commit 70d23b9a0ad9af5ca35a627a7f93c7e610e17549)
Reviewed-by: Greg Farnum <greg@inktank.com>

OSD: use the osdmap_subscribe helper

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 3c76b81f2f96b790b72f2088164ed8e9d5efbba1)

OSD: create a helper for handling OSDMap subscriptions, and clean them up

We've had some trouble with not clearing out subscription requests and
overloading the monitors (though only because of other bugs). Write a
helper for handling subscription requests that we can use to centralize
safety logic. Clear out the subscription whenever we get a map that covers
it; if there are more maps available than we received, we will issue another
subscription request based on "m->newest_map" at the end of handle_osd_map().

Notice that the helper will no longer request old maps which we already have,
and that unless forced it will not dispatch multiple subscribe requests
to a single monitor.
Skipping old maps is safe:
1) we only trim old maps when the monitor tells us to,
2) we do not send messages to our peers until we have updated our maps
from the monitor.
That means only old and broken OSDs will send us messages based on maps
in our past, and we can (and should) ignore any directives from them anyway.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 6db3ae851d1c936de045390d18b1c6ae95f2a209)

Conflicts:

src/osd/OSD.h

monc: new fsub_want_increment( function to make handling subscriptions easier

Provide a subscription-modifying function which will not decrement
the start version.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 5b9c187caf6f7847aaa4a1003d200158dd32bf63)

OSD: disable the PGStatsAck timeout when we are reconnecting to a monitor

Previously, the timeout counter started as soon as we issued the reopen,
but if the reconnect process itself took a while, we might time out and
issue another reopen just as we get to the point where it's possible to
get work done. Since the mon client has its own reconnect timeouts (that is,
the OSD doesn't need to trigger those), we instead disable our timeouts
while the reconnect is happening, and then turn them back on again starting
from when we get the reconnect callback.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 64cedf6fa3ee309cc96554286bfb805e4ca89439)

Conflicts:

src/osd/OSD.cc

monc: backoff the timeout period when reconnecting

If the monitors are systematically slowing down, we don't want to spam
them with reconnect attempts every three seconds. Instead, every time
we issue a reconnect, multiply our timeout period by a configurable; when
we complete the connection, reduce that multipler by 50%. This should let
us respond to monitor load.
Of course, we don't want to do that for initial startup in the case of a
couple down monitors, so don't apply the backoff until we've successfully
connected to a monitor at least once.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 794c86fd289bd62a35ed14368fa096c46736e9a2)

monc: set "hunting" to true when we reopen the mon session

If we don't have a connecton to a monitor, we want to retry to another
monitor regardless of whether it's the first time or not.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 60da8abe0ebf17ce818d6fcc6391401878123bb7)

monc: let users specify a callback when they reopen their monitor session

Then the callback is triggered when a new session is established, and the
daemon can do whatever it likes. There are no guarantees about how long it
might take to trigger, though. In particular we call the provided callback
while not holding our own lock in order to avoid deadlock. This could lead
to some funny ordering from the user's perspective if they call
reopen_session() again before getting the callback, but there's no way around
that, so they just have to use it appropriately.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 1a8c43474bf36bfcf2a94bf9b7e756a2a99f33fd)

rgw: multi object delete should be idempotent

Fixes: #7346
When doing a multi object delete, if an object does not exist then we
should return a success code for that object.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 8ca3d95bf633ea9616852cec74f02285a03071d5)

Conflicts:
src/rgw/rgw_op.cc

v0.67.7

Signed-off-by: Ken Dreyer <ken.dreyer@inktank.com>

radosgw-admin: fix object policy read op

Fixes: #7083
This was broken when we fixed #6940. We use the same function to both
read the bucket policy and the object policy. However, each needed to be
treated differently. Restore old behavior for objects.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit b1976dd00f5b29c01791272f63a18250319f2edb)

Merge pull request #1243 from dachary/wip-ceph-disk-dumpling

ceph-disk: unit tests (dumpling)

Reviewed-by: Sage Weil <sage@inktank.com>

add autotools-generated files to .gitignore

When running "make check", Automake generates test-suite.log, along with
various *.log and *.trs files in the tree. Add these files to
.gitignore.

(It looks like this feature arrived in Automake 1.13.)

Signed-off-by: Ken Dreyer <ken.dreyer@inktank.com>
(cherry picked from commit bb8b7503b03fac5830fb71b9723963fdc803ca90)

ceph-disk: unit tests

src/test/ceph-disk.sh replaces src/test/cli/ceph-disk/data-dir.t

Signed-off-by: Loic Dachary <loic@dachary.org>

ceph-disk: cannot run unit tests

Because ceph-disk relies on hardcoded paths. The corresponding test will
be added back when ceph-disk can run from sources.

Fixes: #7085
Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 2ba6930d62263a39f150ab43bf8cd860b9245188)

Revert "librbd: remove limit on number of objects in the cache"

This reverts commit 367cf1bbf86233eb20ff2304e7d6caab77b84fcc.

Removing the limit on objects means we leak memory, since Objects without
any buffers can exist in the cache.

os/FileStore: fix ENOENT error code for getattrs()

In commit dc0dfb9e01d593afdd430ca776cf4da2c2240a20 the omap xattrs code
moved up a block and r was no longer local to the block. Translate
ENOENT -> 0 to compensate.

Fix the same error in _rmattrs().

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 6da4b91c07878e07f23eee563cf1d2422f348c2f)

test/filestore/run_seed_to.sh: avoid obsolete --filestore-xattr-use-omap

This option no longer exists.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 1d4f501a015727a7ff4b2f9b20dc91f2bbd9707b)

release build 67.6

Signed-off-by: Alfredo Deza <alfredo.deza@inktank.com>

Merge pull request #1232 from ceph/dumpling-7334

backport ceph-disk improvements to dumpling

http://pulpito.ceph.com/ubuntu-2014-02-12_16:52:33-ceph-deploy-dumpling-7334-testing-basic-plana/

common,os: Remove filestore_xattr_use_omap option

Now we operate just like when this was set to true

Fixes: #6143
Signed-off-by: David Zafman <david.zafman@inktank.com>
(cherry picked from commit dc0dfb9e01d593afdd430ca776cf4da2c2240a20)

add support for absence of PATH

Note that this commit is actually bisecting the changes from
Loic Dachary that touch ceph-disk only (ad515bf). As that changeset
also touches other files it causes conflicts that are not resolvable
for backporting it to dumpling.

Signed-off-by: Alfredo Deza <alfredo@deza.pe>

ceph-disk: make initial journal files 0 bytes

The ceph-osd will resize journal files up and properly fallocate() them
so that the blocks are preallocated and (hopefully) contiguous. We
don't need to do it here too, and getting fallocate() to work from
python is a pain in the butt.

Fixes: #5981
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit a786ad773cd33880075f1deb3691528d1afd03ec)

alert the user about error messages from partx

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
(cherry picked from commit 9bcc42a3e6b08521694b5c0228b2c6ed7b3d312e)

use partx for red hat or centos instead of partprobe

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
(cherry picked from commit 42900ff9da9f5adcac239a84ebf4d2e407c29699)

ceph-disk: run the right executables from udev

When run by the udev rules, PATH is not defined. Thus,
ceph-disk-activate relies on its which() function to locate the
correct executable. The which() function used os.defpath if none was
set, and this worked for anything using it.

ad6b4b4b08b6ef7ae8086f2be3a9ef521adaa88c added a new default value to
PATH, so only /usr/bin was checked by callers that did not use
which(). This resulted in the mount command not being found when
ceph-disk-activate was run by udev, and thus osds failing to start
after being prepared by ceph-deploy.

Make ceph-disk consistently use the existing helpers (command() and
command_check_call()) that use which(), so lack of PATH does not
matter. Simplify _check_output() to use command(),
another wrapper around subprocess.Popen.

Fixes: #7258
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit d7b0c7faafd37e4ae8a1680edfa60c22b419cbd8)

ceph-disk: implement --sysconfdir as /etc/ceph

Replace hardcoded /etc/ceph with the SYSCONFDIR global variable and
implement the --sysconfdir option to override the default value.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit a71025d33621257b6fd6632516cfed2849ff1637)

ceph-disk: implement --statedir as /var/lib/ceph

Replace hardcoded /var/lib/ceph with the STATEDIR global variable and
implement the --statedir option to override the default value.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit ca713f48ae7a1fece2869f1a1c97d23ab33fb441)

ceph-disk: add copyright notice

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 51ee3c04777aaf6b9609dde9bc318b5c66c70787)

ceph-disk: create the data directory if it does not exist

Instead of failing if the OSD data directory does not exist, create
it. Only do so if the data directory is not enforced to be a device via
the use of the --data-dev flag. The directory is not recursively created.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 306b099ab093bfac466d68fe1cb87367bc01e577)

ceph-disk: run ceph-osd when --mark-init=none

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 0fcc081858fae4febbb6a613a93cfbbcedd5a320)

ceph-disk: implement --mark-init=none

It is meant to be used when preparing and activating a directory that is
not to be used with init. No file is created to identify the init
system, no symbolic link is made to the directory in /var/lib/ceph
and the init scripts are not called.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit e773b68f4c89ac56b425c710d7dcdc3d74a92926)

ceph-disk: fsid is a known configuration option

Use get_conf_with_default instead of get_conf because fsid is a known
ceph configuration option. It allows overriding via CEPH_ARGS which is
convenient for testing. Only options that are not found in config_opts.h
are fetch via get_conf.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit b65eb377f5e93ea85644e4c0939365fd7ac36072)

ceph-disk: use CalledProcessError.returncode

CalledProcessError has no errno data member

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 97f516a1ddfb2d014e1f7e762c4155e4b9bcb90b)

ceph-disk: display the command output on OSD creation failure

The string form of a CalledProcessError instance does not include the
output datamember. Add it to the Error exception for debug purposes.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit d09af0fa50f322c9e59765f3badd497f5ca184d4)

ceph-disk: which() uses PATH first

Instead of relying on a hardcoded set of if paths. Although this has the
potential of changing the location of the binary being used by ceph-disk
on an existing installation, it is currently only used for sgdisk. It
could be disruptive for someone using a modified version of sgdisk but
the odds of this happening are very low.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 2b935bbf60bafb6dd488c0eb30f156fce1b9d197)

ceph-disk: add --prepend-to-path to control execution

/usr/bin is hardcoded in front of some ceph programs which makes it
impossible to control where they are located via the PATH.

The hardcoded path cannot be removed altogether because it will most
likely lead to unexpected and difficult to diagnose problems for
existing installations where the PATH finds the program elsewhere.

The --prepend-to-path flag is added and defaults to /usr/bin : it prepends
to the PATH environment variable. The hardcoded path is removed
and the PATH will be used: since /usr/bin is searched first, the
legacy behavior will not change.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit ad6b4b4b08b6ef7ae8086f2be3a9ef521adaa88c)

ceph-disk: make exception handling terse in main_activate_journal

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 908348b8047e8577ecf9133f2683f91423694416)

ceph-disk: do not hide main_activate() exceptions

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 55ca7bb2da73f1be1293710a635cfea42abd7682)

ceph-disk: fix activate() indent

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 324804a81c37ff89f2488e2ba106033c0e6e119e)

ceph-disk: remove noop try:

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit de0050596b5f56863c3486c1cd5e7ffea62e3d00)

ceph-disk: fix Error() messages formatting

Mainly using % instead of ,

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit b82ccfbfa786cd5436b48ec38276c5a48028ce1d)

ceph-disk: prepare --data-dir must not override files

ceph-disk does nothing when given a device that is already prepared. If
given a directory that already contains a successfully prepared OSD, it
will however override it.

Instead of overriding the files in the osd data directory, return
immediately if the magic file exists. Make it so the magic file is
created last to accurately reflect the success of the OSD preparation.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 7dfe550ce18623cde4ae43a2416e31ef81381ab9)

ceph-disk: zap needs at least one device

If given no argument, ceph-disk zap should display the usage instead of
silently doing nothing. Silence can be confused with "I zapped all the
disks".

http://tracker.ceph.com/issues/6981 fixes #6981

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 07888ef3fd4440332c8287d0faa9f23a32cf141c)

use the new get_command helper in check_call

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
(cherry picked from commit 897dfc113fe3b86f3dda53172933bfd4f8089869)

use the absolute path for executables if found

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
(cherry picked from commit a9334a1c8c6681305e76b361377864d0dd1e3d34)

remove trailing semicolon

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
(cherry picked from commit 43561f791607f5fd6f03d5421e1f30a29fb4194e)

replace sgdisk subprocess calls with a helper

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
(cherry picked from commit e19e38012bc4579054f63865e682c8c3a7829c7b)

Call --mbrtogpt on journal run of sgdisk should the drive require a GPT table.

Signed-off-by: Jonathan Davies <jonathan.davies@canonical.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 35011e0b01d65e4c001876882d597451f2028345)

ceph-disk: blacklist /dev/fd0

blkid -s TYPE /dev/fd0 has been verified to hang forever on a
H8DMR-82 supermicro motherboard running

3.8.0-33-generic #48~precise1-Ubuntu SMP Thu Oct 24 16:28:06 UTC 2013
x86_64

It is unlikely that ceph will ever be used on floppy disks, they
can be blacklisted.

http://tracker.ceph.com/issues/6827 fixes: #6827

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 65701978715237ff5a4c68573c0696fd9d438e4f)

Make fsid comparison case-insensitive

get_fsid and find_cluster_by_uuid are modified so ceph-disk activate and
ceph-disk activate-all will work if the fsid uses uppercase characters.

Signed-off-by: Harry Harrington <git-harry@live.co.uk>
(cherry picked from commit 22f8325dbfce7ef2e97bf015c0f8bba53e75dfe9)

librbd: remove limit on number of objects in the cache

The number of objects is not a significant indicated of when data
should be written out for rbd. Use the highest possible value for
number of objects and just rely on the dirty data limits to trigger
flushing. When the number of objects is low, and many start being
flushed before they accumulate many requests, it hurts average request
size and performance for many concurrent sequential writes.

Fixes: #7385
Backport: emperor, dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 0559d31db29ea83bdb6cec72b830d16b44e3cd35)

ObjectCacher: use uint64_t for target and max values

All the options are uint64_t, but the ObjectCacher was converting them
to int64_t. There's never any reason for these to be negative, so
change the type.

Adjust a few conditionals so that they only convert known-positive
signed values to uint64_t before comparing with the target and max
values. Leave the actual stats accounting as loff_t for now, since
bugs in accounting will have bad effects if negative values wrap
around.

Backport: emperor, dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit db034acf546a72739ff6543241543f3bd651f3ae)

ObjectCacher: remove max_bytes and max_ob arguments to trim()

These are never passed, so replace them with the defaults.

Backport: emperor, dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit bf8cf2d6d21a204a099347f3dcd5b48100b8c445)

Merge pull request #1210 from dachary/dumpling

common: admin socket fallback to json-pretty format (dumpling)

Reviewed-by: Sage Weil <sage@inktank.com>

common: admin socket fallback to json-pretty format

If the format argument to a command sent to the admin socket is not
among the supported formats ( json, json-pretty, xml, xml-pretty ) the
new_formatter function will return null and the AdminSocketHook::call
function must fall back to a sensible default.

The CephContextHook::call and HelpHook::call failed to do that and a
malformed format argument would cause the mon to crash. A check is added
to each of them and fallback to json-pretty if the format is not
recognized.

To further protect AdminSocketHook::call implementations from similar
problems the format argument is checked immediately after accepting the
command in AdminSocket::do_accept and replaced with json-pretty if it is
not known.

A test case is added for both CephContextHook::call and HelpHook::call
to demonstrate the problem exists and is fixed by the patch.

Three other instances of unsafe calls to new_formatter were found and
a fallback to json-pretty was added. All other calls have been audited
and appear to be safe.

http://tracker.ceph.com/issues/7378 fixes #7378

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 165e76d4d03ffcc490fd3c2ba60fb37372990d0a)

qa: add script for testing rados client timeout options

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 9e62beb80b6c92a97ec36c0db5ea39e417661b35)

rados: check return values for commands that can now fail

A few places were not checking the return values of commands, since
they could not fail before timeouts were added.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 79c1874346ff55e2dc74ef860db16ce70242fd00)

librados: check and return on error so timeouts work

Some functions could not previously return errors, but they had an
int return value, which can now receive ETIMEDOUT.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 8e9459e897b1bc2f97d52ee07701fd22069efcf3)

msg/Pipe: add option to restrict delay injection to specific msg type

This makes it possible to test timeouts reliably by delaying certain
messages effectively forever, but still being able to e.g. connect and
authenticate to the monitors.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit d389e617c1019e44848330bf9570138ac7b0e5d4)

MonClient: add a timeout on commands for librados

Just use the conf option directly, since librados is the only caller.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 671a76d64bc50e4f15f4c2804d99887e22dcdb69)

Objecter: implement mon and osd operation timeouts

This captures almost all operations from librados other than mon_commands().

Get the values for the timeouts from the Objecter constructor, so only
librados uses them.

Add C_Cancel_*_Op, finish_*_op(), and *_op_cancel() for each type of
operation, to mirror those for Op. Create a callback and schedule it
in the existing timer thread if the timeouts are specified.

Fixes: #6507
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 3e1f7bbb4217d322f4e0ece16e676cd30ee42a20)

Conflicts:
src/osd/OSD.cc
src/osd/ReplicatedPG.cc
src/osdc/Objecter.cc
src/osdc/Objecter.h

librados: add timeout to wait_for_osdmap()

This is used by several pool operations independent of the objecter,
including rados_ioctx_create() to look up the pool id in the first
osdmap.

Unfortunately we can't just rely on WaitInterval returning ETIMEDOUT,
since it may also get interrupted by a signal, so we can't avoid
keeping track of time explicitly here.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 1829d2c9fd13f2cbae4e192c9feb553047dad42c)

Conflicts:
src/librados/RadosClient.cc

conf: add options for librados timeouts

These will be implemented in subsequent patches.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 0dcceff1378d85ca6d81d102d201890b8a71af6b)

test_striper: fix warning

Signed-off-by: Sage Weil <sage@inktank.com>

crushtool: add cli test for off-by-one tries vs retries bug

See bug #7370. This passes on dumpling and breaks prior to the #7370 fix.

Backport: emperor, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ed32c4002fb5cb1dd546331651eaf7de1a017471)

client: use 64-bit value in sync read eof logic

The file size can jump to a value that is very much larger than our current
position (for example, it could be a disk image file that gets a sparse
write at a large offset). Use a 64-bit value so that 'some' doesn't
overflow.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: John Spray <john.spray@inktank.com>
(cherry picked from commit 7ff2b541c24d1c81c3bcfbcb347694c2097993d7)

osd: do not send peering messages during init

Do not send any peering messages while we are still working our way
through init().

Fixes: #7093
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 35da8f9d80e0c6c33fb6c6e00f0bf38f1eb87d0e)
Signed-off-by: Greg Farnum <greg@inktank.com>

OSDMap: fix deepish_copy_from

Start with a shallow copy!

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit d0f13f54146694a197535795da15b8832ef4b56f)

Conflicts:

src/osd/OSDMap.h

rgw: fix listing of multipart upload parts

Fixes: #7169
There are two issues here. One is that we may return more entries than
we should (as specified by max_parts). Second issue is that the
NextPartNumberMarker is set incorrectly. Both of these issues mainly
affect uploads with > 1000 parts, although can be triggered with less
than that.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>

rgw: initialize RGWUserAdminOpState::system_specified

Fixes: #6829
Backport: dumpling, emperor
We didn't init this member variable, which might cause that when
modifying user info that has this flag set the 'system' flag might
inadvertently reset.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 561e7b0b287e65e90b80699e45a52ae44e94684f)