]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
11 years agomds: fixes for coverity scan 973/head
Yan, Zheng [Wed, 18 Dec 2013 01:33:13 +0000 (09:33 +0800)]
mds: fixes for coverity scan

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoMerge pull request #960 from ceph/wip-6990
David Zafman [Tue, 17 Dec 2013 19:46:48 +0000 (11:46 -0800)]
Merge pull request #960 from ceph/wip-6990

Add backward comptible acting set until all OSDs updated

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoAdd backward comptible acting set until all OSDs updated 960/head
David Zafman [Tue, 17 Dec 2013 06:08:07 +0000 (22:08 -0800)]
Add backward comptible acting set until all OSDs updated

Add configuration variable to override compatible acting set handling.
Later we'll check the osdmap that all OSDs are updated to use new acting sets.

Fixes: #6990
Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agoMerge pull request #953 from dachary/wip-qa-suite
Sage Weil [Tue, 17 Dec 2013 18:46:49 +0000 (10:46 -0800)]
Merge pull request #953 from dachary/wip-qa-suite

use qa/workunits/cephtool/test.sh as a unittest

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agomds: drop unused find_ino_dir
Alexandre Oliva [Tue, 17 Dec 2013 11:00:00 +0000 (09:00 -0200)]
mds: drop unused find_ino_dir

Remove all traces of find_ino_dir, it is no longer used.

Signed-off-by: Alexandre Oliva <oliva@gnu.org>
11 years agoFix typo in #undef in ceph-dencoder
Alexandre Oliva [Tue, 17 Dec 2013 10:55:27 +0000 (08:55 -0200)]
Fix typo in #undef in ceph-dencoder

Signed-off-by: Alexandre Oliva <oliva@gnu.org>
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoqa: add ../qa/workunits/cephtool/test.sh to unittests 953/head
Loic Dachary [Tue, 17 Dec 2013 16:53:02 +0000 (17:53 +0100)]
qa: add ../qa/workunits/cephtool/test.sh to unittests

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #957 from ceph/wip-rbd-coverity
Sage Weil [Tue, 17 Dec 2013 16:51:32 +0000 (08:51 -0800)]
Merge pull request #957 from ceph/wip-rbd-coverity

rbd: make coverity happy

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoqa: vstart wrapper helper for unittests
Loic Dachary [Mon, 16 Dec 2013 16:13:27 +0000 (17:13 +0100)]
qa: vstart wrapper helper for unittests

Primarily useful to run scripts from qa/workunits as part of make check.

vstart_wrapper.sh starts a vstart.sh cluster, runs the command given in
argument and tearsdown cluster when it completes.

The vstart_wrapped_tests.sh script contains the list of scripts that
need the vstart_wrapper.sh to run. It would not be necessary if automake
allowed passing argument to tests scripts. It also adds markers to the
output to facilitate searching the output because it can be very verbose.

This wrapper is kept simple and will probably evolve into something more
sophisticated depending on the scripts being added to
vstart_wrapper_tests.sh. There are numerous options, ranging from
parsing the yaml from ceph-qa-suite to figure out the configuration
cluster to converting the same yaml into a puppet manifest that is
applied locally or even driving OpenStack instances to avoid messing
with the local machine. But this would probably be overkill at this
point.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agorbd: make coverity happy 957/head
Ilya Dryomov [Tue, 17 Dec 2013 15:42:30 +0000 (17:42 +0200)]
rbd: make coverity happy

A recent coverity run found two "defects" in rbd.cc:

** CID 1138367:  Time of check time of use  (TOCTOU)
/rbd.cc: 2024 in do_kernel_rm(const char *)()

2019   const char *fname = "/sys/bus/rbd/remove_single_major";
2020   if (stat(fname, &sbuf)) {
2021     fname = "/sys/bus/rbd/remove";
2022   }
2023
2024   int fd = open(fname, O_WRONLY);
2025   if (fd < 0) {

** CID 1138368:  Time of check time of use  (TOCTOU)
/rbd.cc: 1735 in do_kernel_add(const char *, const char *, const char *)()

same as above, s/remove/add

There is nothing racey going on here, and this is not an instance of
TOCTOU, but, instead of silencing coverity with annotatations, redo
this with two open() calls.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agovstart/stop: use pkill instead of killall
Loic Dachary [Mon, 16 Dec 2013 15:27:34 +0000 (16:27 +0100)]
vstart/stop: use pkill instead of killall

killall fails to kill all OSDs when called as a oneliner. Replace with a
loop using pkill that retries until there are no more process to kill by
the required name.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoqa: recursively remove .gcno and .gcda
Loic Dachary [Mon, 16 Dec 2013 13:36:26 +0000 (14:36 +0100)]
qa: recursively remove .gcno and .gcda

Instead of removing them only in the current directory. Leftovers
prevent running make check-coverage properly because lcov fails
when stumbling on old .gcno files with

lcov -d . -c -i -o check-coverage_base_full.lcov
Processing os/BtrfsFileStoreBackend.gcno
geninfo: ERROR: ceph/src/os/BtrfsFileStoreBackend.gcno: reached
         unexpected end of file

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoceph_test_rados_api_tier: fix HitSetTrim vs split, too
Sage Weil [Tue, 17 Dec 2013 01:09:13 +0000 (17:09 -0800)]
ceph_test_rados_api_tier: fix HitSetTrim vs split, too

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #904 from ceph/wip-mds-cluster2
Sage Weil [Tue, 17 Dec 2013 01:03:27 +0000 (17:03 -0800)]
Merge pull request #904 from ceph/wip-mds-cluster2

Wip mds cluster2

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_api_tier: fix HitSetRead test race with split
Sage Weil [Tue, 17 Dec 2013 00:52:35 +0000 (16:52 -0800)]
ceph_test_rados_api_tier: fix HitSetRead test race with split

Recalculate the hash on each iteration in case we are racing with split.

Fixes: #7013
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #954 from ceph/wip-7009
Sage Weil [Tue, 17 Dec 2013 00:31:39 +0000 (16:31 -0800)]
Merge pull request #954 from ceph/wip-7009

mon: move supported_commands fields, methods into Monitor, and fix leak

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agomon: move supported_commands fields, methods into Monitor, and fix leak 954/head
Sage Weil [Tue, 17 Dec 2013 00:09:44 +0000 (16:09 -0800)]
mon: move supported_commands fields, methods into Monitor, and fix leak

We were leaking the static leader_supported_mon_commands.  Move this into
the class so that we can clean up in the destructor.

Rename get_command_descriptions -> format_command_descriptions.

Fixes: #7009
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #951 from ceph/wip-linux-version
Sage Weil [Mon, 16 Dec 2013 17:27:43 +0000 (09:27 -0800)]
Merge pull request #951 from ceph/wip-linux-version

common: introduce get_linux_version()

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoFileJournal: use pclose() to close a popen() stream 951/head
Ilya Dryomov [Mon, 16 Dec 2013 16:57:22 +0000 (18:57 +0200)]
FileJournal: use pclose() to close a popen() stream

In FileJournal::_check_disk_write_cache(), use pclose() instead of
fclose() to close a stream, created by popen().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agoFileJournal: switch to get_linux_version()
Ilya Dryomov [Mon, 16 Dec 2013 16:57:22 +0000 (18:57 +0200)]
FileJournal: switch to get_linux_version()

For the purposes of FileJournal::_check_disk_write_cache(), use
get_linux_version(), which is based on uname(2), instead of parsing the
contents of /proc/version.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agocommon: introduce get_linux_version()
Ilya Dryomov [Mon, 16 Dec 2013 16:57:21 +0000 (18:57 +0200)]
common: introduce get_linux_version()

get_linux_version() returns a version of the currently running kernel,
encoded as in int, and is contained in common/linux_version.[ch].

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agoconfigure: break up AC_CHECK_HEADERS into one header-file per line
Ilya Dryomov [Mon, 16 Dec 2013 16:57:21 +0000 (18:57 +0200)]
configure: break up AC_CHECK_HEADERS into one header-file per line

Break up AC_CHECK_HEADERS macro into one header-file per line so it's
easier to read and make changes.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agomds: fix stale session handling for multiple mds 904/head
Yan, Zheng [Sun, 15 Dec 2013 02:11:16 +0000 (10:11 +0800)]
mds: fix stale session handling for multiple mds

Don't add new caps to stale session when importing inodes. Don't
touch session when importing caps because it confuses the stale
session detection.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: properly set dirty flag when journalling import
Yan, Zheng [Thu, 12 Dec 2013 04:00:59 +0000 (12:00 +0800)]
mds: properly set dirty flag when journalling import

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: properly update mdsdir's authority during recovery
Yan, Zheng [Thu, 12 Dec 2013 02:37:27 +0000 (10:37 +0800)]
mds: properly update mdsdir's authority during recovery

dirfrag of mdsdir doesn't inherit its parent inode's authority.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: finish opening sessions even if import aborted
Yan, Zheng [Thu, 12 Dec 2013 02:18:10 +0000 (10:18 +0800)]
mds: finish opening sessions even if import aborted

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: fix discover path race
Yan, Zheng [Sat, 7 Dec 2013 23:33:19 +0000 (07:33 +0800)]
mds: fix discover path race

When C_MDC_RetryDiscoverPath executed, we may have already become
auth mds of base

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoMerge pull request #947 from dachary/wip-6824
Sage Weil [Mon, 16 Dec 2013 05:16:48 +0000 (21:16 -0800)]
Merge pull request #947 from dachary/wip-6824

mon: set ceph osd (down|out|in|rm) error code on failure

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agomds: fix bug in MDCache::open_ino_finish
Yan, Zheng [Sun, 8 Dec 2013 00:01:54 +0000 (08:01 +0800)]
mds: fix bug in MDCache::open_ino_finish

It's wrong to erase open_ino_info_t after finishing contexts, because
MDCache::open_ino() can be called again when finishing contexts.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: add CEPH_FEATURE_EXPORT_PEER and bump the protocal version
Yan, Zheng [Fri, 6 Dec 2013 08:33:39 +0000 (16:33 +0800)]
mds: add CEPH_FEATURE_EXPORT_PEER and bump the protocal version

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoclient: handle session flush message
Yan, Zheng [Fri, 6 Dec 2013 02:24:34 +0000 (10:24 +0800)]
client: handle session flush message

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: simplify how to export non-auth caps
Yan, Zheng [Tue, 26 Nov 2013 10:32:18 +0000 (18:32 +0800)]
mds: simplify how to export non-auth caps

Introduce a new flag in cap import message. If client finds the flag
is set, it releases exporter's caps (send release to the exporter).
This saves the cap export message and a "mds to mds" message.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: send cap import messages to clients after importing subtree succeeds
Yan, Zheng [Tue, 26 Nov 2013 09:19:04 +0000 (17:19 +0800)]
mds: send cap import messages to clients after importing subtree succeeds

When importing subtree, the importer sends cap import messages to clients
before the import subtree operation is considered as success. If the
exporter crashes before EExport event is journalled, the importer needs to
re-export client caps. This confuses clients, and makes them lose track of
auth caps.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: re-send cap exports in resolve message.
Yan, Zheng [Tue, 26 Nov 2013 07:10:29 +0000 (15:10 +0800)]
mds: re-send cap exports in resolve message.

For rename operation that changes inode's authority, if master mds
of the operation crashed, inode's original auth mds sends export
messages to clients when it receives the master mds' resolve ack
message, Client can't reply on the export message to add caps for
the master mds, then reconnect the cap when the master mds enters
reconnect stage. Because client may receive the export message after
receiving mdsmap that claims the master mds is in reconnect stage.

The fix is include cap exports in resolve message, so the master mds
can send import messages to clients when it enters the rejoin stage.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: include counterpart's information in cap import/export messages
Yan, Zheng [Tue, 26 Nov 2013 03:02:49 +0000 (11:02 +0800)]
mds: include counterpart's information in cap import/export messages

when exporting indoes with client caps, the importer sends cap import
messages to clients, the exporter sends cap export messages to clients.
A client can receive these two messages in any order. If a client first
receives cap import message, it adds the imported caps. but the caps
from the exporter are still considered as valid. This can compromise
consistence. If MDS crashes while importing caps, clients can only
receive cap export messages, but don't receive cap import messages.
These clients don't know which MDS is the cap importer, so they can't
send cap reconnect when the MDS recovers.

We can handle above issues by including counterpart's information in
cap import/export messages. If a client first receives cap import
message, it added the imported caps, then removes the the exporter's
caps. If a client first receives cap export message, it removes the
exported caps, then adds caps for the importer.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: send info of imported caps back to the exporter (rename)
Yan, Zheng [Tue, 26 Nov 2013 02:31:07 +0000 (10:31 +0800)]
mds: send info of imported caps back to the exporter (rename)

use MMDSSlaveRequest::OP_FINISH slave request to send information
of rename imported caps back to the exporter. This is preparation
for including counterpart's information in cap import/export message.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: send info of imported caps back to the exporter (cache rejoin)
Yan, Zheng [Tue, 26 Nov 2013 02:17:30 +0000 (10:17 +0800)]
mds: send info of imported caps back to the exporter (cache rejoin)

Use cache rejoin ack message to send information of rejoin imported
caps back to the exporter. Also move the code that exports reconnect
caps to MDCache::handle_cache_rejoin_ack()

This is preparation for including counterpart's information in cap
import/export message.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: send info of imported caps back to the exporter (export dir)
Yan, Zheng [Tue, 26 Nov 2013 01:49:21 +0000 (09:49 +0800)]
mds: send info of imported caps back to the exporter (export dir)

Introduce a new class Capability::Import and use it to send information
of imported caps back to the exporter. This is preparation for including
counterpart's information in cap import/export message.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: flush session messages before exporting caps
Yan, Zheng [Fri, 25 Oct 2013 08:30:49 +0000 (16:30 +0800)]
mds: flush session messages before exporting caps

Following sequence of events can happen when exporting inodes:

- client sends open file request to mds.0
- mds.0 handles the request and sends inode stat back to the client
- mds.0 export the inode to mds.1
- mds.1 sends cap import message to the client
- mds.0 sends cap export message to the client
- client receives the cap import message from mds.1, but the client
  still doesn't have corresponding inode in the cache. So the client
  releases the imported caps.
- client receives the open file reply from mds.0
- client receives the cap export message from mds.0.

After the end of these events, the client doesn't have any cap for
the opened file.

To fix the message ordering issue, this patch introduces a new session
operation FLUSHMSG. Before exporting caps, we send a FLUSHMSG seesion
message to client and wait for the acknowledgment. When receiveing the
FLUSHMSG_ACK message from client, we are sure that clients have received
all messages sent previously.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: increase cap sequence when sharing max size
Yan, Zheng [Mon, 18 Nov 2013 09:59:06 +0000 (17:59 +0800)]
mds: increase cap sequence when sharing max size

For case:
 - client voluntarily releases some caps through cap update message
 - mds shares the new max by sending cap grant message
 - mds recevies the cap update message

If mds doesn't increase the cap sequence when sharing the max size.
It can't determine if the cap update message was sent before or after
client reveived the cap grant message that updates max size.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: include inode version in auth mds' lock messages
Yan, Zheng [Mon, 18 Nov 2013 03:06:43 +0000 (11:06 +0800)]
mds: include inode version in auth mds' lock messages

encode inode version in auth mds' lock messages, so that version
of replica inodes get updated. This is important because client
use inode version in mds reply to check if the cached inode is
already up-to-date. It skips updating the inode if it thinks the
inode is already up-to-date.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: avoid allocating MDRequest::More when cleanup request
Yan, Zheng [Sun, 17 Nov 2013 10:32:23 +0000 (18:32 +0800)]
mds: avoid allocating MDRequest::More when cleanup request

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: waiting for slave reuqest to finish
Yan, Zheng [Sun, 17 Nov 2013 09:03:29 +0000 (17:03 +0800)]
mds: waiting for slave reuqest to finish

If MDS receives a client request, but find there is an existing
slave request. It's possible that other MDS forwarded the request
to us, but the MMDSSlaveRequest::OP_FINISH message arrives after
the client request.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: check lock state before eval_gather
Yan, Zheng [Sat, 16 Nov 2013 00:37:58 +0000 (08:37 +0800)]
mds: check lock state before eval_gather

Locker::eval_gather() can dispatch requests, which may change other
locks' states.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: don't request CEPH_CAP_PIN from auth mds
Yan, Zheng [Fri, 15 Nov 2013 02:21:49 +0000 (10:21 +0800)]
mds: don't request CEPH_CAP_PIN from auth mds

avoid triggering assert(in->get_loner() >= 0 && in->mds_caps_wanted.empty())
in Locker::file_xsyn()

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: fix sending resolve message
Yan, Zheng [Fri, 15 Nov 2013 01:30:39 +0000 (09:30 +0800)]
mds: fix sending resolve message

need to send resolve message when mds is in reconnect state

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: keep dentry lock in sync state
Yan, Zheng [Tue, 12 Nov 2013 08:12:25 +0000 (16:12 +0800)]
mds: keep dentry lock in sync state

unlike locks of other types, dentry lock in unreadable state can
block path traverse, so it should be in sync state as much as
possible.

This patch make Locker::try_eval() change dentry lock's state to
sync even when the dentry is freezing. Also make migrator check
imported dentries' lock states, change locks' states to sync if
necessary.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: avoid leaving bare-bone dirfrags in the cache
Yan, Zheng [Tue, 12 Nov 2013 07:38:21 +0000 (15:38 +0800)]
mds: avoid leaving bare-bone dirfrags in the cache

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: re-issue caps after importing inode
Yan, Zheng [Sat, 12 Oct 2013 01:18:20 +0000 (09:18 +0800)]
mds: re-issue caps after importing inode

After importing inode, the issued caps can be less than the caps
client wants. So always re-issue caps after importing inode.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: avoid issuing caps when inode is frozen
Yan, Zheng [Sat, 9 Nov 2013 03:07:27 +0000 (11:07 +0800)]
mds: avoid issuing caps when inode is frozen

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: fix rename notify
Yan, Zheng [Fri, 8 Nov 2013 10:42:33 +0000 (18:42 +0800)]
mds: fix rename notify

commit 1d86f77edf (mds: fix cross-authorty rename race) introduced
rename notify, but it puts the code in wrong bracket.

This patch also fixes a rename notify related bug in
MDCache::handle_mds_failure()

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: re-send discover if want_xlocked becomes true
Yan, Zheng [Fri, 8 Nov 2013 09:45:12 +0000 (17:45 +0800)]
mds: re-send discover if want_xlocked becomes true

If want_xlocked becomes true, we can not rely on previously sent discover
because it's likely the previous discover is blocked on the xlocked dentry.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: fix empty directory check
Yan, Zheng [Thu, 7 Nov 2013 09:07:51 +0000 (17:07 +0800)]
mds: fix empty directory check

Since commit 310032ee81(fix mds scatter_writebehind starvation), rdlock
a scatter lock does not always propagate dirty fragstats to corresponding
inode. So Server::_dir_is_nonempty() needs to check each dirfrag's stat
intead of checking inode's dirstat.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: merge delayed cache expire
Yan, Zheng [Wed, 6 Nov 2013 02:58:00 +0000 (10:58 +0800)]
mds: merge delayed cache expire

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: process delayed expire if exporting dir cancelled in warnning state
Yan, Zheng [Wed, 6 Nov 2013 02:31:06 +0000 (10:31 +0800)]
mds: process delayed expire if exporting dir cancelled in warnning state

we may add delayed expire when exporting dir is in warnning state

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: handle cache rejoin corner case
Yan, Zheng [Wed, 6 Nov 2013 01:42:43 +0000 (09:42 +0800)]
mds: handle cache rejoin corner case

A recovering MDS may receives strong cache rejoin from a survivor,
then the survivor restarts, the recovering MDS receives week cache
rejoin from the same MDS. Before processing the week cache rejoin,
we should scour replicas added by the obsoleted strong cache rejoin.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: unify nonce type
Yan, Zheng [Wed, 6 Nov 2013 01:28:51 +0000 (09:28 +0800)]
mds: unify nonce type

MDSCacheObject::replica_nonce is defined as __s16, but nonce type
in MDSCacheObject::replica_map is int. This mismatch may confuse
MDCache::handle_cache_expire().

this patch unifies the nonce type as uint32

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: rework stale import/export message detection
Yan, Zheng [Thu, 24 Oct 2013 09:10:59 +0000 (17:10 +0800)]
mds: rework stale import/export message detection

Current code uses import state to detect obsolete import/export messages.
it does not work for the case: cancel a subtree export, export the same
subtree again, the messages for the first export get dispatched.

This patch introduces "transation ID" for subtree exports. Each subtree
export has a unique TID, the ID is recorded in all import/export related
messages. By comparing the TID, we can reliably detect stale messages.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: put import/export related states together
Yan, Zheng [Thu, 24 Oct 2013 08:05:56 +0000 (16:05 +0800)]
mds: put import/export related states together

Current code uses several STL maps to record import/export related
states. A map lookup is required for each state access, this is not
efficient. It's better to put import/export related states together.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: freeze tree deadlock detection.
Yan, Zheng [Wed, 23 Oct 2013 01:15:58 +0000 (09:15 +0800)]
mds: freeze tree deadlock detection.

there are two situations that result freeze tree deadlock.

 - mds.0 authpins an item in subtree A
 - mds.0 sends request to mds.1 to authpin an item in subtree B
 - mds.0 freezes subtree A
 - mds.1 authpins an item in subtree B
 - mds.1 sends request to mds.0 to authpin an item in subtree A
 - mds.1 freezes subtree B
 - mds.1 receives the remote authpin request from mds.0
   (wait because subtree B is freezing)
 - mds.0 receives the remote authpin request from mds.1
   (wait because subtree A is freezing)

 - client request authpins items in subtree B
 - freeze subtree B
 - import subtree A which is parent of subtree B
   (authpins parent inode of subtree B, see CDir::set_dir_auth())
 - freeze subtree A
 - client request tries authpinning items in subtree A
   (wait because subtree A is freezing)

Enforcing a authpinning order can avoid the deadlock, but it's very
expensive. The deadlock is rare, so I think deadlock detection is
more suitable for the case.

This patch introduces freeze tree deadlock detection. We record the
start time of freezing tree. If we fail to freeze the tree within a
given duration, cancel the process of freezing tree.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoMerge remote-tracking branch 'gh/wip-hitset'
Sage Weil [Mon, 16 Dec 2013 00:57:23 +0000 (16:57 -0800)]
Merge remote-tracking branch 'gh/wip-hitset'

Reviewed-by: Greg Farnum <greg@inktank.com>
Conflicts:
src/common/config_opts.h
src/osd/ReplicatedPG.cc
src/osdc/Objecter.cc
src/vstart.sh

11 years agoRevert "common/Formatter: add newline to flushed output if m_pretty"
Sage Weil [Mon, 16 Dec 2013 00:23:09 +0000 (16:23 -0800)]
Revert "common/Formatter: add newline to flushed output if m_pretty"

This reverts commit d6146b0d915f1420b5e76f7037f656460c314461.

As Yehuda points out, this does not properly handle cases where we flush
the same output stream multiple times.

11 years agoRevert "common: fix perf_counters unittests for trailing newline in m_pretty"
Sage Weil [Mon, 16 Dec 2013 00:22:59 +0000 (16:22 -0800)]
Revert "common: fix perf_counters unittests for trailing newline in m_pretty"

This reverts commit ba5572397c0e48378b0a0e556db1b2c02756617e.

11 years agoqa: test for error when ceph osd rm is EBUSY 947/head
Loic Dachary [Sun, 15 Dec 2013 21:59:51 +0000 (22:59 +0100)]
qa: test for error when ceph osd rm is EBUSY

http://tracker.ceph.com/issues/6824 fixes #6824

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoqa: make cephtool test imune to pool size
Loic Dachary [Sun, 15 Dec 2013 20:41:45 +0000 (21:41 +0100)]
qa: make cephtool test imune to pool size

instead of assuming the pool size is 2, query it and increment it to
test for pool set data size. It allows to run the test from vstart.sh
without knowing what the required pool size is in advance:

    rm -fr dev out ;  mkdir -p dev ; \
     MON=1 OSD=3 ./vstart.sh -n -X -l mon osd

    LC_ALL=C PATH=:$PATH CEPH_CONF=ceph.conf \
      ../qa/workunits/cephtool/test.sh

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoqa: add function name and line number to cephtool output
Loic Dachary [Sun, 15 Dec 2013 20:41:00 +0000 (21:41 +0100)]
qa: add function name and line number to cephtool output

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoqa: silence cephtool tests cleanup
Loic Dachary [Sun, 15 Dec 2013 20:34:37 +0000 (21:34 +0100)]
qa: silence cephtool tests cleanup

The file removal installed to be triggered when the script stops must
not fail if the file does not exist.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: set ceph osd (down|out|in|rm) error code on failure
Loic Dachary [Sun, 15 Dec 2013 15:27:02 +0000 (16:27 +0100)]
mon: set ceph osd (down|out|in|rm) error code on failure

Instead of always returning true, the error code is set if at least one
operation fails.

EINVAL if the OSD id is invalid (osd.foobar for instance).
EBUSY if trying to remove and OSD that is up.

When used with the ceph command line, it looks like this:

    ceph -c ceph.conf osd rm osd.0
    Error EBUSY: osd.0 is still up; must be down before removal.
    kill PID_OF_osd.0
    ceph -c ceph.conf osd down osd.0
    marked down osd.0.
    ceph -c ceph.conf osd rm osd.0 osd.1
    Error EBUSY: removed osd.0, osd.1 is still up; must be down before removal.

http://tracker.ceph.com/issues/6824 fixes #6824

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #716 from ceph/wip-formatter-newlines
Sage Weil [Sun, 15 Dec 2013 18:24:03 +0000 (10:24 -0800)]
Merge pull request #716 from ceph/wip-formatter-newlines

common/Formatter: add newline to flushed output if m_pretty

11 years agoMerge pull request #943 from dachary/wip-formatter-newlines 716/head
Sage Weil [Sun, 15 Dec 2013 18:23:33 +0000 (10:23 -0800)]
Merge pull request #943 from dachary/wip-formatter-newlines

common: fix perf_counters unittests for trailing newline in m_pretty

11 years agoMerge pull request #942 from sstock/master
Sage Weil [Sun, 15 Dec 2013 18:18:49 +0000 (10:18 -0800)]
Merge pull request #942 from sstock/master

Add -n option to mount.ceph, feature 7006

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoAdd -n option to mount.ceph. Required by autofs when /etc/mtab is a link to /proc... 942/head
Steve Stock [Sat, 14 Dec 2013 21:44:06 +0000 (16:44 -0500)]
Add -n option to mount.ceph.  Required by autofs when /etc/mtab is a link to /proc/mounts (e.g. Debian Wheezy), otherwise automounting a ceph file system fails.  Also useful when /etc is read-only.  feature 7006

Signed-off-by: Steve Stock <steve@technolope.org>
11 years agoMerge pull request #937 from christian-marie/master
Sage Weil [Sun, 15 Dec 2013 16:41:16 +0000 (08:41 -0800)]
Merge pull request #937 from christian-marie/master

Document librados's rados_write's behaviour in reguards to return value.

11 years agoMerge pull request #924 from dachary/wip-erasure-doc
Sage Weil [Sun, 15 Dec 2013 16:40:52 +0000 (08:40 -0800)]
Merge pull request #924 from dachary/wip-erasure-doc

doc: update erasure code development doc

11 years agoMerge pull request #946 from dachary/wip-80-column
Sage Weil [Sun, 15 Dec 2013 16:40:32 +0000 (08:40 -0800)]
Merge pull request #946 from dachary/wip-80-column

osd: format test_osd_types.cc to 80 columns

11 years agoMerge pull request #945 from dachary/wip-6981
Sage Weil [Sun, 15 Dec 2013 16:40:16 +0000 (08:40 -0800)]
Merge pull request #945 from dachary/wip-6981

ceph-disk: zap needs at least one device

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #944 from dachary/wip-6679
Sage Weil [Sun, 15 Dec 2013 16:39:55 +0000 (08:39 -0800)]
Merge pull request #944 from dachary/wip-6679

common: fix rare race condition in Throttle unit tests

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #948 from dachary/wip-6736-1
Sage Weil [Sun, 15 Dec 2013 16:32:41 +0000 (08:32 -0800)]
Merge pull request #948 from dachary/wip-6736-1

mon: typo s/degrated/degraded/

Backport: emperor, dumpling

11 years agomon: typo s/degrated/degraded/ 948/head
Loic Dachary [Sun, 15 Dec 2013 16:15:46 +0000 (17:15 +0100)]
mon: typo s/degrated/degraded/

http://tracker.ceph.com/issues/6736 refs #6736

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoosd: format test_osd_types.cc to 80 columns 946/head
Loic Dachary [Sun, 15 Dec 2013 15:23:53 +0000 (16:23 +0100)]
osd: format test_osd_types.cc to 80 columns

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoceph-disk: zap needs at least one device 945/head
Loic Dachary [Sun, 15 Dec 2013 14:34:17 +0000 (15:34 +0100)]
ceph-disk: zap needs at least one device

If given no argument, ceph-disk zap should display the usage instead of
silently doing nothing. Silence can be confused with "I zapped all the
disks".

http://tracker.ceph.com/issues/6981 fixes #6981

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocommon: fix rare race condition in Throttle unit tests 944/head
Loic Dachary [Sun, 15 Dec 2013 13:31:27 +0000 (14:31 +0100)]
common: fix rare race condition in Throttle unit tests

The thread created to test Throttle race conditions updates a value (
throttle.get_current() ) that is tested by the main gtest thread but is
not protected by a lock. Instead of adding a lock, the main thread tests
the value after pthread_join() on the child thread.

http://tracker.ceph.com/issues/6679 fixes #6679

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocommon: format Throttle test to 80 columns
Loic Dachary [Sun, 15 Dec 2013 13:30:38 +0000 (14:30 +0100)]
common: format Throttle test to 80 columns

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocommon: fix perf_counters unittests for trailing newline in m_pretty 943/head
Loic Dachary [Sun, 15 Dec 2013 12:24:14 +0000 (13:24 +0100)]
common: fix perf_counters unittests for trailing newline in m_pretty

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #929 from kazhang/add-pkg-config
Loic Dachary [Sun, 15 Dec 2013 11:26:21 +0000 (03:26 -0800)]
Merge pull request #929 from kazhang/add-pkg-config

add apt-get install pkg-config for ubuntu server

Reviewed-by: Loic Dachary <loic@dachary.org>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agodoc: Added additional comments on placement targets and default placement.
John Wilkins [Sat, 14 Dec 2013 00:09:35 +0000 (16:09 -0800)]
doc: Added additional comments on placement targets and default placement.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agodoc: Updates to federated config.
John Wilkins [Sat, 14 Dec 2013 00:08:37 +0000 (16:08 -0800)]
doc: Updates to federated config.

Reverted Emperor versionadded to Dumpling as it gets backported.
Added default index and bucket pools to pool creation
Added default default_placment setting
Added placement_pools key val pair examples.
Added comments for re-running the procedure for the secondary region.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agotest_ipaddr: add another unit test
Sage Weil [Sat, 14 Dec 2013 00:02:22 +0000 (16:02 -0800)]
test_ipaddr: add another unit test

Was checking something for kbader.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: drop unused hit_set_start_stats
Sage Weil [Sat, 14 Dec 2013 00:02:02 +0000 (16:02 -0800)]
osd/ReplicatedPG: drop unused hit_set_start_stats

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: maintain stats for the hit_set_* objects
Sage Weil [Sat, 14 Dec 2013 00:01:48 +0000 (16:01 -0800)]
osd/ReplicatedPG: maintain stats for the hit_set_* objects

We also make hit_set.current_info reflect only the on-disk 'current', not
anything that is not persisted.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: set object_info_t, SnapSet on hit_set objects
Sage Weil [Fri, 13 Dec 2013 22:54:16 +0000 (14:54 -0800)]
osd/ReplicatedPG: set object_info_t, SnapSet on hit_set objects

These are first-class user-visible rados objects and need these attrs.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agovstart.sh: --hitset <pool> <type>
Sage Weil [Fri, 13 Dec 2013 22:50:34 +0000 (14:50 -0800)]
vstart.sh: --hitset <pool> <type>

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge remote-tracking branch 'gh/wip-objecter-full-2'
Sage Weil [Fri, 13 Dec 2013 18:49:10 +0000 (10:49 -0800)]
Merge remote-tracking branch 'gh/wip-objecter-full-2'

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #936 from ceph/wip-rbd-single-major
Josh Durgin [Fri, 13 Dec 2013 18:40:11 +0000 (10:40 -0800)]
Merge pull request #936 from ceph/wip-rbd-single-major

rbd: support for single-major device number allocation scheme

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoMerge pull request #932 from ceph/wip-6979
Sage Weil [Fri, 13 Dec 2013 18:03:43 +0000 (10:03 -0800)]
Merge pull request #932 from ceph/wip-6979

replace sgdisk subprocess calls with a helper

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Fri, 13 Dec 2013 17:58:10 +0000 (09:58 -0800)]
Merge remote-tracking branch 'gh/next'

11 years agotest/libcephfs: release resources before umount
Yan, Zheng [Tue, 10 Dec 2013 23:38:18 +0000 (07:38 +0800)]
test/libcephfs: release resources before umount

Fixes: #6742
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agouse the new get_command helper in check_call 932/head
Alfredo Deza [Fri, 13 Dec 2013 17:06:25 +0000 (12:06 -0500)]
use the new get_command helper in check_call

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
11 years agorbd: modprobe with single_major=Y on newer kernels 936/head
Ilya Dryomov [Fri, 13 Dec 2013 15:40:52 +0000 (17:40 +0200)]
rbd: modprobe with single_major=Y on newer kernels

On kernels that support it, and if 'rbd map' is given a chance to
modprobe, turn on single-major device number allocation scheme.  For
users who for some reason don't want it, the workaround is to insert
the rbd module manually before executing the first 'rbd map' command.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agorbd: add support for single-major device number allocation scheme
Ilya Dryomov [Fri, 13 Dec 2013 15:40:52 +0000 (17:40 +0200)]
rbd: add support for single-major device number allocation scheme

With the preparatory commits ("rbd: match against wholedisk device
numbers on unmap" and "rbd: match against both major and minor on unmap
on kernels >= 3.14") in, this amounts to chosing to work with new rbd
bus interfaces (/sys/bus/rbd/{add,remove}_single_major) if they are
available, instead of the old ones (/sys/bus/rbd/{add,remove}).

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>