]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
11 years agomds: handle cache rejoin corner case
Yan, Zheng [Wed, 6 Nov 2013 01:42:43 +0000 (09:42 +0800)]
mds: handle cache rejoin corner case

A recovering MDS may receives strong cache rejoin from a survivor,
then the survivor restarts, the recovering MDS receives week cache
rejoin from the same MDS. Before processing the week cache rejoin,
we should scour replicas added by the obsoleted strong cache rejoin.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: unify nonce type
Yan, Zheng [Wed, 6 Nov 2013 01:28:51 +0000 (09:28 +0800)]
mds: unify nonce type

MDSCacheObject::replica_nonce is defined as __s16, but nonce type
in MDSCacheObject::replica_map is int. This mismatch may confuse
MDCache::handle_cache_expire().

this patch unifies the nonce type as uint32

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: rework stale import/export message detection
Yan, Zheng [Thu, 24 Oct 2013 09:10:59 +0000 (17:10 +0800)]
mds: rework stale import/export message detection

Current code uses import state to detect obsolete import/export messages.
it does not work for the case: cancel a subtree export, export the same
subtree again, the messages for the first export get dispatched.

This patch introduces "transation ID" for subtree exports. Each subtree
export has a unique TID, the ID is recorded in all import/export related
messages. By comparing the TID, we can reliably detect stale messages.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: put import/export related states together
Yan, Zheng [Thu, 24 Oct 2013 08:05:56 +0000 (16:05 +0800)]
mds: put import/export related states together

Current code uses several STL maps to record import/export related
states. A map lookup is required for each state access, this is not
efficient. It's better to put import/export related states together.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: freeze tree deadlock detection.
Yan, Zheng [Wed, 23 Oct 2013 01:15:58 +0000 (09:15 +0800)]
mds: freeze tree deadlock detection.

there are two situations that result freeze tree deadlock.

 - mds.0 authpins an item in subtree A
 - mds.0 sends request to mds.1 to authpin an item in subtree B
 - mds.0 freezes subtree A
 - mds.1 authpins an item in subtree B
 - mds.1 sends request to mds.0 to authpin an item in subtree A
 - mds.1 freezes subtree B
 - mds.1 receives the remote authpin request from mds.0
   (wait because subtree B is freezing)
 - mds.0 receives the remote authpin request from mds.1
   (wait because subtree A is freezing)

 - client request authpins items in subtree B
 - freeze subtree B
 - import subtree A which is parent of subtree B
   (authpins parent inode of subtree B, see CDir::set_dir_auth())
 - freeze subtree A
 - client request tries authpinning items in subtree A
   (wait because subtree A is freezing)

Enforcing a authpinning order can avoid the deadlock, but it's very
expensive. The deadlock is rare, so I think deadlock detection is
more suitable for the case.

This patch introduces freeze tree deadlock detection. We record the
start time of freezing tree. If we fail to freeze the tree within a
given duration, cancel the process of freezing tree.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoRevert "common/Formatter: add newline to flushed output if m_pretty"
Sage Weil [Mon, 16 Dec 2013 00:23:09 +0000 (16:23 -0800)]
Revert "common/Formatter: add newline to flushed output if m_pretty"

This reverts commit d6146b0d915f1420b5e76f7037f656460c314461.

As Yehuda points out, this does not properly handle cases where we flush
the same output stream multiple times.

11 years agoRevert "common: fix perf_counters unittests for trailing newline in m_pretty"
Sage Weil [Mon, 16 Dec 2013 00:22:59 +0000 (16:22 -0800)]
Revert "common: fix perf_counters unittests for trailing newline in m_pretty"

This reverts commit ba5572397c0e48378b0a0e556db1b2c02756617e.

11 years agoMerge pull request #716 from ceph/wip-formatter-newlines
Sage Weil [Sun, 15 Dec 2013 18:24:03 +0000 (10:24 -0800)]
Merge pull request #716 from ceph/wip-formatter-newlines

common/Formatter: add newline to flushed output if m_pretty

11 years agoMerge pull request #943 from dachary/wip-formatter-newlines 716/head
Sage Weil [Sun, 15 Dec 2013 18:23:33 +0000 (10:23 -0800)]
Merge pull request #943 from dachary/wip-formatter-newlines

common: fix perf_counters unittests for trailing newline in m_pretty

11 years agoMerge pull request #942 from sstock/master
Sage Weil [Sun, 15 Dec 2013 18:18:49 +0000 (10:18 -0800)]
Merge pull request #942 from sstock/master

Add -n option to mount.ceph, feature 7006

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoAdd -n option to mount.ceph. Required by autofs when /etc/mtab is a link to /proc... 942/head
Steve Stock [Sat, 14 Dec 2013 21:44:06 +0000 (16:44 -0500)]
Add -n option to mount.ceph.  Required by autofs when /etc/mtab is a link to /proc/mounts (e.g. Debian Wheezy), otherwise automounting a ceph file system fails.  Also useful when /etc is read-only.  feature 7006

Signed-off-by: Steve Stock <steve@technolope.org>
11 years agoMerge pull request #937 from christian-marie/master
Sage Weil [Sun, 15 Dec 2013 16:41:16 +0000 (08:41 -0800)]
Merge pull request #937 from christian-marie/master

Document librados's rados_write's behaviour in reguards to return value.

11 years agoMerge pull request #924 from dachary/wip-erasure-doc
Sage Weil [Sun, 15 Dec 2013 16:40:52 +0000 (08:40 -0800)]
Merge pull request #924 from dachary/wip-erasure-doc

doc: update erasure code development doc

11 years agoMerge pull request #946 from dachary/wip-80-column
Sage Weil [Sun, 15 Dec 2013 16:40:32 +0000 (08:40 -0800)]
Merge pull request #946 from dachary/wip-80-column

osd: format test_osd_types.cc to 80 columns

11 years agoMerge pull request #945 from dachary/wip-6981
Sage Weil [Sun, 15 Dec 2013 16:40:16 +0000 (08:40 -0800)]
Merge pull request #945 from dachary/wip-6981

ceph-disk: zap needs at least one device

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #944 from dachary/wip-6679
Sage Weil [Sun, 15 Dec 2013 16:39:55 +0000 (08:39 -0800)]
Merge pull request #944 from dachary/wip-6679

common: fix rare race condition in Throttle unit tests

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #948 from dachary/wip-6736-1
Sage Weil [Sun, 15 Dec 2013 16:32:41 +0000 (08:32 -0800)]
Merge pull request #948 from dachary/wip-6736-1

mon: typo s/degrated/degraded/

Backport: emperor, dumpling

11 years agomon: typo s/degrated/degraded/ 948/head
Loic Dachary [Sun, 15 Dec 2013 16:15:46 +0000 (17:15 +0100)]
mon: typo s/degrated/degraded/

http://tracker.ceph.com/issues/6736 refs #6736

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoosd: format test_osd_types.cc to 80 columns 946/head
Loic Dachary [Sun, 15 Dec 2013 15:23:53 +0000 (16:23 +0100)]
osd: format test_osd_types.cc to 80 columns

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoceph-disk: zap needs at least one device 945/head
Loic Dachary [Sun, 15 Dec 2013 14:34:17 +0000 (15:34 +0100)]
ceph-disk: zap needs at least one device

If given no argument, ceph-disk zap should display the usage instead of
silently doing nothing. Silence can be confused with "I zapped all the
disks".

http://tracker.ceph.com/issues/6981 fixes #6981

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocommon: fix rare race condition in Throttle unit tests 944/head
Loic Dachary [Sun, 15 Dec 2013 13:31:27 +0000 (14:31 +0100)]
common: fix rare race condition in Throttle unit tests

The thread created to test Throttle race conditions updates a value (
throttle.get_current() ) that is tested by the main gtest thread but is
not protected by a lock. Instead of adding a lock, the main thread tests
the value after pthread_join() on the child thread.

http://tracker.ceph.com/issues/6679 fixes #6679

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocommon: format Throttle test to 80 columns
Loic Dachary [Sun, 15 Dec 2013 13:30:38 +0000 (14:30 +0100)]
common: format Throttle test to 80 columns

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocommon: fix perf_counters unittests for trailing newline in m_pretty 943/head
Loic Dachary [Sun, 15 Dec 2013 12:24:14 +0000 (13:24 +0100)]
common: fix perf_counters unittests for trailing newline in m_pretty

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #929 from kazhang/add-pkg-config
Loic Dachary [Sun, 15 Dec 2013 11:26:21 +0000 (03:26 -0800)]
Merge pull request #929 from kazhang/add-pkg-config

add apt-get install pkg-config for ubuntu server

Reviewed-by: Loic Dachary <loic@dachary.org>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agodoc: Added additional comments on placement targets and default placement.
John Wilkins [Sat, 14 Dec 2013 00:09:35 +0000 (16:09 -0800)]
doc: Added additional comments on placement targets and default placement.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agodoc: Updates to federated config.
John Wilkins [Sat, 14 Dec 2013 00:08:37 +0000 (16:08 -0800)]
doc: Updates to federated config.

Reverted Emperor versionadded to Dumpling as it gets backported.
Added default index and bucket pools to pool creation
Added default default_placment setting
Added placement_pools key val pair examples.
Added comments for re-running the procedure for the secondary region.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agoMerge remote-tracking branch 'gh/wip-objecter-full-2'
Sage Weil [Fri, 13 Dec 2013 18:49:10 +0000 (10:49 -0800)]
Merge remote-tracking branch 'gh/wip-objecter-full-2'

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #936 from ceph/wip-rbd-single-major
Josh Durgin [Fri, 13 Dec 2013 18:40:11 +0000 (10:40 -0800)]
Merge pull request #936 from ceph/wip-rbd-single-major

rbd: support for single-major device number allocation scheme

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoMerge pull request #932 from ceph/wip-6979
Sage Weil [Fri, 13 Dec 2013 18:03:43 +0000 (10:03 -0800)]
Merge pull request #932 from ceph/wip-6979

replace sgdisk subprocess calls with a helper

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Fri, 13 Dec 2013 17:58:10 +0000 (09:58 -0800)]
Merge remote-tracking branch 'gh/next'

11 years agotest/libcephfs: release resources before umount
Yan, Zheng [Tue, 10 Dec 2013 23:38:18 +0000 (07:38 +0800)]
test/libcephfs: release resources before umount

Fixes: #6742
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agouse the new get_command helper in check_call 932/head
Alfredo Deza [Fri, 13 Dec 2013 17:06:25 +0000 (12:06 -0500)]
use the new get_command helper in check_call

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
11 years agorbd: modprobe with single_major=Y on newer kernels 936/head
Ilya Dryomov [Fri, 13 Dec 2013 15:40:52 +0000 (17:40 +0200)]
rbd: modprobe with single_major=Y on newer kernels

On kernels that support it, and if 'rbd map' is given a chance to
modprobe, turn on single-major device number allocation scheme.  For
users who for some reason don't want it, the workaround is to insert
the rbd module manually before executing the first 'rbd map' command.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agorbd: add support for single-major device number allocation scheme
Ilya Dryomov [Fri, 13 Dec 2013 15:40:52 +0000 (17:40 +0200)]
rbd: add support for single-major device number allocation scheme

With the preparatory commits ("rbd: match against wholedisk device
numbers on unmap" and "rbd: match against both major and minor on unmap
on kernels >= 3.14") in, this amounts to chosing to work with new rbd
bus interfaces (/sys/bus/rbd/{add,remove}_single_major) if they are
available, instead of the old ones (/sys/bus/rbd/{add,remove}).

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agorbd: match against both major and minor on unmap on newer kernels
Ilya Dryomov [Fri, 13 Dec 2013 15:40:52 +0000 (17:40 +0200)]
rbd: match against both major and minor on unmap on newer kernels

As described in commit "rbd: match against wholedisk device numbers on
unmap", currently we only match against major numbers.  In preparation
for support for single-major device number allocation scheme, start
matching against minor numbers also, which newer kernels provide in
a /sys/bus/rbd/devices/<id>/minor sysfs attribute.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agorbd: match against whole disks on unmap
Ilya Dryomov [Fri, 13 Dec 2013 15:40:52 +0000 (17:40 +0200)]
rbd: match against whole disks on unmap

Currently the way 'rbd unmap' translates a user-provided block device
into an rbd id is it matches the major number of the specified device
against /sys/bus/rbd/devices/<id>/major for each rbd mapping and
declares success on the first match.  This works for both entire disks
and partitions, because under the current device number allocation
scheme, each mapping means a new major number.

In preparation for support for single-major device number allocation
scheme, which would require matching both major and minor numbers, make
sure to always match against entire disk device numbers, by converting
the specified device major:minor pair into wholdedisk major:minor pair.
To achive that, use the libblkid library, which accomplishes this goal
by walking stable sysfs structures.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agorbd: switch to strict_strtol for major parsing
Ilya Dryomov [Fri, 13 Dec 2013 15:40:52 +0000 (17:40 +0200)]
rbd: switch to strict_strtol for major parsing

Use common/strict_strtol, which actually parses integers in a proper
way, instead of atoi for parsing /sys/bus/rbd/devices/<id>/major.  This
is important, because the kernel apparently can write things like
"(none)" into that file, and in general is more bulletproof.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agoDocument librados's rados_write's behaviour in reguards to return value. 937/head
Christian Marie [Fri, 13 Dec 2013 02:24:31 +0000 (13:24 +1100)]
Document librados's rados_write's behaviour in reguards to return value.

11 years agoMerge pull request #934 from cernceph/wip-rgw-ulimit
Sage Weil [Thu, 12 Dec 2013 17:42:21 +0000 (09:42 -0800)]
Merge pull request #934 from cernceph/wip-rgw-ulimit

radosgw: increase nofiles ulimit on sysvinit machines

11 years agoMerge pull request #935 from ceph/wip-vstart-memstore
Sage Weil [Thu, 12 Dec 2013 17:41:40 +0000 (09:41 -0800)]
Merge pull request #935 from ceph/wip-vstart-memstore

vstart.sh: add --memstore option

11 years agovstart.sh: add --memstore option 935/head
Yehuda Sadeh [Thu, 12 Dec 2013 17:31:53 +0000 (09:31 -0800)]
vstart.sh: add --memstore option

for setting memstore backed osds

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agouse the absolute path for executables if found
Alfredo Deza [Thu, 12 Dec 2013 16:16:38 +0000 (11:16 -0500)]
use the absolute path for executables if found

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
11 years agoremove trailing semicolon
Alfredo Deza [Thu, 12 Dec 2013 15:26:05 +0000 (10:26 -0500)]
remove trailing semicolon

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
11 years agoradosgw: increase nofiles ulimit on sysvinit machines 934/head
Dan van der Ster [Thu, 12 Dec 2013 13:53:13 +0000 (14:53 +0100)]
radosgw: increase nofiles ulimit on sysvinit machines

Clusters with many OSDs require a higher nofiles ulimit than the RHEL default. Increase it.

Tested-by: Dan van der Ster <daniel.vanderster@cern.ch>
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
11 years agodoc/release-notes: sort
Sage Weil [Thu, 12 Dec 2013 00:13:51 +0000 (16:13 -0800)]
doc/release-notes: sort

meh

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agodoc/release-notes: fix indentation; sigh
Sage Weil [Thu, 12 Dec 2013 00:11:00 +0000 (16:11 -0800)]
doc/release-notes: fix indentation; sigh

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agodoc/release-notes: v0.73
Sage Weil [Wed, 11 Dec 2013 23:59:45 +0000 (15:59 -0800)]
doc/release-notes: v0.73

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoPendingReleaseNotes: note CRUSH and hashpspool default changes
Sage Weil [Wed, 11 Dec 2013 23:39:37 +0000 (15:39 -0800)]
PendingReleaseNotes: note CRUSH and hashpspool default changes

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #930 from ceph/wip-hashpspool
Sage Weil [Wed, 11 Dec 2013 23:37:46 +0000 (15:37 -0800)]
Merge pull request #930 from ceph/wip-hashpspool

enable hashpspool by default

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoRevert "Partial revert "mon: osd pool set syntax relaxed, modify unit tests""
Greg Farnum [Wed, 11 Dec 2013 22:17:25 +0000 (14:17 -0800)]
Revert "Partial revert "mon: osd pool set syntax relaxed, modify unit tests""

This reverts commit e80ab94bf44e102fcd87d16dc11e38ca4c0eeadb.

We accept non-CephInt arguments again, now that we've got the monitors
handling differing APIs intelligently.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agomon/OSDMonitor: take 'osd pool set ...' value as a string again
Sage Weil [Wed, 4 Dec 2013 05:39:03 +0000 (21:39 -0800)]
mon/OSDMonitor: take 'osd pool set ...' value as a string again

We ran into problems before when we made this a string because a mixed
cluster of mons might forward a client request with the wrong schema.
To make this work, we make the new code understand both the new and
old schema, and also backport a change to emperor and dumpling to
handle the new schema.

For the previous attempt to do this, see:
 337195f04653eed8e8f153a5b074f3bd48408998
 2fe0d0d97af95c22db80800f5b9da51f672d9407

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #925 from ceph/wip-mon-api
Gregory Farnum [Wed, 11 Dec 2013 21:27:03 +0000 (13:27 -0800)]
Merge pull request #925 from ceph/wip-mon-api

Merge in changes to unify the API presented by the monitors and handle changes gracefully.

(Upgrade tests) Tested-by: Tamil Muthamizhan <tamil.muthamizhan@inktank.com>

Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agoreplace sgdisk subprocess calls with a helper
Alfredo Deza [Wed, 11 Dec 2013 20:41:45 +0000 (15:41 -0500)]
replace sgdisk subprocess calls with a helper

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
11 years agoosd: enable HASHPSPOOL by default 930/head
Sage Weil [Wed, 11 Dec 2013 19:19:37 +0000 (11:19 -0800)]
osd: enable HASHPSPOOL by default

Much like the CRUSH tunables, this first appears in kernel v3.9.

Unlike the CRUSH tunables, it does not appear in Ceph until v0.64
(post cuttlefish, pre dumpling).

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomon: if we're the leader, don't validate command matching 925/head
Greg Farnum [Tue, 10 Dec 2013 19:33:51 +0000 (11:33 -0800)]
mon: if we're the leader, don't validate command matching

Classic-format commands never match our leader command set!

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agomon: by default, warn if some members of the quorum are "classic"
Greg Farnum [Tue, 10 Dec 2013 18:56:33 +0000 (10:56 -0800)]
mon: by default, warn if some members of the quorum are "classic"

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoadd apt-get install pkg-config for ubuntu server 929/head
Kai Zhang [Wed, 11 Dec 2013 00:25:48 +0000 (16:25 -0800)]
add apt-get install pkg-config for ubuntu server

Signed-off-by: Kai Zhang <kaizh.pub@gmail.com>
11 years agoMemStore: update for the new ObjectStore interface
Greg Farnum [Tue, 10 Dec 2013 23:51:39 +0000 (15:51 -0800)]
MemStore: update for the new ObjectStore interface

68fdcfa1cc249af859400a2ce4590fefbb2f525b changed the ObjectStore
interface in the 'next' branch, which was merged into master by
e5a02c33e23e4fbdc7bf0f16a5bbff61f4e37186. Unfortunately the
Memstore (added via the master branch) was not corrected for this
interface change.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
11 years agoMerge branch 'next'
Gary Lowell [Tue, 10 Dec 2013 21:00:14 +0000 (21:00 +0000)]
Merge branch 'next'

11 years agoMerge pull request #927 from dachary/wip-crush-test
Gregory Farnum [Tue, 10 Dec 2013 20:25:07 +0000 (12:25 -0800)]
Merge pull request #927 from dachary/wip-crush-test

crush: remove crushtool test leftover

11 years agocrush: remove crushtool test leftover 927/head
Loic Dachary [Tue, 10 Dec 2013 19:35:34 +0000 (20:35 +0100)]
crush: remove crushtool test leftover

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #920 from dachary/wip-man
Sage Weil [Tue, 10 Dec 2013 19:10:41 +0000 (11:10 -0800)]
Merge pull request #920 from dachary/wip-man

man: Ceph is also an object store

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoElector: use monitor's encoded command sets instead of our own
Greg Farnum [Tue, 10 Dec 2013 18:23:03 +0000 (10:23 -0800)]
Elector: use monitor's encoded command sets instead of our own

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #865 from ceph/wip-doc-build-cluster
scuttlemonkey [Tue, 10 Dec 2013 18:14:59 +0000 (10:14 -0800)]
Merge pull request #865 from ceph/wip-doc-build-cluster

Wip doc build cluster

11 years agoMonitor: encode and expose mon command sets
Greg Farnum [Tue, 10 Dec 2013 18:06:36 +0000 (10:06 -0800)]
Monitor: encode and expose mon command sets

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoman: update man/ from doc/man/8 920/head
Loic Dachary [Sat, 7 Dec 2013 21:07:38 +0000 (22:07 +0100)]
man: update man/ from doc/man/8

As explained in admin/manpage-howto.txt

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoman: Ceph is also an object store
Loic Dachary [Sat, 7 Dec 2013 20:52:16 +0000 (21:52 +0100)]
man: Ceph is also an object store

Replace

   Ceph distributed file system

with

   Ceph distributed storage system

to help reduce the idea that Ceph is just a file system.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #923 from dachary/wip-crush-test
Sage Weil [Tue, 10 Dec 2013 17:06:31 +0000 (09:06 -0800)]
Merge pull request #923 from dachary/wip-crush-test

CrushTester patches and documentation

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoos/MemStore: do on_apply_sync callback synchronously
Sage Weil [Tue, 10 Dec 2013 16:56:35 +0000 (08:56 -0800)]
os/MemStore: do on_apply_sync callback synchronously

We can easily deadlock if we put this in the Finisher thread behind other
work; do it synchronously!

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agov0.73 v0.73
Gary Lowell [Tue, 10 Dec 2013 04:55:36 +0000 (04:55 +0000)]
v0.73

11 years agoElector: keep a list of classic mons instead of each mon's commands
Greg Farnum [Mon, 9 Dec 2013 23:30:57 +0000 (15:30 -0800)]
Elector: keep a list of classic mons instead of each mon's commands

We aren't actually using the sets, so don't bother keeping them.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agocrush: implement --show-bad-mappings for indep 923/head
Loic Dachary [Mon, 9 Dec 2013 13:35:00 +0000 (14:35 +0100)]
crush: implement --show-bad-mappings for indep

Support the presence of ITEM_NONE device numbers in the indep mapping as
proof of a bad mapping. Implement the associated unit tests.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocrush: add unitest for crushtool --show-bad-mappings
Loic Dachary [Mon, 9 Dec 2013 13:08:14 +0000 (14:08 +0100)]
crush: add unitest for crushtool --show-bad-mappings

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocrush: remove scary message string
Loic Dachary [Sun, 8 Dec 2013 21:39:18 +0000 (22:39 +0100)]
crush: remove scary message string

The string is no longer used and can be removed.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocrush: document the --test mode of operations
Loic Dachary [Sun, 8 Dec 2013 21:03:33 +0000 (22:03 +0100)]
crush: document the --test mode of operations

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMonitor: Elector: share the classic command set if we have a classic mon
Greg Farnum [Mon, 9 Dec 2013 16:44:05 +0000 (08:44 -0800)]
Monitor: Elector: share the classic command set if we have a classic mon

The leader now checks to see if any monitors did not provide their
command set, and if so, shares the list of "classic" commands instead
of his own set. This will prevent users from seeing different commands
(depending on whether they connect to an old or new mon) while
performing upgrades, and will make it really obvious if they forgot
to upgrade one of the monitors!

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoElector: share local command set when deferring
Greg Farnum [Mon, 9 Dec 2013 16:41:54 +0000 (08:41 -0800)]
Elector: share local command set when deferring

We're about to use this at a basic level, to identify when we have
"classic" monitors in-quorum, but could also do something more
sophisticated like a set intersection on the commands.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMonitor: import MonCommands.h from original Dumpling and expose it
Greg Farnum [Mon, 9 Dec 2013 06:17:39 +0000 (22:17 -0800)]
Monitor: import MonCommands.h from original Dumpling and expose it

If the Elector doesn't receive a set of commands from the elected leader, it
assumes the monitor is "classic" and uses the Dumpling command set as
the leader set.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMonitor: validate incoming commands against the leader's set too
Greg Farnum [Sat, 7 Dec 2013 03:08:13 +0000 (19:08 -0800)]
Monitor: validate incoming commands against the leader's set too

Then check against our own, and forward if we don't recognize it
or for some reason don't match.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMonitor: disseminate leader's command set instead of our own
Greg Farnum [Fri, 6 Dec 2013 22:55:13 +0000 (14:55 -0800)]
Monitor: disseminate leader's command set instead of our own

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoElector: transmit local api on election win, accept leader's on loss
Greg Farnum [Fri, 6 Dec 2013 22:08:48 +0000 (14:08 -0800)]
Elector: transmit local api on election win, accept leader's on loss

If we're the leader, just point to our local set. Disseminating these
will let peons advertise the full command set supported by the leader.
INCOMPLETE: does not yet handle winning Electors who do not send a command set.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agomessages: make room for passing supported monitor commands in MMonElection
Greg Farnum [Fri, 6 Dec 2013 21:13:03 +0000 (13:13 -0800)]
messages: make room for passing supported monitor commands in MMonElection

We're going to use this space to let leader tell everybody what
commands it supports.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMonitor: pull command mapping out of _allowed_command()
Greg Farnum [Sat, 7 Dec 2013 00:09:36 +0000 (16:09 -0800)]
Monitor: pull command mapping out of _allowed_command()

We want to be able to validate commands against both the leader and
local command sets, so make that functionality generic.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #918 from ceph/port/misc
Sage Weil [Mon, 9 Dec 2013 19:16:49 +0000 (11:16 -0800)]
Merge pull request #918 from ceph/port/misc

Misc portability patches

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #922 from dachary/wip-crush-choose-tries
Sage Weil [Mon, 9 Dec 2013 16:28:43 +0000 (08:28 -0800)]
Merge pull request #922 from dachary/wip-crush-choose-tries

crush: fix map->choose_tries boundary test

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agodoc: update erasure code development doc 924/head
Loic Dachary [Mon, 9 Dec 2013 14:17:54 +0000 (15:17 +0100)]
doc: update erasure code development doc

With a link to the tracker issue implementing the new indep mode.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocrush: --show-utilization* implies --show-statistics
Loic Dachary [Sun, 8 Dec 2013 18:45:28 +0000 (19:45 +0100)]
crush: --show-utilization* implies --show-statistics

--show-utilization* outputs only if --show-statistics is set, which is
confusing. Instead of failing, set --show-statistics to avoid the
confusion.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMonitor: add a separate leader_supported_commands
Greg Farnum [Fri, 6 Dec 2013 21:55:38 +0000 (13:55 -0800)]
Monitor: add a separate leader_supported_commands

This isn't used yet, but will be shortly.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMonitor: expose local monitor commands to other compilation units
Greg Farnum [Fri, 6 Dec 2013 21:48:42 +0000 (13:48 -0800)]
Monitor: expose local monitor commands to other compilation units

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMonCommand: add operator== and operator!=
Greg Farnum [Sat, 7 Dec 2013 02:19:32 +0000 (18:19 -0800)]
MonCommand: add operator== and operator!=

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMonCommand: support encode/decode
Greg Farnum [Fri, 6 Dec 2013 21:51:51 +0000 (13:51 -0800)]
MonCommand: support encode/decode

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoencoding: fix [encode|decode]_array_nohead
Greg Farnum [Sat, 7 Dec 2013 02:19:13 +0000 (18:19 -0800)]
encoding: fix [encode|decode]_array_nohead

We want to actually encode each element and keep it, rather than
writing each one at the position after the array end!

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agocrush: add CrushTester accessors
Loic Dachary [Sun, 8 Dec 2013 18:39:16 +0000 (19:39 +0100)]
crush: add CrushTester accessors

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocrush: output --show-bad-mappings on err
Loic Dachary [Sun, 8 Dec 2013 16:57:25 +0000 (17:57 +0100)]
crush: output --show-bad-mappings on err

Instead of using stdout so that it displays well when used in
conjunction with --show-statistics

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocrush: fix map->choose_tries boundary test 922/head
Loic Dachary [Sun, 8 Dec 2013 13:38:59 +0000 (14:38 +0100)]
crush: fix map->choose_tries boundary test

CrushWrapper::start_choose_profile allocates map->choose_tries with
choose_total_tries elements. When crush_choose_firstn sets a value, it
tests against map->choose_local_tries which could lead to memory
corruption if map->choose_total_tries is smaller than
map->choose_local_tries.

Another indesirable but non fatal side effect is that the output crushtool
--show-choose-tries will be truncated to choose_local_tries which is
set to a lower value than choose_total_tries by the default tuneables.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #869 from ceph/wip-crush
Sage Weil [Sun, 8 Dec 2013 04:59:22 +0000 (20:59 -0800)]
Merge pull request #869 from ceph/wip-crush

crush changes for erasure coding

Reviewed-by: Loic Dachary <loic@dachary.org>
Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agolibrbd: remove unused private variable 918/head
Noah Watkins [Sat, 7 Dec 2013 17:58:43 +0000 (09:58 -0800)]
librbd: remove unused private variable

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
11 years agoTrackedOp: remove unused private variable
Noah Watkins [Sat, 7 Dec 2013 17:54:53 +0000 (09:54 -0800)]
TrackedOp: remove unused private variable

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
11 years agolibrbd: rename howmany to avoid conflict
Noah Watkins [Sat, 7 Dec 2013 17:59:13 +0000 (09:59 -0800)]
librbd: rename howmany to avoid conflict

A howmany macro exists on some platforms in standard headers, but there
really isn't any sort of standard that I've found. We just avoid the
conflict entirely this way.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
11 years agoMerge pull request #917 from ceph/port/compat
Sage Weil [Sat, 7 Dec 2013 22:01:14 +0000 (14:01 -0800)]
Merge pull request #917 from ceph/port/compat

compat: define replacement TEMP_FAILURE_RETRY

Reviewed-by: Sage Weil <sage@inktank.com>