]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
9 years agoos/newstore: always create db.wal
Sage Weil [Mon, 16 Nov 2015 21:02:48 +0000 (16:02 -0500)]
os/newstore: always create db.wal

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/newstore: create db dir
Sage Weil [Mon, 16 Nov 2015 20:33:18 +0000 (15:33 -0500)]
os/newstore: create db dir

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/newstore: consume a raw block device
Sage Weil [Mon, 16 Nov 2015 18:35:37 +0000 (13:35 -0500)]
os/newstore: consume a raw block device

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/newstore: make collection_list tolerate sloppy start position
Sage Weil [Tue, 24 Nov 2015 19:12:05 +0000 (14:12 -0500)]
os/newstore: make collection_list tolerate sloppy start position

Because of this change (#6076), the hobject_t will contain pool id, hence
the ghobject_t having this hobject_t will be not equal to ghobject_t().

In newstore, this will cause assertion failure:
FAILED assert(k >= start_key && k < end_key)

The fix is to make compatible with previous change to create a
ghobject_t object with pool id and shard id in newstore.

Fixes: #13801
Reported-by: Zhi Zhang <zhangz.david@outlook.com>
Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/newstore: make key names more efficient
Sage Weil [Wed, 21 Oct 2015 20:45:11 +0000 (16:45 -0400)]
os/newstore: make key names more efficient

- pack u32 and u64 in binary (instead of in hex)
- avoid duplicating the object name while making things still
  sort by (key,name).  Use < when key < name, = when key == name,
  > when key > name) as a prefix.  And in the = case (which is
  basically always) include the name just once.

Note that this breaks on-disk compatibility.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/newstore: fix collection_list vs max entries
Sage Weil [Fri, 16 Oct 2015 16:41:50 +0000 (12:41 -0400)]
os/newstore: fix collection_list vs max entries

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/newstore: do not set/change frag_size if there are overlays
Sage Weil [Wed, 14 Oct 2015 12:41:39 +0000 (08:41 -0400)]
os/newstore: do not set/change frag_size if there are overlays

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/newstore: define a fid_backpointer_t type
Sage Weil [Tue, 6 Oct 2015 23:05:42 +0000 (19:05 -0400)]
os/newstore: define a fid_backpointer_t type

Signed-off-by: Sage Weil <sage@redhat.com>
fix wal_oP_t

9 years agoos/newstoer: add newstore types to ceph-dencoder
Sage Weil [Tue, 6 Oct 2015 23:01:17 +0000 (19:01 -0400)]
os/newstoer: add newstore types to ceph-dencoder

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/newstore: set alloc hint on new frags
Sage Weil [Tue, 6 Oct 2015 01:42:09 +0000 (21:42 -0400)]
os/newstore: set alloc hint on new frags

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/newstore: dump onode contents
Sage Weil [Tue, 6 Oct 2015 12:55:27 +0000 (08:55 -0400)]
os/newstore: dump onode contents

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/newstore: fixed fragment size
Sage Weil [Mon, 21 Sep 2015 01:56:50 +0000 (21:56 -0400)]
os/newstore: fixed fragment size

Instead of a single, variable-length fragment for each object,
set a fixed size (newstore_min_frag_size = 1 MB) and stripe the
object over these.  The last fragment will be smaller
than 1 MB if the object is not a multiple of 1 MB.

On write, this is basically free: we can just as cheaply write
4 inodes created together and fsync them than we can one.  On
overwrite, it allows us to replace individual fragments and avoid
write-ahead many cases.

On read it is a bit slower because of inode lookups and disk
seeks.  In the common case (big object written sequentially) we
hope that fs prefetching will hide most of it (e.g., all inodes
will be loaded together in the same metadata btree node, and the
files' data is written sequentially on disk).

Allowing for a singe large fragment in the case of a sequentially
written large object may save us something, but it complicates
the code significantly.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/newstore: recycle rocksdb log files
Sage Weil [Mon, 9 Nov 2015 22:14:45 +0000 (17:14 -0500)]
os/newstore: recycle rocksdb log files

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agorocksdb: latest master
Sage Weil [Mon, 9 Nov 2015 22:13:57 +0000 (17:13 -0500)]
rocksdb: latest master

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6649 from majianpeng/filesstore-lfnunlink
Sage Weil [Fri, 1 Jan 2016 14:49:52 +0000 (09:49 -0500)]
Merge pull request #6649 from majianpeng/filesstore-lfnunlink

osd: FileStore:: optimize lfn_unlink

Reviewed-by: Kefu Chai <kchai@redhat.com>
9 years agoMerge pull request #7017 from efirs/ef_atomic_ceph_tid
Sage Weil [Fri, 1 Jan 2016 14:48:10 +0000 (09:48 -0500)]
Merge pull request #7017 from efirs/ef_atomic_ceph_tid

osd: use atomic to generate ceph_tid

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
9 years agoMerge pull request #7077 from XinzeChi/wip-fix-wip-perf
Sage Weil [Fri, 1 Jan 2016 14:47:35 +0000 (09:47 -0500)]
Merge pull request #7077 from XinzeChi/wip-fix-wip-perf

osd: fix wip (l_osd_op_wip) perf counter and remove repop_map

Reviewed-by: Kefu Chai <kchai@redhat.com>
9 years agoMerge pull request #6893 from kylinstorage/wip-osd_command
Sage Weil [Thu, 31 Dec 2015 14:33:51 +0000 (09:33 -0500)]
Merge pull request #6893 from kylinstorage/wip-osd_command

librados: add c++ style osd/pg command interface

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #5630 from wonzhq/evict-after-flush
Sage Weil [Thu, 31 Dec 2015 14:33:26 +0000 (09:33 -0500)]
Merge pull request #5630 from wonzhq/evict-after-flush

osd: try evicting after flushing is done

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6639 from xiexingguo/xxg-wip-13822
Sage Weil [Thu, 31 Dec 2015 14:33:05 +0000 (09:33 -0500)]
Merge pull request #6639 from xiexingguo/xxg-wip-13822

librados: potential null pointer access in list_(n)objects

Reviewed-by: Kefu Chai <kchai@redhat.com>
9 years agoMerge pull request #6702 from liewegas/wip-fix-recency
Sage Weil [Thu, 31 Dec 2015 14:32:28 +0000 (09:32 -0500)]
Merge pull request #6702 from liewegas/wip-fix-recency

osd/ReplicatedPG: fix promotion recency logic

Reviewed-by: Samuel Just <sjust@redhat.com>
9 years agoMerge pull request #6824 from Sandy4999/wip-crushtool-build
Sage Weil [Thu, 31 Dec 2015 14:32:01 +0000 (09:32 -0500)]
Merge pull request #6824 from Sandy4999/wip-crushtool-build

crushtool: set type 0 name "device" for --build option

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6962 from liewegas/wip-buffer-lastp
Sage Weil [Thu, 31 Dec 2015 14:30:55 +0000 (09:30 -0500)]
Merge pull request #6962 from liewegas/wip-buffer-lastp

buffer: fix internal iterator invalidation on rebuild, get_contiguous

Reviewed-by: Samuel Just <sjust@redhat.com>
9 years agoMerge pull request #6970 from aiicore/drop_removal_pg_type
Sage Weil [Thu, 31 Dec 2015 14:30:12 +0000 (09:30 -0500)]
Merge pull request #6970 from aiicore/drop_removal_pg_type

osd: drop deprecated removal pg type

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoosd: remove repop_map in osd 7077/head
Xinze Chi [Tue, 29 Dec 2015 14:48:55 +0000 (22:48 +0800)]
osd: remove repop_map in osd

If I do not misread, repop_map is useless.

Signed-off-by: Xinze Chi <xinze@xsky.com>
9 years agoosd: fix wip (l_osd_op_wip) perf counter
Xinze Chi [Tue, 29 Dec 2015 14:00:31 +0000 (22:00 +0800)]
osd: fix wip (l_osd_op_wip) perf counter

The l_osd_op_wip is for osd, so it should be the sum of all pgs in osd

Signed-off-by: Xinze Chi <xinze@xsky.com>
9 years agoMerge pull request #6987 from H3C/wip-addr-bugfix
Kefu Chai [Mon, 28 Dec 2015 08:40:38 +0000 (16:40 +0800)]
Merge pull request #6987 from H3C/wip-addr-bugfix

common/address_help.cc: fix the leak in entity_addr_from_url()

Reviewed-by: Kefu Chai <kchai@redhat.com>
9 years agoMerge remote-tracking branch 'origin/jewel' 6763/head
Josh Durgin [Thu, 24 Dec 2015 00:32:00 +0000 (16:32 -0800)]
Merge remote-tracking branch 'origin/jewel'

9 years agocommon/address_help.cc: fix the leak in entity_addr_from_url() 6987/head
qiankunzheng [Wed, 23 Dec 2015 22:29:26 +0000 (17:29 -0500)]
common/address_help.cc: fix the leak in entity_addr_from_url()

Fixes: #14132
Signed-off-by: Qiankun Zheng <zheng.qiankun@h3c.com>
9 years agoMerge pull request #7026 from xdonghai/master
Josh Durgin [Wed, 23 Dec 2015 22:28:24 +0000 (14:28 -0800)]
Merge pull request #7026 from xdonghai/master

rbd: must specify both of stripe-unit and stripe-count when specifying stripingv2 feature

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #6998 from xiexingguo/xxg-wip-clsrbd
Josh Durgin [Wed, 23 Dec 2015 22:15:32 +0000 (14:15 -0800)]
Merge pull request #6998 from xiexingguo/xxg-wip-clsrbd

stringify outputted error code and fix unmatched parentheses.

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #6983 from xiexingguo/xxg-wip-14126
Josh Durgin [Wed, 23 Dec 2015 22:12:43 +0000 (14:12 -0800)]
Merge pull request #6983 from xiexingguo/xxg-wip-14126

librbd: small fixes for error messages and readahead counter

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #6298 from guangyy/wip-13441
Sage Weil [Wed, 23 Dec 2015 22:05:54 +0000 (17:05 -0500)]
Merge pull request #6298 from guangyy/wip-13441

osd: make list_missing query missing_loc.needs_recovery_map

Reviewed-by: Samuel Just <sjust@redhat.com>
9 years agoMerge pull request #6572 from liewegas/wip-crush-chooseleaf-stable
Sage Weil [Wed, 23 Dec 2015 22:05:23 +0000 (17:05 -0500)]
Merge pull request #6572 from liewegas/wip-crush-chooseleaf-stable

crush: add chooseleaf_stable tunable

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agoMerge pull request #7043 from dillaman/wip-14170-jewel
Josh Durgin [Wed, 23 Dec 2015 21:30:13 +0000 (13:30 -0800)]
Merge pull request #7043 from dillaman/wip-14170-jewel

librbd: do not ignore self-managed snapshot release result

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agolibrbd: do not ignore self-managed snapshot release result 7043/head
Jason Dillaman [Wed, 23 Dec 2015 18:57:44 +0000 (13:57 -0500)]
librbd: do not ignore self-managed snapshot release result

Fixes: #14170
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
9 years agoMerge pull request #7042 from dillaman/wip-14164-jewel
Josh Durgin [Wed, 23 Dec 2015 18:52:16 +0000 (10:52 -0800)]
Merge pull request #7042 from dillaman/wip-14164-jewel

librbd: properly handle replay of snap remove RPC message

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #7041 from dillaman/wip-14165
Josh Durgin [Wed, 23 Dec 2015 18:51:17 +0000 (10:51 -0800)]
Merge pull request #7041 from dillaman/wip-14165

qa/workunits: merge_diff shouldn't attempt to use striping

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #7040 from dillaman/wip-14092-jewel
Josh Durgin [Wed, 23 Dec 2015 18:48:43 +0000 (10:48 -0800)]
Merge pull request #7040 from dillaman/wip-14092-jewel

librbd: ensure librados callbacks are flushed prior to destroying

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #7036 from xiexingguo/xxg-wip-kvstore
David Zafman [Wed, 23 Dec 2015 18:47:01 +0000 (10:47 -0800)]
Merge pull request #7036 from xiexingguo/xxg-wip-kvstore

KeyValueStore: fix return code of mkfs

Reviewed-by: David Zafman <dzafman@redhat.com>
9 years agoMerge pull request #7020 from chenyehua11692/master
Josh Durgin [Wed, 23 Dec 2015 18:45:27 +0000 (10:45 -0800)]
Merge pull request #7020 from chenyehua11692/master

doc:adding "--allow-shrink" in decreasing the size of the rbd block to distinguish from the increasing option

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #7035 from dillaman/wip-14122-jewel
Josh Durgin [Wed, 23 Dec 2015 18:43:29 +0000 (10:43 -0800)]
Merge pull request #7035 from dillaman/wip-14122-jewel

librbd: clear error when older OSD doesn't support image flags

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agolibrbd: properly handle replay of snap remove RPC message 7042/head
Jason Dillaman [Wed, 23 Dec 2015 18:26:39 +0000 (13:26 -0500)]
librbd: properly handle replay of snap remove RPC message

Fixes: #14164
Backport: infernalis
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
9 years agoqa/workunits: merge_diff shouldn't attempt to use striping v2 7041/head
Jason Dillaman [Wed, 23 Dec 2015 17:54:47 +0000 (12:54 -0500)]
qa/workunits: merge_diff shouldn't attempt to use striping v2

Fixes: #14165
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
9 years agolibrbd: ensure librados callbacks are flushed prior to destroying image 7040/head
Jason Dillaman [Wed, 23 Dec 2015 17:06:50 +0000 (12:06 -0500)]
librbd: ensure librados callbacks are flushed prior to destroying image

Fixes: #14092
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
9 years agoKeyValueStore: fix return code of mkfs 7036/head
xiexingguo [Wed, 23 Dec 2015 14:49:59 +0000 (22:49 +0800)]
KeyValueStore: fix return code of mkfs

It shall return a negative result code instead.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
9 years agolibrbd: clear error when older OSD doesn't support image flags 7035/head
Jason Dillaman [Wed, 23 Dec 2015 14:41:32 +0000 (09:41 -0500)]
librbd: clear error when older OSD doesn't support image flags

Fixes: #14122
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
9 years agoMerge pull request #6996 from H3C/wip-mds-f11322
John Spray [Wed, 23 Dec 2015 13:49:14 +0000 (13:49 +0000)]
Merge pull request #6996 from H3C/wip-mds-f11322

mds: we should wait messenger when MDSDaemon suicide

Reviewed-by: John Spray <john.spray@redhat.com>
9 years agoMerge pull request #7015 from chipitsine/master
Kefu Chai [Wed, 23 Dec 2015 13:39:36 +0000 (21:39 +0800)]
Merge pull request #7015 from chipitsine/master

ceph-fuse: fix double free of args

Reviewed-by: Kefu Chai <kchai@redhat.com>
9 years agoMerge pull request #7004 from jazeltq/master
Sage Weil [Wed, 23 Dec 2015 13:36:26 +0000 (08:36 -0500)]
Merge pull request #7004 from jazeltq/master

doc: fix typo

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6870 from mslovy/wip-pgmeta-object
Sage Weil [Wed, 23 Dec 2015 13:28:50 +0000 (08:28 -0500)]
Merge pull request #6870 from mslovy/wip-pgmeta-object

osd: os: skip checking pg_meta object existance in FileStore

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6744 from majianpeng/scrub-fix
Sage Weil [Wed, 23 Dec 2015 13:27:35 +0000 (08:27 -0500)]
Merge pull request #6744 from majianpeng/scrub-fix

osd: release related sources when scrub is interrupted

Reviewed-by: Kefu Chai <kchai@redhat.com>
9 years agoMerge pull request #6961 from liewegas/wip-lockdep-max
Sage Weil [Wed, 23 Dec 2015 13:26:43 +0000 (08:26 -0500)]
Merge pull request #6961 from liewegas/wip-lockdep-max

common/lockdep: increase max lock names

Reviewed-by: Kefu Chai <kchai@redhat.com>
9 years agomds: we should wait messenger when MDSDaemon suicide 6996/head
Wei Feng [Mon, 21 Dec 2015 06:35:54 +0000 (01:35 -0500)]
mds: we should wait messenger when MDSDaemon suicide

Signed-off-by: Wei Feng <feng.wei@h3c.com>
9 years agoMerge pull request #7025 from tchaikov/wip-ceph-detect-init-py3
Loic Dachary [Wed, 23 Dec 2015 10:59:58 +0000 (11:59 +0100)]
Merge pull request #7025 from tchaikov/wip-ceph-detect-init-py3

ceph-detect-init: fix py3 test

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agoceph-detect-init: fix py3 test 7025/head
Kefu Chai [Wed, 23 Dec 2015 06:36:39 +0000 (14:36 +0800)]
ceph-detect-init: fix py3 test

* in python3, None can not be compared with a str. and
  we should always set codename with a non empty str, so update
  TestCephDetectInit.test_debian so it always set a code name.
  and assert(codename and distroname) in choose_init().

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agoMerge pull request #7023 from tchaikov/wip-doc-cachemode
Loic Dachary [Wed, 23 Dec 2015 08:12:49 +0000 (09:12 +0100)]
Merge pull request #7023 from tchaikov/wip-doc-cachemode

doc: document "readforward" and "readproxy" cache mode

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agorbd:must specify both of stripe-unit and stripe-count when specify striping features 7026/head
xudonghai [Wed, 23 Dec 2015 07:51:48 +0000 (15:51 +0800)]
rbd:must specify both of stripe-unit and stripe-count when specify striping features

when create a rbd image with striping features, Be sure specify both of stirpe-unit and stripe-count too, If not, refuse to excute and give a message

Signed-off-by:Donghai Xu <xu.donghai@h3c.com>

9 years agodoc: document "readforward" and "readproxy" cache mode 7023/head
Kefu Chai [Wed, 23 Dec 2015 05:43:15 +0000 (13:43 +0800)]
doc: document "readforward" and "readproxy" cache mode

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agodoc:adding "--allow-shrink" in decreasing the size of the rbd block to distinguish... 7020/head
Yehua [Wed, 23 Dec 2015 05:08:44 +0000 (13:08 +0800)]
doc:adding "--allow-shrink" in decreasing the size of the rbd block to distinguish from the increasing option

In the original file, the increasing and decreaing of the size of the rbd block shares the same option:
    "rbd resize --size 2048 foo".
However, it is not proper, as the "--allow-shrink" needs to be added while decreasing the size of the rbd block.
As a result, it is necessary to make a distinguish between these two options as follows:
    "rbd resize --size 2048 foo (to increase)"
    "rbd resize --size 2048 foo --allow-shrink (to decrease)"

Signed-off-by: Yehua <chen.yehua@h3c.com>
9 years agoceph doc fix slip of pen 7004/head
litianqing [Tue, 22 Dec 2015 02:32:05 +0000 (10:32 +0800)]
ceph doc fix slip of pen

mon client hung interval -> mon client hunt interval

Signed-off-by: tianqing <tianqing@unitedstack.com>
9 years agoceph-fuse: fix double free of args 7015/head
Ilya Shipitsin [Tue, 22 Dec 2015 09:02:36 +0000 (14:02 +0500)]
ceph-fuse: fix double free of args

Checking src/ceph_fuse.cc...
[src/ceph_fuse.cc:55]: (error) Memory pointed to by 'argv' is freed twice.
[src/ceph_fuse.cc:55]: (error) Deallocating a deallocated pointer: argv

Signed-off-by: Ilya Shipitsin <chipitsine@gmail.com>
9 years agoMerge pull request #6988 from xiexingguo/xxg-wip-14134
David Zafman [Tue, 22 Dec 2015 18:02:37 +0000 (10:02 -0800)]
Merge pull request #6988 from xiexingguo/xxg-wip-14134

FileJournal: fix return code of create method

Reviewed-by: David Zafman <dzafman@redhat.com>
9 years agoMerge pull request #7005 from YankunLi/patch-4
Orit Wasserman [Tue, 22 Dec 2015 15:26:13 +0000 (16:26 +0100)]
Merge pull request #7005 from YankunLi/patch-4

delete default zone

9 years agoMerge pull request #7006 from YankunLi/patch-5
Orit Wasserman [Tue, 22 Dec 2015 15:18:52 +0000 (16:18 +0100)]
Merge pull request #7006 from YankunLi/patch-5

correct radosgw-admin command

9 years agoMerge pull request #6997 from zhouyuan/evict_check_range
Sage Weil [Tue, 22 Dec 2015 14:01:17 +0000 (09:01 -0500)]
Merge pull request #6997 from zhouyuan/evict_check_range

osd: cache tier: add config option for eviction check list size

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6990 from linuxbox2/master-fixes
Kefu Chai [Tue, 22 Dec 2015 13:55:07 +0000 (21:55 +0800)]
Merge pull request #6990 from linuxbox2/master-fixes

Fixes some small issues

Reviewed-by: Kefu Chai <kchai@redhat.com>
9 years agocorrect radosgw-admin command 7006/head
YankunLi [Tue, 22 Dec 2015 05:56:50 +0000 (13:56 +0800)]
correct radosgw-admin command

the command to disable users quota should be 'radosgw-admin quota disable --quota-scope=<user | bucket> --uid=<uid>'

9 years agodelete default zone 7005/head
YankunLi [Tue, 22 Dec 2015 05:46:13 +0000 (13:46 +0800)]
delete default zone

if the default zone exist, delete the default zone from both the east and west pools.

9 years agoMerge remote-tracking branch 'origin/jewel'
Josh Durgin [Tue, 22 Dec 2015 00:59:51 +0000 (16:59 -0800)]
Merge remote-tracking branch 'origin/jewel'

9 years agoMerge pull request #6926 from dachary/wip-14080-ceph-disk-udevadm
Sage Weil [Mon, 21 Dec 2015 17:58:32 +0000 (12:58 -0500)]
Merge pull request #6926 from dachary/wip-14080-ceph-disk-udevadm

ceph-disk: fix failures when preparing disks with udev > 214

On CentOS 7.1 and other operating systems with a version of udev greater or equal to 214,
running ceph-disk prepare triggered unexpected removal and addition of partitions on
the disk being prepared. That created problems ranging from the OSD not being activated
to failures because /dev/sdb1 does not exist although it should.

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #7002 from dillaman/wip-14092-jewel
Josh Durgin [Mon, 21 Dec 2015 15:31:57 +0000 (07:31 -0800)]
Merge pull request #7002 from dillaman/wip-14092-jewel

tests: flush op work queue prior to destroying MockImageCtx

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #6986 from xiexingguo/xxg-wip-14129
Jason Dillaman [Mon, 21 Dec 2015 15:29:33 +0000 (10:29 -0500)]
Merge pull request #6986 from xiexingguo/xxg-wip-14129

librbd: fix snap_exists API return code overflow

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
9 years agotest: update test cases with the new snap_exists API 6986/head
xiexingguo [Sat, 19 Dec 2015 06:44:23 +0000 (14:44 +0800)]
test: update test cases with the new snap_exists API

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
9 years agoMerge pull request #6938 from wuxiangwei/wip-wxw-rbdfuserename
Jason Dillaman [Mon, 21 Dec 2015 14:55:08 +0000 (09:55 -0500)]
Merge pull request #6938 from wuxiangwei/wip-wxw-rbdfuserename

rbd-fuse: implement mv operation

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
9 years agoECBackend: fix unmatched parenttheses 6998/head
xiexingguo [Mon, 21 Dec 2015 10:43:19 +0000 (18:43 +0800)]
ECBackend: fix unmatched parenttheses

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
9 years agorbd: stringify outputted error code
xiexingguo [Mon, 21 Dec 2015 07:59:51 +0000 (15:59 +0800)]
rbd: stringify outputted error code

As it is more human-readable.

Signed-off-by: xie.xingguo <xie.xingguo@zte.com.cn>
9 years agotests: flush op work queue prior to destroying MockImageCtx 7002/head
Jason Dillaman [Mon, 21 Dec 2015 14:03:15 +0000 (09:03 -0500)]
tests: flush op work queue prior to destroying MockImageCtx

Fixes: #14092
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
9 years agoceph-disk: protect deactivate with activate lock 6926/head
Loic Dachary [Fri, 18 Dec 2015 23:53:03 +0000 (00:53 +0100)]
ceph-disk: protect deactivate with activate lock

When ceph-disk prepares the disk, it triggers udev events and each of
them ceph-disk activate. If systemctl stop ceph-osd@2 happens while
there still are ceph-disk activate in flight, the systemctl stop may be
cancelled by the systemctl enable issued by one of the pending ceph-disk
activate.

This only matters in a test environment where disks are destroyed
shortly after they are activated.

Signed-off-by: Loic Dachary <loic@dachary.org>
9 years agoceph-disk: use blkid instead of sgdisk -i
Loic Dachary [Fri, 18 Dec 2015 16:03:21 +0000 (17:03 +0100)]
ceph-disk: use blkid instead of sgdisk -i

sgdisk -i 1 /dev/vdb opens /dev/vdb in write mode which indirectly
triggers a BLKRRPART ioctl from udev (starting version 214 and up) when
the device is closed (see below for the udev release note). The
implementation of this ioctl by the kernel (even old kernels) removes
all partitions and adds them again (similar to what partprobe does
explicitly).

The side effects of partitions disappearing while ceph-disk is running
are devastating.

sgdisk is replaced by blkid which only opens the device in read mode and
will not trigger this unexpected behavior.

The problem does not show on Ubuntu 14.04 because it is running udev <
214 but shows on CentOS 7 which is running udev > 214.

git clone git://anonscm.debian.org/pkg-systemd/systemd.git
systemd/NEWS:
CHANGES WITH 214:

        * As an experimental feature, udev now tries to lock the
          disk device node (flock(LOCK_SH|LOCK_NB)) while it
          executes events for the disk or any of its partitions.
          Applications like partitioning programs can lock the
          disk device node (flock(LOCK_EX)) and claim temporary
          device ownership that way; udev will entirely skip all event
          handling for this disk and its partitions. If the disk
          was opened for writing, the close will trigger a partition
          table rescan in udev's "watch" facility, and if needed
          synthesize "change" events for the disk and all its partitions.
          This is now unconditionally enabled, and if it turns out to
          cause major problems, we might turn it on only for specific
          devices, or might need to disable it entirely. Device Mapper
          devices are excluded from this logic.

http://tracker.ceph.com/issues/14094 Fixes: #14094

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
9 years agoceph-disk: dereference symlinks in destroy and zap
Loic Dachary [Wed, 16 Dec 2015 14:57:03 +0000 (15:57 +0100)]
ceph-disk: dereference symlinks in destroy and zap

The behavior of partprobe or sgdisk may be subtly different if given a
symbolic link to a device instead of an actual device. The debug output
is also more confusing when the symlink shows instead of the device it
points to.

Always dereference the symlink before running destroy and zap.

Signed-off-by: Loic Dachary <loic@dachary.org>
9 years agoceph-disk: increase partprobe / udevadm settle timeouts
Loic Dachary [Wed, 16 Dec 2015 11:33:25 +0000 (12:33 +0100)]
ceph-disk: increase partprobe / udevadm settle timeouts

The default of 120 seconds may be exceeded when the disk is very slow
which can happen in cloud environments. Increase it to 600 seconds
instead.

The partprobe command may fail for the same reason but it does not have
a timeout parameter. Instead, try a few times before failing.

The udevadm settle guarding partprobe are not necessary because
partprobe already does the same. However, partprobe does not provide a
way to control the timeout. Having a udevadm settle after another is
going to be a noop most of the time and not add any delay. It matters
when the udevadm settle run by partprobe fails with a timeout because
partprobe will silentely ignores the failure.

http://tracker.ceph.com/issues/14080 Fixes: #14080

Signed-off-by: Loic Dachary <loic@dachary.org>
9 years agotests: ceph-disk workunit increase verbosity
Loic Dachary [Wed, 16 Dec 2015 11:36:47 +0000 (12:36 +0100)]
tests: ceph-disk workunit increase verbosity

So that reading the teuthology log is enough in most cases to figure out
the cause of the error.

Signed-off-by: Loic Dachary <loic@dachary.org>
9 years agoceph-disk: fix typo
Loic Dachary [Wed, 16 Dec 2015 11:31:03 +0000 (12:31 +0100)]
ceph-disk: fix typo

Signed-off-by: Loic Dachary <loic@dachary.org>
9 years agoceph-disk: log parted output
Loic Dachary [Wed, 16 Dec 2015 11:30:20 +0000 (12:30 +0100)]
ceph-disk: log parted output

Should parted output fail to parse, it is useful to get the full output
when running in verbose mode.

Signed-off-by: Loic Dachary <loic@dachary.org>
9 years agoceph-disk: do not discard stderr
Loic Dachary [Wed, 16 Dec 2015 11:29:17 +0000 (12:29 +0100)]
ceph-disk: do not discard stderr

Signed-off-by: Loic Dachary <loic@dachary.org>
9 years agocache-tier: Allow to config eviction check max size 6997/head
Yuan Zhou [Mon, 21 Dec 2015 07:30:44 +0000 (15:30 +0800)]
cache-tier: Allow to config eviction check max size

This patch adds the option for eviction check size in cache-tier.
On a busy setup, it's better to check bigger number of objects so
the eviction is faster.

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
9 years agoDispatcher.h: include assert.h 6990/head
Matt Benjamin [Sun, 20 Dec 2015 18:32:13 +0000 (13:32 -0500)]
Dispatcher.h: include assert.h

The inline ms_fast_dispatch implementation calls assert, which
puts asset.h in the interface.

The fact that many Dispatcher descendants don't implement
ms_fast_dispatch prevents making the method pure virtual--which
suggests that maybe there is a need for a FastDispatcher
interface that inherits Dispatcher and introduces ms_fast_dispatch
and related?

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
9 years agoxio: avoid conversion warning w/xio_queue_depth
Matt Benjamin [Sun, 20 Dec 2015 18:14:41 +0000 (13:14 -0500)]
xio: avoid conversion warning w/xio_queue_depth

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
9 years agoxio: remove static declspec on buffer::create_msg
Matt Benjamin [Sun, 20 Dec 2015 17:38:14 +0000 (12:38 -0500)]
xio: remove static declspec on buffer::create_msg

Likely an update missed from PR #6686.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
9 years agoFileJournal: fix return code of create method 6988/head
xiexingguo [Sun, 20 Dec 2015 13:41:46 +0000 (21:41 +0800)]
FileJournal: fix return code of create method

Shall return negative error code instead.

Fixes: #14134
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
9 years agolibrbd: fix wrongly reported error code 6983/head
xiexingguo [Sun, 20 Dec 2015 12:44:03 +0000 (20:44 +0800)]
librbd: fix wrongly reported error code

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
9 years agoTest: add tests for corrupted list/nlist process 6639/head
xiexingguo [Sun, 20 Dec 2015 09:50:26 +0000 (17:50 +0800)]
Test: add tests for corrupted list/nlist process

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
9 years agoObjecter: potential null pointer access in list_(n)objects.
xiexingguo [Wed, 18 Nov 2015 09:57:17 +0000 (17:57 +0800)]
Objecter: potential null pointer access in list_(n)objects.

In list_objects and list_nobjects, we are possibly access a null returned pointer from the osdmap->get_pg_pool() call.
Fixes: #13822
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
9 years agorbd-fuse: discard space restriction for mv operation 6938/head
wuxiangwei [Sun, 20 Dec 2015 09:14:25 +0000 (04:14 -0500)]
rbd-fuse: discard space restriction for mv operation

discard space restriction for destination image name of
the mv operation.

Signed-off-by: Xiangwei Wu wuxiangwei@h3c.com
9 years agolibrbd: string standard error number
xiexingguo [Sat, 19 Dec 2015 04:21:00 +0000 (12:21 +0800)]
librbd: string standard error number

It's more human-readable.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
9 years agolibrbd: fix wrong tip message
xiexingguo [Sat, 19 Dec 2015 03:23:19 +0000 (11:23 +0800)]
librbd: fix wrong tip message

As equivalent size is ok for copy.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
9 years agotools: replace snap_exists with a new safer version
xiexingguo [Sat, 19 Dec 2015 06:53:47 +0000 (14:53 +0800)]
tools: replace snap_exists with a new safer version

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
9 years agolibrbd: fix snap_exists API overflow issue
xiexingguo [Sun, 20 Dec 2015 08:19:59 +0000 (16:19 +0800)]
librbd: fix snap_exists API overflow issue

The original one may overflow and thus not be safe.

Fixes: #14129
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
9 years agolibrbd: fix readahead counter update logic
xiexingguo [Sat, 19 Dec 2015 02:29:59 +0000 (10:29 +0800)]
librbd: fix readahead counter update logic

Fixes: #14127
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>