]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
9 years agoMerge pull request #8010 from dillaman/wip-15032-infernalis
Josh Durgin [Thu, 10 Mar 2016 02:57:32 +0000 (18:57 -0800)]
Merge pull request #8010 from dillaman/wip-15032-infernalis

librbd: possible QEMU deadlock after creating image snapshots

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agolibrbd: complete cache reads on cache's dedicate thread 8010/head
Jason Dillaman [Wed, 9 Mar 2016 23:00:04 +0000 (18:00 -0500)]
librbd: complete cache reads on cache's dedicate thread

If a snapshot is created out-of-band, the next IO will result in the
cache being flushed.  If pending writeback data performs a copy-on-write,
the read from the parent will be blocked.

Fixes: #15032
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
9 years agotest: reproducer for writeback CoW deadlock
Jason Dillaman [Wed, 9 Mar 2016 22:31:06 +0000 (17:31 -0500)]
test: reproducer for writeback CoW deadlock

Refs: #14988

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 16b6efdd24b25ba1f6bc658681afa3d0878eb397)

9 years agoMerge remote-tracking branch 'gh/infernalis' into infernalis
Alfredo Deza [Fri, 26 Feb 2016 13:23:58 +0000 (08:23 -0500)]
Merge remote-tracking branch 'gh/infernalis' into infernalis

9 years agoMerge pull request #7423 from Abhishekvrshny/wip-14324-infernalis
Loic Dachary [Thu, 25 Feb 2016 04:08:07 +0000 (11:08 +0700)]
Merge pull request #7423 from Abhishekvrshny/wip-14324-infernalis

infernalis: rgw: radosgw-admin bucket check --fix not work

Reviewed-by: Yehuda Sadeh <ysadehwe@redhat.com>
9 years agoMerge pull request #7424 from Abhishekvrshny/wip-13887-infernalis
Loic Dachary [Thu, 25 Feb 2016 04:07:44 +0000 (11:07 +0700)]
Merge pull request #7424 from Abhishekvrshny/wip-13887-infernalis

infernalis: rgw: orphans finish segfaults

Reviewed-by: Yehuda Sadeh <ysadehwe@redhat.com>
9 years ago9.2.1 v9.2.1
Jenkins Build Slave User [Wed, 24 Feb 2016 22:07:26 +0000 (22:07 +0000)]
9.2.1

9 years agoMerge pull request #7484 from dillaman/wip-14610-infernalis
Loic Dachary [Thu, 11 Feb 2016 15:32:53 +0000 (22:32 +0700)]
Merge pull request #7484 from dillaman/wip-14610-infernalis

librbd: flattening an rbd image with active IO can lead to hang

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #7406 from dillaman/wip-14542-infernalis
Loic Dachary [Thu, 11 Feb 2016 15:32:41 +0000 (22:32 +0700)]
Merge pull request #7406 from dillaman/wip-14542-infernalis

librbd: ImageWatcher shouldn't block the notification thread

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #6981 from dillaman/wip-14062-infernalis
Loic Dachary [Thu, 11 Feb 2016 15:32:25 +0000 (22:32 +0700)]
Merge pull request #6981 from dillaman/wip-14062-infernalis

librbd: fix merge-diff for >2GB diff-files

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agolibrbd: ensure librados callbacks are flushed prior to destroying image 7484/head
Jason Dillaman [Wed, 23 Dec 2015 17:06:50 +0000 (12:06 -0500)]
librbd: ensure librados callbacks are flushed prior to destroying image

Fixes: #14092
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 98157ab3274bd960e4487e34f5a83e9c921a6ac8)

9 years agolibrbd: simplify IO flush handling
Jason Dillaman [Fri, 31 Jul 2015 02:31:55 +0000 (22:31 -0400)]
librbd: simplify IO flush handling

Add a new convenience method to ImageCtx for handling flush
requests and cleanup flush handling with dealing with the cache.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(based on commit ee7c6f73992d3b09c6b401fbb782b2151f2399c7)

9 years agoWorkQueue: PointerWQ drain no longer waits for other queues
Jason Dillaman [Fri, 14 Aug 2015 17:28:13 +0000 (13:28 -0400)]
WorkQueue: PointerWQ drain no longer waits for other queues

If another (independent) queue was processing, drain could
block waiting.  Instead, allow drain to exit quickly if
no items are being processed and the queue is empty for
the current WQ.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit b118d7df1e34387b6e5649a5b205cf061598d0d4)

9 years agotest: new librbd flatten test case
Jason Dillaman [Tue, 2 Feb 2016 15:54:53 +0000 (10:54 -0500)]
test: new librbd flatten test case

AIO operations after a flatten operation were previously
hanging during the close of the parent image.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 5b3a4d2cbca51e5c6795ba7d1189920c7d9af806)

9 years agolibrbd: ImageWatcher shouldn't block the notification thread 7406/head
Jason Dillaman [Thu, 28 Jan 2016 19:38:20 +0000 (14:38 -0500)]
librbd: ImageWatcher shouldn't block the notification thread

Blocking the notification thread will also result in librados async
callbacks becoming blocked (since they use the same thread).

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 6f94bde44500cc4592ac9a842cbb150b8cabf96b)

Conflicts:
    src/librbd/ImageWatcher.[cc|h]: fewer RPC messages and synchronous
                                    snapshot actions

9 years agolibrados_test_stub: watch/notify now behaves similar to librados
Jason Dillaman [Thu, 28 Jan 2016 19:35:54 +0000 (14:35 -0500)]
librados_test_stub: watch/notify now behaves similar to librados

Notifications are executed via the same librados AIO callback
thread, so it's now possible to catch deadlock.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 0a3822f1559ba3fe3def6a65883b9c6c7c5a33fe)

9 years agotests: simulate writeback flush during snap create
Jason Dillaman [Thu, 28 Jan 2016 17:40:18 +0000 (12:40 -0500)]
tests: simulate writeback flush during snap create

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 5e564ea9f869b987f3ada2465edfbe5edf9f6435)

9 years agolibrbd: fix merge-diff for >2GB diff-files 6981/head
Jason Dillaman [Fri, 18 Dec 2015 20:22:13 +0000 (15:22 -0500)]
librbd: fix merge-diff for >2GB diff-files

Fixes: #14062
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(derived from commit 68125dd01349edf93cfa1af5028c2d438b5ae089)

9 years agoMerge pull request #6629 from Abhishekvrshny/wip-13733-infernalis
Loic Dachary [Thu, 11 Feb 2016 06:59:30 +0000 (13:59 +0700)]
Merge pull request #6629 from Abhishekvrshny/wip-13733-infernalis

rbd: misdirected op in rbd balance-reads test

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #7431 from Abhishekvrshny/wip-14067-infernalis
Loic Dachary [Wed, 10 Feb 2016 06:00:23 +0000 (13:00 +0700)]
Merge pull request #7431 from Abhishekvrshny/wip-14067-infernalis

infernalis : Ceph file system is not freeing space

Reviewed-by: Yan, Zheng <zyan@redhat.com>
9 years agoMerge pull request #7429 from Abhishekvrshny/wip-14490-infernalis
Loic Dachary [Wed, 10 Feb 2016 05:59:30 +0000 (12:59 +0700)]
Merge pull request #7429 from Abhishekvrshny/wip-14490-infernalis

infernalis: fsx failed to compile

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agoMerge pull request #6853 from Abhishekvrshny/wip-13889-infernalis
Loic Dachary [Wed, 10 Feb 2016 05:57:48 +0000 (12:57 +0700)]
Merge pull request #6853 from Abhishekvrshny/wip-13889-infernalis

infernalis: Segmentation fault accessing file using fuse mount

Reviewed-by: Yan, Zheng <zyan@redhat.com>
9 years agoMerge pull request #6752 from ukernel/infernalis-11482
Loic Dachary [Wed, 10 Feb 2016 05:57:09 +0000 (12:57 +0700)]
Merge pull request #6752 from ukernel/infernalis-11482

mds: fix client capabilities during reconnect (client.XXXX isn't responding to mclientcaps warning)

Reviewed-by: Yan, Zheng <zyan@redhat.com>
9 years agoMerge pull request #6628 from Abhishekvrshny/wip-13792-infernalis
Loic Dachary [Wed, 10 Feb 2016 05:53:34 +0000 (12:53 +0700)]
Merge pull request #6628 from Abhishekvrshny/wip-13792-infernalis

rbd-replay-* moved from ceph-test-dbg to ceph-common-dbg as well

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agoMerge pull request #7079 from Abhishekvrshny/wip-14199-infernalis
Loic Dachary [Tue, 9 Feb 2016 04:58:19 +0000 (11:58 +0700)]
Merge pull request #7079 from Abhishekvrshny/wip-14199-infernalis

infernalis: [ FAILED ] TestLibRBD.SnapRemoveViaLockOwner

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #7080 from Abhishekvrshny/wip-14142-infernalis
Loic Dachary [Tue, 9 Feb 2016 04:58:01 +0000 (11:58 +0700)]
Merge pull request #7080 from Abhishekvrshny/wip-14142-infernalis

infernalis: Verify self-managed snapshot functionality on image create

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #7428 from Abhishekvrshny/wip-14321-infernalis
Loic Dachary [Tue, 9 Feb 2016 04:57:37 +0000 (11:57 +0700)]
Merge pull request #7428 from Abhishekvrshny/wip-14321-infernalis

infernalis: cls_rbd: object_map_save should enable checksums

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #7427 from Abhishekvrshny/wip-14465-infernalis
Loic Dachary [Mon, 8 Feb 2016 15:23:18 +0000 (22:23 +0700)]
Merge pull request #7427 from Abhishekvrshny/wip-14465-infernalis

infernalis: rbd-replay does not check for EOF and goes to endless loop

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
9 years agoMerge pull request #7426 from Abhishekvrshny/wip-14552-infernalis
Loic Dachary [Mon, 8 Feb 2016 15:23:01 +0000 (22:23 +0700)]
Merge pull request #7426 from Abhishekvrshny/wip-14552-infernalis

infernalis: rbd: TaskFinisher::cancel should remove event from SafeTimer

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
9 years agoMerge pull request #6397 from SUSE/wip-13615-infernalis
Sage Weil [Mon, 8 Feb 2016 13:49:41 +0000 (08:49 -0500)]
Merge pull request #6397 from SUSE/wip-13615-infernalis

OSD::build_past_intervals_parallel() shall reset primary and up_primary when begin a new past_interval.

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6840 from SUSE/wip-13791-infernalis
Sage Weil [Mon, 8 Feb 2016 13:49:17 +0000 (08:49 -0500)]
Merge pull request #6840 from SUSE/wip-13791-infernalis

Objecter: potential null pointer access when do pool_snap_list.

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6851 from Abhishekvrshny/wip-14018-infernalis
Sage Weil [Mon, 8 Feb 2016 13:48:49 +0000 (08:48 -0500)]
Merge pull request #6851 from Abhishekvrshny/wip-14018-infernalis

infernalis: osd/PG.cc: 288: FAILED assert(info.last_epoch_started >= info.history.last_epoch_started)

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6849 from Abhishekvrshny/wip-13979-infernalis
Sage Weil [Mon, 8 Feb 2016 13:48:25 +0000 (08:48 -0500)]
Merge pull request #6849 from Abhishekvrshny/wip-13979-infernalis

osd: call on_new_interval on newly split child PG

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6907 from Abhishekvrshny/wip-13929-infernalis
Sage Weil [Mon, 8 Feb 2016 13:48:03 +0000 (08:48 -0500)]
Merge pull request #6907 from Abhishekvrshny/wip-13929-infernalis

infernalis: Ceph Pools' MAX AVAIL is 0 if some OSDs' weight is 0

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #7421 from Abhishekvrshny/wip-14494-infernalis
Sage Weil [Mon, 8 Feb 2016 13:47:36 +0000 (08:47 -0500)]
Merge pull request #7421 from Abhishekvrshny/wip-14494-infernalis

infernalis: pgs stuck inconsistent after infernalis upgrade

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6627 from Abhishekvrshny/wip-13771-infernalis
Sage Weil [Mon, 8 Feb 2016 13:46:25 +0000 (08:46 -0500)]
Merge pull request #6627 from Abhishekvrshny/wip-13771-infernalis

Objecter: pool op callback may hang forever.

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #7543 from SUSE/wip-14676-infernalis
Loic Dachary [Mon, 8 Feb 2016 11:18:07 +0000 (18:18 +0700)]
Merge pull request #7543 from SUSE/wip-14676-infernalis

infernalis: rgw: radosgw-admin --help doesn't show the orphans find command

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agoMerge pull request #6993 from badone/wip-13993-infernalis
Loic Dachary [Mon, 8 Feb 2016 04:22:28 +0000 (11:22 +0700)]
Merge pull request #6993 from badone/wip-13993-infernalis

log: Log.cc: Assign LOG_DEBUG priority to syslog calls

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agoMerge pull request #6882 from dachary/wip-13988-reuse-osd-id-infernalis
Loic Dachary [Mon, 8 Feb 2016 04:21:11 +0000 (11:21 +0700)]
Merge pull request #6882 from dachary/wip-13988-reuse-osd-id-infernalis

tests: verify it is possible to reuse an OSD id

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agoMerge pull request #6852 from Abhishekvrshny/wip-14013-infernalis
Loic Dachary [Mon, 8 Feb 2016 04:18:35 +0000 (11:18 +0700)]
Merge pull request #6852 from Abhishekvrshny/wip-14013-infernalis

infernalis: systemd/ceph-disk@.service assumes /bin/flock

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agoMerge pull request #6846 from Abhishekvrshny/wip-13638-infernalis
Loic Dachary [Mon, 8 Feb 2016 04:16:23 +0000 (11:16 +0700)]
Merge pull request #6846 from Abhishekvrshny/wip-13638-infernalis

FileStore: potential memory leak if getattrs fails.

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agoMerge pull request #6836 from SUSE/wip-13891-infernalis
Loic Dachary [Mon, 8 Feb 2016 04:14:18 +0000 (11:14 +0700)]
Merge pull request #6836 from SUSE/wip-13891-infernalis

infernalis: auth/cephx: large amounts of log are produced by osd

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agoMerge pull request #6833 from SUSE/wip-13935-infernalis
Loic Dachary [Mon, 8 Feb 2016 04:12:47 +0000 (11:12 +0700)]
Merge pull request #6833 from SUSE/wip-13935-infernalis

infernalis: Ceph daemon failed to start, because the service name was already used.

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agoMerge pull request #6694 from xiexingguo/xxg-wip-13869
Loic Dachary [Mon, 8 Feb 2016 04:12:00 +0000 (11:12 +0700)]
Merge pull request #6694 from xiexingguo/xxg-wip-13869

osd: fix race condition during send_failures

Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Loic Dachary <ldachary@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
9 years agoMerge pull request #6626 from Abhishekvrshny/wip-13655-infernalis
Loic Dachary [Mon, 8 Feb 2016 04:09:30 +0000 (11:09 +0700)]
Merge pull request #6626 from Abhishekvrshny/wip-13655-infernalis

crush: crash if we see CRUSH_ITEM_NONE in early rule step

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agoMerge pull request #6449 from dachary/wip-13671-infernalis
Loic Dachary [Mon, 8 Feb 2016 04:06:41 +0000 (11:06 +0700)]
Merge pull request #6449 from dachary/wip-13671-infernalis

tests: testprofile must be removed before it is re-created

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agorgw-admin: document orphans commands in usage 7543/head
Yehuda Sadeh [Tue, 2 Feb 2016 00:33:55 +0000 (16:33 -0800)]
rgw-admin: document orphans commands in usage

Fixes: #14516
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 105a76bf542e05b739d5a03ca8ae55432350f107)

9 years agoMerge pull request #6880 from dachary/wip-14044-infernalis
Sage Weil [Thu, 4 Feb 2016 21:23:51 +0000 (16:23 -0500)]
Merge pull request #6880 from dachary/wip-14044-infernalis

infernalis: ceph-disk list fails on /dev/cciss!c0d0

9 years agoMerge pull request #6392 from SUSE/wip-13589-infernalis
Sage Weil [Fri, 29 Jan 2016 14:05:14 +0000 (09:05 -0500)]
Merge pull request #6392 from SUSE/wip-13589-infernalis

mon: should not set isvalid = true when cephx_verify_authorizer retur…

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6500 from SUSE/wip-13678-infernalis
Sage Weil [Fri, 29 Jan 2016 13:55:45 +0000 (08:55 -0500)]
Merge pull request #6500 from SUSE/wip-13678-infernalis

systemd: no rbdmap systemd unit file

9 years agomds: properly set STATE_STRAY/STATE_ORPHAN for stray dentry/inode 7431/head
Yan, Zheng [Thu, 12 Nov 2015 13:57:27 +0000 (21:57 +0800)]
mds: properly set STATE_STRAY/STATE_ORPHAN for stray dentry/inode

Fixes: #13777
Signed-off-by: Yan, Zheng <zyan@redhat.com>
(cherry picked from commit 460c74a0b872336a7279f0b40b17ed672b6e15a1)

9 years agomon: don't require OSD W for MRemoveSnaps
John Spray [Mon, 16 Nov 2015 10:57:56 +0000 (10:57 +0000)]
mon: don't require OSD W for MRemoveSnaps

Use ability to execute "osd pool rmsnap" command
as a signal that the client should be permitted
to send MRemoveSnaps too.

Note that we don't also require the W ability,
unlike Monitor::_allowed_command -- this is slightly
more permissive handling, but anyone crafting caps
that explicitly permit "osd pool rmsnap" needs to
know what they are doing.

Fixes: #13777
Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 0b474c52abd3d528c041544f73b1d27d7d1b1320)

9 years agofsx: checkout old version until it compiles properly on miras 7429/head
Greg Farnum [Wed, 13 Jan 2016 21:17:53 +0000 (13:17 -0800)]
fsx: checkout old version until it compiles properly on miras

I sent a patch to xfstests upstream at
http://article.gmane.org/gmane.comp.file-systems.fstests/1665, but
until that's fixed we need a version that works in our test lab.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
(cherry picked from commit 7d52372ae74878ebd001036ff0a7aad525eb15b6)

9 years agocls_rbd: enable object map checksums for object_map_save 7428/head
Douglas Fuller [Thu, 7 Jan 2016 19:01:19 +0000 (11:01 -0800)]
cls_rbd: enable object map checksums for object_map_save

object_map_save disables CRCs when an object map footer isn't provided.
Unconditionally re-enable object map CRCs before re-encoding the new object
map.

Fixes: #14280
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
(cherry picked from commit d5c02f3ed26edec095d45d7a7f26ff26d1b5aacc)

9 years agorbd-replay: handle EOF gracefully 7427/head
Mykola Golub [Thu, 21 Jan 2016 11:45:42 +0000 (13:45 +0200)]
rbd-replay: handle EOF gracefully

Fixes: #14452
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
(cherry picked from commit c59b84c3e2c9bbda68219e4d2288a889dd9ca6cb)

9 years agorbd: remove canceled tasks from timer thread 7426/head
Douglas Fuller [Fri, 22 Jan 2016 19:18:40 +0000 (11:18 -0800)]
rbd: remove canceled tasks from timer thread

When canceling scheduled tasks using the timer thread, TaskFinisher::cancel
does not call SafeTimer::cancel_event, so events fire anyway. Add this call.

Fixes: #14476
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
(cherry picked from commit 2aa0f318c862dbe3027d74d345671506605778eb)

9 years agoFixing NULL pointer dereference 7424/head
Igor Fedotov [Thu, 19 Nov 2015 10:38:40 +0000 (13:38 +0300)]
Fixing NULL pointer dereference

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
(cherry picked from commit 93d3dfe0441be50a6990d458ee0ee3289af39b20)

9 years agorgw: radosgw-admin bucket check --fix not work 7423/head
Weijun Duan [Mon, 4 Jan 2016 01:12:04 +0000 (20:12 -0500)]
rgw: radosgw-admin bucket check --fix not work

Fixed:#14215

Signed-off-by: Weijun Duan <duanweijun@h3c.com>
(cherry picked from commit a17f4e27d608ef29cf499fe76246929ec7962783)

9 years agoosd/PG: For performance start scrub scan at pool to skip temp objects 7421/head
David Zafman [Thu, 24 Sep 2015 15:38:41 +0000 (11:38 -0400)]
osd/PG: For performance start scrub scan at pool to skip temp objects

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 05d79faa512210b0f0a91640d18db33b887a6e73)

9 years agoosd/OSD: clear_temp_objects() include removal of Hammer temp objects
David Zafman [Fri, 18 Dec 2015 17:08:19 +0000 (09:08 -0800)]
osd/OSD: clear_temp_objects() include removal of Hammer temp objects

Fixes: #13862
Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 10b4a0825d9917b6fdd0d6450640238b78ba05d4)

9 years agoosd: Improve log message which isn't about a particular shard
David Zafman [Fri, 18 Dec 2015 02:04:08 +0000 (18:04 -0800)]
osd: Improve log message which isn't about a particular shard

Remove redundant dout()

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit e85907fcc582922925609f595f68c597a88c39dc)

9 years agoMerge pull request #7225 from dillaman/wip-13810-infernalis
Josh Durgin [Thu, 14 Jan 2016 01:15:41 +0000 (17:15 -0800)]
Merge pull request #7225 from dillaman/wip-13810-infernalis

tests: notification slave needs to wait for master

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agotests: notification slave needs to wait for master 7225/head
Jason Dillaman [Wed, 13 Jan 2016 17:44:01 +0000 (12:44 -0500)]
tests: notification slave needs to wait for master

If the slave instance starts before the master, race
conditions are possible.

Fixes: #13810
Backport: infernalis, hammer
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 3992d6fe67bbf82322cedc1582406caaf6d4de60)

9 years agotests: verify it is possible to reuse an OSD id 6882/head
Loic Dachary [Thu, 10 Dec 2015 14:20:32 +0000 (15:20 +0100)]
tests: verify it is possible to reuse an OSD id

When an OSD id is removed via ceph osd rm, it will be reused by the next
ceph osd create command. Verify that and OSD reusing such an id
successfully comes up.

http://tracker.ceph.com/issues/13988 Refs: #13988

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 7324615bdb829f77928fa10d4e988c6422945937)

9 years agoceph-disk: list accepts absolute dev names 6880/head
Loic Dachary [Tue, 5 Jan 2016 16:33:45 +0000 (17:33 +0100)]
ceph-disk: list accepts absolute dev names

The ceph-disk list subcommand now accepts /dev/sda as well as sda.
The filtering is done on the full list of devices instead of restricting
the number of devices explored. Always obtaining the full list of
devices makes things simpler when trying to match a dmcrypted device to
the corresponding raw device.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 591d581c84cfd72d7c655ac88b0911a318b96e95)

Conflicts:
src/ceph-disk: as part of the implementation of deactivate /
destroy in master, the prototype of list_device was changed
        to take a list of paths instead of the all arguments (args).

9 years agoceph-disk: display OSD details when listing dmcrypt devices
Loic Dachary [Tue, 5 Jan 2016 13:25:51 +0000 (14:25 +0100)]
ceph-disk: display OSD details when listing dmcrypt devices

The details about a device that mapped via dmcrypt are directly
available. Do not try to fetch them from the device entry describing the
devicemapper entry.

http://tracker.ceph.com/issues/14230 Fixes: #14230

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 7aab4ed6f108ddc7bc90300f1999a38f30da3a57)

Conflicts:
src/ceph-disk: an incorrect attempt was made to fix the same
                       problem. It was not backported and does not
                       need to be. It is entirely contained in the
                       code block removed and is the reason for the
                       conflict.

9 years agotests: limit ceph-disk unit tests to test dir
Loic Dachary [Wed, 9 Dec 2015 15:52:10 +0000 (16:52 +0100)]
tests: limit ceph-disk unit tests to test dir

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 499c80db606fe3926a8a603e03fdba6967d66003)

9 years agoceph-disk: factorize duplicated dmcrypt mapping
Loic Dachary [Tue, 5 Jan 2016 16:38:59 +0000 (17:38 +0100)]
ceph-disk: factorize duplicated dmcrypt mapping

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 35a0c94c4cd3a57cfc382c64eaa9cfb9306dd2e6)

9 years agoceph-disk: fix regression in cciss devices names
Loic Dachary [Tue, 5 Jan 2016 16:42:11 +0000 (17:42 +0100)]
ceph-disk: fix regression in cciss devices names

The cciss driver has device paths such as /dev/cciss/c0d1 with a
matching /sys/block/cciss!c0d1. The general case is that whenever a
device name is found in /sys/block, the / is replaced by the !.

When refactoring the ceph-disk list subcommand, this conversion was
overlooked in a few places. All explicit concatenation of /dev with a
device name are replaced with a call to get_dev_name which does the same
but also converts all ! in /.

http://tracker.ceph.com/issues/13970 Fixes: #13970

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit a2fd3a535e66b3a2b694cda9c6add33383ccfa4a)

Conflicts:
src/ceph-disk : trivial resolution

9 years agoMerge pull request #7001 from dachary/wip-14145-infernalis
Loic Dachary [Thu, 7 Jan 2016 14:06:32 +0000 (15:06 +0100)]
Merge pull request #7001 from dachary/wip-14145-infernalis

infernalis: ceph-disk: use blkid instead of sgdisk -i

On CentOS 7.1 and other operating systems with a version of udev greater or equal to 214,
running ceph-disk prepare triggered unexpected removal and addition of partitions on
the disk being prepared. That created problems ranging from the OSD not being activated
to failures because /dev/sdb1 does not exist although it should.

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agotests: ceph-disk cryptsetup close must try harder 7001/head
Loic Dachary [Wed, 6 Jan 2016 22:36:57 +0000 (23:36 +0100)]
tests: ceph-disk cryptsetup close must try harder

Similar to how it's done in dmcrpyt_unmap in master (
132e56615805cba0395898cf165b32b88600d633 ), the infernalis tests helper
that were deprecated by the addition of the deactivate / destroy
ceph-disk subcommand must try cryptsetup close a few times in some
contexts.

Signed-off-by: Loic Dachary <loic@dachary.org>
9 years agoceph-disk: protect deactivate with activate lock
Loic Dachary [Fri, 18 Dec 2015 23:53:03 +0000 (00:53 +0100)]
ceph-disk: protect deactivate with activate lock

When ceph-disk prepares the disk, it triggers udev events and each of
them ceph-disk activate. If systemctl stop ceph-osd@2 happens while
there still are ceph-disk activate in flight, the systemctl stop may be
cancelled by the systemctl enable issued by one of the pending ceph-disk
activate.

This only matters in a test environment where disks are destroyed
shortly after they are activated.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 6395bf856b4d4511f0758174ef915ebcafbe3777)

Conflicts:

        src/ceph-disk: ceph-disk deactivate does not exist in ceph-disk
            on infernalis. But the same feature is implemented in
            ceph-test-disk.py for test purposes and has the same
            problem. The patch is adapted to ceph-test-disk.py.

9 years agoceph-disk: retry cryptsetup remove
Loic Dachary [Wed, 6 Jan 2016 10:15:19 +0000 (11:15 +0100)]
ceph-disk: retry cryptsetup remove

Retry a cryptsetup remove ten times. After the ceph-osd terminates, the
device is released asyncrhonously and an attempt to cryptsetup remove
will may fail because it is considered busy. Although a few attempts are
made before giving up, the number of attempts / the duration of the
attempts cannot be controlled with a cryptsetup option. The workaround
is to increase this by trying a few times.

If cryptsetup remove fails for a reason that is unrelated to timeout,
the error will be repeated a few times. There is no undesirable side
effect. It will not hide a problem.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 132e56615805cba0395898cf165b32b88600d633)

9 years agoceph-disk: use blkid instead of sgdisk -i
Loic Dachary [Fri, 18 Dec 2015 16:03:21 +0000 (17:03 +0100)]
ceph-disk: use blkid instead of sgdisk -i

sgdisk -i 1 /dev/vdb opens /dev/vdb in write mode which indirectly
triggers a BLKRRPART ioctl from udev (starting version 214 and up) when
the device is closed (see below for the udev release note). The
implementation of this ioctl by the kernel (even old kernels) removes
all partitions and adds them again (similar to what partprobe does
explicitly).

The side effects of partitions disappearing while ceph-disk is running
are devastating.

sgdisk is replaced by blkid which only opens the device in read mode and
will not trigger this unexpected behavior.

The problem does not show on Ubuntu 14.04 because it is running udev <
214 but shows on CentOS 7 which is running udev > 214.

git clone git://anonscm.debian.org/pkg-systemd/systemd.git
systemd/NEWS:
CHANGES WITH 214:

        * As an experimental feature, udev now tries to lock the
          disk device node (flock(LOCK_SH|LOCK_NB)) while it
          executes events for the disk or any of its partitions.
          Applications like partitioning programs can lock the
          disk device node (flock(LOCK_EX)) and claim temporary
          device ownership that way; udev will entirely skip all event
          handling for this disk and its partitions. If the disk
          was opened for writing, the close will trigger a partition
          table rescan in udev's "watch" facility, and if needed
          synthesize "change" events for the disk and all its partitions.
          This is now unconditionally enabled, and if it turns out to
          cause major problems, we might turn it on only for specific
          devices, or might need to disable it entirely. Device Mapper
          devices are excluded from this logic.

http://tracker.ceph.com/issues/14080 Fixes: #14080

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 9dce05a8cdfc564c5162885bbb67a04ad7b95c5a)

9 years agoceph-disk: dereference symlinks in destroy and zap
Loic Dachary [Wed, 16 Dec 2015 14:57:03 +0000 (15:57 +0100)]
ceph-disk: dereference symlinks in destroy and zap

The behavior of partprobe or sgdisk may be subtly different if given a
symbolic link to a device instead of an actual device. The debug output
is also more confusing when the symlink shows instead of the device it
points to.

Always dereference the symlink before running destroy and zap.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit fe71647bc9bd0f9ddc6d470ee7bee1e6b0983e2b)

Conflicts:
        src/ceph-disk
          trivial, because destroy is not implemented
          in infernalis

9 years agoceph-disk: increase partprobe / udevadm settle timeouts
Loic Dachary [Wed, 16 Dec 2015 11:33:25 +0000 (12:33 +0100)]
ceph-disk: increase partprobe / udevadm settle timeouts

The default of 120 seconds may be exceeded when the disk is very slow
which can happen in cloud environments. Increase it to 600 seconds
instead.

The partprobe command may fail for the same reason but it does not have
a timeout parameter. Instead, try a few times before failing.

The udevadm settle guarding partprobe are not necessary because
partprobe already does the same. However, partprobe does not provide a
way to control the timeout. Having a udevadm settle after another is
going to be a noop most of the time and not add any delay. It matters
when the udevadm settle run by partprobe fails with a timeout because
partprobe will silentely ignores the failure.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 730b5d62d3cda7de4076bafa6e9e35f1eb8e2190)

9 years agotests: ceph-disk workunit increase verbosity
Loic Dachary [Wed, 16 Dec 2015 11:36:47 +0000 (12:36 +0100)]
tests: ceph-disk workunit increase verbosity

So that reading the teuthology log is enough in most cases to figure out
the cause of the error.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit fd7fe8c4977658f66651dad5efb0d816ae71b38b)

Conflicts:
qa/workunits/ceph-disk/ceph-disk-test.py:
          trivial, because destroy/deactivate are not implemented
          in infernalis. The existing destroy_osd function
          has to be modified so the id returned by sh() does
          not have a trailing newline.

9 years agoceph-disk: log parted output
Loic Dachary [Wed, 16 Dec 2015 11:30:20 +0000 (12:30 +0100)]
ceph-disk: log parted output

Should parted output fail to parse, it is useful to get the full output
when running in verbose mode.

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit f5d36b9ac299e9f6d52cc32d540cc1c3342de6e7)

9 years agoceph-disk: do not discard stderr
Loic Dachary [Wed, 16 Dec 2015 11:29:17 +0000 (12:29 +0100)]
ceph-disk: do not discard stderr

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 5fa35ba10e10b56262757afc43929ab8ee4164f2)

Conflicts:
src/ceph-disk : trivial, because destroy/deactivate
        are not implemented in infernalis

9 years agotests: new integration test for validating new RBD pools 7080/head
Jason Dillaman [Mon, 14 Dec 2015 22:49:55 +0000 (17:49 -0500)]
tests: new integration test for validating new RBD pools

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 00cfe4efacd664032f700afe9701d41bacf8700a)

9 years agolibrbd: optionally validate RBD pool configuration (snapshot support)
Jason Dillaman [Mon, 14 Dec 2015 22:41:49 +0000 (17:41 -0500)]
librbd: optionally validate RBD pool configuration (snapshot support)

Fixes: #13633
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 1fea4dadc60e13518e9ee55d136fbc4e9d3a621e)

9 years agolibrbd: properly handle replay of snap remove RPC message 7079/head
Jason Dillaman [Wed, 23 Dec 2015 18:26:39 +0000 (13:26 -0500)]
librbd: properly handle replay of snap remove RPC message

Fixes: #14164
Backport: infernalis
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit bc309d9d7612f005a3d50ecf099ddf9b706a1bf6)

9 years agoMerge pull request #7038 from dillaman/wip-14121-infernalis
Josh Durgin [Wed, 23 Dec 2015 18:47:30 +0000 (10:47 -0800)]
Merge pull request #7038 from dillaman/wip-14121-infernalis

tests: rebuild exclusive lock test should acquire exclusive lock

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agotests: rebuild exclusive lock test should acquire exclusive lock 7038/head
Jason Dillaman [Wed, 23 Dec 2015 15:31:07 +0000 (10:31 -0500)]
tests: rebuild exclusive lock test should acquire exclusive lock

Starting with Jewel, the object map will not be loaded until the
exclusive lock is acquired since it might be updated by the
lock owner.

Fixes: #14121
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
9 years agolog: Log.cc: Assign LOG_DEBUG priority to syslog calls 6993/head
Brad Hubbard [Mon, 7 Dec 2015 01:31:28 +0000 (11:31 +1000)]
log: Log.cc: Assign LOG_DEBUG priority to syslog calls

Fixes: #13993
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
(cherry picked from commit 8e93f3f45db681f82633ca695a7dc4e7bd030584)

9 years agomon/PGMonitor: MAX AVAIL is 0 if some OSDs' weight is 0 6907/head
Chengyuan Li [Fri, 20 Nov 2015 05:29:39 +0000 (22:29 -0700)]
mon/PGMonitor: MAX AVAIL is 0 if some OSDs' weight is 0

In get_rule_avail(), even p->second is 0, it's possible to be used
as divisor and quotient is infinity, then is converted to an integer
which is negative value.
So we should check p->second value before calculation.

It fixes BUG #13840.

Signed-off-by: Chengyuan Li <chengyli@ebay.com>
(cherry picked from commit 18713e60edd1fe16ab571f7c83e6de026db483ca)

9 years agoMerge pull request #6395 from SUSE/wip-13593-infernalis
Abhishek Varshney [Wed, 9 Dec 2015 05:52:26 +0000 (11:22 +0530)]
Merge pull request #6395 from SUSE/wip-13593-infernalis

Ceph-fuse won't start correctly when the option log_max_new in ceph.conf set to zero

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
9 years agoMerge pull request #6828 from dachary/wip-ceph-disk-augeas
Loic Dachary [Tue, 8 Dec 2015 23:06:33 +0000 (00:06 +0100)]
Merge pull request #6828 from dachary/wip-ceph-disk-augeas

tests: ceph-disk workunit uses configobj

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agotests: ceph-disk workunit uses the ceph task 6828/head
Loic Dachary [Wed, 21 Oct 2015 23:48:31 +0000 (01:48 +0200)]
tests: ceph-disk workunit uses the ceph task

The ceph-disk workunit deploy keys that are not deployed by default by
the ceph teuthology task.

The OSD created by the ceph task are removed from the default
bucket (via osd rm) so they do not interfere with the tests.

Signed-off-by: Loic Dachary <ldachary@redhat.com>
(cherry picked from commit 163de5b0f8f46695ab41b3f2288e9b5c1feaedab)

9 years agotests: ceph-disk workunit uses configobj
Loic Dachary [Wed, 21 Oct 2015 22:21:49 +0000 (00:21 +0200)]
tests: ceph-disk workunit uses configobj

Instead of using augtool to modify the configuration file, use
configobj. It is also used by the install teuthology task. The .ini
lens (puppet lens really) is unable to read ini files created by
configobj.

Signed-off-by: Loic Dachary <ldachary@redhat.com>
(cherry picked from commit f4906a124cc194dccd855679a04a5c7ffc125a44)

9 years agoclient: use null snapc to check pool permission 6853/head
Yan, Zheng [Mon, 9 Nov 2015 03:37:02 +0000 (11:37 +0800)]
client: use null snapc to check pool permission

snap inodes' ->snaprealm can be NULL, so dereferencing it in
check_pool_perm() can cause segment fault. The pool permission
check does not write any data, so it's safe to use null snapc.

Fixes: #13714
Signed-off-by: Yan, Zheng <zyan@redhat.com>
(cherry picked from commit fad3772fb7731272d47cbfd9e81f22f5df3701a2)

9 years agoMerge pull request #6845 from dachary/wip-14019-infernalis
Loic Dachary [Tue, 8 Dec 2015 08:34:39 +0000 (09:34 +0100)]
Merge pull request #6845 from dachary/wip-14019-infernalis

infernalis: libunwind package missing on CentOS 7

Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
9 years agobuild/ops: systemd ceph-disk unit must not assume /bin/flock 6852/head
Loic Dachary [Fri, 4 Dec 2015 20:11:09 +0000 (21:11 +0100)]
build/ops: systemd ceph-disk unit must not assume /bin/flock

The flock command may be installed elsewhere, depending on the
system. Let the PATH search figure that out.

http://tracker.ceph.com/issues/13975 Fixes: #13975

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit c8f7d44c935bd097db7d131b785bdab78a7a650c)

9 years agoosd: Test osd_find_best_info_ignore_history_les config in another assert 6851/head
David Zafman [Thu, 3 Dec 2015 22:52:24 +0000 (14:52 -0800)]
osd: Test osd_find_best_info_ignore_history_les config in another assert

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 02a9a41f151a3d968bf8066749658659dc6e3ac4)

9 years agoosd: call on_new_interval on newly split child PG 6849/head
Sage Weil [Wed, 2 Dec 2015 19:50:28 +0000 (14:50 -0500)]
osd: call on_new_interval on newly split child PG

We must call on_new_interval() on any interval change *and* on the
creation of the PG.  Currently we call it from PG::init() and
PG::start_peering_interval().  However, PG::split_into() did not
do so for the child PG, which meant that the new child feature
bits were not properly initialized and the bitwise/nibblewise
debug bit was not correctly set.  That, in turn, could lead to
various misbehaviors, the most obvious of which is scrub errors
due to the sort order mismatch.

Fixes: #13962
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit fb120d7b2da5715e7f7d1baa65bfa70d2e5d807a)

9 years agoFileStore: potential memory leak if _fgetattrs fails 6846/head
xiexingguo [Mon, 26 Oct 2015 10:38:01 +0000 (18:38 +0800)]
FileStore: potential memory leak if _fgetattrs fails

Memory leak happens if _fgetattrs encounters some error and simply returns.
Fixes: #13597
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit ace7dd096b58a88e25ce16f011aed09269f2a2b4)

9 years agobuild/ops: enable CR in CentOS 7 6845/head
Loic Dachary [Tue, 8 Dec 2015 07:02:56 +0000 (08:02 +0100)]
build/ops: enable CR in CentOS 7

To get libunwind from the CR repositories until CentOS 7.2.1511 is released.

http://tracker.ceph.com/issues/13997 Fixes: #13997

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 247ee6084b58861da601d349bdba739b252d96de)

9 years agoObjecter: remove redundant result-check of _calc_target in _map_session. 6840/head
xiexingguo [Mon, 2 Nov 2015 13:46:11 +0000 (21:46 +0800)]
Objecter: remove redundant result-check of _calc_target in _map_session.

Result-code check is currently redundant since _calc_target never returns a negative value.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 5a6117e667024f51e65847f73f7589467b6cb762)

9 years agoObjecter: potential null pointer access when do pool_snap_list.
xiexingguo [Thu, 29 Oct 2015 09:32:50 +0000 (17:32 +0800)]
Objecter: potential null pointer access when do pool_snap_list.

Objecter: potential null pointer access when do pool_snap_list. Shall check pool existence first.
Fixes: #13639
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 865541605b6c32f03e188ec33d079b44be42fa4a)

9 years agoauth/cephx: large amounts of log are produced by osd 6836/head
qiankunzheng [Thu, 5 Nov 2015 12:29:49 +0000 (07:29 -0500)]
auth/cephx: large amounts of log are produced by osd
if the auth of osd is deleted when the osd is running, the osd will produce large amounts of log.

Fixes:#13610
Signed-off-by: Qiankun Zheng <zheng.qiankun@h3c.com>
(cherry picked from commit 102f0b19326836e3b0754b4d32da89eb2bc0b03c)