]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
10 years agoerasure-code isa-l: remove duplicated lines (fix warning) 2963/head
Dan Mick [Tue, 18 Nov 2014 23:21:30 +0000 (15:21 -0800)]
erasure-code isa-l: remove duplicated lines (fix warning)

06a245a added a section def to assembly files; I added it twice to
this file.  There's no damage, but a compiler warning (on machines with
yasm installed)

Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 10f6ef185a9d09e396e94036ec90bfe8a0738ce9)

10 years agoAdd annotation to all assembly files to turn off stack-execute bit
Dan Mick [Sat, 15 Nov 2014 01:59:57 +0000 (17:59 -0800)]
Add annotation to all assembly files to turn off stack-execute bit

See discussion in http://tracker.ceph.com/issues/10114

Building with these changes allows output from readelf like this:

 $ readelf -lW src/.libs/librados.so.2 | grep GNU_STACK
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000
0x000000 RW  0x8

(note the absence of 'X' in 'RW')

Fixes: #10114
Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 06a245a9845c0c126fb3106b41b2fd2bc4bc4df3)

10 years agoosd/OSD: use OSDMap helper to determine if we are correct op target
Sage Weil [Thu, 13 Nov 2014 01:11:10 +0000 (17:11 -0800)]
osd/OSD: use OSDMap helper to determine if we are correct op target

Use the new helper.  This fixes our behavior for EC pools where targetting
a different shard is not correct, while for replicated pools it may be. In
the EC case, it leaves the op hanging indefinitely in the OpTracker because
the pgid exists but as a different shard.

Fixes: #9835
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 9e05ba086a36ae9a04b347153b685c2b8adac2c3)

10 years agoosd/OSDMap: add osd_is_valid_op_target()
Sage Weil [Thu, 13 Nov 2014 01:04:35 +0000 (17:04 -0800)]
osd/OSDMap: add osd_is_valid_op_target()

Helper to check whether an osd is a given op target for a pg.  This
assumes that for EC we always send ops to the primary, while for
replicated we may target any replica.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 89c02637914ac7332e9dbdbfefc2049b2b6c127d)

10 years agoqa: allow small allocation diffs for exported rbds
Josh Durgin [Wed, 12 Nov 2014 02:16:02 +0000 (18:16 -0800)]
qa: allow small allocation diffs for exported rbds

The local filesytem may behave slightly differently. This isn't
foolproof, but seems to be reliable enough on rhel7 rootfs, where
exact comparison was failing.

Fixes: #10002
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
(cherry picked from commit e94d3c11edb9c9cbcf108463fdff8404df79be33)

10 years agocommon: Add cctid meta variable
Adam Crume [Thu, 18 Sep 2014 23:57:27 +0000 (16:57 -0700)]
common: Add cctid meta variable

Fixes: #6228
Signed-off-by: Adam Crume <adamcrume@gmail.com>
(cherry picked from commit bb45621cb117131707a85154292a3b3cdd1c662a)

10 years agoMerge pull request #2804 from ceph/wip-9301-giant
Sage Weil [Tue, 11 Nov 2014 16:28:19 +0000 (08:28 -0800)]
Merge pull request #2804 from ceph/wip-9301-giant

mon: backport paxos off-by-one bug (9301) to giant

10 years agoMerge pull request #2887 from ceph/wip-9977-backport
Gregory Farnum [Tue, 11 Nov 2014 06:41:19 +0000 (22:41 -0800)]
Merge pull request #2887 from ceph/wip-9977-backport

tools: skip up to expire_pos in journal-tool

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
10 years agoclient: trim unused inodes before reconnecting to recovering MDS
Yan, Zheng [Thu, 11 Sep 2014 01:36:44 +0000 (09:36 +0800)]
client: trim unused inodes before reconnecting to recovering MDS

So the recovering MDS does not need to fetch these ununsed inodes during
cache rejoin. This may reduce MDS recovery time.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
(cherry picked from commit 2bd7ceeff53ad0f49d5825b6e7f378683616dffb)

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
10 years agoclient: allow xattr caps in inject_release_failure
John Spray [Mon, 27 Oct 2014 12:02:17 +0000 (12:02 +0000)]
client: allow xattr caps in inject_release_failure

Because some test environments generate spurious
rmxattr operations, allow the client to release
'X' caps.  Allows xattr operations to proceed
while still preventing client releasing other caps.

Fixes: #9800
Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 5691c68a0a44eb2cdf0afb3f39a540f5d42a5c0c)

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
10 years agotools: skip up to expire_pos in journal-tool 2887/head
John Spray [Mon, 3 Nov 2014 19:19:45 +0000 (19:19 +0000)]
tools: skip up to expire_pos in journal-tool

Previously worked for journals starting from an
object boundary (i.e. freshly created filesystems)

Fixes: #9977
Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 65c33503c83ff8d88781c5c3ae81d88d84c8b3e4)

Conflicts:
src/tools/cephfs/JournalScanner.cc

10 years agoMerge pull request #2876 from ceph/giant-readdir-fix
Gregory Farnum [Sat, 8 Nov 2014 00:26:54 +0000 (16:26 -0800)]
Merge pull request #2876 from ceph/giant-readdir-fix

Giant readdir fix

10 years agoMerge pull request #2879 from ceph/wip-10025-giant
Gregory Farnum [Fri, 7 Nov 2014 22:10:40 +0000 (14:10 -0800)]
Merge pull request #2879 from ceph/wip-10025-giant

#10025/giant -- tools: fix MDS journal import

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
10 years agotools: fix MDS journal import 2879/head
John Spray [Fri, 7 Nov 2014 11:34:43 +0000 (11:34 +0000)]
tools: fix MDS journal import

Previously it only worked on fresh filesystems which
hadn't been trimmed yet, and resulted in an invalid
trimmed_pos when expire_pos wasn't on an object
boundary.

Fixes: #10025
Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit fb29e71f9a97c12354045ad2e128156e503be696)

10 years agoclient: fix I_COMPLETE_ORDERED checking 2876/head
Yan, Zheng [Mon, 27 Oct 2014 20:57:16 +0000 (13:57 -0700)]
client: fix I_COMPLETE_ORDERED checking

Current code marks a directory inode as complete and ordered when readdir
finishes, but it does not check if the directory was modified in the middle
of readdir. This is wrong, directory inode should not be marked as ordered
if it was modified during readddir

The fix is introduce a new counter to the inode data struct, we increase
the counter each time the directory is modified. When readdir finishes, we
check the counter to decide if the directory should be marked as ordered.

Fixes: #9894
Signed-off-by: Yan, Zheng <zyan@redhat.com>
(cherry picked from commit a4caed8a53d011b214ab516090676641f7c4699d)

10 years agoclient: preserve ordering of readdir result in cache
Yan, Zheng [Tue, 9 Sep 2014 09:34:46 +0000 (17:34 +0800)]
client: preserve ordering of readdir result in cache

Preserve ordering of readdir result in a list, so that the result of cached
readdir is consistant with uncached readdir.

As a side effect, this commit also removes the code that removes stale dentries.
This is OK because stale dentries does not have valid lease, they will be
filter out by the shared gen check in Client::_readdir_cache_cb()

Signed-off-by: Yan, Zheng <zyan@redhat.com>
(cherry picked from commit 346c06c1647658768e927a47768a0bc74de17b53)

10 years agoclient: introduce a new flag indicating if dentries in directory are sorted
Yan, Zheng [Tue, 9 Sep 2014 06:06:06 +0000 (14:06 +0800)]
client: introduce a new flag indicating if dentries in directory are sorted

When creating a file, Client::insert_dentry_inode() set the dentry's offset
based on directory's max offset. The offset does not reflect the real
postion of the dentry in directory. Later readdir reply from real postion
of the dentry in directory. Later readdir reply from MDS may change the
dentry's position/offset. This inconsistency can cause missing/duplicate
entries in readdir result if readdir is partly satisfied by dcache_readdir().

The fix is introduce a new flag indicating if dentries in directory are
sorted. We use _readdir_cache_cb() to handle readdir only when the flag is
set, clear the flag after creating/deleting/renaming file.

Fixes: #9178
Signed-off-by: Yan, Zheng <zyan@redhat.com>
(cherry picked from commit 600af25493947871c38214aa370e2544a7fea399)

10 years agoqa: use sudo even more when rsyncing /usr
Greg Farnum [Fri, 7 Nov 2014 01:48:01 +0000 (17:48 -0800)]
qa: use sudo even more when rsyncing /usr

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
(cherry picked from commit 3aa7797741f9cff06053a2f31550fe6929039692)

10 years agoMerge pull request #2858 from ceph/wip-9909
Loic Dachary [Wed, 5 Nov 2014 07:51:18 +0000 (08:51 +0100)]
Merge pull request #2858 from ceph/wip-9909

tools: rados put /dev/null should write() and not create()

Reviewed-by: Loic Dachary <loic-201408@dachary.org>
10 years agotools: rados put /dev/null should write() and not create() 2858/head
Loic Dachary [Thu, 2 Oct 2014 07:23:55 +0000 (09:23 +0200)]
tools: rados put /dev/null should write() and not create()

In the rados.cc special case to handle put an empty objects, use
write_full() instead of create().

A special case was introduced 6843a0b81f10125842c90bc63eccc4fd873b58f2
to create() an object if the rados put file is empty. Prior to this fix
an attempt to rados put an empty file was a noop. The problem with this
fix is that it is not idempotent. rados put an empty file twice would
fail the second time and rados put a file with one byte would succeed as
expected.

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit 50e80407f3c2f74d77ba876d01e7313c3544ea4d)

10 years agorgw: set length for keystone token validation request
Yehuda Sadeh [Thu, 9 Oct 2014 17:20:27 +0000 (10:20 -0700)]
rgw: set length for keystone token validation request

Fixes: #7796
Backport: giany, firefly
Need to set content length to this request, as the server might not
handle a chunked request (even though we don't send anything).

Tested-by: Mark Kirkwood <mark.kirkwood@catalyst.net.nz>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 3dd4ccad7fe97fc16a3ee4130549b48600bc485c)

10 years agoMerge pull request #2846 from dachary/wip-9752-past-intervals-giant
Sage Weil [Fri, 31 Oct 2014 15:35:42 +0000 (08:35 -0700)]
Merge pull request #2846 from dachary/wip-9752-past-intervals-giant

osd: past_interval display bug on acting

10 years agoosd: past_interval display bug on acting 2846/head
Loic Dachary [Thu, 30 Oct 2014 23:49:21 +0000 (00:49 +0100)]
osd: past_interval display bug on acting

The acting array was incorrectly including the primary and up_primary.

http://tracker.ceph.com/issues/9752 Fixes: #9752

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit c5f8d6eded52da451fdd1d807bd4700221e4c41c)

10 years agoMerge pull request #2841 from ceph/giant-9869
Yan, Zheng [Fri, 31 Oct 2014 00:01:12 +0000 (17:01 -0700)]
Merge pull request #2841 from ceph/giant-9869

Backport "client: cast m->get_client_tid() to compare to 16-bit Inode::flushing_cap_tid"

10 years agoclient: cast m->get_client_tid() to compare to 16-bit Inode::flushing_cap_tid 2841/head
Greg Farnum [Thu, 23 Oct 2014 00:16:31 +0000 (17:16 -0700)]
client: cast m->get_client_tid() to compare to 16-bit Inode::flushing_cap_tid

m->get_client_tid() is 64 bits (as it should be), but Inode::flushing_cap_tid
is only 16 bits. 16 bits should be plenty to let the cap flush updates
pipeline appropriately, but we need to cast in the proper direction when
comparing these differently-sized versions. So downcast the 64-bit one
to 16 bits.

Fixes: #9869
Backport: giant, firefly, dumpling

Signed-off-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit a5184cf46a6e867287e24aeb731634828467cd98)

10 years agoMerge pull request #2838 from ceph/wip-9945-giant
Sage Weil [Thu, 30 Oct 2014 17:05:22 +0000 (10:05 -0700)]
Merge pull request #2838 from ceph/wip-9945-giant

messages: fix COMPAT_VERSION on MClientSession

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agomessages: fix COMPAT_VERSION on MClientSession 2838/head
John Spray [Thu, 30 Oct 2014 16:43:21 +0000 (16:43 +0000)]
messages: fix COMPAT_VERSION on MClientSession

This was incorrectly incremented to 2 by omission
of an explicit COMPAT_VERSION value.

Fixes: #9945
Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 1eb9bcb1d36014293efc687b4331be8c4d208d8e)

10 years ago0.87 v0.87
Jenkins [Wed, 29 Oct 2014 18:03:55 +0000 (11:03 -0700)]
0.87

10 years agoMerge remote-tracking branch 'origin/wip-9806-giant' into giant
Josh Durgin [Tue, 28 Oct 2014 20:08:05 +0000 (13:08 -0700)]
Merge remote-tracking branch 'origin/wip-9806-giant' into giant

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoMerge pull request #2630 from ceph/wip-9545
Samuel Just [Mon, 27 Oct 2014 20:20:16 +0000 (13:20 -0700)]
Merge pull request #2630 from ceph/wip-9545

os/FileStore: do not loop in sync_entry on shutdown

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agomon: re-bootstrap if we get probed by a mon that is way ahead 2804/head
Sage Weil [Thu, 18 Sep 2014 21:23:36 +0000 (14:23 -0700)]
mon: re-bootstrap if we get probed by a mon that is way ahead

During bootstrap we verify that our paxos commits overlap with the other
mons we will form a quorum with.  If they do not, we do a sync.

However, it is possible we pass those checks, then fail to join a quorum
before the quorum moves ahead in time such that we no longer overlap.
Currently nothing kicks up back into a probing state to discover we need
to sync... we will just keep trying to call or join an election instead.

Fix this by jumping back to bootstrap if we get a probe that is ahead of
us.  Only do this from non probe or sync states as these will be common;
it is only the active and electing states that matter (and probably just
electing!).

Fixes: #9301
Backport: giant, firefly
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit c421b55e8e15ef04ca8aeb47f7d090375eaa8573)

10 years agomon/Paxos: fix off-by-one in last_ vs first_committed check
Sage Weil [Thu, 18 Sep 2014 21:11:24 +0000 (14:11 -0700)]
mon/Paxos: fix off-by-one in last_ vs first_committed check

peon last_committed + 1 == leader first_committed is okay.  Note that the
other check (where I clean up whitespace) gets this correct.

Fixes: #9301 (partly)
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit d81cd7f86695185dce31df76c33c9a02123f0e4a)

10 years agoMerge pull request #2800 from ceph/wip-enoent-race
João Eduardo Luís [Sun, 26 Oct 2014 18:58:50 +0000 (18:58 +0000)]
Merge pull request #2800 from ceph/wip-enoent-race

os/LevelDBStore, RocksDBStore: fix race handling for get store size

Reviewed-by: Joao Eduardo Luis <joao@redhat.com>
10 years agoos/LevelDBStore, RocksDBStore: fix race handling for get store size 2800/head
Sage Weil [Sat, 25 Oct 2014 04:23:19 +0000 (21:23 -0700)]
os/LevelDBStore, RocksDBStore: fix race handling for get store size

If we get ENOENT, skip this file, instead of adding in undefined stat
values.

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2799 from athanatos/wip-9480
Sage Weil [Fri, 24 Oct 2014 20:20:43 +0000 (13:20 -0700)]
Merge pull request #2799 from athanatos/wip-9480

Wip 9480

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2798 from athanatos/wip-9875
Sage Weil [Fri, 24 Oct 2014 19:55:05 +0000 (12:55 -0700)]
Merge pull request #2798 from athanatos/wip-9875

ReplicatedPG: writeout hit_set object with correct prior_version

Reviewed-by: Sage Weil <sage@redhat.com>
10 years ago.gitmodules: ignoring changes in rocksdb submodule
Federico Gimenez [Fri, 24 Oct 2014 06:46:50 +0000 (08:46 +0200)]
.gitmodules: ignoring changes in rocksdb submodule

Signed-off-by: Federico Gimenez <fgimenez@coit.es>
(cherry picked from commit 60eaeca4ddccc79b29b17ad433c6569cb2a89500)

10 years agoMerge pull request #2797 from ceph/wip-rbd-revert
Josh Durgin [Fri, 24 Oct 2014 18:12:29 +0000 (11:12 -0700)]
Merge pull request #2797 from ceph/wip-rbd-revert

rbd/objectcacher: revert recent changes for giant

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
10 years agoRevert "Enforce cache size on read requests" 2797/head
Sage Weil [Fri, 24 Oct 2014 18:06:16 +0000 (11:06 -0700)]
Revert "Enforce cache size on read requests"

This reverts commit 4fc9fffc494abedac0a9b1ce44706343f18466f1.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoRevert "rbd: ObjectCacher reads can hang when reading sparse files"
Sage Weil [Fri, 24 Oct 2014 18:06:08 +0000 (11:06 -0700)]
Revert "rbd: ObjectCacher reads can hang when reading sparse files"

This reverts commit cdb7675a21c9107e3596c90c2b1598def3c6899f.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoRevert "Fix read performance regression in ObjectCacher"
Sage Weil [Fri, 24 Oct 2014 18:05:53 +0000 (11:05 -0700)]
Revert "Fix read performance regression in ObjectCacher"

This reverts commit 65be257e9295619b960b49f6aa80ecdf8ea4d16a.

Too late for giant.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2795 from ceph/wip-9873
David Zafman [Fri, 24 Oct 2014 17:49:32 +0000 (10:49 -0700)]
Merge pull request #2795 from ceph/wip-9873

objecter: fix tick_event shutdown race (9873)

Reviewed-by: David Zafman <dzafman@redhat.com>
10 years agoosdc/Objecter: fix tick_event handling in shutdown vs tick race 2795/head
Sage Weil [Fri, 24 Oct 2014 16:32:20 +0000 (09:32 -0700)]
osdc/Objecter: fix tick_event handling in shutdown vs tick race

If we fail to cancel the tick_event, we rely on tick() itself to clear
tick_event.  I'm not quite sure how we got this wrong in the previous
commit, but this boils down to two cases:

1) shutdown() successfully cancels the event and clears tick_event.  tick()
   never runs.  tick_event == NULL when we finish.
2) shutdown() fails to cancel the event because it has already started.  In
   this case tick itself is blocking (or about to block) waiting on the
   rlock.  When it does run it will clear tick_event itself, then see
   initiazed == 0 and exit without rescheduling.

Fixes: #9873
Signed-off-by: Sage Weil <sage@redhat.com>
10 years agocommon/Timer: recheck stopping before sleep if we dropped the lock
Sage Weil [Fri, 24 Oct 2014 16:20:41 +0000 (09:20 -0700)]
common/Timer: recheck stopping before sleep if we dropped the lock

If we have safe_callbacks==false, the stopping flag may have changed while
we were doing our callback. Recheck it and exit to avoid a deadlock on
shutdown.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2787 from ceph/fix-fstat-mode
Sage Weil [Fri, 24 Oct 2014 00:57:02 +0000 (17:57 -0700)]
Merge pull request #2787 from ceph/fix-fstat-mode

java: fill in stat structure correctly

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agojava: fill in stat structure correctly 2787/head
Noah Watkins [Thu, 23 Oct 2014 20:22:52 +0000 (13:22 -0700)]
java: fill in stat structure correctly

Added stat filling helper function but only stat and lstat were updated.
This patch makes fstat use it. Crucially the fstat wasn't updating the
mode flags.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
10 years agoObjecter: resend linger ops on any interval change
Josh Durgin [Mon, 20 Oct 2014 20:29:13 +0000 (13:29 -0700)]
Objecter: resend linger ops on any interval change

Watch/notify ops need to be resent after a pg split occurs, as well as
a few other circumstances that the existing objecter checks did not
catch.

Refactor the check the OSD uses for this to add a version taking the
more basic types instead of the whole OSD map, and stash the needed
info when an op is sent.

Fixes: #9806
Backport: giant, firefly, dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
10 years agoMerge pull request #2785 from athanatos/wip-9821
Sage Weil [Thu, 23 Oct 2014 20:45:26 +0000 (13:45 -0700)]
Merge pull request #2785 from athanatos/wip-9821

PG:: reset_interval_flush and in set_last_peering_reset

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2766 from dachary/wip-9408-buffer-alignment-giant
Samuel Just [Thu, 23 Oct 2014 16:52:00 +0000 (09:52 -0700)]
Merge pull request #2766 from dachary/wip-9408-buffer-alignment-giant

erasure-code: buffer alignment (giant)

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoReplicatedPG: writeout hit_set object with correct prior_version 2798/head
Samuel Just [Thu, 23 Oct 2014 16:11:28 +0000 (09:11 -0700)]
ReplicatedPG: writeout hit_set object with correct prior_version

Fixes: #9875
Backport: giant, firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoqa: use sudo when rsyncing /usr so we can read everything
Greg Farnum [Tue, 21 Oct 2014 17:55:06 +0000 (10:55 -0700)]
qa: use sudo when rsyncing /usr so we can read everything

Signed-off-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit fa07c04231db2d130de54647957ffab4a7a53733)

10 years agoFDCache: purge hoid on clear 2799/head
Samuel Just [Wed, 22 Oct 2014 19:43:55 +0000 (12:43 -0700)]
FDCache: purge hoid on clear

We no longer require that a lock on the FD be held for the duration of an
operation, only while accessing the actual index.  We cannot, therefore, assume
that a racing read during lfn_unlink (backfill or scrub) does not still have a
reference to the fd.  We want to remove the fd from the cache to prevent
subsequent operations from finding it while allowing such a racing read to
complete with its existing fd.

Fixes: #9480
Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoshared_cache: add purge and tests
Samuel Just [Wed, 22 Oct 2014 19:41:25 +0000 (12:41 -0700)]
shared_cache: add purge and tests

purge detaches the lru shared_ptr currently associated from
the key from the lru even if there are still references.

Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoshared_cache::add: do not delete value if existed
Samuel Just [Wed, 22 Oct 2014 19:40:14 +0000 (12:40 -0700)]
shared_cache::add: do not delete value if existed

The method contract specifies that we do not want to delete
value if we are not inserting it, so do not initialize val
at the top of the function to take over value.  No current
users appear to trip over this problem (FDCache and
map_cache).

Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoMerge pull request #2777 from ceph/wip-9859
Sage Weil [Wed, 22 Oct 2014 18:36:07 +0000 (11:36 -0700)]
Merge pull request #2777 from ceph/wip-9859

mon: Monitor: MMonGetMap doesn't require caps

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agomon: Monitor: MMonGetMap doesn't require caps 2777/head
Joao Eduardo Luis [Wed, 22 Oct 2014 18:30:08 +0000 (19:30 +0100)]
mon: Monitor: MMonGetMap doesn't require caps

We are dropping the requirement for MON_CAP_R for MMonGetMap.

Reason is simple enough: clients may need to contact the monitors and
obtain the latest monmap before authenticating.  This happens, for
instance, when a client calls MonClient::get_monmap_privately().  The
osd uses this function during mkfs, prior to initializing a keyring or
even so much as existing.

Fixes: #9859
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
10 years agoPG:: reset_interval_flush and in set_last_peering_reset 2785/head
Samuel Just [Mon, 20 Oct 2014 21:10:58 +0000 (14:10 -0700)]
PG:: reset_interval_flush and in set_last_peering_reset

If we have a change in the prior set, but not in the up/acting set, we go back
through Reset in order to reset peering state.  Previously, we would reset
last_peering_reset in the Reset constructor.  This did not, however, reset the
flush_interval, which caused the eventual flush event to be ignored and the
peering messages to not be sent.

Instead, we will always reset_interval_flush if we are actually changing the
last_peering_reset value.

Fixes: #9821
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoobjecter: Unlock in shutdown before waiting for timer thread
David Zafman [Tue, 21 Oct 2014 07:52:37 +0000 (00:52 -0700)]
objecter: Unlock in shutdown before waiting for timer thread

Fixes: #9845
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: David Zafman <dzafman@redhat.com>
10 years agoerasure-code: use ErasureCode::SIMD_ALIGN in ceph_erasure_code_benchmark 2766/head
Loic Dachary [Mon, 13 Oct 2014 14:43:20 +0000 (16:43 +0200)]
erasure-code: use ErasureCode::SIMD_ALIGN in ceph_erasure_code_benchmark

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
10 years agoerasure-code: add ErasureCode::encode unit test
Loic Dachary [Mon, 13 Oct 2014 12:48:27 +0000 (14:48 +0200)]
erasure-code: add ErasureCode::encode unit test

Re-create and describe the situation that is fixed by
91a7e18f60bbc9acab3045baaa1b6505474ec4a9 which reworks the buffer
preparation function provided by ErasureCode::encode.

http://tracker.ceph.com/issues/9408 Refs: #9408

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
10 years agoerasure-code: expose ErasureCode::SIMD_ALIGN as a const
Loic Dachary [Mon, 13 Oct 2014 12:46:22 +0000 (14:46 +0200)]
erasure-code: expose ErasureCode::SIMD_ALIGN as a const

For test purposes and it will also be useful for plugins that must
ensure the chunk size is a multiple of SIMD_ALIGN.

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
10 years agoceph_erasure_code_benchmark: use 32-byte aligned input
Janne Grunau [Mon, 29 Sep 2014 12:34:32 +0000 (14:34 +0200)]
ceph_erasure_code_benchmark: use 32-byte aligned input

The benchmark is supposed to measure the encoding/decoding speed and
not the overhead of buffer realignments.

Signed-off-by: Janne Grunau <j@jannau.net>
10 years agoerasure code: use 32-byte aligned buffers
Janne Grunau [Mon, 29 Sep 2014 12:34:31 +0000 (14:34 +0200)]
erasure code: use 32-byte aligned buffers

Requiring page aligned buffers and realigning the input if necessary
creates measurable oberhead. ceph_erasure_code_benchmark is between
10-20% faster depending on the workload.

Also prevents a misaligned buffer when bufferlist::c_str(bufferlist)
has to allocate a new buffer to provide continuous one. See bug #9408

Signed-off-by: Janne Grunau <j@jannau.net>
10 years agoerasure code: use a function for the chunk mapping index
Janne Grunau [Mon, 29 Sep 2014 12:34:30 +0000 (14:34 +0200)]
erasure code: use a function for the chunk mapping index

10 years agocommon: add an aligned buffer with less alignment than a page
Loic Dachary [Mon, 13 Oct 2014 14:32:18 +0000 (16:32 +0200)]
common: add an aligned buffer with less alignment than a page

SIMD optimized erasure code computation needs aligned memory. Buffers
aligned to a page boundary are not needed though. The buffers used
for the erasure code computation are typical smaller than a page.

The typical alignment requirements SIMD operations are 16 bytes for
SSE2 and NEON and 32 bytes for AVX/AVX2.

Add new prototypes with an align argument, similar to the one enforcing
page alignment. The implementation is exactly the same, except for the
align parameter. The page alignment method are then implemented as calls
to the more generic methods.

The align parameter is an unsigned (same type as CEPH_PAGE_SIZE). The
CEPH_PAGE_MASK value ( ~(CEPH_PAGE_SIZE - 1) ) was only used as
~CEPH_PAGE_MASK, i.e. equivalent of (CEPH_PAGE_SIZE - 1) once the double
~~ is reduced. These occurrence are replaced with (align - 1). The type
of CEPH_PAGE_MASK is an unsigned long which probably because it was
~(CEPH_PAGE_SIZE). When using (align - 1) as a mask for both
CEPH_PAGE_SIZE and SIMD alignment there is no need to use an unsigned
long because there is no risk of overflowing the unsigned value.

The CYGWIN specific code is also modified but not tested.

Unit tests are added for the new methods.

Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
10 years agocommon: remove dead code in buffer.cc
Loic Dachary [Mon, 13 Oct 2014 14:29:10 +0000 (16:29 +0200)]
common: remove dead code in buffer.cc

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
10 years agoFix read performance regression in ObjectCacher
Adam Crume [Wed, 8 Oct 2014 00:45:53 +0000 (17:45 -0700)]
Fix read performance regression in ObjectCacher

The regression was introduced in commit
4fc9fffc494abedac0a9b1ce44706343f18466f1.  The problem is that the cache
thinks it's full (when it's not), so it defers the read.  This change
frees up cache space if necessary and only defers the read if enough
space cannot be freed.

Fixes: 9513
Signed-off-by: Adam Crume <adamcrume@gmail.com>
(cherry picked from commit 82175ec94acc89dc75da0154f86187fb2e4dbf5e)

10 years agoMerge pull request #2758 from ceph/wip-9820
Sage Weil [Mon, 20 Oct 2014 17:46:48 +0000 (10:46 -0700)]
Merge pull request #2758 from ceph/wip-9820

qa/workunits: cephtool: don't remove self's key on auth tests

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoqa/workunits: cephtool: don't remove self's key on auth tests 2758/head
Joao Eduardo Luis [Mon, 20 Oct 2014 17:00:15 +0000 (18:00 +0100)]
qa/workunits: cephtool: don't remove self's key on auth tests

Suites run with CEPH_TEST_CLI_DUP_COMMAND=1, which will send a duplicate
command for every command issued with the 'ceph' tool.  Behavior is to
get a reply from the command and then send a duplicate, looking for the
same outcome (guaranteeing idempotency of the operations).  However, it
so happens that if you remove the entity's own key from the keyring and
you happen to be unlucky enough so that the client's connection gets
failed (we also run tests with connection failure injections), the
'ceph' tool won't be able to reconnect to the cluster to send the
duplicate command (as it's entity no longer exists in the cluster's
keyring).

We rewrite the test instead of resorting to ugly hacks to work around
this behavior, simply having a new 'role-definer' added by the existing
'role-definer' (which we weren't testing anyway, so bonus points for
that) and then have one removing the other (to test the procedure) and
finally using 'client.admin' to remove the last 'role-definer'.

Fixes: #9820
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
10 years agoMerge pull request #2708 from ceph/wip-9718
Samuel Just [Fri, 17 Oct 2014 17:38:43 +0000 (10:38 -0700)]
Merge pull request #2708 from ceph/wip-9718

osd/osd_types: consider CRUSH_ITEM_NONE in check_new_interval() min_size

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoMerge pull request #2711 from guangyy/wip-9614-followup
Samuel Just [Fri, 17 Oct 2014 17:38:30 +0000 (10:38 -0700)]
Merge pull request #2711 from guangyy/wip-9614-followup

Follow-up fix for 9614

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoMerge pull request #2740 from ceph/giant-unknown-locktype
Sage Weil [Fri, 17 Oct 2014 15:21:16 +0000 (08:21 -0700)]
Merge pull request #2740 from ceph/giant-unknown-locktype

mds: reply -EOPNOTSUPP for unknown lock type

10 years agomds: reply -EOPNOTSUPP for unknown lock type 2740/head
Yan, Zheng [Tue, 14 Oct 2014 14:02:41 +0000 (22:02 +0800)]
mds: reply -EOPNOTSUPP for unknown lock type

Signed-off-by: Yan, Zheng <zyan@redhat.com>
(cherry picked from commit 675392335c53ff7879031fb9184e4f35bcc90fe2)

10 years agoqa/workunits/rbd/import_export.sh: be case insensitive
Sage Weil [Wed, 15 Oct 2014 19:26:00 +0000 (12:26 -0700)]
qa/workunits/rbd/import_export.sh: be case insensitive

Stop tripping over this change (from dumpling).

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2702 from ceph/wip-9706
Yehuda Sadeh [Wed, 15 Oct 2014 15:58:16 +0000 (08:58 -0700)]
Merge pull request #2702 from ceph/wip-9706

objecter: fix session locking, use after frees (#9706)

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
10 years agoMerge pull request #2719 from ceph/wip-inotable-init
Gregory Farnum [Tue, 14 Oct 2014 20:38:50 +0000 (13:38 -0700)]
Merge pull request #2719 from ceph/wip-inotable-init

mds: fix inotable initialization/reset

Reviewed-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
10 years agomds: fix inotable initialization/reset 2719/head
Henry C Chang [Tue, 14 Oct 2014 02:06:04 +0000 (10:06 +0800)]
mds: fix inotable initialization/reset

interval_set::insert takes arguments start and len, not end.

Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com>
(cherry picked from commit c95bb5943450be95e4302e35b3e2df68a6fc34bd)

10 years agoMerge pull request #2707 from ceph/wip-9731
Sage Weil [Mon, 13 Oct 2014 18:17:17 +0000 (11:17 -0700)]
Merge pull request #2707 from ceph/wip-9731

PGLog::IndexedLog::trim(): rollback_info_trimmed_to_riter may be log.ren...

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agorpm: 95-ceph-osd-alt.rules is not needed for centos7 / rhel7
Loic Dachary [Sat, 11 Oct 2014 16:20:36 +0000 (18:20 +0200)]
rpm: 95-ceph-osd-alt.rules is not needed for centos7 / rhel7

The || instead of && had it always installed. That was fixed in EPEL
already.

http://tracker.ceph.com/issues/9747 Fixes: #9747

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit 5ff4a850a0d809b3f25988c6cceb82c35095ef84)

10 years agoThe fix for issue 9614 was not completed, as a result, for those erasure coded PGs... 2711/head
Guang Yang [Mon, 13 Oct 2014 04:18:45 +0000 (04:18 +0000)]
The fix for issue 9614 was not completed, as a result, for those erasure coded PGs with one OSD down, the state was wrongly marked as active+clean+degraded. This patch makes sure the clean flag is not set for such PG.
Signed-off-by: Guang Yang <yguang@yahoo-inc.com>
10 years agoosd/osd_types: consider CRUSH_ITEM_NONE in check_new_interval() min_size check 2708/head
Sage Weil [Sun, 12 Oct 2014 17:05:51 +0000 (10:05 -0700)]
osd/osd_types: consider CRUSH_ITEM_NONE in check_new_interval() min_size check

Fixes: #9718
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoosdc/Objecter: fix use-after-frees in close_session, shutdown 2702/head
Sage Weil [Fri, 10 Oct 2014 23:48:14 +0000 (16:48 -0700)]
osdc/Objecter: fix use-after-frees in close_session, shutdown

For,  linger ops, _session_linger_op_remove invalidates our iterator; add
it to the list first.  Same goes for the others.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoosdc/Objecter: fix tick() session locking
Sage Weil [Fri, 10 Oct 2014 23:36:40 +0000 (16:36 -0700)]
osdc/Objecter: fix tick() session locking

We need to take the session read lock before traversing the ops lists.

Fixes: #9706
Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoPGLog::IndexedLog::trim(): rollback_info_trimmed_to_riter may be log.rend() 2707/head
Samuel Just [Fri, 10 Oct 2014 20:53:29 +0000 (13:53 -0700)]
PGLog::IndexedLog::trim(): rollback_info_trimmed_to_riter may be log.rend()

Fixes: #9731
Backport: giant, firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoMerge pull request #2684 from ceph/wip-9696
Samuel Just [Fri, 10 Oct 2014 17:32:15 +0000 (10:32 -0700)]
Merge pull request #2684 from ceph/wip-9696

PG::choose_acting: in mixed cluster case, acting may include backfill

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2693 from ceph/giant-unused-variable
Gregory Farnum [Fri, 10 Oct 2014 13:59:22 +0000 (06:59 -0700)]
Merge pull request #2693 from ceph/giant-unused-variable

Giant unused variable

10 years agomds: Locker: remove unused variable 2693/head
Yan, Zheng [Fri, 10 Oct 2014 13:36:39 +0000 (21:36 +0800)]
mds: Locker: remove unused variable

Signed-off-by: Yan, Zheng <zyan@redhat.com>
10 years agoMerge pull request #2679 from ceph/giant-locker-null
Yan, Zheng [Fri, 10 Oct 2014 01:37:28 +0000 (09:37 +0800)]
Merge pull request #2679 from ceph/giant-locker-null

mds: Locker: fix a NULL deref in _update_cap_fields

10 years agoPG::choose_acting: in mixed cluster case, acting may include backfill 2684/head
Samuel Just [Thu, 9 Oct 2014 23:21:18 +0000 (16:21 -0700)]
PG::choose_acting: in mixed cluster case, acting may include backfill

Fixes: 9696
Backport: firefly, giant
Introduced: 92cfd370395385ca5537b5bc72220934c9f09026
Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agomds: Locker: fix a NULL deref in _update_cap_fields 2679/head
Greg Farnum [Thu, 9 Oct 2014 22:12:19 +0000 (15:12 -0700)]
mds: Locker: fix a NULL deref in _update_cap_fields

The MClientCaps* is allowed to be NULL, so we can't deref it unless
the dirty param is non-zero. So don't do the ahead-of-time lookup;
just call it explicitly in the if block.

Signed-off-by: Greg Farnum <greg@inktank.com>
10 years agoMerge pull request #2616 from guangyy/wip-giant-9614
Samuel Just [Wed, 8 Oct 2014 18:08:35 +0000 (11:08 -0700)]
Merge pull request #2616 from guangyy/wip-giant-9614

PG::actingset should be used when checking the number of acting OSDs for a given PG.

Backport: firefly
Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoMerge pull request #2663 from ceph/wip-9496
Samuel Just [Wed, 8 Oct 2014 18:07:06 +0000 (11:07 -0700)]
Merge pull request #2663 from ceph/wip-9496

mon: PGMonitor: populate scrub timestamps with 'now' on pg creation

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoMerge pull request #2650 from ceph/wip-9128
Samuel Just [Wed, 8 Oct 2014 18:04:42 +0000 (11:04 -0700)]
Merge pull request #2650 from ceph/wip-9128

Add reset_tp_timeout in long loop in add_source_info for suicide timeout

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoMerge pull request #2543 from ceph/wip-9419
Samuel Just [Wed, 8 Oct 2014 18:01:26 +0000 (11:01 -0700)]
Merge pull request #2543 from ceph/wip-9419

Wip 9419

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoMerge pull request #2661 from dachary/wip-9677-ioprio-class-giant
Loic Dachary [Wed, 8 Oct 2014 06:44:08 +0000 (08:44 +0200)]
Merge pull request #2661 from dachary/wip-9677-ioprio-class-giant

common: ceph_ioprio_string_to_class always returns -EINVAL

10 years agomon: PGMonitor: populate scrub timestamps with 'now' on pg creation 2663/head
Joao Eduardo Luis [Tue, 7 Oct 2014 23:13:49 +0000 (00:13 +0100)]
mon: PGMonitor: populate scrub timestamps with 'now' on pg creation

Fixes: #9496
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
10 years agomon: PGMonitor: prettify access to pg_stats_t in register_pg
Joao Eduardo Luis [Tue, 7 Oct 2014 23:12:29 +0000 (00:12 +0100)]
mon: PGMonitor: prettify access to pg_stats_t in register_pg

Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
10 years agoMerge pull request #2660 from athanatos/wip-9203
Sage Weil [Tue, 7 Oct 2014 21:08:39 +0000 (14:08 -0700)]
Merge pull request #2660 from athanatos/wip-9203

test/osd/Object: don't generate length of 0

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2659 from athanatos/wip-9113
Sage Weil [Tue, 7 Oct 2014 21:06:38 +0000 (14:06 -0700)]
Merge pull request #2659 from athanatos/wip-9113

Wip 9113

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2658 from athanatos/wip-9625
Sage Weil [Tue, 7 Oct 2014 21:03:55 +0000 (14:03 -0700)]
Merge pull request #2658 from athanatos/wip-9625

PG: release backfill reservations if a backfill peer rejects

Reviewed-by: Sage Weil <sage@redhat.com>