]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
10 years agolttng: Add rmw_flags to tracepoint in PG::queue_op
Adam Crume [Wed, 18 Jun 2014 17:52:56 +0000 (10:52 -0700)]
lttng: Add rmw_flags to tracepoint in PG::queue_op

Signed-off-by: Adam Crume <adamcrume@gmail.com>
10 years agolttng: Trace OpRequest
Adam Crume [Sat, 14 Jun 2014 00:17:22 +0000 (17:17 -0700)]
lttng: Trace OpRequest

Signed-off-by: Adam Crume <adamcrume@gmail.com>
10 years agotracing: automake-ify tracepoint generation
Noah Watkins [Fri, 20 Jun 2014 23:49:28 +0000 (16:49 -0700)]
tracing: automake-ify tracepoint generation

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
10 years agolttng: Check for lttng/tracepoint.h when configuring
Adam Crume [Thu, 12 Jun 2014 23:27:19 +0000 (16:27 -0700)]
lttng: Check for lttng/tracepoint.h when configuring

10 years agolttng: add pg and osd tracepoints
Noah Watkins [Sat, 7 Jun 2014 16:37:39 +0000 (09:37 -0700)]
lttng: add pg and osd tracepoints

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
10 years agolttng: trace mutex::unlock
Noah Watkins [Sat, 31 May 2014 22:59:27 +0000 (15:59 -0700)]
lttng: trace mutex::unlock

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
10 years agotracing: bootstrap lttng-ust with mutex events
Noah Watkins [Fri, 30 May 2014 21:13:12 +0000 (14:13 -0700)]
tracing: bootstrap lttng-ust with mutex events

See src/tracing/README.md

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
10 years agoMerge pull request #2295 from dachary/wip-9153-jerasure-upgrade
Sage Weil [Wed, 20 Aug 2014 22:09:42 +0000 (15:09 -0700)]
Merge pull request #2295 from dachary/wip-9153-jerasure-upgrade

erasure-code: do not preload the isa plugin

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoerasure-code: do not preload the isa plugin 2295/head
Loic Dachary [Wed, 20 Aug 2014 21:10:49 +0000 (23:10 +0200)]
erasure-code: do not preload the isa plugin

Because it's not built for all architectures and distributions.

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
10 years agoMerge pull request #2219 from somnathr/wip-optracker-lock
Sage Weil [Wed, 20 Aug 2014 20:08:39 +0000 (13:08 -0700)]
Merge pull request #2219 from somnathr/wip-optracker-lock

TrackedOp: Removed redundant lock in OpTracker::_mark_event()

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoTrackedOp:_dump_op_descriptor is renamed to _dump_op_descriptor_unlocked 2219/head
Pavan Rallabhandi [Wed, 20 Aug 2014 08:31:57 +0000 (14:01 +0530)]
TrackedOp:_dump_op_descriptor is renamed to _dump_op_descriptor_unlocked

Caller don't need to hold lock before calling _dump_op_descriptor(),so,
to reflect this it is renamed to _dump_op_descriptor_unlocked().

Signed-off-by: Pavan Rallabhandi <pavan.rallabhandi@sandisk.com>
Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
10 years agoTrackedOp: Removed redundant lock in OpTracker::_mark_event()
Pavan Rallabhandi [Tue, 5 Aug 2014 11:51:35 +0000 (17:21 +0530)]
TrackedOp: Removed redundant lock in OpTracker::_mark_event()

ops_in_flight_lock seems redundant in OpTracker::_mark_event()
and this lock is highly contended for. Removing the same
is giving a significant performance boost.

Signed-off-by: Pavan Rallabhandi <pavan.rallabhandi@sandisk.com>
10 years agoMerge pull request #2282 from dachary/wip-9153-jerasure-upgrade
Sage Weil [Wed, 20 Aug 2014 17:08:39 +0000 (10:08 -0700)]
Merge pull request #2282 from dachary/wip-9153-jerasure-upgrade

erasure-code: preload the jerasure plugin

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agodoc/start/quick-ceph-deploy: missing {ceph-node} from mon create-initial
Dan Mick [Wed, 20 Aug 2014 04:23:46 +0000 (21:23 -0700)]
doc/start/quick-ceph-deploy: missing {ceph-node} from mon create-initial

Signed-off-by: Dan Mick <dan.mick@inktank.com>
10 years agoMerge pull request #2283 from somnathr/wip-sd-9145
Sage Weil [Wed, 20 Aug 2014 03:56:06 +0000 (20:56 -0700)]
Merge pull request #2283 from somnathr/wip-sd-9145

CollectionIndex: Collection name is added to the access_lock name

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoCollectionIndex: Collection name is added to the access_lock name 2283/head
Somnath Roy [Mon, 18 Aug 2014 23:59:36 +0000 (16:59 -0700)]
CollectionIndex: Collection name is added to the access_lock name

The CollectionIndex constructor is changed to accept the coll_t
so that the collection name can be used to form access_lock(RWLock)
name.This is needed otherwise lockdep will report a recursive lock error
and assert. lockdep needs unique lock names for each Index object.

Fixes: #9145
Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
10 years agoerasure-code: preload the jerasure plugin 2282/head
Loic Dachary [Mon, 18 Aug 2014 23:30:15 +0000 (01:30 +0200)]
erasure-code: preload the jerasure plugin

Load the jerasure plugin when ceph-osd starts to avoid the following
scenario:

* ceph-osd-v1 is running but did not load jerasure

* ceph-osd-v2 is installed being installed but takes time : the files
  are installed before ceph-osd is restarted

* ceph-osd-v1 is required to handle an erasure coded placement group and
  loads jerasure (the v2 version which is not API compatible)

* ceph-osd-v1 calls the v2 jerasure plugin and does not reference the
  expected part of the code and crashes

Although this problem shows in the context of teuthology, it is unlikely
to happen on a real cluster because it involves upgrading immediately
after installing and running an OSD. Once it is backported to firefly,
it will not even happen in teuthology tests because the upgrade from
firefly to master will use the firefly version including this fix.

While it would be possible to walk the plugin directory and preload
whatever it contains, that would not work for plugins such as jerasure
that load other plugins depending on the CPU features, or even plugins
such as isa which only work on specific CPU.

http://tracker.ceph.com/issues/9153 Fixes: #9153

Backport: firefly
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
10 years agoMerge pull request #2043 from guangyy/wip-pg-splitting
Samuel Just [Tue, 19 Aug 2014 22:45:31 +0000 (15:45 -0700)]
Merge pull request #2043 from guangyy/wip-pg-splitting

Support 'expected_num_objects' parameter when creating pool for pg folder splitting

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agomon: fix signed/unsigned warnings
Sage Weil [Tue, 19 Aug 2014 21:33:54 +0000 (14:33 -0700)]
mon: fix signed/unsigned warnings

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2287 from ceph/wip-reweight-tunables
Gregory Farnum [Tue, 19 Aug 2014 20:06:08 +0000 (13:06 -0700)]
Merge pull request #2287 from ceph/wip-reweight-tunables

mon: make reweight-by-* sanity limits configurable

Reviewed-by: Greg Farnum <greg@inktank.com>
10 years agoMerge pull request #2279 from ceph/wip-hadoop
Gregory Farnum [Tue, 19 Aug 2014 18:47:07 +0000 (11:47 -0700)]
Merge pull request #2279 from ceph/wip-hadoop

fix and reorg hadoop workunits

Reviewed-by: Greg Farnum <greg@inktank.com>
10 years agomon: make reweight-by-* sanity limits configurable 2287/head
Sage Weil [Tue, 19 Aug 2014 18:32:07 +0000 (11:32 -0700)]
mon: make reweight-by-* sanity limits configurable

Also drop the somewhat redundant osd_sum.kb check; the main thing we care
about here is

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2199 from ceph/wip-reweight
Sage Weil [Tue, 19 Aug 2014 17:40:42 +0000 (10:40 -0700)]
Merge pull request #2199 from ceph/wip-reweight

mon: allow reweighting of osds by pg (isntead of bytes used)

Reviewed-by: Guang Yang <yguang@yahoo-inc.com>
10 years agomon/OSDMonitor: respect CRUSH weights for reweight-by-pg 2199/head
Sage Weil [Tue, 12 Aug 2014 03:54:38 +0000 (20:54 -0700)]
mon/OSDMonitor: respect CRUSH weights for reweight-by-pg

Do not assume that all OSDs are weighted equally for reweight-by-pg.

Note that reweight-by-utilization already reweights based on the size of
the OSD volume; we presume that this is already reflected by the CRUSH
weights.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agomon/OSDMonitor: reweight-by-pg for pool(s)
Sage Weil [Wed, 6 Aug 2014 15:51:18 +0000 (08:51 -0700)]
mon/OSDMonitor: reweight-by-pg for pool(s)

Allow the reweight-by-pg to look at a specific set of pools.  If the list
is ommitted, use PGs from all pools.  This allows you to focus on a
specific pool (the one that will dominate data usage).  Otherwise things
may not be quite right because other pools may have PGs that contain
much less data.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agomon/OSDMonitor: adjust weights up, when possible
Sage Weil [Wed, 6 Aug 2014 15:35:07 +0000 (08:35 -0700)]
mon/OSDMonitor: adjust weights up, when possible

Note when OSDs are underloaded, as well.  If that is the case, adjust the
OSD reweight value if, if possible.  (It won't always be possible since
weights are capped at 1.)

Note that we set the underload threshold to the average, as we want to
aggressively adjust weights up (back to 1.0) whenever possible.  This gets
us a more efficient mapping calculation and reduces the amount of "noise"
in the weights.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoqa/workunits/cephtool/test.sh: test reweight-by-pg
Sage Weil [Tue, 19 Aug 2014 03:57:28 +0000 (20:57 -0700)]
qa/workunits/cephtool/test.sh: test reweight-by-pg

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agomon/OSDMonitor: reweight-by-pg
Sage Weil [Mon, 4 Aug 2014 22:40:35 +0000 (15:40 -0700)]
mon/OSDMonitor: reweight-by-pg

This is just like reweight-by-utilization, but looks purely at the PG to
OSD mapping, not at the number of bytes used on the target disks.  This
allows the reweighting to be done before any data is written into the
cluster, when no data will need to migrate as a result of the reweight.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoAdd tests for the collection hint OP: 1) Store Test 2) Idempotent Test. 2043/head
Guang Yang [Wed, 9 Jul 2014 07:45:58 +0000 (07:45 +0000)]
Add tests for the collection hint OP: 1) Store Test 2) Idempotent Test.
Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
10 years agoImplement the collection hint transaction, add a new transation type as expected...
Guang Yang [Mon, 7 Jul 2014 11:32:23 +0000 (11:32 +0000)]
Implement the collection hint transaction, add a new transation type as expected number of objects.
Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
10 years agoAdd a new transaction OP (collection hint) to ObjectStore.
Guang Yang [Mon, 7 Jul 2014 07:37:02 +0000 (07:37 +0000)]
Add a new transaction OP (collection hint) to ObjectStore.
Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
10 years agoAdd a new monitor command to let user specify the expected number of objects during...
Guang Yang [Mon, 30 Jun 2014 07:22:17 +0000 (07:22 +0000)]
Add a new monitor command to let user specify the expected number of objects during pool creation.

Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
10 years agoAdd a new field 'expected_num_objects' to pg_pool_t which denotes the expected number...
Guang Yang [Mon, 30 Jun 2014 05:42:49 +0000 (05:42 +0000)]
Add a new field 'expected_num_objects' to pg_pool_t which denotes the expected number of objects on this pool.

Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
10 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Tue, 19 Aug 2014 04:10:32 +0000 (21:10 -0700)]
Merge remote-tracking branch 'gh/next'

10 years agodoc: Removed quick guide and wireshark from top-level IA.
John Wilkins [Mon, 18 Aug 2014 21:29:09 +0000 (14:29 -0700)]
doc: Removed quick guide and wireshark from top-level IA.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
10 years agodoc: Move wireshark documentation to dev.
John Wilkins [Mon, 18 Aug 2014 21:28:38 +0000 (14:28 -0700)]
doc: Move wireshark documentation to dev.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
10 years agodoc/release-notes: v0.84
Sage Weil [Mon, 18 Aug 2014 18:57:59 +0000 (11:57 -0700)]
doc/release-notes: v0.84

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2280 from ceph/wip-fs-docs
Sage Weil [Mon, 18 Aug 2014 17:04:41 +0000 (10:04 -0700)]
Merge pull request #2280 from ceph/wip-fs-docs

doc: add notes on using "ceph fs new"

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agodoc: add notes on using "ceph fs new" 2280/head
john [Mon, 18 Aug 2014 15:57:25 +0000 (16:57 +0100)]
doc: add notes on using "ceph fs new"

Signed-off-by: John Spray <john.spray@redhat.com>
10 years ago0.84 v0.84
Jenkins [Mon, 18 Aug 2014 16:02:20 +0000 (09:02 -0700)]
0.84

10 years agoqa/workunits/rbd/qemu-iotests: touch common.env
Sage Weil [Mon, 18 Aug 2014 03:54:28 +0000 (20:54 -0700)]
qa/workunits/rbd/qemu-iotests: touch common.env

This seems to be necessary on trusty.

Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 055be68cf8e1b84287ab3631a02e89a9f3ae6cca)

10 years agoqa/workunits/hadoop: move all hadoop tests into a hadoop/ dir 2279/head
Sage Weil [Mon, 18 Aug 2014 15:39:14 +0000 (08:39 -0700)]
qa/workunits/hadoop: move all hadoop tests into a hadoop/ dir

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoqa/workunits/hadoop-wordcount: fix/use -rmr command
Sage Weil [Mon, 18 Aug 2014 15:38:10 +0000 (08:38 -0700)]
qa/workunits/hadoop-wordcount: fix/use -rmr command

-rm -r -f ... doesn't seem to work; use -rmr instead.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoqa/workunits/hadoop-wordcount: use -x
Sage Weil [Mon, 18 Aug 2014 15:37:38 +0000 (08:37 -0700)]
qa/workunits/hadoop-wordcount: use -x

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoqa/workunits/rbd/qemu-iotests: touch common.env
Sage Weil [Mon, 18 Aug 2014 03:54:28 +0000 (20:54 -0700)]
qa/workunits/rbd/qemu-iotests: touch common.env

This seems to be necessary on trusty.

Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2010 from ceph/wip-misplaced
Sage Weil [Mon, 18 Aug 2014 03:49:05 +0000 (20:49 -0700)]
Merge pull request #2010 from ceph/wip-misplaced

osd: track misplaced objects separately from degraded objects

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoqa/workunits/rest/test.py: use rbd instead of data pool for size tests
Sage Weil [Sun, 17 Aug 2014 04:56:00 +0000 (21:56 -0700)]
qa/workunits/rest/test.py: use rbd instead of data pool for size tests

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoqa/workunits/rest/test.py: do snap test on our data2/3 pool
Sage Weil [Sun, 17 Aug 2014 04:22:48 +0000 (21:22 -0700)]
qa/workunits/rest/test.py: do snap test on our data2/3 pool

This way it works when a 'data' pool doesn't already exist.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoqa/workunits/rest/test.py: fix rd_kb -> rd_bytes
Sage Weil [Sun, 17 Aug 2014 04:13:21 +0000 (21:13 -0700)]
qa/workunits/rest/test.py: fix rd_kb -> rd_bytes

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2272 from ceph/wip-8621
Sage Weil [Sun, 17 Aug 2014 05:04:13 +0000 (22:04 -0700)]
Merge pull request #2272 from ceph/wip-8621

Wip 8621

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoosd: fix theoretical use-after-free of OSDMap
Sage Weil [Sat, 16 Aug 2014 21:51:31 +0000 (14:51 -0700)]
osd: fix theoretical use-after-free of OSDMap

In practice, the map will remain pinned for a while, but this
will make coverity happy.

*** CID 1231685:  Use after free  (USE_AFTER_FREE)
/osd/OSD.cc: 6223 in OSD::handle_osd_map(MOSDMap *)()
6217
6218           if (o->test_flag(CEPH_OSDMAP_FULL))
6219            last_marked_full = e;
6220           pinned_maps.push_back(add_map(o));
6221
6222           bufferlist fbl;
>>>     CID 1231685:  Use after free  (USE_AFTER_FREE)
>>>     Calling "encode" dereferences freed pointer "o".
6223           o->encode(fbl);
6224
6225           hobject_t fulloid = get_osdmap_pobject_name(e);
6226           t.write(coll_t::META_COLL, fulloid, 0, fbl.length(), fbl);
6227           pin_map_bl(e, fbl);
6228           continue;

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2259 from ceph/wip-9039
Sage Weil [Sat, 16 Aug 2014 20:41:41 +0000 (13:41 -0700)]
Merge pull request #2259 from ceph/wip-9039

Wip 9039

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agovstart.sh: make filestore fd cache size smaller 2010/head
Sage Weil [Wed, 2 Jul 2014 16:27:52 +0000 (09:27 -0700)]
vstart.sh: make filestore fd cache size smaller

I hit the fd limit on a vstart cluster with the default 128; reduce this
to 16.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agomon: track stuck undersized
Sage Weil [Wed, 2 Jul 2014 16:10:23 +0000 (09:10 -0700)]
mon: track stuck undersized

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agomon: track pgs that get stuck degraded
Sage Weil [Tue, 1 Jul 2014 00:18:24 +0000 (17:18 -0700)]
mon: track pgs that get stuck degraded

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoosd: track last_fullsized in pg_stat_t
Sage Weil [Wed, 2 Jul 2014 16:13:09 +0000 (09:13 -0700)]
osd: track last_fullsized in pg_stat_t

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoosd: track last_undegraded pg stat
Sage Weil [Tue, 1 Jul 2014 00:18:05 +0000 (17:18 -0700)]
osd: track last_undegraded pg stat

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoosd/osd_types: add last_undegraded, last_undersized to pg_stat_t
Sage Weil [Tue, 1 Jul 2014 00:17:51 +0000 (17:17 -0700)]
osd/osd_types: add last_undegraded, last_undersized to pg_stat_t

Keep track of the last time the PG was known to not be degraded or
undersized.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoosd/PG: track PG_STATE_UNDERSIZED separately from DEGRADED
Sage Weil [Thu, 3 Jul 2014 03:28:07 +0000 (20:28 -0700)]
osd/PG: track PG_STATE_UNDERSIZED separately from DEGRADED

DEGRADED means there are objects without complete reduncancy; also check
for needs_recovery().

UNDERSIZED means acting set is too small.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoosd: add PG_STATE_UNDERSIZED
Sage Weil [Wed, 2 Jul 2014 01:08:33 +0000 (18:08 -0700)]
osd: add PG_STATE_UNDERSIZED

This is a distinct concept from degraded.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoosd/PG: account for misplaces separately than degraded
Sage Weil [Sat, 21 Jun 2014 00:58:23 +0000 (17:58 -0700)]
osd/PG: account for misplaces separately than degraded

A degraded object does not have enough replicas or shards, while a
misplaced object is not stored in the correct place.  Account for them
separately.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agolibrados: approximate legacy 'degraded' value
Sage Weil [Sat, 21 Jun 2014 01:09:12 +0000 (18:09 -0700)]
librados: approximate legacy 'degraded' value

The librados API returns a degraded count and no misplaced count.  Sum them
to approximate the old behavior.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agomon: warn about misplaced objects, just like degraded
Sage Weil [Sat, 21 Jun 2014 00:56:25 +0000 (17:56 -0700)]
mon: warn about misplaced objects, just like degraded

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoosd: num_objects_misplaced
Sage Weil [Sat, 14 Jun 2014 16:21:52 +0000 (09:21 -0700)]
osd: num_objects_misplaced

Signed-off-by: Sage Weil <sage@inktank.com>
10 years agoMerge pull request #2217 from ceph/wip-problem-osds
Sage Weil [Sat, 16 Aug 2014 20:15:10 +0000 (13:15 -0700)]
Merge pull request #2217 from ceph/wip-problem-osds

mon: 'ceph osd blocked-by' for histogram of peers OSDs are waiting for

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoqa/workunits/rest/test.py: fix 'df' test to use total_used_bytes
Sage Weil [Sat, 16 Aug 2014 20:06:02 +0000 (13:06 -0700)]
qa/workunits/rest/test.py: fix 'df' test to use total_used_bytes

This changed back in ee2dbdb0f5e54fe6f9c5999c032063b084424c4c

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoRevert "os/FileJournal: Update the journal header when closing journal"
Sage Weil [Sat, 16 Aug 2014 19:56:39 +0000 (12:56 -0700)]
Revert "os/FileJournal: Update the journal header when closing journal"

This reverts commit 4eb18dd487da4cb621dcbecfc475fc0871b356ac.

This may be responsible for #9073.  Until that is resolved, revert.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2271 from ceph/wip-9053
Sage Weil [Sat, 16 Aug 2014 16:18:19 +0000 (09:18 -0700)]
Merge pull request #2271 from ceph/wip-9053

paxos: fix problem with disjoint quorum members

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
10 years agorgw: update civetweb submodule 2272/head
Yehuda Sadeh [Fri, 15 Aug 2014 20:28:35 +0000 (13:28 -0700)]
rgw: update civetweb submodule

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
10 years agoMerge pull request #2270 from ceph/wip-init-ceph
Alfredo Deza [Fri, 15 Aug 2014 23:42:59 +0000 (19:42 -0400)]
Merge pull request #2270 from ceph/wip-init-ceph

init-ceph: don't use bashism

Reviewed-by: Alfredo Deza <adeza@redhat.com>
10 years agoinit-ceph: don't use bashism 2270/head
Sage Weil [Fri, 15 Aug 2014 23:41:43 +0000 (16:41 -0700)]
init-ceph: don't use bashism

       -z STRING
              the length of STRING is zero

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2247 from ceph/wip-ceph-disk
Alfredo Deza [Fri, 15 Aug 2014 23:40:15 +0000 (19:40 -0400)]
Merge pull request #2247 from ceph/wip-ceph-disk

ceph-disk: fix various dmcrypt bugs

Reviewed-by: Alfredo Deza <adeza@redhat.com>
10 years agoMerge pull request #2269 from ceph/wip-osd-mon-feature
Loic Dachary [Fri, 15 Aug 2014 22:19:59 +0000 (00:19 +0200)]
Merge pull request #2269 from ceph/wip-osd-mon-feature

osd: fix mon feature requirement

Reviewed-by: Loic Dachary <loic@dachary.org>
10 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Fri, 15 Aug 2014 22:01:23 +0000 (15:01 -0700)]
Merge remote-tracking branch 'gh/next'

10 years agoFix -Wno-format and -Werror=format-security options clash
Boris Ranto [Fri, 15 Aug 2014 17:34:27 +0000 (19:34 +0200)]
Fix -Wno-format and -Werror=format-security options clash

This causes build failure in latest fedora builds, ceph_test_librbd_fsx adds -Wno-format cflag but the default AM_CFLAGS already contain -Werror=format-security, in previous releases, this was tolerated but in the latest fedora rawhide it no longer is, ceph_test_librbd_fsx builds fine without -Wno-format on x86_64 so there is likely no need for the flag anymore

Signed-off-by: Boris Ranto <branto@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoosd: fix feature requirement for mons 2269/head
Sage Weil [Fri, 15 Aug 2014 21:28:57 +0000 (14:28 -0700)]
osd: fix feature requirement for mons

These features should be set on the client_messenger, not
cluster_messenger.

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2268 from ceph/wip-9119
Sage Weil [Fri, 15 Aug 2014 21:11:10 +0000 (14:11 -0700)]
Merge pull request #2268 from ceph/wip-9119

Wip 9119

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoReplicatedPG::maybe_handle_cache: do not forward RWORDERED reads 2268/head
Samuel Just [Thu, 14 Aug 2014 18:13:31 +0000 (11:13 -0700)]
ReplicatedPG::maybe_handle_cache: do not forward RWORDERED reads

Even with READFORWARD, we can't forward RWORDERED reads.

Fixes: #9119
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoReplicatedPG::cancel_copy: clear cop->obc
Samuel Just [Tue, 12 Aug 2014 23:41:38 +0000 (16:41 -0700)]
ReplicatedPG::cancel_copy: clear cop->obc

Otherwise, an objecter callback might still be hanging
onto this reference until after the flush.

Fixes: #8894
Introduced: 589b639af7c8834a1e6293d58d77a9c440107bc3
Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoMerge pull request #2264 from ceph/wip-crush-features
Sage Weil [Fri, 15 Aug 2014 20:55:36 +0000 (13:55 -0700)]
Merge pull request #2264 from ceph/wip-crush-features

do not require crush features for rules that aren't being used

Reviewed-by: Loic Dachary <loic@dachary.org>
10 years agounittest_osdmap: test EC rule and pool features 2264/head
Sage Weil [Fri, 15 Aug 2014 20:54:11 +0000 (13:54 -0700)]
unittest_osdmap: test EC rule and pool features

TODO: tiering feature bits.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2266 from kevincox/removewirehsark
Sage Weil [Fri, 15 Aug 2014 20:41:15 +0000 (13:41 -0700)]
Merge pull request #2266 from kevincox/removewirehsark

Remove Old Wireshark Dissectors

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2070 from somnathr/wip-sd-filestore-optimization
Samuel Just [Fri, 15 Aug 2014 20:37:54 +0000 (13:37 -0700)]
Merge pull request #2070 from somnathr/wip-sd-filestore-optimization

Wip sd filestore optimization

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoRemove Old Wireshark Dissectors 2266/head
Kevin Cox [Fri, 15 Aug 2014 19:27:13 +0000 (15:27 -0400)]
Remove Old Wireshark Dissectors

Remove the two old Wireshark plugins.  They do not build and are
superseded by the dissector which is inside Wireshark.

Signed-Off-By: Kevin Cox <kevincox@kevincox.ca>
10 years agoosd: only require crush features for rules that are actually used
Sage Weil [Fri, 15 Aug 2014 15:55:10 +0000 (08:55 -0700)]
osd: only require crush features for rules that are actually used

Often there will be a CRUSH rule present for erasure coding that uses the
new CRUSH steps or indep mode.  If these rules are not referenced by any
pool, we do not need clients to support the mapping behavior.  This is true
because the encoding has not changed; only the expected CRUSH output.

Fixes: #8963
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
10 years agocrush: add is_v[23]_rule(ruleid) methods
Sage Weil [Fri, 15 Aug 2014 15:52:37 +0000 (08:52 -0700)]
crush: add is_v[23]_rule(ruleid) methods

Add methods to check if a *specific* rule uses v2 or v3 features.  Refactor
the existing checks to use these.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2213 from dachary/wip-9025-chunk-remapping
Loic Dachary [Fri, 15 Aug 2014 10:43:03 +0000 (12:43 +0200)]
Merge pull request #2213 from dachary/wip-9025-chunk-remapping

erasure-code: chunk remapping

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agomon/Paxos: share state and verify contiguity early in collect phase 2271/head
Sage Weil [Wed, 13 Aug 2014 23:17:02 +0000 (16:17 -0700)]
mon/Paxos: share state and verify contiguity early in collect phase

We verify peons are contiguous and share new paxos states to catch peons
up at the end of the round.  Do this each time we (potentially) get new
states via a collect message.  This will allow peons to be pulled forward
and remain contiguous when they otherwise would not have been able to.
For example, if

  mon.0 (leader)  20..30
  mon.1 (peon)    15..25
  mon.2 (peon)    28..40

If we got mon.1 first and then mon.2 second, we would store the new txns
and then boot mon.1 out at the end because 15..25 is not contiguous with
28..40.  However, with this change, we share 26..30 to mon.1 when we get
the collect, and then 31..40 when we get mon.2's collect, pulling them
both into the final quorum.

It also breaks the 'catch-up' work into smaller pieces, which ought to
smooth out latency a bit.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agomon/Paxos: verify all new peons are still contiguous at end of round
Sage Weil [Thu, 14 Aug 2014 23:55:58 +0000 (16:55 -0700)]
mon/Paxos: verify all new peons are still contiguous at end of round

During the collect phase we verify that each peon has overlapping or
contiguous versions as us (and can therefore be caught up with some
series of transactions).  However, we *also* assimilate any new states we
get from those peers, and that may move our own first_committed forward
in time.  This means that an early responder might have originally been
contiguous, but a later one moved us forward, and when the round finished
they were not contiguous any more.  This leads to a crash on the peon
when they get our first begin message.

For example:

 - we have 10..20
 - first peon has 5..15
   - ok!
 - second peon has 18..30
   - we apply this state
 - we are now 18..30
 - we finish the round
   - send commit to first peon (empty.. we aren't contiguous)
   - send no commit to second peon (we match)
 - we send a begin for state 31
   - first peon crashes (it's lc is still 15)

Prevent this by checking at the end of the round if we are still
contiguous.  If not, bootstrap.  This is similar to the check we do above,
but reverse to make sure *we* aren't too far ahead of *them*.

Fixes: #9053
Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoerasure-code: remap chunks if not sequential 2213/head
Loic Dachary [Tue, 3 Jun 2014 17:27:26 +0000 (19:27 +0200)]
erasure-code: remap chunks if not sequential

If the remap vector is not empty, use it to figure out the sequence of
data chunks.

http://tracker.ceph.com/issues/9025 Fixes: #9025

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agoerasure-code: parse function for the mapping parameter
Loic Dachary [Tue, 3 Jun 2014 20:20:29 +0000 (22:20 +0200)]
erasure-code: parse function for the mapping parameter

Each D letter is a data chunk. For instance:

    _DDD_DDD

is going to parse into:

   [ 1, 2, 3, 5, 6, 7 ]

the 0 and 4 positions are not used by chunks and do not show in the
mapping. Implement ErasureCode::parse to support a reasonable default
for the mapping parameter.

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agoerasure-code: ErasureCodeInterface::get_chunk_mapping()
Loic Dachary [Tue, 3 Jun 2014 15:45:47 +0000 (17:45 +0200)]
erasure-code: ErasureCodeInterface::get_chunk_mapping()

Add support for erasure code plugins that do not sequentially map the
chunks encoded to the corresponding index. This is mostly transparent to
the caller, except when it comes to retrieving the data chunks when
reading. For this purpose there needs to be a remapping function so the
caller has a way to figure out which chunks actually contain the data
and reorder them.

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agorgw: update civetweb submodule
Yehuda Sadeh [Fri, 1 Aug 2014 23:34:16 +0000 (16:34 -0700)]
rgw: update civetweb submodule

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
10 years agorgw: don't allow negative / invalid content length
Yehuda Sadeh [Fri, 1 Aug 2014 23:15:36 +0000 (16:15 -0700)]
rgw: don't allow negative / invalid content length

Certain frontends (e.g., civetweb) don't filter such requests.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
10 years agorgw: log civetweb messages
Yehuda Sadeh [Fri, 1 Aug 2014 21:09:48 +0000 (14:09 -0700)]
rgw: log civetweb messages

Handle the civetweb log_message callback, divert messages into our debug
log.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
10 years agorgw: disable civetweb url decoding
Yehuda Sadeh [Thu, 31 Jul 2014 04:32:48 +0000 (21:32 -0700)]
rgw: disable civetweb url decoding

Fixes: #8621
We want to have the raw request uri, as we do the decoding ourselves.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
10 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Thu, 14 Aug 2014 23:02:22 +0000 (16:02 -0700)]
Merge remote-tracking branch 'gh/next'

10 years agoFileStore: Introduced a RLock instead of WLock 2070/head
Somnath Roy [Thu, 31 Jul 2014 22:03:53 +0000 (15:03 -0700)]
FileStore: Introduced a RLock instead of WLock

While calling index->collection_version, there is no need to
hold WLock at the index level. RLock should be sufficient.

Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
10 years agoFileStore: No need to hold Index lock during omap calls
Somnath Roy [Thu, 31 Jul 2014 21:56:42 +0000 (14:56 -0700)]
FileStore: No need to hold Index lock during omap calls

The Index lock is held during all the omap calls which is
not necessary.

Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
10 years agoFileStore: FDCache lookup is rearranged
Somnath Roy [Mon, 30 Jun 2014 08:54:36 +0000 (01:54 -0700)]
FileStore: FDCache lookup is rearranged

In lfn_open() there is no point of building the Index if the
cache lookup is successful and caller is not asking for Index.

Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>