]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
10 years agoOSD: move waiting_for_pg into the session structures 2226/head
Samuel Just [Mon, 4 Aug 2014 22:30:41 +0000 (15:30 -0700)]
OSD: move waiting_for_pg into the session structures

Each message belongs to a session.  Further, no ordering is implied
between messages which arrived on different sessions.  Breaking the
global waiting_for_pg structure into a per-session structure lets
us avoid the problem of taking a write lock on a global structure
(pg_map_lock) in get_pg_or_queue_for_pg at the cost of some
complexity in updating each session's waiting_for_pg structure when
we receive a new map (due to pg splits) or when we locally create
a pg.

Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoOSD::shutdown: actually drop sessions waiting on map
Samuel Just [Mon, 4 Aug 2014 22:31:06 +0000 (15:31 -0700)]
OSD::shutdown: actually drop sessions waiting on map

There might be messages for which we still don't have the
map.  Dispatching waiting won't actually help.

Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoOSD: clear_session_waiting_on_map in ms_handle_reset
Samuel Just [Tue, 29 Jul 2014 22:54:37 +0000 (15:54 -0700)]
OSD: clear_session_waiting_on_map in ms_handle_reset

Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoOSD: rename session_waiting_for_map_lock to session_waiting_lock
Samuel Just [Tue, 29 Jul 2014 22:33:30 +0000 (15:33 -0700)]
OSD: rename session_waiting_for_map_lock to session_waiting_lock

This lock will also protect the waiting_for_pg structures in each
session.

Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoOSD: wake_pg_waiters outside of the pgmap write_lock, pg_lock
Samuel Just [Tue, 5 Aug 2014 19:57:43 +0000 (12:57 -0700)]
OSD: wake_pg_waiters outside of the pgmap write_lock, pg_lock

Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoOSD: fix wake_pg_waiters revert error in _open_lock_pg
Samuel Just [Tue, 5 Aug 2014 20:00:01 +0000 (13:00 -0700)]
OSD: fix wake_pg_waiters revert error in _open_lock_pg

231fe1b685bfbd3db9c81709ca39a29d696b13ad reintroduced erroneously
this call to wake_pg_waiters.  All _create_lock_pg callers handle
calling wake_pg_waiters after the pg lock has been dropped.

Fixes: #8691
Signed-off-by: Samuel Just <sam.just@inktank.com>
10 years agoMerge remote-tracking branch 'gh/wip-filestore-bigxattr'
Sage Weil [Thu, 7 Aug 2014 18:21:06 +0000 (11:21 -0700)]
Merge remote-tracking branch 'gh/wip-filestore-bigxattr'

10 years agopowerdns: Update README with better markdown
Wido den Hollander [Wed, 6 Aug 2014 15:14:04 +0000 (17:14 +0200)]
powerdns: Update README with better markdown

10 years agoMerge pull request #2220 from somnathr/wip-lock-leak-fix
Sage Weil [Thu, 7 Aug 2014 01:42:18 +0000 (18:42 -0700)]
Merge pull request #2220 from somnathr/wip-lock-leak-fix

RadosClient: Fixing potential lock leaks.

Backport: firefly
Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoRadosClient: Fixing potential lock leaks. 2220/head
Pavan Rallabhandi [Wed, 6 Aug 2014 09:40:14 +0000 (15:10 +0530)]
RadosClient: Fixing potential lock leaks.

In lookup_pool and pool_delete, a lock is taken
before invoking wait_for_osdmap, but is not
released for the failure case of the call. Fixing the same.

Fixes: #9022
Signed-off-by: Pavan Rallabhandi <pavan.rallabhandi@sandisk.com>
10 years agoos/FileStore: force any new xattr into omap on E2BIG
Sage Weil [Thu, 7 Aug 2014 00:28:45 +0000 (17:28 -0700)]
os/FileStore: force any new xattr into omap on E2BIG

If we have a huge xattr (or many little ones), the _fgetattrs() for the
inline_set will fail with E2BIG.  The conditions later where we decide
whether to clean up the old xattr will then also fail.  We *will* put
the xattr in omap, but the non-omap version isn't cleaned up.

Fix this by setting a flag if we get E2BIG that the inline_set is known
to be incomplete.  In that case, take the conservative step of assuming
the xattr might be present and chain_fremovexattr().  Ignore any error
because it might not be there.

This is clearly harmless in the general case because it won't be there.
If it is, we will hopefully remove enough xattrs that the E2BIG
condition will go away (usually by removing some really big chained
xattr).

See original bug #7779.  With this in place, we can repair objects in
the broken state if we know the rados attr(s) that are responsible.
Usually that is user.rgw.manifset, and a rados get + set of the attr
will repair things.

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2218 from ceph/wip-rados-xattr
Yehuda Sadeh [Thu, 7 Aug 2014 00:11:44 +0000 (17:11 -0700)]
Merge pull request #2218 from ceph/wip-rados-xattr

rados: fix get/setxattr commands up

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
10 years agorados: use STD{IN,OUT}_FILENO for magic values 2218/head
Sage Weil [Thu, 7 Aug 2014 00:01:29 +0000 (17:01 -0700)]
rados: use STD{IN,OUT}_FILENO for magic values

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoqa/workunits/rados/test_rados_tool: add a few xattr tests
Sage Weil [Thu, 7 Aug 2014 00:00:57 +0000 (17:00 -0700)]
qa/workunits/rados/test_rados_tool: add a few xattr tests

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agorados: optionally read setxattr value from stdin
Sage Weil [Wed, 6 Aug 2014 22:09:22 +0000 (15:09 -0700)]
rados: optionally read setxattr value from stdin

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agorados: don't add \n to getxattr
Sage Weil [Wed, 6 Aug 2014 22:07:17 +0000 (15:07 -0700)]
rados: don't add \n to getxattr

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoqa/workunits/cephtool/test.sh: fix 'ceph df ...' tests
Sage Weil [Wed, 6 Aug 2014 20:16:49 +0000 (13:16 -0700)]
qa/workunits/cephtool/test.sh: fix 'ceph df ...' tests

Broken by ee2dbdb0f5e54fe6f9c5999c032063b084424c4c and friends.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoPendingReleaseNotes: make note about rbd cache default change
Sage Weil [Wed, 6 Aug 2014 18:37:22 +0000 (11:37 -0700)]
PendingReleaseNotes: make note about rbd cache default change

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2123 from ceph/wip-rbd-flush
Josh Durgin [Wed, 6 Aug 2014 18:28:56 +0000 (11:28 -0700)]
Merge pull request #2123 from ceph/wip-rbd-flush

librbd: enable rbd cache by default; writethrough until flush

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
10 years agoMerge pull request #2205 from ceph/wip-librbd-snap-meta
Sage Weil [Wed, 6 Aug 2014 18:08:43 +0000 (11:08 -0700)]
Merge pull request #2205 from ceph/wip-librbd-snap-meta

librbd: fix crash with a chain of flattened images

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agorocksdb: fix i386 build
Sage Weil [Wed, 6 Aug 2014 14:52:31 +0000 (07:52 -0700)]
rocksdb: fix i386 build

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge remote-tracking branch 'gh/wip-test-ceph-disk'
Sage Weil [Wed, 6 Aug 2014 17:41:36 +0000 (10:41 -0700)]
Merge remote-tracking branch 'gh/wip-test-ceph-disk'

10 years agoMerge pull request #2215 from ceph/wip-kb
Sage Weil [Wed, 6 Aug 2014 17:33:36 +0000 (10:33 -0700)]
Merge pull request #2215 from ceph/wip-kb

mon: clean up _kb fields in json and perf counter output

Reviewed-by: John Spray <john.spray@redhat.com>
10 years agomon/PGMonitor: remove {rd,wr}_kb from pool stat dumps 2215/head
Sage Weil [Wed, 6 Aug 2014 16:31:59 +0000 (09:31 -0700)]
mon/PGMonitor: remove {rd,wr}_kb from pool stat dumps

These fields are replaced with corresponding *_bytes fields.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agomon: remove *_kb perf counters
Sage Weil [Wed, 6 Aug 2014 17:33:02 +0000 (10:33 -0700)]
mon: remove *_kb perf counters

This is an incompatible change.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agomon/PGMonitor: add _bytes perf counters
Sage Weil [Wed, 6 Aug 2014 16:18:27 +0000 (09:18 -0700)]
mon/PGMonitor: add _bytes perf counters

Leave the _kb ones in place for now.

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
10 years agomon/PGMonitor: add _bytes fields for all usage dumps
Sage Weil [Wed, 6 Aug 2014 16:14:40 +0000 (09:14 -0700)]
mon/PGMonitor: add _bytes fields for all usage dumps

Leave the _kb ones in place for now.

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2165 from dachary/wip-mailmap
Loic Dachary [Wed, 6 Aug 2014 15:23:17 +0000 (17:23 +0200)]
Merge pull request #2165 from dachary/wip-mailmap

mailmap updates

10 years agoREADME.md: word wrap
Sage Weil [Wed, 6 Aug 2014 15:16:21 +0000 (08:16 -0700)]
README.md: word wrap

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoREADME: symlink from README.md
Sage Weil [Wed, 6 Aug 2014 15:15:35 +0000 (08:15 -0700)]
README: symlink from README.md

It looks better as markdown than rendered as text via the markdown tool,
so just symlink it.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2202 from xinxinsh/enable-rocksdb-log-level
Sage Weil [Wed, 6 Aug 2014 14:46:04 +0000 (07:46 -0700)]
Merge pull request #2202 from xinxinsh/enable-rocksdb-log-level

Enable rocksdb log level

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agotest/osd/osd-test-helpers: mkdir -p for ceph-disk
Sage Weil [Tue, 5 Aug 2014 23:48:30 +0000 (16:48 -0700)]
test/osd/osd-test-helpers: mkdir -p for ceph-disk

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2208 from lpabon/osd_dev_doc 2221/head
Sage Weil [Tue, 5 Aug 2014 22:25:40 +0000 (15:25 -0700)]
Merge pull request #2208 from lpabon/osd_dev_doc

Developer quick start guide

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agotest/ceph-disk.sh: mkdir -p
Sage Weil [Tue, 5 Aug 2014 22:11:18 +0000 (15:11 -0700)]
test/ceph-disk.sh: mkdir -p

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge remote-tracking branch 'upstream/next' into wip-sam-testing
Samuel Just [Tue, 5 Aug 2014 20:49:50 +0000 (13:49 -0700)]
Merge remote-tracking branch 'upstream/next' into wip-sam-testing

Conflicts:
src/osd/OSD.cc

10 years agoRenamed README to README.md to render in markdown 2208/head
Luis Pabón [Tue, 5 Aug 2014 18:51:16 +0000 (14:51 -0400)]
Renamed README to README.md to render in markdown

Signed-off-by: Luis Pabón <lpabon@redhat.com>
10 years agoDeveloper quick start guide
Luis Pabón [Tue, 5 Aug 2014 18:48:38 +0000 (14:48 -0400)]
Developer quick start guide

Signed-off-by: Luis Pabón <lpabon@redhat.com>
10 years agoenable info_log_level config option for rocksdb 2202/head
xinxin shu [Mon, 4 Aug 2014 22:53:36 +0000 (06:53 +0800)]
enable info_log_level config option for rocksdb

Signed-off-by: xinxin shu <xinxin.shu@intel.com>
10 years agoMerge pull request #2206 from ceph/wip-8875
John Wilkins [Tue, 5 Aug 2014 17:21:58 +0000 (10:21 -0700)]
Merge pull request #2206 from ceph/wip-8875

doc: be a bit more explicit about 'ceph-deploy new' in quickstart

Reviewed-by: John Wilkins <john.wilkins@inktank.com>
10 years agobe a bit more explicit about 'ceph-deploy new' in quickstart 2206/head
Alfredo Deza [Tue, 5 Aug 2014 16:51:33 +0000 (12:51 -0400)]
be a bit more explicit about 'ceph-deploy new' in quickstart

Signed-off-by: Alfredo Deza <alfredo.deza@inktank.com>
10 years agoMerge branch 'master' of github.com:ceph/ceph
Sage Weil [Tue, 5 Aug 2014 16:15:34 +0000 (09:15 -0700)]
Merge branch 'master' of github.com:ceph/ceph

10 years agoMerge remote-tracking branch 'gh/wip-8880'
Sage Weil [Tue, 5 Aug 2014 16:15:12 +0000 (09:15 -0700)]
Merge remote-tracking branch 'gh/wip-8880'

Conflicts:
src/osd/OSD.cc

10 years agoMerge pull request #2204 from osynge/wip-dont-mkdir-by-mistake2
Alfredo Deza [Tue, 5 Aug 2014 15:50:57 +0000 (11:50 -0400)]
Merge pull request #2204 from osynge/wip-dont-mkdir-by-mistake2

Do not make directories by mistake.

Reviewed-by: Alfredo Deza <adeza@redhat.com>
10 years agoMerge pull request #1883 from ceph/wip-msgr
Sage Weil [Tue, 5 Aug 2014 15:50:05 +0000 (08:50 -0700)]
Merge pull request #1883 from ceph/wip-msgr

messenger refactoring for xio

Reviewed-by: Greg Farnum <greg@inktank.com>
10 years agoDo not make directories by mistake. 2204/head
Owen Synge [Tue, 5 Aug 2014 15:28:16 +0000 (17:28 +0200)]
Do not make directories by mistake.

Rational: I found I had created a series of OSD directories under "/dev/" when disks I thought existed did not exist.
Warning: This change will be noticed by end users and may effect deployment infrastructures.

Signed-off-by: Owen Synge <osynge@suse.com>
10 years agoMerge pull request #2200 from theanalyst/typo
Sage Weil [Tue, 5 Aug 2014 14:57:51 +0000 (07:57 -0700)]
Merge pull request #2200 from theanalyst/typo

doc: typo s/loose/lose

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agopowerdns: Define a application variable when not invoked from Shell
Wido den Hollander [Tue, 5 Aug 2014 14:10:45 +0000 (16:10 +0200)]
powerdns: Define a application variable when not invoked from Shell

This allows to be run directly using mod_wsgi behind Apache.

10 years agodoc: typo s/loose/lose 2200/head
Abhishek Lekshmanan [Tue, 5 Aug 2014 05:05:03 +0000 (10:35 +0530)]
doc: typo s/loose/lose

Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
10 years agoMerge pull request #1875 from dachary/wip-8437
Sage Weil [Tue, 5 Aug 2014 00:41:53 +0000 (17:41 -0700)]
Merge pull request #1875 from dachary/wip-8437

erasure-code: benchmarking jerasure

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoadd annotation for rocksdb config option
xinxin shu [Mon, 4 Aug 2014 22:24:44 +0000 (06:24 +0800)]
add annotation for rocksdb config option

Signed-off-by: xinxin shu <xinxin.shu@intel.com>
10 years agoMerge pull request #2198 from ceph/wip-8998
Samuel Just [Mon, 4 Aug 2014 22:12:25 +0000 (15:12 -0700)]
Merge pull request #2198 from ceph/wip-8998

fix OSD SEGV in heartbeat()

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoosd: simplify dout_prefix macros 2198/head
Sage Weil [Mon, 4 Aug 2014 22:01:15 +0000 (15:01 -0700)]
osd: simplify dout_prefix macros

Use a get_osdmap_epoch() helper that is a bit lighter weight (by avoiding
copying around an OSDMapRef).

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoosd: reorder OSDService methods under proper dout_prefix macro
Sage Weil [Mon, 4 Aug 2014 21:57:28 +0000 (14:57 -0700)]
osd: reorder OSDService methods under proper dout_prefix macro

The dout_prefix for OSDService uses get_osdmap() to grab a shared_ptr for
the epoch printout.  The OSD one does not, and is not safe to run in all
thread contexts.

In particular, update_osd_stat() is run by the heartbeat thread and can
race with the shared_ptr itself being updated with a new map.

Ironically, if this were simply an OSDMap*, there would be no race since
the pointer is a single word and updates atomically.

Fix this, and any similar issues, by moving the OSDService methods up in
OSD.cc so that they use the safe dout macro.

Fixes: #8998
Backport: firefly (in a minimal form, I think!)
Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Mon, 4 Aug 2014 20:56:24 +0000 (13:56 -0700)]
Merge remote-tracking branch 'gh/next'

10 years agodoc/release-notes: make note about init-radosgw change
Sage Weil [Mon, 4 Aug 2014 20:48:06 +0000 (13:48 -0700)]
doc/release-notes: make note about init-radosgw change

This changed back in 524aee6f95f9c397b7c8508934f3c0577f9df1dd but
was not mentioned in the release notes.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agodoc: Added 'x' to monitor cap.
John Wilkins [Mon, 4 Aug 2014 18:47:58 +0000 (11:47 -0700)]
doc: Added 'x' to monitor cap.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
10 years agoMerge pull request #2166 from majianpeng/bug-fix
Samuel Just [Mon, 4 Aug 2014 17:33:16 +0000 (10:33 -0700)]
Merge pull request #2166 from majianpeng/bug-fix

os/FileJournal: When dump journal, using correctly seq avoid misjudging joural corrupt.

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoMerge pull request #2184 from majianpeng/fix2
Samuel Just [Mon, 4 Aug 2014 17:32:07 +0000 (10:32 -0700)]
Merge pull request #2184 from majianpeng/fix2

ECBackend: Don't directyly use get_recovery_chunk_size() in RecoveryOp::WRITING state

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoMerge pull request #2194 from majianpeng/fix1
Samuel Just [Mon, 4 Aug 2014 17:31:18 +0000 (10:31 -0700)]
Merge pull request #2194 from majianpeng/fix1

osd/ECBackend: clean up assert(r==0) in continue_recovery_op.

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoMerge pull request #2192 from ceph/wip-8891
Samuel Just [Mon, 4 Aug 2014 17:30:25 +0000 (10:30 -0700)]
Merge pull request #2192 from ceph/wip-8891

msg/SimpleMessenger: drop msgr lock when joining a Pipe

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agocls_rgw: fix object name of objects removed on object creation
Yehuda Sadeh [Wed, 30 Jul 2014 18:53:16 +0000 (11:53 -0700)]
cls_rgw: fix object name of objects removed on object creation

Fixes: #8972
Backport: firefly, dumpling

Reported-by: Patrycja Szabłowska <szablowska.patrycja@gmail.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 0f8929a68aed9bc3e50cf15765143a9c55826cd2)

10 years agorgw: need to pass need_to_wait for throttle_data()
Yehuda Sadeh [Sat, 2 Aug 2014 20:01:05 +0000 (13:01 -0700)]
rgw: need to pass need_to_wait for throttle_data()

need_to_wait wasn't passed into processor->throttle_data(). This was
broken in fix for #8937.

CID 1229541:    (PW.PARAM_SET_BUT_NOT_USED)

Backport: firefly

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit e93818df33286a2a7f73b593dc20da412db4e0a6)

10 years agorgw: call processor->handle_data() again if needed
Yehuda Sadeh [Sat, 26 Jul 2014 03:33:52 +0000 (20:33 -0700)]
rgw: call processor->handle_data() again if needed

Fixes: #8937
Following the fix to #8928 we end up accumulating pending data that
needs to be written. Beforehand it was working fine because we were
feeding it with the exact amount of bytes we were writing.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 0553890e79b43414cc0ef97ceb694c1cb5f06bbb)

10 years agoMerge pull request #2191 from ceph/wip-rgw-need-to-wait
Sage Weil [Mon, 4 Aug 2014 16:51:43 +0000 (09:51 -0700)]
Merge pull request #2191 from ceph/wip-rgw-need-to-wait

rgw: need to pass need_to_wait for throttle_data()

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2195 from apeters1971/wip-ec-isa-fast-xor
Loic Dachary [Mon, 4 Aug 2014 16:41:05 +0000 (18:41 +0200)]
Merge pull request #2195 from apeters1971/wip-ec-isa-fast-xor

EC-ISA: provide a 10% faster simple parity operation for (k, m=1)

Reviewed-by: Loic Dachary <loic@dachary.org>
10 years agoMerge pull request #2193 from ceph/wip-ceph-conf
Loic Dachary [Mon, 4 Aug 2014 16:28:49 +0000 (18:28 +0200)]
Merge pull request #2193 from ceph/wip-ceph-conf

ceph-conf: flush log on exit

Reviewed-by: Loic Dachary <loic@dachary.org>
10 years agoEC-ISA: provide a 10% faster simple parity operation for (k, m=1). Add simple parity... 2195/head
Andreas-Joachim Peters [Mon, 4 Aug 2014 13:03:32 +0000 (15:03 +0200)]
EC-ISA: provide a 10% faster simple parity operation for (k, m=1). Add simple parity unit test for k=4,m=1

10 years agoosd/ECBackend: clean up assert(r==0) in continue_recovery_op. 2194/head
Ma Jianpeng [Mon, 4 Aug 2014 10:00:28 +0000 (18:00 +0800)]
osd/ECBackend: clean up assert(r==0) in continue_recovery_op.

After the commit(d9106ce5e4437ab02), the assert(r==0) is no longer
necessary.

10 years agoerasure-code: HTML display of benchmark results 1875/head
Loic Dachary [Fri, 30 May 2014 13:24:25 +0000 (15:24 +0200)]
erasure-code: HTML display of benchmark results

The ceph_erasure_code_benchmark output is converted into a JSON series
suitable to display in HTML with the http://www.flotcharts.org/
library. A self contained copy of the HTML,JS,CSS files is included for
durability and can be used from the source tree with:

    CEPH_ERASURE_CODE_BENCHMARK=src/ceph_erasure_code_benchmark  \
    PLUGIN_DIRECTORY=src/.libs \
        qa/workunits/erasure-code/bench.sh fplot jerasure |
        tee qa/workunits/erasure-code/bench.js

and display with:

    firefox qa/workunits/erasure-code/bench.html

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agoCOPYING: Cloudwatt copyright is inline
Loic Dachary [Tue, 27 May 2014 19:45:19 +0000 (21:45 +0200)]
COPYING: Cloudwatt copyright is inline

Remove partial list of contributions since Cloudwatt copyright has been
placed in the copyright notices of the files where works covered by
copyright have been included.

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agoerasure-code: rework benchmark suite
Loic Dachary [Tue, 27 May 2014 17:25:22 +0000 (19:25 +0200)]
erasure-code: rework benchmark suite

Expand the default suite to enumerate all cases that are relevant to the
current code base so that it is easier to consume. Namely it means

 * iterating over object sizes of 4KB (what is used by default) and
   1MB (what was previous benchmarked)
 * grouping results in series that would make sense to plot to get the
   behavior of a given technique for a series of K/M values and all
   possible erasures.

Instead of specifying the iterations to run, set the size of the total
data set to be exercised and compute the iterations by dividing it by
the object size. Since the object size varies, it is impractical to
preset the number of iterations and get meaningful results.

The PARAMETERS environment variable is added to enable the caller to
inject --parameter jerasure-variant=generic, for instance.

The packets size is calculated based on the other parameters. The
options are limited when packets are small (4KB) and it would not make a
real difference to give control over it. The packet size is capped to
a maximum of 3100 bytes which is roughly what has been found to be an
optimal value for large packets (1MB).

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agoerasure-code: properly indent ErasureCodePluginSelectJerasure.cc
Loic Dachary [Fri, 30 May 2014 12:33:59 +0000 (14:33 +0200)]
erasure-code: properly indent ErasureCodePluginSelectJerasure.cc

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agoerasure-code: control jerasure plugin variant selection
Loic Dachary [Fri, 30 May 2014 12:33:15 +0000 (14:33 +0200)]
erasure-code: control jerasure plugin variant selection

The jerasure-variant parameter is interpreted as the name of the plugin
variant to be loaded regardless of the available CPU features. The
values can be sse3, sse4, generic. It is undocumented and meant for
benchmarking purposes, primarily to force the generic plugin to be
loaded when the sse4 would be chosen.

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agoerasure-code: reduce jerasure verbosity
Loic Dachary [Tue, 27 May 2014 16:34:10 +0000 (18:34 +0200)]
erasure-code: reduce jerasure verbosity

Only output a message about adjusting the buffer size when it is
adjusted, not when the size does not need adjustment.

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agoerasure-code: implement alignment on chunk sizes 1890/head
Loic Dachary [Tue, 27 May 2014 16:40:45 +0000 (18:40 +0200)]
erasure-code: implement alignment on chunk sizes

jerasure expects chunk sizes that are aligned on the largest possible
vector size that could be used by SSE instructions, when available (
LARGEST_VECTOR_WORDSIZE == 16 bytes ).

For techniques derived from Cauchy, encoding and decoding is done by
subdividing the chunk into packets of packetsize bytes. The operations
are done w * packetsize bytes at a time. It follows that each chunk must
have a size that is a multiple of w * packetsize bytes.

For techniques derived from Vandermonde, it is enough for a chunk to be
a multiple of w * LARGEST_VECTOR_WORDSIZE.

ErasureCodeJerasure::get_alignment returns a size alignment constraint
that has to be enforced as a multiple of the object size. The resulting
object size then has to match the chunk constraints described above
although they have no relationship with K. For Cauchy, it leads to
excessive padding, making it impossible to set sensible parameters for
when the object size is small.

When the per_chunk_alignement data member is true, the semantic of
ErasureCodeJerasure::get_alignment is changed to return a size alignment
constraint to be enforced as a multiple of the chunk size. The
ErasureCodeJerasure::get_chunk_size method is modified to use the new
semantic when appropriate.

The jerasure-per-chunk-alignement parameter is parsed to set
per_chunk_alignement for the Vandermonde and Cauchy techniques.

The memory address of a chunk is implicitly aligned to a page boundary
because it is allocated with buffer::create_page_aligned.

http://tracker.ceph.com/issues/8475 Fixes: #8475

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agoerasure-code: cauchy techniques allow w 8,16,32
Loic Dachary [Tue, 27 May 2014 16:36:09 +0000 (18:36 +0200)]
erasure-code: cauchy techniques allow w 8,16,32

Enforce the restriction at initialization time, the same way it is done
for Reed Solomon. Choosing a w value different from 8,16,32 will lead to
memory corruption that cannot easily be traced to the cause.

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agomailmap: sort entries 2165/head
Loic Dachary [Mon, 4 Aug 2014 06:52:03 +0000 (08:52 +0200)]
mailmap: sort entries

to help avoid duplicates (found one)

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agomailmap: Tommi Virtanen is not with Red Hat
Loic Dachary [Mon, 4 Aug 2014 06:50:22 +0000 (08:50 +0200)]
mailmap: Tommi Virtanen is not with Red Hat

The entry was added by s/inktank/redhat/ and did not acknowledge that
Tommi Virtanen left Inktank before it was acquired.

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agomailmap: João Eduardo Luís name normalization
Loic Dachary [Thu, 31 Jul 2014 12:22:45 +0000 (18:07 +0545)]
mailmap: João Eduardo Luís name normalization

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agomailmap: Sebastien Ponce name normalization
Loic Dachary [Wed, 30 Jul 2014 13:02:59 +0000 (19:02 +0600)]
mailmap: Sebastien Ponce name normalization

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agomailmap: Brian Rak affiliation
Loic Dachary [Wed, 30 Jul 2014 03:54:03 +0000 (09:54 +0600)]
mailmap: Brian Rak affiliation

and name normalization

Reviewed-by: Brian Rak <dn@devicenull.org>
Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agomailmap: George Ryall affiliation
Loic Dachary [Wed, 30 Jul 2014 03:52:24 +0000 (09:52 +0600)]
mailmap: George Ryall affiliation

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agomailmap: Stephen Jahl affiliation
Loic Dachary [Wed, 30 Jul 2014 03:52:04 +0000 (09:52 +0600)]
mailmap: Stephen Jahl affiliation

Reviewed-by: Stephen Jahl <stephenjahl@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agomailmap: Adam Crume affiliation
Loic Dachary [Wed, 30 Jul 2014 03:51:34 +0000 (09:51 +0600)]
mailmap: Adam Crume affiliation

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agomailmap: Accela Zhao affiliation
Loic Dachary [Wed, 30 Jul 2014 06:03:22 +0000 (12:03 +0600)]
mailmap: Accela Zhao affiliation

Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agomailmap: Kevin Cox affiliation
Loic Dachary [Wed, 30 Jul 2014 03:51:11 +0000 (09:51 +0600)]
mailmap: Kevin Cox affiliation

Reviewed-by: Kevin Cox <kevincox@kevincox.ca>
Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agomailmap: Ma Jianpeng affiliation
Loic Dachary [Wed, 30 Jul 2014 03:50:45 +0000 (09:50 +0600)]
mailmap: Ma Jianpeng affiliation

and name normalization

Reviewed-by: Ma Jianpeng <jianpeng.ma@intel.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
10 years agoceph-conf: flush log on exit 2193/head
Sage Weil [Mon, 4 Aug 2014 04:00:37 +0000 (21:00 -0700)]
ceph-conf: flush log on exit

This makes it deterministic whether we output

2014-08-03 20:59:45.482614 4036c80 -1 did not load config file, using default settings.

or not, and will make the unit tests stop intermittently failing.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoECBackend: Don't directly use get_recovery_chunk_size() in RecoveryOp::WRITING state. 2184/head
Ma Jianpeng [Wed, 30 Jul 2014 03:03:17 +0000 (11:03 +0800)]
ECBackend: Don't directly use get_recovery_chunk_size() in RecoveryOp::WRITING state.

We cannot guarantee that conf->osd_recovery_max_chunk don't change when
recoverying a erasure object.
If change between RecoveryOp::READING and RecoveryOp::WRITING, it can cause this bug:

2014-07-30 10:12:09.599220 7f7ff26c0700 -1 osd/ECBackend.cc: In function
'void ECBackend::continue_recovery_op(ECBackend::RecoveryOp&,
RecoveryMessages*)' thread 7f7ff26c0700 time 2014-07-30 10:12:09.596837
osd/ECBackend.cc: 529: FAILED assert(pop.data.length() ==
sinfo.aligned_logical_offset_to_chunk_offset(
after_progress.data_recovered_to -
op.recovery_progress.data_recovered_to))

 ceph version 0.83-383-g3cfda57
(3cfda577b15039cb5c678b79bef3e561df826ed1)
 1: (ECBackend::continue_recovery_op(ECBackend::RecoveryOp&,RecoveryMessages*)+0x1a50) [0x928070]
 2: (ECBackend::handle_recovery_read_complete(hobject_t const&,
boost::tuples::tuple<unsigned long, unsigned long, std::map<pg_shard_t,
ceph::buffer::list, std::less<pg_shard_t>,
std::allocator<std::pair<pg_shard_t const, ceph::buffer::list> > >,
boost::tuples::null_type, boost::tuples::null_type,
boost::tuples::null_type, boost::tuples::null_type,
boost::tuples::null_type, boost::tuples::null_type,
boost::tuples::null_type>&, boost::optional<std::map<std::string,
ceph::buffer::list, std::less<std::string>,
std::allocator<std::pair<std::string const, ceph::buffer::list> > > >,
RecoveryMessages*)+0x90c) [0x92952c]
 3: (OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*,
ECBackend::read_result_t&>&)+0x121) [0x938481]
 4: (GenContext<std::pair<RecoveryMessages*,
ECBackend::read_result_t&>&>::complete(std::pair<RecoveryMessages*,
ECBackend::read_result_t&>&)+0x9) [0x929d69]
 5: (ECBackend::complete_read_op(ECBackend::ReadOp&,RecoveryMessages*)+0x63) [0x91c6e3]
 6: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&,RecoveryMessages*)+0x96d) [0x920b4d]
 7: (ECBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x17e)[0x92884e]
 8: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&,ThreadPool::TPHandle&)+0x23b) [0x7b34db]
 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x428)
[0x638d58]
 10: (OSD::ShardedOpWQ::_process(unsigned int,ceph::heartbeat_handle_d*)+0x346) [0x6392f6]
 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ce)[0xa5caae]
 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xa5ed00]
 13: (()+0x8182) [0x7f800b5d3182]
 14: (clone()+0x6d) [0x7f800997430d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

So we only get the get_recovery_chunk_size() at RecoverOp::READING and
record it using RecoveryOp::extent_requested.

Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
10 years agomsg/SimpleMessenger: drop msgr lock when joining a Pipe 2192/head
Sage Weil [Mon, 4 Aug 2014 01:26:34 +0000 (18:26 -0700)]
msg/SimpleMessenger: drop msgr lock when joining a Pipe

Avoid this deadlock:

- a fault
- delay thread entry gets a fast dispatch message
 - drops delay_lock
 - calls into fast_dispatch
- reaper tries to reap the pipe
 - pipe->join()
  - delay_thread->join()
   - blocks waiting for delay_thread to exit
- delay thread / fast dispatch blocks on msgr->lock trying to mark_down

The solution is to drop the msgr lock while joining the thread.  This will
allow the join() to complete.  Adjust the reaper thread to recheck the
exit condition since the lock may have been dropped.  The other two callers
do not care.

Fixes: #8891
Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoos/MemStore: fix lock leak
Sage Weil [Sun, 3 Aug 2014 18:23:33 +0000 (11:23 -0700)]
os/MemStore: fix lock leak

CID 1228868 (#2-1 of 2): Missing unlock (LOCK)
12. missing_unlock: Returning without unlocking oc->lock.L.

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agorgw: need to pass need_to_wait for throttle_data() 2191/head
Yehuda Sadeh [Sat, 2 Aug 2014 20:01:05 +0000 (13:01 -0700)]
rgw: need to pass need_to_wait for throttle_data()

need_to_wait wasn't passed into processor->throttle_data(). This was
broken in fix for #8937.

CID 1229541:    (PW.PARAM_SET_BUT_NOT_USED)

Backport: firefly

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
10 years agodoc/release-notes: fix syntax error
Sage Weil [Sat, 2 Aug 2014 04:19:26 +0000 (21:19 -0700)]
doc/release-notes: fix syntax error

Attempt 2...

ERROR: /srv/autobuild-ceph/gitbuilder.git/build/doc/release-notes.rst:22: Unknown target name: "leveldb".

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2188 from wonzhq/obj-mtime
Sage Weil [Sat, 2 Aug 2014 02:27:01 +0000 (19:27 -0700)]
Merge pull request #2188 from wonzhq/obj-mtime

osd: add local_mtime to struct object_info_t

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoos/KeyValueStore: clean up operator<< for KVSuperBlock
Sage Weil [Sat, 2 Aug 2014 02:24:26 +0000 (19:24 -0700)]
os/KeyValueStore: clean up operator<< for KVSuperBlock

Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2174 from yuyuyu101/kvstore-superblock
Sage Weil [Sat, 2 Aug 2014 02:23:35 +0000 (19:23 -0700)]
Merge pull request #2174 from yuyuyu101/kvstore-superblock

Kvstore superblock

Reviewed-by: Sage Weil <sage@redhat.com>
10 years agoMerge pull request #2169 from ceph/wip-double-pc
Sage Weil [Sat, 2 Aug 2014 01:01:43 +0000 (18:01 -0700)]
Merge pull request #2169 from ceph/wip-double-pc

mon: s/%%/%/

Realized where these came from; it was an accident.

10 years agoMerge branch 'wip-cache-second'
Sage Weil [Fri, 1 Aug 2014 22:37:33 +0000 (15:37 -0700)]
Merge branch 'wip-cache-second'

Reviewed-by: Samuel Just <sam.just@inktank.com>
10 years agoceph_test_rados_api_tier: test promote-on-second-read behavior
Signed-off-by: Zhiqiang Wang [Thu, 31 Jul 2014 22:49:44 +0000 (15:49 -0700)]
ceph_test_rados_api_tier: test promote-on-second-read behavior

Signed-off-by: Zhiqiang Wang <wonzhq@hotmail.com>
Signed-off-by: Sage Weil <sage@redhat.com>
10 years agoosd: promotion on 2nd read for cache tiering
Zhiqiang Wang [Mon, 28 Jul 2014 06:06:06 +0000 (14:06 +0800)]
osd: promotion on 2nd read for cache tiering

Signed-off-by: Zhiqiang Wang <wonzhq@hotmail.com>