]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agodoc: update ceph.conf examples about btrfs default
Sage Weil [Fri, 21 Dec 2012 22:04:30 +0000 (14:04 -0800)]
doc: update ceph.conf examples about btrfs default

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodoc: Moved path to individual OSD entires.
John Wilkins [Fri, 21 Dec 2012 18:15:38 +0000 (10:15 -0800)]
doc: Moved path to individual OSD entires.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Fri, 21 Dec 2012 01:43:51 +0000 (17:43 -0800)]
Merge remote-tracking branch 'gh/next'

12 years agoMerge remote-tracking branch 'upstream/wip_notify' into next
Samuel Just [Fri, 21 Dec 2012 00:23:23 +0000 (16:23 -0800)]
Merge remote-tracking branch 'upstream/wip_notify' into next

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agocephtool: mention ceph osd ls, fix ceph osd tell N bench
Dan Mick [Thu, 20 Dec 2012 23:31:21 +0000 (15:31 -0800)]
cephtool: mention ceph osd ls, fix ceph osd tell N bench

Add ceph osd ls to help; make help for ceph osd tell N bench look
more like injectargs, which says <osd-id or *> to make it clear you
can benchmark all osds simultaneously

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agorgw: remove noisy log message
Yehuda Sadeh [Thu, 20 Dec 2012 23:32:59 +0000 (15:32 -0800)]
rgw: remove noisy log message

No need for that log message.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: fix daemonize initialization
Yehuda Sadeh [Thu, 20 Dec 2012 23:21:48 +0000 (15:21 -0800)]
rgw: fix daemonize initialization

Just call the common daemonize function. Otherwise we end up
not initializng stdout / stderr correctly.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agolog: fix flush/signal race
Sage Weil [Thu, 20 Dec 2012 21:48:06 +0000 (13:48 -0800)]
log: fix flush/signal race

We need to signal the cond in the same interval where we hold the lock
*and* modify the queue.  Otherwise, we can have a race like:

 queue has 1 item, max is 1.
 A: enter submit_entry, signal cond, wait on condition
 B: enter submit_entry, signal cond, wait on condition
 C: flush wakes up, flushes 1 previous item
 A: retakes lock, enqueues something, exits
 B: retakes lock, condition fails, waits
  -> C is never woken up as there are 2 items waiting

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agoReplicatedPG::remove_notify : don't leak the notify object
Samuel Just [Thu, 20 Dec 2012 21:29:09 +0000 (13:29 -0800)]
ReplicatedPG::remove_notify : don't leak the notify object

Following remove_notify, there are no other references to
notif, delete it.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoOSD,ReplicatedPG: do not track notifies on the session
Samuel Just [Thu, 20 Dec 2012 21:23:27 +0000 (13:23 -0800)]
OSD,ReplicatedPG: do not track notifies on the session

handle_notify_timeout and remove_notify currently do not clean up this
state leaving dangling Notification*.  Further, we only use this mapping
in unwatch in order to determine which notifies to update. We can
accomplish the same thing by iterating through the obc->notifs mapping
since all notifications relevant for a given watch would have been for
the same obc as the watch.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agodoc: Added package and repo links for Apache and FastCGI. Added SSL enable too.
John Wilkins [Thu, 20 Dec 2012 20:59:58 +0000 (12:59 -0800)]
doc: Added package and repo links for Apache and FastCGI. Added SSL enable too.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Fixed restructuredText usage.
John Wilkins [Thu, 20 Dec 2012 20:59:22 +0000 (12:59 -0800)]
doc: Fixed restructuredText usage.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Removed foo. Apparently myimage was added and foo not removed.
John Wilkins [Thu, 20 Dec 2012 19:39:41 +0000 (11:39 -0800)]
doc: Removed foo. Apparently myimage was added and foo not removed.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge branch 'next'
Sage Weil [Thu, 20 Dec 2012 19:07:10 +0000 (11:07 -0800)]
Merge branch 'next'

12 years agoMerge remote-tracking branch 'gh/wip-cephtool' into next
Sage Weil [Thu, 20 Dec 2012 19:04:29 +0000 (11:04 -0800)]
Merge remote-tracking branch 'gh/wip-cephtool' into next

12 years agoMerge branch 'wip-build-fixes' into next
Sage Weil [Thu, 20 Dec 2012 18:49:34 +0000 (10:49 -0800)]
Merge branch 'wip-build-fixes' into next

12 years agorgw: configurable exit timeout
Yehuda Sadeh [Tue, 18 Dec 2012 21:53:09 +0000 (13:53 -0800)]
rgw: configurable exit timeout

Fixes: #3638
rgw exit timeout secs : number of seconds to wait for process
to exit cleanly before forcing exit. If set to 0, it'l wait
indefinitely.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: don't try to assign content type if not found
Yehuda Sadeh [Wed, 19 Dec 2012 18:21:57 +0000 (10:21 -0800)]
rgw: don't try to assign content type if not found

Fixes: #3648
Cannot assign a NULL pointer into stl string. This is only
relevant to swift, when uploading an object without specifying
content type, and when the suffix cannot be determined.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-crushtool' into next
Sage Weil [Thu, 20 Dec 2012 16:53:19 +0000 (08:53 -0800)]
Merge remote-tracking branch 'gh/wip-crushtool' into next

Reviewed-by: Caleb Miles <caleb.miles@inktank.com>
12 years agorgw: don't initialize keystone if not set up
Yehuda Sadeh [Thu, 20 Dec 2012 00:59:43 +0000 (16:59 -0800)]
rgw: don't initialize keystone if not set up

Fixes: #3653
No need to initialize keystone, including the keystone
revocation thread which was verbose if key stone was
not set up. This removes some unuseful errors from the
log.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: remove useless configurable, fix swift auth error handling
Yehuda Sadeh [Wed, 19 Dec 2012 22:34:53 +0000 (14:34 -0800)]
rgw: remove useless configurable, fix swift auth error handling

Fixes: #3649
No need to have an extra configurable to use keystone. Use keystone
whenever keystone url has been specified. Also, fix a bad error
handling that turned a failure to authenticate into successfully
authenticating a bad user.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoMerge remote-tracking branch 'upstream/wip_pg_temp' into next
Samuel Just [Thu, 20 Dec 2012 00:50:11 +0000 (16:50 -0800)]
Merge remote-tracking branch 'upstream/wip_pg_temp' into next

Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Luis <joao.luis@inktank.com>
12 years agodoc: Modified the demo configuration file for Bobtail.
John Wilkins [Wed, 19 Dec 2012 22:22:43 +0000 (14:22 -0800)]
doc: Modified the demo configuration file for Bobtail.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added Gateway Quick Start.
John Wilkins [Wed, 19 Dec 2012 22:02:19 +0000 (14:02 -0800)]
doc: Added Gateway Quick Start.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added Gateway Quick Start configuration file.
John Wilkins [Wed, 19 Dec 2012 22:02:02 +0000 (14:02 -0800)]
doc: Added Gateway Quick Start configuration file.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoUpdated Getting Started index to include Gateway Quick Start.
John Wilkins [Wed, 19 Dec 2012 22:01:30 +0000 (14:01 -0800)]
Updated Getting Started index to include Gateway Quick Start.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added REST Gateway link to 5-minute Quick Start.
John Wilkins [Wed, 19 Dec 2012 22:00:55 +0000 (14:00 -0800)]
doc: Added REST Gateway link to 5-minute Quick Start.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated the 5-minute Quick Start for Bobtail.
John Wilkins [Wed, 19 Dec 2012 21:52:20 +0000 (13:52 -0800)]
doc: Updated the 5-minute Quick Start for Bobtail.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated Block Device Quick Start for Bobtail.
John Wilkins [Wed, 19 Dec 2012 21:47:11 +0000 (13:47 -0800)]
doc: Updated Block Device Quick Start for Bobtail.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated CephFS Quick Start for Bobtail.
John Wilkins [Wed, 19 Dec 2012 21:46:28 +0000 (13:46 -0800)]
doc: Updated CephFS Quick Start for Bobtail.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added authentication and mkcephfs settings for Bobtail.
John Wilkins [Wed, 19 Dec 2012 21:45:34 +0000 (13:45 -0800)]
doc: Added authentication and mkcephfs settings for Bobtail.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added javascript code block tag.
John Wilkins [Wed, 19 Dec 2012 21:36:17 +0000 (13:36 -0800)]
doc: Added javascript code block tag.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoOSDMonitor: remove temp pg mappings with no up pgs
Samuel Just [Wed, 19 Dec 2012 18:33:40 +0000 (10:33 -0800)]
OSDMonitor: remove temp pg mappings with no up pgs

Otherwise, the pg won't be validly mapped until one of the temp
pgs comes back up.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoOSDMap: make apply_incremental take a const argument
Samuel Just [Wed, 19 Dec 2012 18:32:52 +0000 (10:32 -0800)]
OSDMap: make apply_incremental take a const argument

This requires us to copy bufferlists in two cases since bufferlist
does not have a const interator at this time.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agocephtool: add qa workunit
Sage Weil [Wed, 19 Dec 2012 16:37:42 +0000 (08:37 -0800)]
cephtool: add qa workunit

A few basic sanity checks, including a tell on a down osd.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph.spec.in: Improve finding location of jni.h for sles11.
Gary Lowell [Wed, 19 Dec 2012 05:00:15 +0000 (21:00 -0800)]
ceph.spec.in:  Improve finding location of jni.h for sles11.

Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
12 years agoosd: implement 'version' tell command
Sage Weil [Wed, 19 Dec 2012 04:08:42 +0000 (20:08 -0800)]
osd: implement 'version' tell command

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph.spec.in: Add packages for libcephfs-jni and libcephfs-java
Gary Lowell [Wed, 19 Dec 2012 03:40:32 +0000 (19:40 -0800)]
ceph.spec.in:  Add packages for libcephfs-jni and libcephfs-java

Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
12 years agoceph: report error string to stderr, not stdout
Sage Weil [Wed, 19 Dec 2012 03:21:24 +0000 (19:21 -0800)]
ceph: report error string to stderr, not stdout

If we return an error, send the message to stderr.  This makes things
more easily scriptable because error messages won't take the place of
expected output.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph: fix error reporting when tell target is invalid or down
Sage Weil [Wed, 19 Dec 2012 03:20:06 +0000 (19:20 -0800)]
ceph: fix error reporting when tell target is invalid or down

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: 'ceph osd ls'
Sage Weil [Wed, 19 Dec 2012 03:11:49 +0000 (19:11 -0800)]
mon: 'ceph osd ls'

List osd ids that exist.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoOSDMap::dump: tag pg_temp mappings with pgid
Samuel Just [Wed, 19 Dec 2012 00:50:24 +0000 (16:50 -0800)]
OSDMap::dump: tag pg_temp mappings with pgid

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: configurable exit timeout
Yehuda Sadeh [Tue, 18 Dec 2012 21:53:09 +0000 (13:53 -0800)]
rgw: configurable exit timeout

Fixes: #3638
rgw exit timeout secs : number of seconds to wait for process
to exit cleanly before forcing exit. If set to 0, it'l wait
indefinitely.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agocrushtool: nicer error message on extra args
Sage Weil [Mon, 17 Dec 2012 22:44:35 +0000 (14:44 -0800)]
crushtool: nicer error message on extra args

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocrushtool: only dump usage on -h|--help
Sage Weil [Mon, 17 Dec 2012 19:21:55 +0000 (11:21 -0800)]
crushtool: only dump usage on -h|--help

Instead, output a useful error message.

Fix error code to be a success.

Add test for the output usage.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/testing' into next
Sage Weil [Tue, 18 Dec 2012 00:51:20 +0000 (16:51 -0800)]
Merge remote-tracking branch 'gh/testing' into next

12 years agoceph.spec.in: Update pre-reqs for ceph-fuse pacakge.
Gary Lowell [Tue, 18 Dec 2012 00:38:19 +0000 (16:38 -0800)]
ceph.spec.in:  Update pre-reqs for ceph-fuse pacakge.

12 years agoRevert "objecter: don't use new tid when retrying notifies"
Sage Weil [Tue, 18 Dec 2012 00:29:19 +0000 (16:29 -0800)]
Revert "objecter: don't use new tid when retrying notifies"

This reverts commit c3107009f66bc06b5e14c465142e14120f9a4412.

This appears to be causing problems in the objecter by corrupting
the stack.  Until that is resolved, let's revert.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: OSDMonitor: add option 'mon_max_pool_pg_num' and limit 'pg_num' accordingly
Joao Eduardo Luis [Mon, 17 Dec 2012 18:58:16 +0000 (18:58 +0000)]
mon: OSDMonitor: add option 'mon_max_pool_pg_num' and limit 'pg_num' accordingly

Instead of having a hardcoded default, use a configurable one. It is
limited to 65536 until future testing guarantees there is no side-effects
of increasing it past this value, but by being adjustable the user still
has the freedom to specify whatever maximum value he wants.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoosd: debug EMSGSIZE / OSD_WRITETOOBIG
Sage Weil [Mon, 17 Dec 2012 17:20:07 +0000 (09:20 -0800)]
osd: debug EMSGSIZE / OSD_WRITETOOBIG

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocrushtool: add --set-chooseleaf-descend-once to help
Sage Weil [Mon, 17 Dec 2012 19:14:44 +0000 (11:14 -0800)]
crushtool: add --set-chooseleaf-descend-once to help

We forgot to update this in 88f218181a9e6d2292e2697fc93797d0f6d6e5dc.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodoc: fix typo in config file
Josh Durgin [Mon, 17 Dec 2012 15:57:34 +0000 (07:57 -0800)]
doc: fix typo in config file

The option is host, not hostname

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoMakefiles: Two new packages needed in the debian build depdencies.
Gary Lowell [Sun, 16 Dec 2012 06:38:58 +0000 (22:38 -0800)]
Makefiles:  Two new packages needed in the debian build depdencies.

The ceph test programs that are now being built by default require the junit
and libboost-program-options packages.  These have been added to the build
dependecies in the debian control file.

Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
12 years agoRefactor rule file to separate arch/indep builds.
James Page [Wed, 12 Dec 2012 22:06:16 +0000 (22:06 +0000)]
Refactor rule file to separate arch/indep builds.

Prior to the ceph fs java bindings, all packages where
architecture depdendent so the packaging rules file
worked OK; this fixes up the binary-indep/arch targets
to split the builds of architecture dependent and
independent files.

Signed-off-by: James Page <james.page@ubuntu.com>
12 years agoosdc/Objecter: prevent pool dne check from invalidating scan_requests iterator
Sage Weil [Sun, 16 Dec 2012 01:45:25 +0000 (17:45 -0800)]
osdc/Objecter: prevent pool dne check from invalidating scan_requests iterator

We iterate over ops and, if the pool dne and other conditions are true,
we will immediately return ENOENT and cancel an op.  Increment the
iterator at the top of the loop to avoid invalidating it.

We also need to switch to a map<>, because hash_map<> mutations may
invalidate any/all iterators.

Fixes: #3613
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Sat, 15 Dec 2012 01:08:35 +0000 (17:08 -0800)]
Merge remote-tracking branch 'gh/next'

12 years agoqa: add a workunit for fsync-tester
Greg Farnum [Fri, 14 Dec 2012 22:34:35 +0000 (14:34 -0800)]
qa: add a workunit for fsync-tester

It turns out that our suites don't exercise fsync, at least not very much
(I couldn't find it in all the places I looked for it). This tester
was written by Ted T'so and updated by Chris Mason; I just made it
work on a smaller dataset (256MB) because 8GB against a small cluster takes
more time than we want to wait.

Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agotest: remove underscores from cephfs test names
Noah Watkins [Wed, 28 Nov 2012 19:49:53 +0000 (11:49 -0800)]
test: remove underscores from cephfs test names

Google Test documentation strongly suggests avoiding underscores from
unit test names to avoid accidental conflicts with their macro naming
scheme.

http://code.google.com/p/googletest/wiki/FAQ#Why_should_not_test_case_names_and_test_names_contain_underscore

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agoMerge branch 'wip_watch' into next
Sage Weil [Fri, 14 Dec 2012 22:32:44 +0000 (14:32 -0800)]
Merge branch 'wip_watch' into next

12 years agolockdep: Decrease lockdep backtrace skip by 1
Sam Lang [Fri, 14 Dec 2012 03:22:37 +0000 (17:22 -1000)]
lockdep:  Decrease lockdep backtrace skip by 1

Skipping the top 4 (it starts at 0) calls in the
backtrace actually skips the call that does the lock.
Skip 3 instead.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agomkcephfs: fix == -> =
Sage Weil [Fri, 14 Dec 2012 22:16:31 +0000 (14:16 -0800)]
mkcephfs: fix == -> =

Another bashism.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomap-unmap.sh: use udevadm settle for synchronization
Alex Elder [Fri, 14 Dec 2012 21:58:39 +0000 (15:58 -0600)]
map-unmap.sh: use udevadm settle for synchronization

This script was heuristically using short sleep commands in order to
give udev activity time to complete.

There's a command "udevadm settle" which actually looks at the udev
queue and waits until its processing is done.  Much, much better.

This rearranges the get_id function a bit too, breaking it into one
function that gets the id and another that loops back and tries
again after a short delay in the event the get_id fails.

Signed-off-by: Alex Elder <elder@inktank.com>
12 years agoMerge branch 'wip-upstart' into next
Sage Weil [Fri, 14 Dec 2012 21:51:19 +0000 (13:51 -0800)]
Merge branch 'wip-upstart' into next

Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoceph-disk-activate: mark dir as upstart-managed
Sage Weil [Fri, 14 Dec 2012 21:49:14 +0000 (13:49 -0800)]
ceph-disk-activate: mark dir as upstart-managed

Mark the directory so that upstart will manage the daemon.  Eventually,
this should be generalized to allow ceph-disk-* usage with other init
systems.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoupstart: make starter jobs consistent
Sage Weil [Fri, 14 Dec 2012 21:40:58 +0000 (13:40 -0800)]
upstart: make starter jobs consistent

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoupstart: only start when 'upstart' file exists in daemon dir
Sage Weil [Fri, 14 Dec 2012 21:40:25 +0000 (13:40 -0800)]
upstart: only start when 'upstart' file exists in daemon dir

We need to distinguish between daemons managed by upstart and sysvinit
(and, eventually, systemd).  Only start daemons when 'upstart' is present.

Note that sysvinit will only start daemons when the 'host = ...' line is
in ceph.conf, so there is a similar "opt-in".

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoReplicatedPG: use default priority for Backfill messages
Samuel Just [Fri, 14 Dec 2012 20:46:43 +0000 (12:46 -0800)]
ReplicatedPG: use default priority for Backfill messages

Backfill messages modify the stats on the replica and therefore
must be sent with the same priority as sub_op_modify to ensure
ordering.  Using recovery_op_priority caused the following
sequence:

1) Primary(1) sends MOSDPGBackfill FINISH with updated stats (v1)
2) Primary(1) sends SubOp modify for new client op with stats (v2)
3) Replica(2) receives SubOp with stats (v2)
4) Replica(2) receives MOSDPGBackfill FINISH with stats (v1)
5) Replica(2) responds and Primary(1) resets pgtemp making
    Replica(2) Primary(2)
6) PG stats on Primary(2) several ops old.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoReplicatedPG: do not use priority from client op
Samuel Just [Fri, 14 Dec 2012 20:43:08 +0000 (12:43 -0800)]
ReplicatedPG: do not use priority from client op

There are internal ordering requirements which may be sensitive
to assigned priority.  We don't want a mix of priorities from
old clients with priorities from new clients causing trouble.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'wip-3610' into next
Sam Lang [Fri, 14 Dec 2012 19:00:24 +0000 (09:00 -1000)]
Merge branch 'wip-3610' into next

12 years agoFix comment in sample.ceph.conf
Greg Farnum [Fri, 14 Dec 2012 17:53:30 +0000 (09:53 -0800)]
Fix comment in sample.ceph.conf

Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agocrush-map.rst: add info about multiple crush heirarchies
Samuel Just [Mon, 10 Dec 2012 22:17:56 +0000 (14:17 -0800)]
crush-map.rst: add info about multiple crush heirarchies

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoclient: Add config option to inject sleep for tick
Sam Lang [Fri, 14 Dec 2012 03:23:27 +0000 (17:23 -1000)]
client: Add config option to inject sleep for tick

Testing the tick delay with a fork/suspend is causing
corruption in the lockdep code.  This approach uses
a config option to sleep the tick thread for a number
of seconds, avoiding the entire fork/suspend mess.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agorbd.py: check for new librbd methods before use
Josh Durgin [Tue, 11 Dec 2012 06:34:05 +0000 (22:34 -0800)]
rbd.py: check for new librbd methods before use

This way attempting to use format 2 images works when you upgrade the
python bindings before librbd, and attempting to use functions
that librbd does not have results in more understandable errors.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoosd: up != acting okay on mkpg
Sage Weil [Fri, 14 Dec 2012 00:26:43 +0000 (16:26 -0800)]
osd: up != acting okay on mkpg

This can happen when:

 - mon sends create pg
 - it gets created
 - osd remaps the pg to a different osd
     but osd does not update pg status to the mon
 - mkpg resent to the new osd

or something along those lines.  It seems unusual, but in the end who
really cares why the mon doesn't know about the pg creation yet.

Note that this check was added in the initial commit where acting/up was
added; there is no specific condition of concern we are protecting against.

Instead, ignore the message.  We'll get a query soon anwyay.

This 'fixes' #3614.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agomon: OSDMonitor: don't allow creation of pools with > 65535 pgs
Joao Eduardo Luis [Thu, 13 Dec 2012 23:34:23 +0000 (23:34 +0000)]
mon: OSDMonitor: don't allow creation of pools with > 65535 pgs

There are some limitations to the number of possible pg's per pool, and
by allowing the 'osd pool create' command to succeed, we were making room
to some anomalous behavior.

Fixes: #3617
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agorbd: handle images disappearing while in ls -l
Dan Mick [Thu, 13 Dec 2012 22:06:17 +0000 (14:06 -0800)]
rbd: handle images disappearing while in ls -l

rbd.list() returns a list of names, but nothing stops them from
going away before rbd.open(); check for ENOENT and ignore if that
happens; warn on other errors

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agorgw_op: enforce minimum part size in multi-part uploads
caleb miles [Tue, 4 Dec 2012 21:36:17 +0000 (16:36 -0500)]
rgw_op: enforce minimum part size in multi-part uploads

Signed-off-by: caleb miles <caleb.miles@inktank.com>
12 years agomds: document EXCL -> (MIX or SYNC) transition decision
Sage Weil [Thu, 13 Dec 2012 20:47:42 +0000 (12:47 -0800)]
mds: document EXCL -> (MIX or SYNC) transition decision

Previously (in w26f6a8e48ae575f17c850e28e969d55bceefbc0f), for reasons that
are somewhat obscured by passage of time, we did

+      if ((other_wanted & (CEPH_CAP_GRD|CEPH_CAP_GWR)) ||

But then we noticed that the loner may want to RD/WR and we are losing the
loner status for some other reason.  So just recently in
b48dfeba3f99451815a5e2a538bea15cd87220d2 we changed it to

+      if (((other_wanted|loner_wanted) & (CEPH_CAP_GRD|CEPH_CAP_GWR)) ||

Then we noticed that a non-loner wanting to read and a loner wanting to
read (i.e., no writers!) would lead to MIX, even when we want SYNC.
So in 07b36992da35e8b54acf76af6c893a0d86f048fb we changed to

+      if (((other_wanted|loner_wanted) & CEPH_CAP_GWR) ||

This appears to be correct.  The possible choices (wrt caps wanted):

loner  other   want
R      R       SYNC
R      R|W     MIX
R      W       MIX
R|W    R       MIX
R|W    R|W     MIX
R|W    W       MIX
W      R       MIX
W      R|W     MIX
W      W       MIX

Which means any writer -> we want MIX.  We only want SYNC when there is
nobody who wants to write.  Because you can't write in SYNC.  Which in
retrospect seems obvious.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoOSD: put connection in disconnect_session_watches as well as the session
Samuel Just [Thu, 13 Dec 2012 18:52:28 +0000 (10:52 -0800)]
OSD: put connection in disconnect_session_watches as well as the session

obc->watchers now has a ref to the connection as well.  This piece of
disconnect_session_watchers essentially parallels remove_watcher and
should generally do the same thing.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoOSD: disconnect_session_watches obc might not be valid after we relock
Samuel Just [Thu, 13 Dec 2012 18:50:52 +0000 (10:50 -0800)]
OSD: disconnect_session_watches obc might not be valid after we relock

If disconnect_session_watches races with watch removal, the session
might no longer have a valid obc ref.  In that case, move on to
the next obc.

Note, there is no danger of any obcs being *added* to the session
since the session/connection at this point is dead.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoclarify/correct some of sample.ceph.conf
Greg Farnum [Tue, 11 Dec 2012 23:13:44 +0000 (15:13 -0800)]
clarify/correct some of sample.ceph.conf

Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agoMerge remote branch 'origin/next'
Josh Durgin [Thu, 13 Dec 2012 16:30:22 +0000 (08:30 -0800)]
Merge remote branch 'origin/next'

12 years agoqa: echo commands run by rbd map-unmap workunit
Josh Durgin [Thu, 13 Dec 2012 16:29:10 +0000 (08:29 -0800)]
qa: echo commands run by rbd map-unmap workunit

It's hard to figure out what failed without this.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoauth: guard decode_decrypt with try block
Sage Weil [Thu, 13 Dec 2012 06:01:03 +0000 (22:01 -0800)]
auth: guard decode_decrypt with try block

This will catch buffer decoding errors (maybe the block is empty) and
return an error string.

May fix (or possibly paper over) #3459.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomount.fuse.ceph: strip out noauto option
Sage Weil [Thu, 13 Dec 2012 05:14:13 +0000 (21:14 -0800)]
mount.fuse.ceph: strip out noauto option

mount -a uses this, but also passes it to mount.fuse.ceph, and libceph
complains:

fuse: unknown option `noauto'

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomount.fuse.ceph: add ceph-fuse mount helper
Sage Weil [Wed, 12 Dec 2012 16:01:49 +0000 (08:01 -0800)]
mount.fuse.ceph: add ceph-fuse mount helper

Signed-off-by: Sage Weil <sage@inktank.com>
12 years ago/etc/init.d/ceph: fs_type assignment syntax error
Dan Mick [Thu, 13 Dec 2012 03:38:35 +0000 (19:38 -0800)]
/etc/init.d/ceph: fs_type assignment syntax error

This handles the remainder of 3581; it's a lot like the problem in
mkcephfs, but it isn't mkcephfs.

Fixes: #3581
Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agofilestore: Don't keep checking for syncfs if found
Sam Lang [Thu, 13 Dec 2012 00:28:12 +0000 (14:28 -1000)]
filestore: Don't keep checking for syncfs if found

Valgrind outputs a warning for unrecognized system calls,
and does so for the syscall(__SYS_syncfs,...) and
syscall(__NR_syncfs, ...) calls.  This patch avoids making
those calls (and the warning, when run in valgrind) if the
syncfs libc call is available.

INFO:teuthology.task.ceph.osd.1.err:--10568-- WARNING: unhandled syscall: 306
INFO:teuthology.task.ceph.osd.1.err:--10568-- You may be able to write your own handler.
INFO:teuthology.task.ceph.osd.1.err:--10568-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
INFO:teuthology.task.ceph.osd.1.err:--10568-- Nevertheless we consider this a bug.  Please report
INFO:teuthology.task.ceph.osd.1.err:--10568-- it at http://valgrind.org/support/bug_reports.html.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agov0.55.1 v0.55.1
Gary Lowell [Thu, 13 Dec 2012 00:24:34 +0000 (16:24 -0800)]
v0.55.1

12 years agoOSD: pg might be removed during disconnect_session_watches
Samuel Just [Wed, 12 Dec 2012 23:09:25 +0000 (15:09 -0800)]
OSD: pg might be removed during disconnect_session_watches

We don't hold the osd_lock between the session->watches traversal
and the obc checks.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoPG,ReplicatedPG: handle_watch_timeout must not write during scrub/degraded
Samuel Just [Wed, 12 Dec 2012 06:22:31 +0000 (22:22 -0800)]
PG,ReplicatedPG: handle_watch_timeout must not write during scrub/degraded

Currently, handle_watch_timeout will gladly write to an object while
that object is degraded or is being scrubbed.  Now, we queue a
callback to be called on scrub completion or finish_degraded_object
to recall handle_watch_timeout.  The callback mechanism assumes that
the registered callbacks assume they will be called with the pg
lock -- and no other locks -- already held.

The callback will release the obc and pg refs unconditionally.  Thus,
we need to replace the unconnected_watchers pointer with NULL to
ensure that unregister_unconnected_watcher fails to cancel the
event and does not release the resources a second time.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG:, remove_notify, put session after con
Samuel Just [Wed, 12 Dec 2012 22:51:24 +0000 (14:51 -0800)]
ReplicatedPG:, remove_notify, put session after con

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: only put if we cancel evt in unregister_unconnected_watcher
Samuel Just [Wed, 12 Dec 2012 22:26:59 +0000 (14:26 -0800)]
ReplicatedPG: only put if we cancel evt in unregister_unconnected_watcher

If we fail to cancel the callback, the callback will fire and
release those resources.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: watchers must grab Connection ref as well
Samuel Just [Wed, 12 Dec 2012 22:06:51 +0000 (14:06 -0800)]
ReplicatedPG: watchers must grab Connection ref as well

Session refs are not really valid on their own, the
corresponding Connection must remain live for at least
as long as the Session.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agodoc: Updated per comments in the mailing list.
John Wilkins [Wed, 12 Dec 2012 22:38:22 +0000 (14:38 -0800)]
doc: Updated per comments in the mailing list.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodocs: better documentation of new rgw feature
Yehuda Sadeh [Wed, 12 Dec 2012 21:49:55 +0000 (13:49 -0800)]
docs: better documentation of new rgw feature

Document rgw_extended_http_attrs config option.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: configurable list of object attributes
Yehuda Sadeh [Fri, 30 Nov 2012 07:07:26 +0000 (23:07 -0800)]
rgw: configurable list of object attributes

Fixes: #3535
New object attributes are now configurable. A list
can be specified via the 'rgw extended http attrs'
config param.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: option to provide alternative s3 put obj success code
Yehuda Sadeh [Fri, 30 Nov 2012 00:48:46 +0000 (16:48 -0800)]
rgw: option to provide alternative s3 put obj success code

Fixes: #3529
Added a new option: rgw_s3_success_create_obj_status.
Expected values are 0, 200, 201, 204. A value of 0
will skip the special handling altogether. Any value
other than the specified will default to 200.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agodoc: document swift compatibility
Yehuda Sadeh [Wed, 12 Dec 2012 00:44:46 +0000 (16:44 -0800)]
doc: document swift compatibility

Add a table that specifies swift features compatibility

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agodocs: add rgw POST object as supported feature
Yehuda Sadeh [Wed, 12 Dec 2012 00:09:42 +0000 (16:09 -0800)]
docs: add rgw POST object as supported feature

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>