]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agowireshark: fix indention
Danny Al-Gaaf [Thu, 24 Jan 2013 17:21:21 +0000 (18:21 +0100)]
wireshark: fix indention

Fix indention.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agowireshark: fix guint64 print format handling
Danny Al-Gaaf [Thu, 24 Jan 2013 17:21:20 +0000 (18:21 +0100)]
wireshark: fix guint64 print format handling

Use G_GUINT64_FORMAT to handle print format of guint64 correctly.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoPendingReleaseNotes: pool removal cli changes
Sage Weil [Thu, 24 Jan 2013 02:50:57 +0000 (18:50 -0800)]
PendingReleaseNotes: pool removal cli changes

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-rm-pool'
Sage Weil [Thu, 24 Jan 2013 02:49:05 +0000 (18:49 -0800)]
Merge remote-tracking branch 'gh/wip-rm-pool'

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-3832-oc-flushrange'
Sage Weil [Thu, 24 Jan 2013 02:47:25 +0000 (18:47 -0800)]
Merge remote-tracking branch 'gh/wip-3832-oc-flushrange'

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'wip-osd-hb'
Sage Weil [Thu, 24 Jan 2013 02:40:49 +0000 (18:40 -0800)]
Merge branch 'wip-osd-hb'

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge remote-tracking branch 'upstream/wip_push_after_complete'
Samuel Just [Thu, 24 Jan 2013 00:55:33 +0000 (16:55 -0800)]
Merge remote-tracking branch 'upstream/wip_push_after_complete'

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoReplicatedPG: handle omap > max_recovery_chunk
Samuel Just [Wed, 23 Jan 2013 20:49:04 +0000 (12:49 -0800)]
ReplicatedPG: handle omap > max_recovery_chunk

span_of fails if len == 0.

Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: correctly handle omap key larger than max chunk
Samuel Just [Wed, 23 Jan 2013 20:18:31 +0000 (12:18 -0800)]
ReplicatedPG: correctly handle omap key larger than max chunk

Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: start scanning omap at omap_recovered_to
Samuel Just [Wed, 23 Jan 2013 20:15:10 +0000 (12:15 -0800)]
ReplicatedPG: start scanning omap at omap_recovered_to

Previously, we started scanning omap after omap_recovered_to.
This is a problem since the break in the loop implies that
omap_recovered_to is the first key not recovered.

Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: don't finish_recovery_op until the transaction completes
Samuel Just [Wed, 23 Jan 2013 19:50:13 +0000 (11:50 -0800)]
ReplicatedPG: don't finish_recovery_op until the transaction completes

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: ack push only after transaction has completed
Samuel Just [Wed, 23 Jan 2013 19:35:47 +0000 (11:35 -0800)]
ReplicatedPG: ack push only after transaction has completed

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoObjectStore: add queue_transactions with oncomplete
Samuel Just [Wed, 23 Jan 2013 19:13:28 +0000 (11:13 -0800)]
ObjectStore: add queue_transactions with oncomplete

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agorados: safety interlock on 'rmpool' command
Sage Weil [Wed, 23 Jan 2013 16:49:06 +0000 (08:49 -0800)]
rados: safety interlock on 'rmpool' command

This is a very easy way for a user to do a lot of damage with no way back.
Make sure they mean it.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: implement safety interlock for deleting pools
Sage Weil [Wed, 23 Jan 2013 16:40:13 +0000 (08:40 -0800)]
mon: implement safety interlock for deleting pools

This is a very easy way for users to accidentally to a *lot* of damage.
Make it an annoying manual process to actually do this.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocommon/HeartbeatMap: inject unhealthy heartbeat for N seconds
Sage Weil [Wed, 23 Jan 2013 05:18:45 +0000 (21:18 -0800)]
common/HeartbeatMap: inject unhealthy heartbeat for N seconds

This lets us test code that is triggered by an unhealthy heartbeat in a
generic way.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoos/FileStore: add stall injection into filestore op queue
Sage Weil [Wed, 23 Jan 2013 02:08:22 +0000 (18:08 -0800)]
os/FileStore: add stall injection into filestore op queue

Allow admin to artificially induce a stall in the op queue.  Forces the
thread(s) to sleep for N seconds.  We pause for 1 second increments and
recheck the value so that a previously stalled thread can be unwedged by
reinjecting a lower value (or 0).  To stall indefinitely, just injust
very large number.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: do not join cluster if not healthy
Sage Weil [Wed, 23 Jan 2013 02:03:10 +0000 (18:03 -0800)]
osd: do not join cluster if not healthy

If our internal heartbeats are failing, do not send a boot message and try
to join the cluster.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: hold lock while calling start_boot on startup
Sage Weil [Wed, 23 Jan 2013 02:01:07 +0000 (18:01 -0800)]
osd: hold lock while calling start_boot on startup

This probably doesn't strictly matter because start_boot doesn't need the
lock (currently) and few other threads should be running, but it is
better to be consistent.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: do not reply to ping if internal heartbeat is not healthy
Sage Weil [Wed, 23 Jan 2013 01:56:32 +0000 (17:56 -0800)]
osd: do not reply to ping if internal heartbeat is not healthy

If we find that our internal threads are stalled, do not reply to ping
requests.  If we do this long enough, peers will mark us down.  If we are
only transiently unhealthy, we will reply to the next ping and they will
be satisfied.  If we are unhealthy and marked down, and eventually recover,
we will mark ourselves back up.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: reduce op thread heartbeat default 30 -> 15 seconds
Sage Weil [Wed, 23 Jan 2013 01:53:40 +0000 (17:53 -0800)]
osd: reduce op thread heartbeat default 30 -> 15 seconds

If the thread stalls for 15 seconds, let our internal heartbeat fail.
This will let us internally respond more quickly to a stalled or failing
disk.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #35 from cholcombe973/master
Yehuda Sadeh [Wed, 23 Jan 2013 00:54:39 +0000 (16:54 -0800)]
Merge pull request #35 from cholcombe973/master

Making the usage details a little better.

12 years agoMerge remote-tracking branch 'gh/wip-3833-b'
Sage Weil [Wed, 23 Jan 2013 00:13:14 +0000 (16:13 -0800)]
Merge remote-tracking branch 'gh/wip-3833-b'

Conflicts:
src/osd/OSD.cc
src/osd/OSD.h

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoUpdate src/rgw/rgw_admin.cc 35/head
cholcombe973 [Wed, 23 Jan 2013 00:07:27 +0000 (19:07 -0500)]
Update src/rgw/rgw_admin.cc

Improved the usage message.

12 years agoMerge branch 'wip-3651'
David Zafman [Tue, 22 Jan 2013 23:58:44 +0000 (15:58 -0800)]
Merge branch 'wip-3651'

12 years agoosd: debug support for omap deep-scrub
David Zafman [Tue, 15 Jan 2013 00:37:09 +0000 (16:37 -0800)]
osd: debug support for omap deep-scrub

Deep-scrub test support through admin socket

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: Add digest of omap for deep-scrub
David Zafman [Wed, 9 Jan 2013 03:24:13 +0000 (19:24 -0800)]
osd: Add digest of omap for deep-scrub

Add ScrubMap encode/decode v4 message with omap digest
Compute digest of header and key/value.  Use bufferlist
to reflect structure and compute as we go, clearing
bufferlist to reduce memory usage.

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: Add missing unregister_command() in OSD::shutdown()
David Zafman [Fri, 18 Jan 2013 17:31:00 +0000 (09:31 -0800)]
osd: Add missing unregister_command() in OSD::shutdown()

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoconfig: helper to identify internal fields we should be quiet about
Sage Weil [Tue, 22 Jan 2013 22:59:30 +0000 (14:59 -0800)]
config: helper to identify internal fields we should be quiet about

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocommon/Throttle: fix modeline, whitespace
Sage Weil [Tue, 22 Jan 2013 22:56:36 +0000 (14:56 -0800)]
common/Throttle: fix modeline, whitespace

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodoc: Modified usage for upgrade.
John Wilkins [Tue, 22 Jan 2013 22:55:19 +0000 (14:55 -0800)]
doc: Modified usage for upgrade.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoosd: improve sub_op flag points
Sage Weil [Tue, 22 Jan 2013 05:02:01 +0000 (21:02 -0800)]
osd: improve sub_op flag points

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: refactor ReplicatedPG::do_sub_op
Sage Weil [Tue, 22 Jan 2013 04:55:20 +0000 (20:55 -0800)]
osd: refactor ReplicatedPG::do_sub_op

PULL is the only case where we don't wait for active.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: make last state for slow requests more informative
Sage Weil [Tue, 22 Jan 2013 00:36:36 +0000 (16:36 -0800)]
osd: make last state for slow requests more informative

Report on the last event string, and pass in important context for the
op event list, including:

 - which peers were sent sub ops and we are waiting for
 - which pg queue we are delayed by

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: dump op priority queue state via admin socket
Sage Weil [Mon, 21 Jan 2013 23:59:07 +0000 (15:59 -0800)]
osd: dump op priority queue state via admin socket

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: simplify asok to single callback
Sage Weil [Mon, 21 Jan 2013 23:50:33 +0000 (15:50 -0800)]
osd: simplify asok to single callback

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocommon/PrioritizedQueue: dump state to Formatter
Sage Weil [Mon, 21 Jan 2013 23:58:57 +0000 (15:58 -0800)]
common/PrioritizedQueue: dump state to Formatter

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocommon/PrioritizedQueue: add min cost, max tokens per bucket
Sage Weil [Mon, 21 Jan 2013 23:29:28 +0000 (15:29 -0800)]
common/PrioritizedQueue: add min cost, max tokens per bucket

Two problems.

First, we need to cap the tokens per bucket.  Otherwise, a stream of
items at one priority over time will indefinitely inflate the tokens
available at another priority.  The cap should represent how "bursty"
we allow a given bucket to be.  Start with 4MB for now.

Second, set a floor on the item cost.  Otherwise, we can have an
infinite queue of 0 cost items that start over queues.  More
realistically, we need to balance the overhead of processing small items
with the cost of large items.  I.e., a 4 KB item is not 1/1000th as
expensive as a 4MB item.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocommon/PrioritizedQueue: buckets -> tokens
Sage Weil [Mon, 21 Jan 2013 22:52:54 +0000 (14:52 -0800)]
common/PrioritizedQueue: buckets -> tokens

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agonote puller's max chunk in pull requests
Sage Weil [Mon, 21 Jan 2013 22:31:00 +0000 (14:31 -0800)]
note puller's max chunk in pull requests

this lets us calculate a cost value

12 years agoosd: add OpRequest flag point when commit is sent
Sage Weil [Mon, 21 Jan 2013 22:14:25 +0000 (14:14 -0800)]
osd: add OpRequest flag point when commit is sent

With writeahead journaling in particular, we can get requests that
stay in the queue for a long time even after the commit is sent to the
client while we are waiting for the transaction to apply to the fs.
Instead of showing up as 'waiting for subops', make it clear that the
client has gotten its reply and it is local state that is slow.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: set PULL subop cost to size of requested data
Sage Weil [Mon, 21 Jan 2013 21:57:59 +0000 (13:57 -0800)]
osd: set PULL subop cost to size of requested data

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: use Message::get_cost() function for queueing
Sage Weil [Mon, 21 Jan 2013 21:57:38 +0000 (13:57 -0800)]
osd: use Message::get_cost() function for queueing

The data payload is a decent proxy for cost in most cases, but not all.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: debug msg prio, cost, latency
Sage Weil [Mon, 21 Jan 2013 21:25:21 +0000 (13:25 -0800)]
osd: debug msg prio, cost, latency

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agofilestore: filestore_queue_max_ops 500 -> 50
Sage Weil [Tue, 22 Jan 2013 05:05:00 +0000 (21:05 -0800)]
filestore: filestore_queue_max_ops 500 -> 50

Having a deep queue limits the effectiveness of the priority queues
above by adding additional latency.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: target transaction size 300 -> 30
Sage Weil [Tue, 22 Jan 2013 04:00:26 +0000 (20:00 -0800)]
osd: target transaction size 300 -> 30

Small transactions make pg removal nicer to the op queue.  It also slows
down PG deletion a bit, which may exacerbate the PG resurrection case
until #3884 is addressed.

At least on user reported this fixed an osd that kept failing due to
an internal heartbeat failure.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agofilestore: disable extra committing queue allowance
Sage Weil [Sun, 20 Jan 2013 01:33:25 +0000 (17:33 -0800)]
filestore: disable extra committing queue allowance

The motivation here is if there is a problem draining the op queue
during a sync.  For XFS and ext4, this isn't generally a problem: you
can continue to make writes while a syncfs(2) is in progress.  There
are currently some possible implementation issues with btrfs, but we
have not demonstrated them recently.

Meanwhile, this can cause queue length spikes that screw up latency.
During a commit, we allow too much into the queue (say, recovery
operations).  After the sync finishes, we have to drain it out before
we can queue new work (say, a higher priority client request).  Having
a deep queue below the point where priorities order work limits the
value of the priority queue.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoos/FileStore: allow filestore_queue_max_{ops,bytes} to be adjusted at runtime
Sage Weil [Tue, 22 Jan 2013 03:55:26 +0000 (19:55 -0800)]
os/FileStore: allow filestore_queue_max_{ops,bytes} to be adjusted at runtime

The 'committing' ones too.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: make osd_max_backfills dynamically adjustable
Sage Weil [Sun, 20 Jan 2013 06:06:27 +0000 (22:06 -0800)]
osd: make osd_max_backfills dynamically adjustable

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: make OSD a config observer
Sage Weil [Sun, 20 Jan 2013 02:28:35 +0000 (18:28 -0800)]
osd: make OSD a config observer

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoqa/workunit: Add iozone test script for sync
Sam Lang [Sat, 19 Jan 2013 00:57:37 +0000 (18:57 -0600)]
qa/workunit: Add iozone test script for sync

The iozone-sync.sh script runs iozone testing
various sync flags, O_SYNC, O_DSYNC, O_RSYNC.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agoobjectcacher: Remove commit_set, use flush_set
Sam Lang [Fri, 18 Jan 2013 20:59:12 +0000 (14:59 -0600)]
objectcacher: Remove commit_set, use flush_set

commit_set() and flush_set() are identical in functionality,
so use flush_set everywhere and remove commit_set from
the code.

Also fixes a bug in flush_set where the finisher context was
getting freed twice if no objects needed to be flushed.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotesting: add workunit to run hadoop internal tests.
Joe Buck [Fri, 4 Jan 2013 05:35:32 +0000 (21:35 -0800)]
testing: add workunit to run hadoop internal tests.

This workunit runs the internal tests for our local branch of hadoop-common.
Requires ant be installed on the host running the test.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
12 years agoMerge branch 'wip-config'
Sage Weil [Tue, 22 Jan 2013 18:25:37 +0000 (10:25 -0800)]
Merge branch 'wip-config'

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoconfig: report on log level changes
Sage Weil [Mon, 21 Jan 2013 17:24:58 +0000 (09:24 -0800)]
config: report on log level changes

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoconfig: clean up output
Sage Weil [Tue, 22 Jan 2013 18:24:37 +0000 (10:24 -0800)]
config: clean up output

Report a simple list of key='value', without extra verbosity.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoconfig: don't make noise about 'internal_safe_to_start_threads'
Sage Weil [Mon, 21 Jan 2013 16:45:10 +0000 (08:45 -0800)]
config: don't make noise about 'internal_safe_to_start_threads'

This is set on start, and subsequently gets into the changed set.
Once any other config value is injected, it is the first thing reported
by the logs, but is confusing and useless to the user.  Hide it.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Mon, 21 Jan 2013 16:22:36 +0000 (08:22 -0800)]
Merge remote-tracking branch 'gh/next'

12 years agomds: fix default_file_layout constructor
Greg Farnum [Tue, 15 Jan 2013 21:54:18 +0000 (13:54 -0800)]
mds: fix default_file_layout constructor

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomds: fix byte_range_t ctor
Greg Farnum [Tue, 15 Jan 2013 21:29:40 +0000 (13:29 -0800)]
mds: fix byte_range_t ctor

I do not think we saw any bugs from this, but anything that involved
capability issues on restart or migrate might have been caused by
this.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agoosd: calculate initial PG mapping from PG's osdmap
Sage Weil [Mon, 21 Jan 2013 00:11:10 +0000 (16:11 -0800)]
osd: calculate initial PG mapping from PG's osdmap

The initial values of up/acting need to be based on the PG's osdmap, not
the OSD's latest.  This can cause various confusion in
pg_interval_t::check_new_interval() when calling OSDMap methods due to the
up/acting OSDs not existing yet (for example).

Fixes: #3879
Reported-by: Jens Kristian S?gaard <jens@mermaidconsulting.dk>
Tested-by: Jens Kristian S?gaard <jens@mermaidconsulting.dk>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoworkunits/cephtool: add tests for ceph osd pool set/get
Dan Mick [Sat, 19 Jan 2013 06:35:32 +0000 (22:35 -0800)]
workunits/cephtool: add tests for ceph osd pool set/get

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Sat, 19 Jan 2013 04:57:40 +0000 (20:57 -0800)]
Merge remote-tracking branch 'gh/next'

12 years agoClarify journal size based on filestore max sync 33/head
Travis Rhoden [Sat, 19 Jan 2013 03:26:07 +0000 (22:26 -0500)]
Clarify journal size based on filestore max sync

The docs had the recommended journal size based on the option
"filestore min sync interval" when it should have been
"filestore max sync interval".

While in there, fix a couple of typos -- multiple when it should
be multiply, and a missing word.  Change "Should at least twice"
to "Should be at least twice..."

Signed-off-by: Travis Rhoden <trhoden@gmail.com>
12 years agoceph: reject negative weights at ceph osd <n> reweight
Dan Mick [Sat, 19 Jan 2013 02:30:03 +0000 (18:30 -0800)]
ceph: reject negative weights at ceph osd <n> reweight

Check the integer (fixed-point) value to avoid any worries
about floating-point rounding.  Add tests for reweight < 0.

Fixes: #3872
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
12 years agoworkunit/cephtool: Use '! cmd' when expecting failure
Dan Mick [Sat, 19 Jan 2013 02:28:44 +0000 (18:28 -0800)]
workunit/cephtool: Use '! cmd' when expecting failure

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agoOSD: do deep_scrub for repair
Samuel Just [Fri, 18 Jan 2013 22:35:51 +0000 (14:35 -0800)]
OSD: do deep_scrub for repair

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
12 years agoMerge branch 'wip-pg-removal'
Sage Weil [Fri, 18 Jan 2013 23:45:03 +0000 (15:45 -0800)]
Merge branch 'wip-pg-removal'

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: set pg removal transactions based on configurable
Sage Weil [Fri, 18 Jan 2013 23:23:22 +0000 (15:23 -0800)]
osd: set pg removal transactions based on configurable

Use the osd_target_transaction_size knob, and gracefully tolerate bogus
values (e.g., <= 0).

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: make pg removal thread more friendly
Sage Weil [Fri, 18 Jan 2013 23:30:06 +0000 (15:30 -0800)]
osd: make pg removal thread more friendly

For a large PG these are saturating the filestore and journal queues.  Do
them synchronously to make them more friendly.  They don't need to be fast.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoos: move apply_transactions() sync wrapper into ObjectStore
Sage Weil [Fri, 18 Jan 2013 23:27:24 +0000 (15:27 -0800)]
os: move apply_transactions() sync wrapper into ObjectStore

This has nothing to do with the backend implementation.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoos: add apply_transaction() variant that takes a sequencer
Sage Weil [Fri, 18 Jan 2013 23:28:24 +0000 (15:28 -0800)]
os: add apply_transaction() variant that takes a sequencer

Also, move the convenience wrappers into the interface and funnel through
a single implementation.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoclient: Respect O_SYNC, O_DSYNC, and O_RSYNC
Sam Lang [Thu, 17 Jan 2013 20:03:51 +0000 (14:03 -0600)]
client: Respect O_SYNC, O_DSYNC, and O_RSYNC

If the file is opened with O_SYNC, O_DSYNC, or O_RSYNC, we need to
flush cached data (and metadata for O_SYNC) on a write.
For O_RSYNC, we need to flush dirty data on a read.
This patch adds a file_flush() call to the objectCacher
to allow a specific range to be flushed from the cache, and
in the O_SYNC,O_DSYNC case for write and O_RSYNC case for read,
calls that function waiting for the flush to complete.  The patch
also adds a flags field directly to the file handle struct, and
replaces the append boolean with the use of the flags field directly.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-client-pool-api'
Sage Weil [Fri, 18 Jan 2013 21:31:15 +0000 (13:31 -0800)]
Merge remote-tracking branch 'gh/wip-client-pool-api'

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoqa: remove xfstest 068 from qemu testing
Josh Durgin [Fri, 18 Jan 2013 20:20:57 +0000 (12:20 -0800)]
qa: remove xfstest 068 from qemu testing

This tests fsfreeze, which sometimes hangs in xfs in linux 3.2

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoceph: allow osd pool get to get everything you can set
Dan Mick [Fri, 18 Jan 2013 20:20:34 +0000 (12:20 -0800)]
ceph: allow osd pool get to get everything you can set

osd pool get was missing size, min_size, crash_replay_interval,
and crush_ruleset; they're all easily added.

Fixes: #3869
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
12 years agoos/FileStore: only flush inline if write is sufficiently large
Sage Weil [Fri, 18 Jan 2013 20:14:48 +0000 (12:14 -0800)]
os/FileStore: only flush inline if write is sufficiently large

Honor filestore_flush_min in the inline flush case.

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoos/FileStore: fix compile when sync_file_range is missing;
Sage Weil [Fri, 18 Jan 2013 20:14:40 +0000 (12:14 -0800)]
os/FileStore: fix compile when sync_file_range is missing;

If sync_file_range is not present, we always close inline, and flush
via fdatasync(2).

Fixes compile on ancient platforms like RHEL5.8.

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agodoc/rados/operations/crush: need kernel v3.6 for first round of tunables
Sage Weil [Fri, 18 Jan 2013 19:05:03 +0000 (11:05 -0800)]
doc/rados/operations/crush: need kernel v3.6 for first round of tunables

Reported-by: rl219 in #ceph on irc.oftc.net
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agojava: support get pool id/replication interface
Noah Watkins [Wed, 16 Jan 2013 20:27:16 +0000 (12:27 -0800)]
java: support get pool id/replication interface

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agolibcephfs: add pool id/size lookup interface
Noah Watkins [Wed, 16 Jan 2013 19:21:39 +0000 (11:21 -0800)]
libcephfs: add pool id/size lookup interface

Adds new interfaces ceph_get_pool_id() and ceph_get_pool_replication()
to libcephfs.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agodoc: Added link to rotation section.
John Wilkins [Fri, 18 Jan 2013 08:25:28 +0000 (00:25 -0800)]
doc: Added link to rotation section.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added hyperlink to log rotation section.
John Wilkins [Fri, 18 Jan 2013 08:25:08 +0000 (00:25 -0800)]
doc: Added hyperlink to log rotation section.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added section on log rotation.
John Wilkins [Fri, 18 Jan 2013 08:24:22 +0000 (00:24 -0800)]
doc: Added section on log rotation.

fixes: #3776

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge branch 'master' of https://github.com/ceph/ceph
John Wilkins [Fri, 18 Jan 2013 07:33:06 +0000 (23:33 -0800)]
Merge branch 'master' of https://github.com/ceph/ceph

12 years agodoc: Modified index to include mon-osd-interaction.
John Wilkins [Fri, 18 Jan 2013 07:32:26 +0000 (23:32 -0800)]
doc: Modified index to include mon-osd-interaction.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added a section describing mon/osd interaction.
John Wilkins [Fri, 18 Jan 2013 07:31:47 +0000 (23:31 -0800)]
doc: Added a section describing mon/osd interaction.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agobuild: Add perl installation dependency to rpm and debian packages.
Gary Lowell [Fri, 18 Jan 2013 06:43:07 +0000 (22:43 -0800)]
build:  Add perl installation dependency to rpm and debian packages.

There was already a dependency on python in the debian control file,
a similar dependency was added to the rpm spec file.  perl is needed
for the logrotate script, so a dependecy was on perl wass added to
both. Bug 3768.

Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
12 years agodoc: Added an admonishment for SSD write latency.
John Wilkins [Fri, 18 Jan 2013 06:13:12 +0000 (22:13 -0800)]
doc: Added an admonishment for SSD write latency.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated OSD configuration reference with backfill config options.
John Wilkins [Fri, 18 Jan 2013 05:27:46 +0000 (21:27 -0800)]
doc: Updated OSD configuration reference with backfill config options.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge branch 'wip-mds'
Sage Weil [Fri, 18 Jan 2013 05:05:05 +0000 (21:05 -0800)]
Merge branch 'wip-mds'

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agorbd: fix bench-write infinite loop
Josh Durgin [Wed, 26 Dec 2012 22:24:22 +0000 (14:24 -0800)]
rbd: fix bench-write infinite loop

I/O was continously submitted as long as there were few enough ops in
flight. If the number of 'threads' was high, or caching was turned on,
there would never be that many ops in flight, so the loop would continue
indefinitely. Instead, submit at most io_threads ops per offset.

Fixes: #3413
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
12 years agoMerge branch 'wip-cephx'
Sage Weil [Fri, 18 Jan 2013 01:01:49 +0000 (17:01 -0800)]
Merge branch 'wip-cephx'

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agocrushtool: warn usefully about missing output spec
Dan Mick [Thu, 17 Jan 2013 19:32:03 +0000 (11:32 -0800)]
crushtool: warn usefully about missing output spec

When running with --test, you must request output to CSV files or
specific types of output to --show-X; make the error message
clarify what the tool wants.

Fixes: #3827
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agocrushtool: consolidate_whitespace() should eat everything except \n
Dan Mick [Thu, 17 Jan 2013 19:18:46 +0000 (11:18 -0800)]
crushtool: consolidate_whitespace() should eat everything except \n

CRUSH map source with \r (like a DOS text file) failed to compile
with the usual nonuseful message; turns out that eating \r along with
' ' and '\t' etc. solves that problem.

Fixes: #3834
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agodoc/rados/operations/authentication: update for cephx sig requirement options
Sage Weil [Thu, 17 Jan 2013 23:12:59 +0000 (15:12 -0800)]
doc/rados/operations/authentication: update for cephx sig requirement options

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: enforce 'cephx require signatures' during negotiation
Sage Weil [Fri, 28 Dec 2012 00:18:19 +0000 (16:18 -0800)]
mon: enforce 'cephx require signatures' during negotiation

If we are negotiating which auth protocol to use, and the client does not
support the MSG_AUTH feature, and the server has 'cephx require signatures'
set to true, then remove cephx from the list of allowed protocols.

Also print something in the mon log so that we know wtf is going on.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsg/Pipe: require MSG_AUTH feature on server if option is enabled
Sage Weil [Fri, 28 Dec 2012 00:03:20 +0000 (16:03 -0800)]
msg/Pipe: require MSG_AUTH feature on server if option is enabled

If we

  negotiate cephx AND
  are a server AND
  cephx require signatures = true

then require the MSG_AUTH feature bit.  Put this in the Policy struct for
this connection so that the existing feature bit checks and error reporting
are used, and the peer knows what feature it is missing.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocephx: control signaures for service vs cluster
Sage Weil [Fri, 28 Dec 2012 01:42:52 +0000 (17:42 -0800)]
cephx: control signaures for service vs cluster

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdmap: make replica separate in default crush map configurable
Sage Weil [Thu, 17 Jan 2013 23:01:35 +0000 (15:01 -0800)]
osdmap: make replica separate in default crush map configurable

Add 'osd crush chooseleaf type' option to control what the default
CRUSH rule separates replicas across.  Default to 1 (host), and set it
to 0 in vstart.sh.

Fixes: #3785
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>