]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agoReplicatedPG: don't finish_recovery_op until the transaction completes
Samuel Just [Wed, 23 Jan 2013 19:50:13 +0000 (11:50 -0800)]
ReplicatedPG: don't finish_recovery_op until the transaction completes

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: ack push only after transaction has completed
Samuel Just [Wed, 23 Jan 2013 19:35:47 +0000 (11:35 -0800)]
ReplicatedPG: ack push only after transaction has completed

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoObjectStore: add queue_transactions with oncomplete
Samuel Just [Wed, 23 Jan 2013 19:13:28 +0000 (11:13 -0800)]
ObjectStore: add queue_transactions with oncomplete

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge pull request #35 from cholcombe973/master
Yehuda Sadeh [Wed, 23 Jan 2013 00:54:39 +0000 (16:54 -0800)]
Merge pull request #35 from cholcombe973/master

Making the usage details a little better.

12 years agoMerge remote-tracking branch 'gh/wip-3833-b'
Sage Weil [Wed, 23 Jan 2013 00:13:14 +0000 (16:13 -0800)]
Merge remote-tracking branch 'gh/wip-3833-b'

Conflicts:
src/osd/OSD.cc
src/osd/OSD.h

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoUpdate src/rgw/rgw_admin.cc 35/head
cholcombe973 [Wed, 23 Jan 2013 00:07:27 +0000 (19:07 -0500)]
Update src/rgw/rgw_admin.cc

Improved the usage message.

12 years agoMerge branch 'wip-3651'
David Zafman [Tue, 22 Jan 2013 23:58:44 +0000 (15:58 -0800)]
Merge branch 'wip-3651'

12 years agoosd: debug support for omap deep-scrub
David Zafman [Tue, 15 Jan 2013 00:37:09 +0000 (16:37 -0800)]
osd: debug support for omap deep-scrub

Deep-scrub test support through admin socket

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: Add digest of omap for deep-scrub
David Zafman [Wed, 9 Jan 2013 03:24:13 +0000 (19:24 -0800)]
osd: Add digest of omap for deep-scrub

Add ScrubMap encode/decode v4 message with omap digest
Compute digest of header and key/value.  Use bufferlist
to reflect structure and compute as we go, clearing
bufferlist to reduce memory usage.

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: Add missing unregister_command() in OSD::shutdown()
David Zafman [Fri, 18 Jan 2013 17:31:00 +0000 (09:31 -0800)]
osd: Add missing unregister_command() in OSD::shutdown()

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoconfig: helper to identify internal fields we should be quiet about
Sage Weil [Tue, 22 Jan 2013 22:59:30 +0000 (14:59 -0800)]
config: helper to identify internal fields we should be quiet about

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocommon/Throttle: fix modeline, whitespace
Sage Weil [Tue, 22 Jan 2013 22:56:36 +0000 (14:56 -0800)]
common/Throttle: fix modeline, whitespace

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodoc: Modified usage for upgrade.
John Wilkins [Tue, 22 Jan 2013 22:55:19 +0000 (14:55 -0800)]
doc: Modified usage for upgrade.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoosd: improve sub_op flag points
Sage Weil [Tue, 22 Jan 2013 05:02:01 +0000 (21:02 -0800)]
osd: improve sub_op flag points

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: refactor ReplicatedPG::do_sub_op
Sage Weil [Tue, 22 Jan 2013 04:55:20 +0000 (20:55 -0800)]
osd: refactor ReplicatedPG::do_sub_op

PULL is the only case where we don't wait for active.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: make last state for slow requests more informative
Sage Weil [Tue, 22 Jan 2013 00:36:36 +0000 (16:36 -0800)]
osd: make last state for slow requests more informative

Report on the last event string, and pass in important context for the
op event list, including:

 - which peers were sent sub ops and we are waiting for
 - which pg queue we are delayed by

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: dump op priority queue state via admin socket
Sage Weil [Mon, 21 Jan 2013 23:59:07 +0000 (15:59 -0800)]
osd: dump op priority queue state via admin socket

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: simplify asok to single callback
Sage Weil [Mon, 21 Jan 2013 23:50:33 +0000 (15:50 -0800)]
osd: simplify asok to single callback

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocommon/PrioritizedQueue: dump state to Formatter
Sage Weil [Mon, 21 Jan 2013 23:58:57 +0000 (15:58 -0800)]
common/PrioritizedQueue: dump state to Formatter

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocommon/PrioritizedQueue: add min cost, max tokens per bucket
Sage Weil [Mon, 21 Jan 2013 23:29:28 +0000 (15:29 -0800)]
common/PrioritizedQueue: add min cost, max tokens per bucket

Two problems.

First, we need to cap the tokens per bucket.  Otherwise, a stream of
items at one priority over time will indefinitely inflate the tokens
available at another priority.  The cap should represent how "bursty"
we allow a given bucket to be.  Start with 4MB for now.

Second, set a floor on the item cost.  Otherwise, we can have an
infinite queue of 0 cost items that start over queues.  More
realistically, we need to balance the overhead of processing small items
with the cost of large items.  I.e., a 4 KB item is not 1/1000th as
expensive as a 4MB item.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocommon/PrioritizedQueue: buckets -> tokens
Sage Weil [Mon, 21 Jan 2013 22:52:54 +0000 (14:52 -0800)]
common/PrioritizedQueue: buckets -> tokens

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agonote puller's max chunk in pull requests
Sage Weil [Mon, 21 Jan 2013 22:31:00 +0000 (14:31 -0800)]
note puller's max chunk in pull requests

this lets us calculate a cost value

12 years agoosd: add OpRequest flag point when commit is sent
Sage Weil [Mon, 21 Jan 2013 22:14:25 +0000 (14:14 -0800)]
osd: add OpRequest flag point when commit is sent

With writeahead journaling in particular, we can get requests that
stay in the queue for a long time even after the commit is sent to the
client while we are waiting for the transaction to apply to the fs.
Instead of showing up as 'waiting for subops', make it clear that the
client has gotten its reply and it is local state that is slow.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: set PULL subop cost to size of requested data
Sage Weil [Mon, 21 Jan 2013 21:57:59 +0000 (13:57 -0800)]
osd: set PULL subop cost to size of requested data

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: use Message::get_cost() function for queueing
Sage Weil [Mon, 21 Jan 2013 21:57:38 +0000 (13:57 -0800)]
osd: use Message::get_cost() function for queueing

The data payload is a decent proxy for cost in most cases, but not all.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: debug msg prio, cost, latency
Sage Weil [Mon, 21 Jan 2013 21:25:21 +0000 (13:25 -0800)]
osd: debug msg prio, cost, latency

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agofilestore: filestore_queue_max_ops 500 -> 50
Sage Weil [Tue, 22 Jan 2013 05:05:00 +0000 (21:05 -0800)]
filestore: filestore_queue_max_ops 500 -> 50

Having a deep queue limits the effectiveness of the priority queues
above by adding additional latency.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: target transaction size 300 -> 30
Sage Weil [Tue, 22 Jan 2013 04:00:26 +0000 (20:00 -0800)]
osd: target transaction size 300 -> 30

Small transactions make pg removal nicer to the op queue.  It also slows
down PG deletion a bit, which may exacerbate the PG resurrection case
until #3884 is addressed.

At least on user reported this fixed an osd that kept failing due to
an internal heartbeat failure.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agofilestore: disable extra committing queue allowance
Sage Weil [Sun, 20 Jan 2013 01:33:25 +0000 (17:33 -0800)]
filestore: disable extra committing queue allowance

The motivation here is if there is a problem draining the op queue
during a sync.  For XFS and ext4, this isn't generally a problem: you
can continue to make writes while a syncfs(2) is in progress.  There
are currently some possible implementation issues with btrfs, but we
have not demonstrated them recently.

Meanwhile, this can cause queue length spikes that screw up latency.
During a commit, we allow too much into the queue (say, recovery
operations).  After the sync finishes, we have to drain it out before
we can queue new work (say, a higher priority client request).  Having
a deep queue below the point where priorities order work limits the
value of the priority queue.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoos/FileStore: allow filestore_queue_max_{ops,bytes} to be adjusted at runtime
Sage Weil [Tue, 22 Jan 2013 03:55:26 +0000 (19:55 -0800)]
os/FileStore: allow filestore_queue_max_{ops,bytes} to be adjusted at runtime

The 'committing' ones too.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: make osd_max_backfills dynamically adjustable
Sage Weil [Sun, 20 Jan 2013 06:06:27 +0000 (22:06 -0800)]
osd: make osd_max_backfills dynamically adjustable

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: make OSD a config observer
Sage Weil [Sun, 20 Jan 2013 02:28:35 +0000 (18:28 -0800)]
osd: make OSD a config observer

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agotesting: add workunit to run hadoop internal tests.
Joe Buck [Fri, 4 Jan 2013 05:35:32 +0000 (21:35 -0800)]
testing: add workunit to run hadoop internal tests.

This workunit runs the internal tests for our local branch of hadoop-common.
Requires ant be installed on the host running the test.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
12 years agoMerge branch 'wip-config'
Sage Weil [Tue, 22 Jan 2013 18:25:37 +0000 (10:25 -0800)]
Merge branch 'wip-config'

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoconfig: report on log level changes
Sage Weil [Mon, 21 Jan 2013 17:24:58 +0000 (09:24 -0800)]
config: report on log level changes

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoconfig: clean up output
Sage Weil [Tue, 22 Jan 2013 18:24:37 +0000 (10:24 -0800)]
config: clean up output

Report a simple list of key='value', without extra verbosity.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoconfig: don't make noise about 'internal_safe_to_start_threads'
Sage Weil [Mon, 21 Jan 2013 16:45:10 +0000 (08:45 -0800)]
config: don't make noise about 'internal_safe_to_start_threads'

This is set on start, and subsequently gets into the changed set.
Once any other config value is injected, it is the first thing reported
by the logs, but is confusing and useless to the user.  Hide it.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Mon, 21 Jan 2013 16:22:36 +0000 (08:22 -0800)]
Merge remote-tracking branch 'gh/next'

12 years agomds: fix default_file_layout constructor
Greg Farnum [Tue, 15 Jan 2013 21:54:18 +0000 (13:54 -0800)]
mds: fix default_file_layout constructor

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomds: fix byte_range_t ctor
Greg Farnum [Tue, 15 Jan 2013 21:29:40 +0000 (13:29 -0800)]
mds: fix byte_range_t ctor

I do not think we saw any bugs from this, but anything that involved
capability issues on restart or migrate might have been caused by
this.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agoosd: calculate initial PG mapping from PG's osdmap
Sage Weil [Mon, 21 Jan 2013 00:11:10 +0000 (16:11 -0800)]
osd: calculate initial PG mapping from PG's osdmap

The initial values of up/acting need to be based on the PG's osdmap, not
the OSD's latest.  This can cause various confusion in
pg_interval_t::check_new_interval() when calling OSDMap methods due to the
up/acting OSDs not existing yet (for example).

Fixes: #3879
Reported-by: Jens Kristian S?gaard <jens@mermaidconsulting.dk>
Tested-by: Jens Kristian S?gaard <jens@mermaidconsulting.dk>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoworkunits/cephtool: add tests for ceph osd pool set/get
Dan Mick [Sat, 19 Jan 2013 06:35:32 +0000 (22:35 -0800)]
workunits/cephtool: add tests for ceph osd pool set/get

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Sat, 19 Jan 2013 04:57:40 +0000 (20:57 -0800)]
Merge remote-tracking branch 'gh/next'

12 years agoClarify journal size based on filestore max sync 33/head
Travis Rhoden [Sat, 19 Jan 2013 03:26:07 +0000 (22:26 -0500)]
Clarify journal size based on filestore max sync

The docs had the recommended journal size based on the option
"filestore min sync interval" when it should have been
"filestore max sync interval".

While in there, fix a couple of typos -- multiple when it should
be multiply, and a missing word.  Change "Should at least twice"
to "Should be at least twice..."

Signed-off-by: Travis Rhoden <trhoden@gmail.com>
12 years agoceph: reject negative weights at ceph osd <n> reweight
Dan Mick [Sat, 19 Jan 2013 02:30:03 +0000 (18:30 -0800)]
ceph: reject negative weights at ceph osd <n> reweight

Check the integer (fixed-point) value to avoid any worries
about floating-point rounding.  Add tests for reweight < 0.

Fixes: #3872
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
12 years agoworkunit/cephtool: Use '! cmd' when expecting failure
Dan Mick [Sat, 19 Jan 2013 02:28:44 +0000 (18:28 -0800)]
workunit/cephtool: Use '! cmd' when expecting failure

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agoOSD: do deep_scrub for repair
Samuel Just [Fri, 18 Jan 2013 22:35:51 +0000 (14:35 -0800)]
OSD: do deep_scrub for repair

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
12 years agoMerge branch 'wip-pg-removal'
Sage Weil [Fri, 18 Jan 2013 23:45:03 +0000 (15:45 -0800)]
Merge branch 'wip-pg-removal'

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: set pg removal transactions based on configurable
Sage Weil [Fri, 18 Jan 2013 23:23:22 +0000 (15:23 -0800)]
osd: set pg removal transactions based on configurable

Use the osd_target_transaction_size knob, and gracefully tolerate bogus
values (e.g., <= 0).

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: make pg removal thread more friendly
Sage Weil [Fri, 18 Jan 2013 23:30:06 +0000 (15:30 -0800)]
osd: make pg removal thread more friendly

For a large PG these are saturating the filestore and journal queues.  Do
them synchronously to make them more friendly.  They don't need to be fast.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoos: move apply_transactions() sync wrapper into ObjectStore
Sage Weil [Fri, 18 Jan 2013 23:27:24 +0000 (15:27 -0800)]
os: move apply_transactions() sync wrapper into ObjectStore

This has nothing to do with the backend implementation.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoos: add apply_transaction() variant that takes a sequencer
Sage Weil [Fri, 18 Jan 2013 23:28:24 +0000 (15:28 -0800)]
os: add apply_transaction() variant that takes a sequencer

Also, move the convenience wrappers into the interface and funnel through
a single implementation.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-client-pool-api'
Sage Weil [Fri, 18 Jan 2013 21:31:15 +0000 (13:31 -0800)]
Merge remote-tracking branch 'gh/wip-client-pool-api'

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoqa: remove xfstest 068 from qemu testing
Josh Durgin [Fri, 18 Jan 2013 20:20:57 +0000 (12:20 -0800)]
qa: remove xfstest 068 from qemu testing

This tests fsfreeze, which sometimes hangs in xfs in linux 3.2

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoceph: allow osd pool get to get everything you can set
Dan Mick [Fri, 18 Jan 2013 20:20:34 +0000 (12:20 -0800)]
ceph: allow osd pool get to get everything you can set

osd pool get was missing size, min_size, crash_replay_interval,
and crush_ruleset; they're all easily added.

Fixes: #3869
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
12 years agoos/FileStore: only flush inline if write is sufficiently large
Sage Weil [Fri, 18 Jan 2013 20:14:48 +0000 (12:14 -0800)]
os/FileStore: only flush inline if write is sufficiently large

Honor filestore_flush_min in the inline flush case.

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoos/FileStore: fix compile when sync_file_range is missing;
Sage Weil [Fri, 18 Jan 2013 20:14:40 +0000 (12:14 -0800)]
os/FileStore: fix compile when sync_file_range is missing;

If sync_file_range is not present, we always close inline, and flush
via fdatasync(2).

Fixes compile on ancient platforms like RHEL5.8.

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agodoc/rados/operations/crush: need kernel v3.6 for first round of tunables
Sage Weil [Fri, 18 Jan 2013 19:05:03 +0000 (11:05 -0800)]
doc/rados/operations/crush: need kernel v3.6 for first round of tunables

Reported-by: rl219 in #ceph on irc.oftc.net
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agojava: support get pool id/replication interface
Noah Watkins [Wed, 16 Jan 2013 20:27:16 +0000 (12:27 -0800)]
java: support get pool id/replication interface

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agolibcephfs: add pool id/size lookup interface
Noah Watkins [Wed, 16 Jan 2013 19:21:39 +0000 (11:21 -0800)]
libcephfs: add pool id/size lookup interface

Adds new interfaces ceph_get_pool_id() and ceph_get_pool_replication()
to libcephfs.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agodoc: Added link to rotation section.
John Wilkins [Fri, 18 Jan 2013 08:25:28 +0000 (00:25 -0800)]
doc: Added link to rotation section.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added hyperlink to log rotation section.
John Wilkins [Fri, 18 Jan 2013 08:25:08 +0000 (00:25 -0800)]
doc: Added hyperlink to log rotation section.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added section on log rotation.
John Wilkins [Fri, 18 Jan 2013 08:24:22 +0000 (00:24 -0800)]
doc: Added section on log rotation.

fixes: #3776

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge branch 'master' of https://github.com/ceph/ceph
John Wilkins [Fri, 18 Jan 2013 07:33:06 +0000 (23:33 -0800)]
Merge branch 'master' of https://github.com/ceph/ceph

12 years agodoc: Modified index to include mon-osd-interaction.
John Wilkins [Fri, 18 Jan 2013 07:32:26 +0000 (23:32 -0800)]
doc: Modified index to include mon-osd-interaction.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added a section describing mon/osd interaction.
John Wilkins [Fri, 18 Jan 2013 07:31:47 +0000 (23:31 -0800)]
doc: Added a section describing mon/osd interaction.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agobuild: Add perl installation dependency to rpm and debian packages.
Gary Lowell [Fri, 18 Jan 2013 06:43:07 +0000 (22:43 -0800)]
build:  Add perl installation dependency to rpm and debian packages.

There was already a dependency on python in the debian control file,
a similar dependency was added to the rpm spec file.  perl is needed
for the logrotate script, so a dependecy was on perl wass added to
both. Bug 3768.

Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
12 years agodoc: Added an admonishment for SSD write latency.
John Wilkins [Fri, 18 Jan 2013 06:13:12 +0000 (22:13 -0800)]
doc: Added an admonishment for SSD write latency.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated OSD configuration reference with backfill config options.
John Wilkins [Fri, 18 Jan 2013 05:27:46 +0000 (21:27 -0800)]
doc: Updated OSD configuration reference with backfill config options.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge branch 'wip-mds'
Sage Weil [Fri, 18 Jan 2013 05:05:05 +0000 (21:05 -0800)]
Merge branch 'wip-mds'

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agorbd: fix bench-write infinite loop
Josh Durgin [Wed, 26 Dec 2012 22:24:22 +0000 (14:24 -0800)]
rbd: fix bench-write infinite loop

I/O was continously submitted as long as there were few enough ops in
flight. If the number of 'threads' was high, or caching was turned on,
there would never be that many ops in flight, so the loop would continue
indefinitely. Instead, submit at most io_threads ops per offset.

Fixes: #3413
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
12 years agoMerge branch 'wip-cephx'
Sage Weil [Fri, 18 Jan 2013 01:01:49 +0000 (17:01 -0800)]
Merge branch 'wip-cephx'

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agocrushtool: warn usefully about missing output spec
Dan Mick [Thu, 17 Jan 2013 19:32:03 +0000 (11:32 -0800)]
crushtool: warn usefully about missing output spec

When running with --test, you must request output to CSV files or
specific types of output to --show-X; make the error message
clarify what the tool wants.

Fixes: #3827
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agocrushtool: consolidate_whitespace() should eat everything except \n
Dan Mick [Thu, 17 Jan 2013 19:18:46 +0000 (11:18 -0800)]
crushtool: consolidate_whitespace() should eat everything except \n

CRUSH map source with \r (like a DOS text file) failed to compile
with the usual nonuseful message; turns out that eating \r along with
' ' and '\t' etc. solves that problem.

Fixes: #3834
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agodoc/rados/operations/authentication: update for cephx sig requirement options
Sage Weil [Thu, 17 Jan 2013 23:12:59 +0000 (15:12 -0800)]
doc/rados/operations/authentication: update for cephx sig requirement options

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: enforce 'cephx require signatures' during negotiation
Sage Weil [Fri, 28 Dec 2012 00:18:19 +0000 (16:18 -0800)]
mon: enforce 'cephx require signatures' during negotiation

If we are negotiating which auth protocol to use, and the client does not
support the MSG_AUTH feature, and the server has 'cephx require signatures'
set to true, then remove cephx from the list of allowed protocols.

Also print something in the mon log so that we know wtf is going on.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsg/Pipe: require MSG_AUTH feature on server if option is enabled
Sage Weil [Fri, 28 Dec 2012 00:03:20 +0000 (16:03 -0800)]
msg/Pipe: require MSG_AUTH feature on server if option is enabled

If we

  negotiate cephx AND
  are a server AND
  cephx require signatures = true

then require the MSG_AUTH feature bit.  Put this in the Policy struct for
this connection so that the existing feature bit checks and error reporting
are used, and the peer knows what feature it is missing.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocephx: control signaures for service vs cluster
Sage Weil [Fri, 28 Dec 2012 01:42:52 +0000 (17:42 -0800)]
cephx: control signaures for service vs cluster

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdmap: make replica separate in default crush map configurable
Sage Weil [Thu, 17 Jan 2013 23:01:35 +0000 (15:01 -0800)]
osdmap: make replica separate in default crush map configurable

Add 'osd crush chooseleaf type' option to control what the default
CRUSH rule separates replicas across.  Default to 1 (host), and set it
to 0 in vstart.sh.

Fixes: #3785
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomon: Monitor: drop messages from old timecheck epochs
Joao Eduardo Luis [Thu, 17 Jan 2013 18:11:23 +0000 (18:11 +0000)]
mon: Monitor: drop messages from old timecheck epochs

We were asserting when the message's timecheck epoch (which is mapped to
the election epoch) was older than the current epoch.  However, if a
monitor is lagged just enough to not even notice an election happened,
then it might eventually answer to old timechecks, which would make
the leader assert.  Instead, we just drop the message, while warning we
did so.

Fixes: #3835
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoosdmaptool: more fix cli test
Sage Weil [Thu, 17 Jan 2013 05:19:18 +0000 (21:19 -0800)]
osdmaptool: more fix cli test

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdmaptool: fix cli test
Sage Weil [Thu, 17 Jan 2013 05:10:26 +0000 (21:10 -0800)]
osdmaptool: fix cli test

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: leave osd_lock locked in shutdown()
Sage Weil [Wed, 16 Jan 2013 21:14:00 +0000 (13:14 -0800)]
osd: leave osd_lock locked in shutdown()

No callers expect the lock to be dropped.

Fixes: #3816
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoradosgw: increate nofile ulimit in upstart
Kyle Bader [Thu, 17 Jan 2013 02:04:22 +0000 (18:04 -0800)]
radosgw: increate nofile ulimit in upstart

The default ulimit for open file descriptors per process is 1024,
far too few for radosgw if you have lots of OSDs and configure
radosgw for decent number of threads.

Signed-off-by: Kyle Bader <kyle.bader@dreamhost.com>
12 years agoceph: adjust crush tunables via 'ceph osd crush tunables <profile>'
Sage Weil [Wed, 16 Jan 2013 22:09:53 +0000 (14:09 -0800)]
ceph: adjust crush tunables via 'ceph osd crush tunables <profile>'

Make it easy to adjust crush tunables.  Create profiles:

 legacy: the legacy values
 argonaut: the argonaut defaults, and what is supported.. legacy! (*(
 bobtail: best that bobtail supports
 optimal: the current optimal values
 default: the current default values

* In actuality, argonaut supports some of the tunables, but it doesn't
  say so via the feature bits.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agoosdmaptool: allow user to specify pool for test-map-object
Samuel Just [Wed, 16 Jan 2013 22:21:47 +0000 (14:21 -0800)]
osdmaptool: allow user to specify pool for test-map-object

Fixes: #3820
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Gregory Farnum <greg@inktank.com>
12 years agoMerge branch 'wip_snap_scrub'
Samuel Just [Wed, 16 Jan 2013 23:52:53 +0000 (15:52 -0800)]
Merge branch 'wip_snap_scrub'

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agorgw: copy object should not copy source acls
Yehuda Sadeh [Wed, 16 Jan 2013 23:01:47 +0000 (15:01 -0800)]
rgw: copy object should not copy source acls

Fixes: #3802
Backport: argonaut, bobtail

When using the S3 api and x-amz-metadata-directive is
set to COPY we used to copy complete metadata of source
object. However, this shouldn't include the source ACLs.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoReplicatedPG: ignore snap link info in scrub if nlinks==0
Samuel Just [Mon, 14 Jan 2013 20:52:04 +0000 (12:52 -0800)]
ReplicatedPG: ignore snap link info in scrub if nlinks==0

links==0 implies that the replica did not sent snap link information.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd/PG: fix osd id in error message on snap collection errors
Sage Weil [Fri, 11 Jan 2013 20:25:22 +0000 (12:25 -0800)]
osd/PG: fix osd id in error message on snap collection errors

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd/ReplicatedPG: validate ino when scrubbing snap collections
Sage Weil [Thu, 10 Jan 2013 06:34:12 +0000 (22:34 -0800)]
osd/ReplicatedPG: validate ino when scrubbing snap collections

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoReplicatedPG: compare nlinks to snapcolls
Samuel Just [Thu, 10 Jan 2013 00:41:40 +0000 (16:41 -0800)]
ReplicatedPG: compare nlinks to snapcolls

nlinks gives us the number of hardlinks to the object.
nlinks should be 1 + snapcolls.size().  This will allow
us to detect links which remain in an erroneous snap
collection.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG/PG: check snap collections during _scan_list
Samuel Just [Thu, 10 Jan 2013 23:35:10 +0000 (15:35 -0800)]
ReplicatedPG/PG: check snap collections during _scan_list

During _scan_list check the snapcollections corresponding to the
object_info attr on the object.  Report inconsistencies during
scrub_finalize.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd_types: add nlink and snapcolls fields to ScrubMap::object
Samuel Just [Wed, 9 Jan 2013 19:53:52 +0000 (11:53 -0800)]
osd_types: add nlink and snapcolls fields to ScrubMap::object

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoPG: move auth replica selection to helper in scrub
Samuel Just [Fri, 4 Jan 2013 04:16:50 +0000 (20:16 -0800)]
PG: move auth replica selection to helper in scrub

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: correctly handle new snap collections on replica
Samuel Just [Sat, 12 Jan 2013 00:43:14 +0000 (16:43 -0800)]
ReplicatedPG: correctly handle new snap collections on replica

Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoReplicatedPG: make_snap_collection when moving snap link in snap_trimmer
Samuel Just [Fri, 11 Jan 2013 23:00:02 +0000 (15:00 -0800)]
ReplicatedPG: make_snap_collection when moving snap link in snap_trimmer

Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agorados.cc: fix rmomapkey usage: val not needed
David Zafman [Wed, 16 Jan 2013 20:41:16 +0000 (12:41 -0800)]
rados.cc: fix rmomapkey usage: val not needed

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <samuel.just@inktank.com>
12 years agolibrados.hpp: fix omap_get_vals and omap_get_keys comments
Samuel Just [Wed, 16 Jan 2013 05:27:23 +0000 (21:27 -0800)]
librados.hpp: fix omap_get_vals and omap_get_keys comments

We list keys greater than start_after.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
12 years agorados.cc: use omap_get_vals_by_keys in getomapval
Samuel Just [Wed, 16 Jan 2013 05:26:22 +0000 (21:26 -0800)]
rados.cc: use omap_get_vals_by_keys in getomapval

Fixes: #3811
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>