]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agoClarify journal size based on filestore max sync 33/head
Travis Rhoden [Sat, 19 Jan 2013 03:26:07 +0000 (22:26 -0500)]
Clarify journal size based on filestore max sync

The docs had the recommended journal size based on the option
"filestore min sync interval" when it should have been
"filestore max sync interval".

While in there, fix a couple of typos -- multiple when it should
be multiply, and a missing word.  Change "Should at least twice"
to "Should be at least twice..."

Signed-off-by: Travis Rhoden <trhoden@gmail.com>
12 years agoOSD: do deep_scrub for repair
Samuel Just [Fri, 18 Jan 2013 22:35:51 +0000 (14:35 -0800)]
OSD: do deep_scrub for repair

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
12 years agoMerge branch 'wip-pg-removal'
Sage Weil [Fri, 18 Jan 2013 23:45:03 +0000 (15:45 -0800)]
Merge branch 'wip-pg-removal'

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: set pg removal transactions based on configurable
Sage Weil [Fri, 18 Jan 2013 23:23:22 +0000 (15:23 -0800)]
osd: set pg removal transactions based on configurable

Use the osd_target_transaction_size knob, and gracefully tolerate bogus
values (e.g., <= 0).

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: make pg removal thread more friendly
Sage Weil [Fri, 18 Jan 2013 23:30:06 +0000 (15:30 -0800)]
osd: make pg removal thread more friendly

For a large PG these are saturating the filestore and journal queues.  Do
them synchronously to make them more friendly.  They don't need to be fast.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoos: move apply_transactions() sync wrapper into ObjectStore
Sage Weil [Fri, 18 Jan 2013 23:27:24 +0000 (15:27 -0800)]
os: move apply_transactions() sync wrapper into ObjectStore

This has nothing to do with the backend implementation.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoos: add apply_transaction() variant that takes a sequencer
Sage Weil [Fri, 18 Jan 2013 23:28:24 +0000 (15:28 -0800)]
os: add apply_transaction() variant that takes a sequencer

Also, move the convenience wrappers into the interface and funnel through
a single implementation.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-client-pool-api'
Sage Weil [Fri, 18 Jan 2013 21:31:15 +0000 (13:31 -0800)]
Merge remote-tracking branch 'gh/wip-client-pool-api'

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoqa: remove xfstest 068 from qemu testing
Josh Durgin [Fri, 18 Jan 2013 20:20:57 +0000 (12:20 -0800)]
qa: remove xfstest 068 from qemu testing

This tests fsfreeze, which sometimes hangs in xfs in linux 3.2

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoceph: allow osd pool get to get everything you can set
Dan Mick [Fri, 18 Jan 2013 20:20:34 +0000 (12:20 -0800)]
ceph: allow osd pool get to get everything you can set

osd pool get was missing size, min_size, crash_replay_interval,
and crush_ruleset; they're all easily added.

Fixes: #3869
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
12 years agoos/FileStore: only flush inline if write is sufficiently large
Sage Weil [Fri, 18 Jan 2013 20:14:48 +0000 (12:14 -0800)]
os/FileStore: only flush inline if write is sufficiently large

Honor filestore_flush_min in the inline flush case.

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoos/FileStore: fix compile when sync_file_range is missing;
Sage Weil [Fri, 18 Jan 2013 20:14:40 +0000 (12:14 -0800)]
os/FileStore: fix compile when sync_file_range is missing;

If sync_file_range is not present, we always close inline, and flush
via fdatasync(2).

Fixes compile on ancient platforms like RHEL5.8.

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agodoc/rados/operations/crush: need kernel v3.6 for first round of tunables
Sage Weil [Fri, 18 Jan 2013 19:05:03 +0000 (11:05 -0800)]
doc/rados/operations/crush: need kernel v3.6 for first round of tunables

Reported-by: rl219 in #ceph on irc.oftc.net
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agojava: support get pool id/replication interface
Noah Watkins [Wed, 16 Jan 2013 20:27:16 +0000 (12:27 -0800)]
java: support get pool id/replication interface

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agolibcephfs: add pool id/size lookup interface
Noah Watkins [Wed, 16 Jan 2013 19:21:39 +0000 (11:21 -0800)]
libcephfs: add pool id/size lookup interface

Adds new interfaces ceph_get_pool_id() and ceph_get_pool_replication()
to libcephfs.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agodoc: Added link to rotation section.
John Wilkins [Fri, 18 Jan 2013 08:25:28 +0000 (00:25 -0800)]
doc: Added link to rotation section.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added hyperlink to log rotation section.
John Wilkins [Fri, 18 Jan 2013 08:25:08 +0000 (00:25 -0800)]
doc: Added hyperlink to log rotation section.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added section on log rotation.
John Wilkins [Fri, 18 Jan 2013 08:24:22 +0000 (00:24 -0800)]
doc: Added section on log rotation.

fixes: #3776

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge branch 'master' of https://github.com/ceph/ceph
John Wilkins [Fri, 18 Jan 2013 07:33:06 +0000 (23:33 -0800)]
Merge branch 'master' of https://github.com/ceph/ceph

12 years agodoc: Modified index to include mon-osd-interaction.
John Wilkins [Fri, 18 Jan 2013 07:32:26 +0000 (23:32 -0800)]
doc: Modified index to include mon-osd-interaction.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added a section describing mon/osd interaction.
John Wilkins [Fri, 18 Jan 2013 07:31:47 +0000 (23:31 -0800)]
doc: Added a section describing mon/osd interaction.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agobuild: Add perl installation dependency to rpm and debian packages.
Gary Lowell [Fri, 18 Jan 2013 06:43:07 +0000 (22:43 -0800)]
build:  Add perl installation dependency to rpm and debian packages.

There was already a dependency on python in the debian control file,
a similar dependency was added to the rpm spec file.  perl is needed
for the logrotate script, so a dependecy was on perl wass added to
both. Bug 3768.

Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
12 years agodoc: Added an admonishment for SSD write latency.
John Wilkins [Fri, 18 Jan 2013 06:13:12 +0000 (22:13 -0800)]
doc: Added an admonishment for SSD write latency.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated OSD configuration reference with backfill config options.
John Wilkins [Fri, 18 Jan 2013 05:27:46 +0000 (21:27 -0800)]
doc: Updated OSD configuration reference with backfill config options.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge branch 'wip-mds'
Sage Weil [Fri, 18 Jan 2013 05:05:05 +0000 (21:05 -0800)]
Merge branch 'wip-mds'

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agorbd: fix bench-write infinite loop
Josh Durgin [Wed, 26 Dec 2012 22:24:22 +0000 (14:24 -0800)]
rbd: fix bench-write infinite loop

I/O was continously submitted as long as there were few enough ops in
flight. If the number of 'threads' was high, or caching was turned on,
there would never be that many ops in flight, so the loop would continue
indefinitely. Instead, submit at most io_threads ops per offset.

Fixes: #3413
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
12 years agoMerge branch 'wip-cephx'
Sage Weil [Fri, 18 Jan 2013 01:01:49 +0000 (17:01 -0800)]
Merge branch 'wip-cephx'

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agocrushtool: warn usefully about missing output spec
Dan Mick [Thu, 17 Jan 2013 19:32:03 +0000 (11:32 -0800)]
crushtool: warn usefully about missing output spec

When running with --test, you must request output to CSV files or
specific types of output to --show-X; make the error message
clarify what the tool wants.

Fixes: #3827
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agocrushtool: consolidate_whitespace() should eat everything except \n
Dan Mick [Thu, 17 Jan 2013 19:18:46 +0000 (11:18 -0800)]
crushtool: consolidate_whitespace() should eat everything except \n

CRUSH map source with \r (like a DOS text file) failed to compile
with the usual nonuseful message; turns out that eating \r along with
' ' and '\t' etc. solves that problem.

Fixes: #3834
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agodoc/rados/operations/authentication: update for cephx sig requirement options
Sage Weil [Thu, 17 Jan 2013 23:12:59 +0000 (15:12 -0800)]
doc/rados/operations/authentication: update for cephx sig requirement options

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: enforce 'cephx require signatures' during negotiation
Sage Weil [Fri, 28 Dec 2012 00:18:19 +0000 (16:18 -0800)]
mon: enforce 'cephx require signatures' during negotiation

If we are negotiating which auth protocol to use, and the client does not
support the MSG_AUTH feature, and the server has 'cephx require signatures'
set to true, then remove cephx from the list of allowed protocols.

Also print something in the mon log so that we know wtf is going on.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsg/Pipe: require MSG_AUTH feature on server if option is enabled
Sage Weil [Fri, 28 Dec 2012 00:03:20 +0000 (16:03 -0800)]
msg/Pipe: require MSG_AUTH feature on server if option is enabled

If we

  negotiate cephx AND
  are a server AND
  cephx require signatures = true

then require the MSG_AUTH feature bit.  Put this in the Policy struct for
this connection so that the existing feature bit checks and error reporting
are used, and the peer knows what feature it is missing.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocephx: control signaures for service vs cluster
Sage Weil [Fri, 28 Dec 2012 01:42:52 +0000 (17:42 -0800)]
cephx: control signaures for service vs cluster

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdmap: make replica separate in default crush map configurable
Sage Weil [Thu, 17 Jan 2013 23:01:35 +0000 (15:01 -0800)]
osdmap: make replica separate in default crush map configurable

Add 'osd crush chooseleaf type' option to control what the default
CRUSH rule separates replicas across.  Default to 1 (host), and set it
to 0 in vstart.sh.

Fixes: #3785
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomon: Monitor: drop messages from old timecheck epochs
Joao Eduardo Luis [Thu, 17 Jan 2013 18:11:23 +0000 (18:11 +0000)]
mon: Monitor: drop messages from old timecheck epochs

We were asserting when the message's timecheck epoch (which is mapped to
the election epoch) was older than the current epoch.  However, if a
monitor is lagged just enough to not even notice an election happened,
then it might eventually answer to old timechecks, which would make
the leader assert.  Instead, we just drop the message, while warning we
did so.

Fixes: #3835
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoosdmaptool: more fix cli test
Sage Weil [Thu, 17 Jan 2013 05:19:18 +0000 (21:19 -0800)]
osdmaptool: more fix cli test

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdmaptool: fix cli test
Sage Weil [Thu, 17 Jan 2013 05:10:26 +0000 (21:10 -0800)]
osdmaptool: fix cli test

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: leave osd_lock locked in shutdown()
Sage Weil [Wed, 16 Jan 2013 21:14:00 +0000 (13:14 -0800)]
osd: leave osd_lock locked in shutdown()

No callers expect the lock to be dropped.

Fixes: #3816
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoradosgw: increate nofile ulimit in upstart
Kyle Bader [Thu, 17 Jan 2013 02:04:22 +0000 (18:04 -0800)]
radosgw: increate nofile ulimit in upstart

The default ulimit for open file descriptors per process is 1024,
far too few for radosgw if you have lots of OSDs and configure
radosgw for decent number of threads.

Signed-off-by: Kyle Bader <kyle.bader@dreamhost.com>
12 years agoceph: adjust crush tunables via 'ceph osd crush tunables <profile>'
Sage Weil [Wed, 16 Jan 2013 22:09:53 +0000 (14:09 -0800)]
ceph: adjust crush tunables via 'ceph osd crush tunables <profile>'

Make it easy to adjust crush tunables.  Create profiles:

 legacy: the legacy values
 argonaut: the argonaut defaults, and what is supported.. legacy! (*(
 bobtail: best that bobtail supports
 optimal: the current optimal values
 default: the current default values

* In actuality, argonaut supports some of the tunables, but it doesn't
  say so via the feature bits.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agoosdmaptool: allow user to specify pool for test-map-object
Samuel Just [Wed, 16 Jan 2013 22:21:47 +0000 (14:21 -0800)]
osdmaptool: allow user to specify pool for test-map-object

Fixes: #3820
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Gregory Farnum <greg@inktank.com>
12 years agoMerge branch 'wip_snap_scrub'
Samuel Just [Wed, 16 Jan 2013 23:52:53 +0000 (15:52 -0800)]
Merge branch 'wip_snap_scrub'

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoReplicatedPG: ignore snap link info in scrub if nlinks==0
Samuel Just [Mon, 14 Jan 2013 20:52:04 +0000 (12:52 -0800)]
ReplicatedPG: ignore snap link info in scrub if nlinks==0

links==0 implies that the replica did not sent snap link information.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd/PG: fix osd id in error message on snap collection errors
Sage Weil [Fri, 11 Jan 2013 20:25:22 +0000 (12:25 -0800)]
osd/PG: fix osd id in error message on snap collection errors

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd/ReplicatedPG: validate ino when scrubbing snap collections
Sage Weil [Thu, 10 Jan 2013 06:34:12 +0000 (22:34 -0800)]
osd/ReplicatedPG: validate ino when scrubbing snap collections

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoReplicatedPG: compare nlinks to snapcolls
Samuel Just [Thu, 10 Jan 2013 00:41:40 +0000 (16:41 -0800)]
ReplicatedPG: compare nlinks to snapcolls

nlinks gives us the number of hardlinks to the object.
nlinks should be 1 + snapcolls.size().  This will allow
us to detect links which remain in an erroneous snap
collection.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG/PG: check snap collections during _scan_list
Samuel Just [Thu, 10 Jan 2013 23:35:10 +0000 (15:35 -0800)]
ReplicatedPG/PG: check snap collections during _scan_list

During _scan_list check the snapcollections corresponding to the
object_info attr on the object.  Report inconsistencies during
scrub_finalize.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd_types: add nlink and snapcolls fields to ScrubMap::object
Samuel Just [Wed, 9 Jan 2013 19:53:52 +0000 (11:53 -0800)]
osd_types: add nlink and snapcolls fields to ScrubMap::object

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoPG: move auth replica selection to helper in scrub
Samuel Just [Fri, 4 Jan 2013 04:16:50 +0000 (20:16 -0800)]
PG: move auth replica selection to helper in scrub

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: correctly handle new snap collections on replica
Samuel Just [Sat, 12 Jan 2013 00:43:14 +0000 (16:43 -0800)]
ReplicatedPG: correctly handle new snap collections on replica

Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoReplicatedPG: make_snap_collection when moving snap link in snap_trimmer
Samuel Just [Fri, 11 Jan 2013 23:00:02 +0000 (15:00 -0800)]
ReplicatedPG: make_snap_collection when moving snap link in snap_trimmer

Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agorados.cc: fix rmomapkey usage: val not needed
David Zafman [Wed, 16 Jan 2013 20:41:16 +0000 (12:41 -0800)]
rados.cc: fix rmomapkey usage: val not needed

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <samuel.just@inktank.com>
12 years agolibrados.hpp: fix omap_get_vals and omap_get_keys comments
Samuel Just [Wed, 16 Jan 2013 05:27:23 +0000 (21:27 -0800)]
librados.hpp: fix omap_get_vals and omap_get_keys comments

We list keys greater than start_after.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
12 years agorados.cc: use omap_get_vals_by_keys in getomapval
Samuel Just [Wed, 16 Jan 2013 05:26:22 +0000 (21:26 -0800)]
rados.cc: use omap_get_vals_by_keys in getomapval

Fixes: #3811
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
12 years agorados.cc: fix listomapvals usage: key,val are not needed
Samuel Just [Wed, 16 Jan 2013 05:24:50 +0000 (21:24 -0800)]
rados.cc: fix listomapvals usage: key,val are not needed

Fixes: #3812
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
12 years agoMerge branch 'wip-rbd-formatted-output'
Josh Durgin [Wed, 16 Jan 2013 21:29:22 +0000 (13:29 -0800)]
Merge branch 'wip-rbd-formatted-output'

Reviewed-by: Dan Mick <dan.mick@inktank.com>
Conflicts:
src/rbd.cc
src/test/cli/rbd/help.t

12 years agoMerge branch 'master' into wip-scrub
Sage Weil [Wed, 16 Jan 2013 21:17:41 +0000 (13:17 -0800)]
Merge branch 'master' into wip-scrub

12 years agoosd: better error message for request on pool that dne
Sage Weil [Mon, 7 Jan 2013 06:49:48 +0000 (22:49 -0800)]
osd: better error message for request on pool that dne

If the request is sent when the pool didn't even exist, say so.  This
would have made #3734 a bit easier to track down.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: drop newlines from event descriptions
Sage Weil [Fri, 4 Jan 2013 21:00:56 +0000 (13:00 -0800)]
osd: drop newlines from event descriptions

These produce extra newlines in the log.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agotest: add cram integration test for formatted output
Josh Durgin [Fri, 4 Jan 2013 20:15:09 +0000 (12:15 -0800)]
test: add cram integration test for formatted output

This can be used with the new teuthology cram task.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agorbd: always output result for formatted output
Josh Durgin [Thu, 3 Jan 2013 20:05:52 +0000 (12:05 -0800)]
rbd: always output result for formatted output

When there's nothing, return an empty array.
This way scripts don't have to special case this.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agorbd: regenerate man page and cli test
Josh Durgin [Thu, 3 Jan 2013 00:15:18 +0000 (16:15 -0800)]
rbd: regenerate man page and cli test

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoXMLFormatter: fix pretty printing
Josh Durgin [Thu, 27 Dec 2012 18:50:53 +0000 (10:50 -0800)]
XMLFormatter: fix pretty printing

It used the wrong indentation level and did not add a newline after
closing a section. dump_stream() did not indent at all.

Simplify a little and remove the parameter from print_spaces(). If we just
remove the element from m_sections before calling print_spaces() in
close_section(), the number of elements in m_sections is always the
indentation level.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agorbd: add --pretty-format option
Josh Durgin [Wed, 2 Jan 2013 18:52:27 +0000 (10:52 -0800)]
rbd: add --pretty-format option

This is the same option the rados and radosgw-admin tool use for more
human-readable json/xml.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agorbd: move Formatter construction to main
Josh Durgin [Thu, 27 Dec 2012 22:43:32 +0000 (14:43 -0800)]
rbd: move Formatter construction to main

Each method that uses a formatter is doing the same thing.
Simplify by constructing and handling errors only once.
Also use a scoped_ptr for easy clean up.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agorbd: fix long lines
Josh Durgin [Fri, 28 Dec 2012 02:02:39 +0000 (18:02 -0800)]
rbd: fix long lines

Several >80 characters have crept in recently.
The older ones generally don't have very useful history,
so I'm not worried about obscuring the history any more.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agorbd: support plain/json/xml output formatting
Stratos Psomadakis [Thu, 27 Dec 2012 00:14:39 +0000 (16:14 -0800)]
rbd: support plain/json/xml output formatting

This patch renames the --format option to --image-format, for
specifying the RBD image format, and uses --format to specify the
output formatting (to be consistent with the other ceph tools). To
avoid breaking backwards compatibility with existing scripts, rbd will
still accept --format [1|2] for the image format, but will print a
warning message, noting its use is deprecated.

The rbd subcommands that support the new --format option are : ls, info, snap
list, children, showmapped, lock list.

Signed-off-by: Stratos Psomadakis <psomas@grnet.gr>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agomon: note scrub errors in health summary
Sage Weil [Tue, 15 Jan 2013 02:23:52 +0000 (18:23 -0800)]
mon: note scrub errors in health summary

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: fix rescrub after repair
Sage Weil [Tue, 15 Jan 2013 02:31:06 +0000 (18:31 -0800)]
osd: fix rescrub after repair

We were rescrubbing if INCONSISTENT is set, but that is now persistent.
Add a new scrub_after_recovery flag that is reset on each peering interval
and set that when repair encounters errors.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'wip-rpm-update'
Gary Lowell [Wed, 16 Jan 2013 19:17:11 +0000 (11:17 -0800)]
Merge branch 'wip-rpm-update'

Merges work around for odd AS_IF behaviour in configure.ac.

12 years agoconfigure.ac: fix problem with --enable-cephfs-java
Danny Al-Gaaf [Wed, 16 Jan 2013 12:40:17 +0000 (13:40 +0100)]
configure.ac: fix problem with --enable-cephfs-java

The AS_IF used to cover java related checks via --enable-cephfs-java
didn't work correctly. Use a plain 'if/fi' instead to make sure this
section is only executed if --enable-cephfs-java is used.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds: fix usage typo for ceph-mds
Sam Lang [Wed, 16 Jan 2013 15:43:58 +0000 (09:43 -0600)]
mds: fix usage typo for ceph-mds

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agomds: use #defines for bits per cap
Sage Weil [Wed, 16 Jan 2013 06:43:42 +0000 (22:43 -0800)]
mds: use #defines for bits per cap

Hard-coding 0xff in SimpleLock.h is too far away from where we add new cap
bits.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'wip-rpm-update'
Gary Lowell [Tue, 15 Jan 2013 20:39:03 +0000 (12:39 -0800)]
Merge branch 'wip-rpm-update'

Clean-up the handling of ceph java bindings in the rpm specfile and
configure.ac.

12 years agoosd: note must_scrub* flags in PG operator<<
Sage Weil [Tue, 15 Jan 2013 02:22:02 +0000 (18:22 -0800)]
osd: note must_scrub* flags in PG operator<<

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: based INCONSISTENT pg state on persistent scrub errors
Sage Weil [Tue, 15 Jan 2013 02:21:46 +0000 (18:21 -0800)]
osd: based INCONSISTENT pg state on persistent scrub errors

This makes the state persistent across PG peering and OSD restarts.

This has the side-effect that, on recovery, we rescrub any PGs marked
inconsistent.  This is new behavior!

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: fix scrub scheduling for 0.0
Sage Weil [Tue, 15 Jan 2013 02:20:29 +0000 (18:20 -0800)]
osd: fix scrub scheduling for 0.0

The initial value for pair<utime_t,pg_t> can match pg 0.0, preventing it
from being manually scrubbed.  Fix!

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: note last_clean_scrub_stamp, last_scrub_errors
Sage Weil [Mon, 14 Jan 2013 07:03:01 +0000 (23:03 -0800)]
osd: note last_clean_scrub_stamp, last_scrub_errors

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: add num_scrub_errors to object_stat_t
Sage Weil [Mon, 14 Jan 2013 06:59:39 +0000 (22:59 -0800)]
osd: add num_scrub_errors to object_stat_t

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: add last_clean_scrub_stamp to pg_stat_t, pg_history_t
Sage Weil [Mon, 14 Jan 2013 06:43:35 +0000 (22:43 -0800)]
osd: add last_clean_scrub_stamp to pg_stat_t, pg_history_t

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: fix object_stat_sum_t dump signedness
Sage Weil [Mon, 14 Jan 2013 06:56:14 +0000 (22:56 -0800)]
osd: fix object_stat_sum_t dump signedness

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: change scrub min/max thresholds
Sage Weil [Mon, 14 Jan 2013 06:04:58 +0000 (22:04 -0800)]
osd: change scrub min/max thresholds

The previous 'osd scrub min interval' was mostly meaningless and useless.
Meanwhile, the 'osd scrub max interval' would only trigger a scrub if the
load was sufficiently low; if it was high, the PG might *never* scrub.

Instead, make the 'min' what the max used to be.  If it has been more than
this many seconds, and the load is low, scrub.  And add an additional
condition that if it has been more than the max threshold, scrub the PG
no matter what--regardless of the load.

Note that this does not change the default scrub interval for less-loaded
clusters, but it *does* change the meaning of existing config options.

Fixes: #3786
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd/PG: remove useless osd_scrub_min_interval check
Sage Weil [Mon, 14 Jan 2013 04:27:59 +0000 (20:27 -0800)]
osd/PG: remove useless osd_scrub_min_interval check

This was already a no-op: we don't call PG::scrub_sched() unless it has
been osd_scrub_max_interval seconds since we last scrubbed.  Unless we
explicitly requested in, in which case we don't want this check anyway.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: move scrub schedule random backoff to seperate helper
Sage Weil [Mon, 14 Jan 2013 04:25:39 +0000 (20:25 -0800)]
osd: move scrub schedule random backoff to seperate helper

Separate this from the load check, which will soon vary dependon on the
PG.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd/PG: trigger scrub via scrub schedule, must_ flags
Sage Weil [Sat, 12 Jan 2013 17:18:38 +0000 (09:18 -0800)]
osd/PG: trigger scrub via scrub schedule, must_ flags

When a scrub is requested, flag it and move it to the front of the
scrub schedule instead of immediately queuing it.  This avoids
bypassing the scrub reservation framework, which can lead to a heavier
impact on performance.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd/PG: introduce flags to indicate explicitly requested scrubs
Sage Weil [Sat, 12 Jan 2013 17:15:16 +0000 (09:15 -0800)]
osd/PG: introduce flags to indicate explicitly requested scrubs

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd/PG: move scrub schedule registration into a helper
Sage Weil [Sat, 12 Jan 2013 17:14:01 +0000 (09:14 -0800)]
osd/PG: move scrub schedule registration into a helper

Simplifies callers, and will let us easily modify the decision of when
to schedule the PG for scrub.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoac_prog_javah.m4: Use AC_CANONICAL_TARGET instead of AC_CANONICAL_SYSTEM.
Gary Lowell [Mon, 14 Jan 2013 22:11:54 +0000 (14:11 -0800)]
ac_prog_javah.m4:  Use AC_CANONICAL_TARGET instead of AC_CANONICAL_SYSTEM.

12 years agoMerge branch 'wip-java-sync'
Noah Watkins [Mon, 14 Jan 2013 21:23:29 +0000 (13:23 -0800)]
Merge branch 'wip-java-sync'

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Reviewed-by: Joe Buck <jbbuck@gmail.com>
12 years agojava: add fine grained synchronization
Noah Watkins [Fri, 11 Jan 2013 23:56:10 +0000 (15:56 -0800)]
java: add fine grained synchronization

Adds r/w lock to protect against some races.

1. Mutual exclusion for mount/unmount prevents races between the two in
libcephfs, which isn't safe (access to ceph_mount_info state).

2. An extremely narrow race between unmount and ceph_* calls in
libcephfs. ThreadA calls ceph_xxx, is_mounted test passes, then ThreadB
calls unmount and destroys the client. ThreadA resumes with a bad client
pointer.

3. Race between unmount and ceph_* calls in JNI. In JNI we hold the
CephContext reference across ceph_* calls. If the ceph mount were to be
released while a thread was returning from a ceph_* call then an attempt
to write to the log (e.g. the return value) would reference bad context.
Since ceph_release is only called by finalize() then no thread can be in
JNI.  So this is actually safe.

Using r/w here provides trade-off between allowing concurrency into
libcephfs, and not having to constantly update the Java bindings. The
only assumption is that unmount/mount race with the rest of the
interface.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: remove all intrinsic locks
Noah Watkins [Fri, 11 Jan 2013 23:42:50 +0000 (15:42 -0800)]
java: remove all intrinsic locks

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: remove unnecessary synchronization
Noah Watkins [Fri, 11 Jan 2013 23:23:57 +0000 (15:23 -0800)]
java: remove unnecessary synchronization

The body of ceph_unmount is a call to a synchronized method.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: remove create/release synchronization
Noah Watkins [Fri, 11 Jan 2013 23:19:13 +0000 (15:19 -0800)]
java: remove create/release synchronization

The constructor calls create, and finalize() calls release. Since each
of these can only happen once (enforced by Java), there is no fear of a
race condition.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agoRevert "osdmap: spread replicas across hosts with default crush map"
Sage Weil [Mon, 14 Jan 2013 15:37:59 +0000 (07:37 -0800)]
Revert "osdmap: spread replicas across hosts with default crush map"

This reverts commit 7ea5d84fa3d0ed3db61eea7eb9fa8dbee53244b6.

This breaks teuthology and vstart both in its current state.

12 years agomon: OSDMonitor: don't output to stdout in plain text if json is specified
Joao Eduardo Luis [Thu, 10 Jan 2013 18:54:12 +0000 (18:54 +0000)]
mon: OSDMonitor: don't output to stdout in plain text if json is specified

Fixes: #3748
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoosdmap: spread replicas across hosts with default crush map
Sage Weil [Sat, 12 Jan 2013 01:23:22 +0000 (17:23 -0800)]
osdmap: spread replicas across hosts with default crush map

This is more often the case than not, and we don't have a good way to
magically know what size of cluster the user will be creating.  Better to
err on the side of doing the right thing for more people.

Fixes: #3785
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomon: OSDMonitor: only share osdmap with up OSDs
Joao Eduardo Luis [Sat, 12 Jan 2013 01:06:36 +0000 (01:06 +0000)]
mon: OSDMonitor: only share osdmap with up OSDs

Try to share the map with a randomly picked OSD; if the picked monitor is
not 'up', then try to find the nearest 'up' OSD in the map by doing a
backward and a forward linear search on the map -- this would be O(n) in
the worst case scenario, as we only do a single iteration starting on the
picked position, incrementing and decrementing two different iterators
until we find an appropriate OSD or we exhaust the map.

Fixes: #3629
Backport: bobtail

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agorbd: Fix tabs
Dan Mick [Sat, 12 Jan 2013 00:18:10 +0000 (16:18 -0800)]
rbd: Fix tabs

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agodoc: Updates to CRUSH paper.
John Wilkins [Fri, 11 Jan 2013 23:56:02 +0000 (15:56 -0800)]
doc: Updates to CRUSH paper.

fixes: 3329, 3707, 3711, 3389

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agorbd: make 'add' modprobe rbd so it has a chance of success
Dan Mick [Fri, 11 Jan 2013 02:46:13 +0000 (18:46 -0800)]
rbd: make 'add' modprobe rbd so it has a chance of success

Check for existence of /sys/bus/rbd first to avoid unnecessary calls

Fixes: #3784
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>