Sage Weil [Mon, 21 Jan 2013 16:45:10 +0000 (08:45 -0800)]
config: don't make noise about 'internal_safe_to_start_threads'
This is set on start, and subsequently gets into the changed set.
Once any other config value is injected, it is the first thing reported
by the logs, but is confusing and useless to the user. Hide it.
Sage Weil [Mon, 21 Jan 2013 00:11:10 +0000 (16:11 -0800)]
osd: calculate initial PG mapping from PG's osdmap
The initial values of up/acting need to be based on the PG's osdmap, not
the OSD's latest. This can cause various confusion in
pg_interval_t::check_new_interval() when calling OSDMap methods due to the
up/acting OSDs not existing yet (for example).
Fixes: #3879 Reported-by: Jens Kristian S?gaard <jens@mermaidconsulting.dk> Tested-by: Jens Kristian S?gaard <jens@mermaidconsulting.dk> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Travis Rhoden [Sat, 19 Jan 2013 03:26:07 +0000 (22:26 -0500)]
Clarify journal size based on filestore max sync
The docs had the recommended journal size based on the option
"filestore min sync interval" when it should have been
"filestore max sync interval".
While in there, fix a couple of typos -- multiple when it should
be multiply, and a missing word. Change "Should at least twice"
to "Should be at least twice..."
Gary Lowell [Fri, 18 Jan 2013 06:43:07 +0000 (22:43 -0800)]
build: Add perl installation dependency to rpm and debian packages.
There was already a dependency on python in the debian control file,
a similar dependency was added to the rpm spec file. perl is needed
for the logrotate script, so a dependecy was on perl wass added to
both. Bug 3768.
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Josh Durgin [Wed, 26 Dec 2012 22:24:22 +0000 (14:24 -0800)]
rbd: fix bench-write infinite loop
I/O was continously submitted as long as there were few enough ops in
flight. If the number of 'threads' was high, or caching was turned on,
there would never be that many ops in flight, so the loop would continue
indefinitely. Instead, submit at most io_threads ops per offset.
Fixes: #3413 Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage.weil@inktank.com>
Dan Mick [Thu, 17 Jan 2013 19:32:03 +0000 (11:32 -0800)]
crushtool: warn usefully about missing output spec
When running with --test, you must request output to CSV files or
specific types of output to --show-X; make the error message
clarify what the tool wants.
Fixes: #3827 Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Dan Mick [Thu, 17 Jan 2013 19:18:46 +0000 (11:18 -0800)]
crushtool: consolidate_whitespace() should eat everything except \n
CRUSH map source with \r (like a DOS text file) failed to compile
with the usual nonuseful message; turns out that eating \r along with
' ' and '\t' etc. solves that problem.
Fixes: #3834 Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 28 Dec 2012 00:18:19 +0000 (16:18 -0800)]
mon: enforce 'cephx require signatures' during negotiation
If we are negotiating which auth protocol to use, and the client does not
support the MSG_AUTH feature, and the server has 'cephx require signatures'
set to true, then remove cephx from the list of allowed protocols.
Also print something in the mon log so that we know wtf is going on.
Sage Weil [Fri, 28 Dec 2012 00:03:20 +0000 (16:03 -0800)]
msg/Pipe: require MSG_AUTH feature on server if option is enabled
If we
negotiate cephx AND
are a server AND
cephx require signatures = true
then require the MSG_AUTH feature bit. Put this in the Policy struct for
this connection so that the existing feature bit checks and error reporting
are used, and the peer knows what feature it is missing.
Sage Weil [Thu, 17 Jan 2013 23:01:35 +0000 (15:01 -0800)]
osdmap: make replica separate in default crush map configurable
Add 'osd crush chooseleaf type' option to control what the default
CRUSH rule separates replicas across. Default to 1 (host), and set it
to 0 in vstart.sh.
Fixes: #3785 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
mon: Monitor: drop messages from old timecheck epochs
We were asserting when the message's timecheck epoch (which is mapped to
the election epoch) was older than the current epoch. However, if a
monitor is lagged just enough to not even notice an election happened,
then it might eventually answer to old timechecks, which would make
the leader assert. Instead, we just drop the message, while warning we
did so.
Fixes: #3835 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Kyle Bader [Thu, 17 Jan 2013 02:04:22 +0000 (18:04 -0800)]
radosgw: increate nofile ulimit in upstart
The default ulimit for open file descriptors per process is 1024,
far too few for radosgw if you have lots of OSDs and configure
radosgw for decent number of threads.
Sage Weil [Wed, 16 Jan 2013 22:09:53 +0000 (14:09 -0800)]
ceph: adjust crush tunables via 'ceph osd crush tunables <profile>'
Make it easy to adjust crush tunables. Create profiles:
legacy: the legacy values
argonaut: the argonaut defaults, and what is supported.. legacy! (*(
bobtail: best that bobtail supports
optimal: the current optimal values
default: the current default values
* In actuality, argonaut supports some of the tunables, but it doesn't
say so via the feature bits.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Yehuda Sadeh [Wed, 16 Jan 2013 23:01:47 +0000 (15:01 -0800)]
rgw: copy object should not copy source acls
Fixes: #3802
Backport: argonaut, bobtail
When using the S3 api and x-amz-metadata-directive is
set to COPY we used to copy complete metadata of source
object. However, this shouldn't include the source ACLs.
Samuel Just [Thu, 10 Jan 2013 00:41:40 +0000 (16:41 -0800)]
ReplicatedPG: compare nlinks to snapcolls
nlinks gives us the number of hardlinks to the object.
nlinks should be 1 + snapcolls.size(). This will allow
us to detect links which remain in an erroneous snap
collection.
Josh Durgin [Thu, 27 Dec 2012 18:50:53 +0000 (10:50 -0800)]
XMLFormatter: fix pretty printing
It used the wrong indentation level and did not add a newline after
closing a section. dump_stream() did not indent at all.
Simplify a little and remove the parameter from print_spaces(). If we just
remove the element from m_sections before calling print_spaces() in
close_section(), the number of elements in m_sections is always the
indentation level.
Josh Durgin [Thu, 27 Dec 2012 22:43:32 +0000 (14:43 -0800)]
rbd: move Formatter construction to main
Each method that uses a formatter is doing the same thing.
Simplify by constructing and handling errors only once.
Also use a scoped_ptr for easy clean up.
Josh Durgin [Fri, 28 Dec 2012 02:02:39 +0000 (18:02 -0800)]
rbd: fix long lines
Several >80 characters have crept in recently.
The older ones generally don't have very useful history,
so I'm not worried about obscuring the history any more.
This patch renames the --format option to --image-format, for
specifying the RBD image format, and uses --format to specify the
output formatting (to be consistent with the other ceph tools). To
avoid breaking backwards compatibility with existing scripts, rbd will
still accept --format [1|2] for the image format, but will print a
warning message, noting its use is deprecated.
The rbd subcommands that support the new --format option are : ls, info, snap
list, children, showmapped, lock list.
Sage Weil [Tue, 15 Jan 2013 02:31:06 +0000 (18:31 -0800)]
osd: fix rescrub after repair
We were rescrubbing if INCONSISTENT is set, but that is now persistent.
Add a new scrub_after_recovery flag that is reset on each peering interval
and set that when repair encounters errors.
Danny Al-Gaaf [Wed, 16 Jan 2013 12:40:17 +0000 (13:40 +0100)]
configure.ac: fix problem with --enable-cephfs-java
The AS_IF used to cover java related checks via --enable-cephfs-java
didn't work correctly. Use a plain 'if/fi' instead to make sure this
section is only executed if --enable-cephfs-java is used.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Sage Weil [Mon, 14 Jan 2013 06:04:58 +0000 (22:04 -0800)]
osd: change scrub min/max thresholds
The previous 'osd scrub min interval' was mostly meaningless and useless.
Meanwhile, the 'osd scrub max interval' would only trigger a scrub if the
load was sufficiently low; if it was high, the PG might *never* scrub.
Instead, make the 'min' what the max used to be. If it has been more than
this many seconds, and the load is low, scrub. And add an additional
condition that if it has been more than the max threshold, scrub the PG
no matter what--regardless of the load.
Note that this does not change the default scrub interval for less-loaded
clusters, but it *does* change the meaning of existing config options.
Fixes: #3786 Signed-off-by: Sage Weil <sage@inktank.com>
This was already a no-op: we don't call PG::scrub_sched() unless it has
been osd_scrub_max_interval seconds since we last scrubbed. Unless we
explicitly requested in, in which case we don't want this check anyway.