Josh Durgin [Thu, 27 Dec 2012 18:50:53 +0000 (10:50 -0800)]
XMLFormatter: fix pretty printing
It used the wrong indentation level and did not add a newline after
closing a section. dump_stream() did not indent at all.
Simplify a little and remove the parameter from print_spaces(). If we just
remove the element from m_sections before calling print_spaces() in
close_section(), the number of elements in m_sections is always the
indentation level.
Josh Durgin [Thu, 27 Dec 2012 22:43:32 +0000 (14:43 -0800)]
rbd: move Formatter construction to main
Each method that uses a formatter is doing the same thing.
Simplify by constructing and handling errors only once.
Also use a scoped_ptr for easy clean up.
Josh Durgin [Fri, 28 Dec 2012 02:02:39 +0000 (18:02 -0800)]
rbd: fix long lines
Several >80 characters have crept in recently.
The older ones generally don't have very useful history,
so I'm not worried about obscuring the history any more.
This patch renames the --format option to --image-format, for
specifying the RBD image format, and uses --format to specify the
output formatting (to be consistent with the other ceph tools). To
avoid breaking backwards compatibility with existing scripts, rbd will
still accept --format [1|2] for the image format, but will print a
warning message, noting its use is deprecated.
The rbd subcommands that support the new --format option are : ls, info, snap
list, children, showmapped, lock list.
Sage Weil [Tue, 15 Jan 2013 02:31:06 +0000 (18:31 -0800)]
osd: fix rescrub after repair
We were rescrubbing if INCONSISTENT is set, but that is now persistent.
Add a new scrub_after_recovery flag that is reset on each peering interval
and set that when repair encounters errors.
Danny Al-Gaaf [Wed, 16 Jan 2013 12:40:17 +0000 (13:40 +0100)]
configure.ac: fix problem with --enable-cephfs-java
The AS_IF used to cover java related checks via --enable-cephfs-java
didn't work correctly. Use a plain 'if/fi' instead to make sure this
section is only executed if --enable-cephfs-java is used.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Sage Weil [Mon, 14 Jan 2013 06:04:58 +0000 (22:04 -0800)]
osd: change scrub min/max thresholds
The previous 'osd scrub min interval' was mostly meaningless and useless.
Meanwhile, the 'osd scrub max interval' would only trigger a scrub if the
load was sufficiently low; if it was high, the PG might *never* scrub.
Instead, make the 'min' what the max used to be. If it has been more than
this many seconds, and the load is low, scrub. And add an additional
condition that if it has been more than the max threshold, scrub the PG
no matter what--regardless of the load.
Note that this does not change the default scrub interval for less-loaded
clusters, but it *does* change the meaning of existing config options.
Fixes: #3786 Signed-off-by: Sage Weil <sage@inktank.com>
This was already a no-op: we don't call PG::scrub_sched() unless it has
been osd_scrub_max_interval seconds since we last scrubbed. Unless we
explicitly requested in, in which case we don't want this check anyway.
Sage Weil [Sat, 12 Jan 2013 17:18:38 +0000 (09:18 -0800)]
osd/PG: trigger scrub via scrub schedule, must_ flags
When a scrub is requested, flag it and move it to the front of the
scrub schedule instead of immediately queuing it. This avoids
bypassing the scrub reservation framework, which can lead to a heavier
impact on performance.
Noah Watkins [Fri, 11 Jan 2013 23:56:10 +0000 (15:56 -0800)]
java: add fine grained synchronization
Adds r/w lock to protect against some races.
1. Mutual exclusion for mount/unmount prevents races between the two in
libcephfs, which isn't safe (access to ceph_mount_info state).
2. An extremely narrow race between unmount and ceph_* calls in
libcephfs. ThreadA calls ceph_xxx, is_mounted test passes, then ThreadB
calls unmount and destroys the client. ThreadA resumes with a bad client
pointer.
3. Race between unmount and ceph_* calls in JNI. In JNI we hold the
CephContext reference across ceph_* calls. If the ceph mount were to be
released while a thread was returning from a ceph_* call then an attempt
to write to the log (e.g. the return value) would reference bad context.
Since ceph_release is only called by finalize() then no thread can be in
JNI. So this is actually safe.
Using r/w here provides trade-off between allowing concurrency into
libcephfs, and not having to constantly update the Java bindings. The
only assumption is that unmount/mount race with the rest of the
interface.
Noah Watkins [Fri, 11 Jan 2013 23:19:13 +0000 (15:19 -0800)]
java: remove create/release synchronization
The constructor calls create, and finalize() calls release. Since each
of these can only happen once (enforced by Java), there is no fear of a
race condition.
Sage Weil [Sat, 12 Jan 2013 01:23:22 +0000 (17:23 -0800)]
osdmap: spread replicas across hosts with default crush map
This is more often the case than not, and we don't have a good way to
magically know what size of cluster the user will be creating. Better to
err on the side of doing the right thing for more people.
Fixes: #3785 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Try to share the map with a randomly picked OSD; if the picked monitor is
not 'up', then try to find the nearest 'up' OSD in the map by doing a
backward and a forward linear search on the map -- this would be O(n) in
the worst case scenario, as we only do a single iteration starting on the
picked position, incrementing and decrementing two different iterators
until we find an appropriate OSD or we exhaust the map.
Fixes: #3629
Backport: bobtail
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Dan Mick [Fri, 11 Jan 2013 02:44:44 +0000 (18:44 -0800)]
rbd: call udevadm settle on map/unmap
When we map/unmap devices, udev gets called to manage device nodes;
this will allow the command to wait for those manipulations to complete,
particularly for test runs, so that the device tree is stable by the
time the command exits.
--no-settle is also provided to avoid this behavior if desired (say,
for a series of 'map' commands, perhaps the user wants to wait for
settling only on the last of the series).
Fixes: #3635 Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
Samuel Just [Thu, 10 Jan 2013 19:06:02 +0000 (11:06 -0800)]
config_opts.h: default osd_recovery_delay_start to 0
This setting was intended to prevent recovery from overwhelming peering traffic
by delaying the recovery_wq until osd_recovery_delay_start seconds after pgs
stop being added to it. This should be less necessary now that recovery
messages are sent with strictly lower priority then peering messages.
Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Gregory Farnum <greg@inktank.com>
Samuel Just [Thu, 10 Jan 2013 03:17:23 +0000 (19:17 -0800)]
ReplicatedPG: fix snapdir trimming
The previous logic was both complicated and not correct. Consequently,
we have been tending to drop snapcollection links in some cases. This
has resulted in clones incorrectly not being trimmed. This patch
replaces the logic with something less efficient but hopefully a bit
clearer.
Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Danny Al-Gaaf [Wed, 9 Jan 2013 21:35:16 +0000 (22:35 +0100)]
configure.ac: check for org.junit.rules.ExternalResource
Check for org.junit.rules.ExternalResource if build with
--enable-cephfs-java and --with-debug. Checking for junit4
isn't enough since junit4 has this class not before 4.7.
Added some m4 files to get some JAVA related macros. Changed
autogen.sh to work with local m4 files/macros.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Fri, 4 Jan 2013 17:51:31 +0000 (18:51 +0100)]
configure.ac: change junit4 handling
Change handling of --with-debug and junit4. Add a new conditional HAVE_JUNIT4
to be able to build ceph-test package also if junit4 isn't available. In
this case simply don't build libcephfs-test.jar, but the rest of the tools.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Dan Mick [Tue, 8 Jan 2013 19:21:22 +0000 (11:21 -0800)]
librbd: Allow get_lock_info to fail
If the lock class isn't present, EOPNOTSUPP is returned for lock calls
on newer OSDs, but sadly EIO on older; we need to treat both as
acceptable failures for RBD images. rados lock list will still fail.
Fixes #3744.
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Noah Watkins [Mon, 7 Jan 2013 23:04:33 +0000 (15:04 -0800)]
libcephfs: clarify interface return value
Document that ceph_get_stripe_unit_granularity may return an error code
(e.g. -ENOTCONN). The interface requires a mount, but currently we
return a compile-time constant. Other error codes may be possible in the
future.