Sage Weil [Thu, 14 Feb 2013 23:39:43 +0000 (15:39 -0800)]
osd/OSDCap: tweak unquoted_word parsing in osd caps
Newer versions of spirit (1.49.0-3.1ubuntu1.1 in quantal, in particular)
dislike the construct with alnum and replace the - and _ with '\0' in the
resulting string.
Samuel Just [Thu, 14 Feb 2013 22:03:56 +0000 (14:03 -0800)]
OSD: always activate_map in advance_pgs, only send messages if up
We should always handle_activate_map() after handle_advance_map() in
order to kick the pg into a valid peering state for processing requests
prior to dropping the lock.
Additionally, we would prefer to avoid sending irrelevant messages
during boot, so only send if we are up according to the current service
osdmap.
Fixes: #4064
Backport: bobtail Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Sat, 9 Feb 2013 05:36:13 +0000 (21:36 -0800)]
java: make CephMountTest use user.* xattr names
Changes to the xattr code in Ceph require
a few tweaks to existing test cases.
Specifically, there is now a ceph.file.layout
xattr by default and user defined xattrs
are prepended with "user."
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Joe Buck <jbbuck@gmail.com> Reviewed-by: Noah Watkins <noahwatkins@gmail.com>
Sam Lang [Tue, 12 Feb 2013 22:33:58 +0000 (16:33 -0600)]
mds: Setting pool on a file requires latest osdmap
If we create a file, then create a pool, then try to
set the pool on the file with the vxattr, no mds operation
triggers a request for the latest osdmap, so we get back
EINVAL. So if we fail to find the pool initially, fall back
to requesting a new osdmap and trying the vxattr again. We
don't wait for a new map if we don't get anything newer than
what we already have.
Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 12 Feb 2013 22:11:09 +0000 (14:11 -0800)]
osd: only share maps on hb connection of OSD_HBMSGS feature is set
Back in 1bc419a7affb056540ba8f9b332b6ff9380b37af we started sharing maps
with dead osds via the heartbeat connection, but old code will crash on an
unexpected message. Only do this if the OSD_HBMSGS feature is present.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Mon, 11 Feb 2013 14:23:54 +0000 (06:23 -0800)]
osd: update snap collections for sub_op_modify log records conditionaly
The only remaining caller is sub_op_modify(). If we do have a non-empty
op transaction, we want to do this update, regardless of what we think
last_backfill is (our notion may be not completely in sync with the
primary). In particular, our last_backfill may be the same object but
a different snapid, but the primary disagrees and is pushing an op
transaction through.
Instead, update the collections if we have a non-empty transaction.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Mon, 11 Feb 2013 00:59:48 +0000 (16:59 -0800)]
osd: unconditionally encode snaps buffer
Previously we would only encode the updated snaps vector for CLONE ops.
This doesn't work for MODIFY ops generated by the snap trimmer, which
may also adjust the clone collections. It is also possible that other
operations may need to populate this field in the future (e.g.,
LOST_REVERT may, although it currently does not).
Fixes: #4071, and possibly #4051. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Yehuda Sadeh [Fri, 8 Feb 2013 21:14:49 +0000 (13:14 -0800)]
rgw: change json formatting for swift list container
Fixes: #4048
There is some difference in the way swift formats the
xml output and the json output for list container. In
xml the entity is named 'name' and in json it is named
'subdir'.
Sage Weil [Sat, 9 Feb 2013 08:05:33 +0000 (00:05 -0800)]
osd: fix load_pgs collection handling
On a _TEMP pg, is_pg() would succeed, which meant we weren't actually
hitting the cleanup checks. Instead, restructure this loop as positive
checks and handle each type of collection we understand.
This fixes _TEMP cleanup.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Sat, 9 Feb 2013 08:04:29 +0000 (00:04 -0800)]
osd: fix load_pgs handling of pg dirs without a head
If there is a pgid that passes coll_t::is_pg() but there is no head, we
will populate the pgs map but then fail later when we try to do
read_state. This is a side-effect of 55f8579.
Take explicit note of _head collections we see, and then warn when we
find stray snap collections.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Fri, 8 Feb 2013 06:06:14 +0000 (22:06 -0800)]
mon: handle -EAGAIN in completion contexts
We can get ECANCELED, EAGAIN, or success out of the completion contexts,
but in the EAGAIN case (meaning there was an election) we were sending
a success to the client. This resulted in client hangs and all-around
confusion when the monitor cluster was thrashing.
Backport: bobtail Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Joao Luis <joao.luis@inktank.com>
Samuel Just [Thu, 7 Feb 2013 19:53:28 +0000 (11:53 -0800)]
ReplicatedPG: check store for temp collection in have_temp_coll
We may not have "created" the temp collection since OSD restart
before removing the PG. have_temp_coll must also look at the
OSD store. Currently, the only user is pg removal, so the
extra work is acceptable.
Backport: bobtail Signed-off-by: Samuel Just <sam.just@inktank.com>
Yehuda Sadeh [Thu, 7 Feb 2013 00:43:48 +0000 (16:43 -0800)]
rgw: bucket recreation should not clobber bucket info
Fixes: #4039
User's list of buckets is getting modified even if bucket already
exists. This fix removes the newly created directory object, and
makes sure that user info's data points at the correct bucket.
Sage Weil [Thu, 7 Feb 2013 18:21:49 +0000 (10:21 -0800)]
osd: flush peering queue (consume maps) prior to boot
If the osd itself is behind on many maps during boot, it will get more and
(as part of that) flush the peering wq to ensure the pgs consume them.
However, it is possible for OSD to have latest/recnet maps, but pgs to be
behind, and to jump directly to boot and join. The OSD is then laggy and
unresponsive because the peering wq is way behind.
To avoid this, call consume_map() (kick the peering wq) at the end of
init and flush it to ensure we are *internally* all caught up before we
consider joining the cluster.
I'm pretty sure this is the root cause of #3905 and possibly #3995.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Yehuda Sadeh [Thu, 13 Dec 2012 23:52:34 +0000 (15:52 -0800)]
rgw: key indexes are only link to user info
Instead of keeping multiple copies of the user info,
we just treat the key index as a pointer to the actual
user info (indexed by uid). This helps with two issues:
first, it scales better as we don't need to update the
entire set of keys whenever we make any change. Second,
it helps with the uid index atomicity.
One point to keep in mind is that both the links and the
info can be cached, so effect on performance is minimal.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: caleb miles <caleb.miles@inktank.com>
Dan Mick [Thu, 31 Jan 2013 01:33:09 +0000 (17:33 -0800)]
Validate format strings for CLS_ERR/CLS_LOG
cls_log needed __attribute__((format(printf..)) to allow the compiler
to crosscheck format strings and arguments. After adding that, there
needed to be a bunch of fixups for %ll, and a few changes for missing
arguments, etc. uncovered by the checking.
Fixes: #3970 Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Alex Elder [Thu, 31 Jan 2013 12:47:59 +0000 (06:47 -0600)]
qa: update the rbd/concurrent.sh workunit
A few changes, now that a few rbd problems have been fixed.
First, the more substantive changes:
- Generate a source file, and compare what's read back from rbd
devices with the content of that file.
- Write to the rbd device such that the written data spans
an (assumed 4 MB) rbd object boundary, as well as starting
and ending on non-page-aligned offsets.
- Perform multiple reads on rbd devices: entirely within a range
before any written data; beginning before but ending within
written data; the exact written data (and validating what's
read); beginning within written data but ending after it;
reading after written data but within a written rbd object;
and reading from an unwritten rbd object.
- Have the sleep between iterations provide a non-integer value
to avoid zero (or quantized) delays.
Also, some a little less substantive (but possibly informative):
- Don't run with "set -x". It produces a ton of noise that is
not useful for this test. This is an exerciser, looking
really for system crashes during concurrent activity, and
knowing which commands were (concurrently) active isn't going
to help much in diagnosis.
- Create two more directories, used to track the degree of
concurrency (more or less) and the highest rbd id consumed.
Files whose names are numbers are touched in each, and the
highest at the end is the highest during the run. This gets
around issues passing environment info from sub-shells to the
top-level shell. As a bonus, it offers a better chance of
avoiding problems due to concurrent update.
- NAMESDIR is renamed NAMES_DIR, and it (and the others) is
set up in the setup() function.
- Increase the concurrency and iteration counts.
- Move the default definitions before the ceph secrets stuff
Danny Al-Gaaf [Wed, 30 Jan 2013 17:52:24 +0000 (18:52 +0100)]
PGMap: fix -Wsign-compare warning
Fix -Wsign-compare compiler warning:
mon/PGMap.cc: In member function 'void PGMap::apply_incremental
(CephContext*, const PGMap::Incremental&)':
mon/PGMap.cc:247:30: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>