Yehuda Sadeh [Wed, 14 Nov 2012 19:30:34 +0000 (11:30 -0800)]
rgw: relax date format check
Don't try to parse beyond the GMT or UTC. Some clients use
special date formatting. If we end up misparsing the date
it'll fail in the authorization, so don't need to be too
restrictive.
Sage Weil [Mon, 12 Nov 2012 15:06:25 +0000 (07:06 -0800)]
osd: defer boot until we have rotating keys
Make sure we have our rotating keys before we start booting. This
ensures we can open connections with peers *before* we add ourselves to
the osdmap. This behaviors marks instances of #3292, although it is
not clear whether it is responsible for the actual crash.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Just <sam.just@inktank.com>
Samuel Just [Mon, 5 Nov 2012 23:40:43 +0000 (15:40 -0800)]
PG: persist divergent_priors in ondisklog
Consider the following logs:
a) 10'10(5'7) foo
12'11(4'3) bar
b) 10'10(5'7) foo
13'11(4'4) baz
When the osd with a merges primary log b, bar is deleted and
added to the missing set with need=4'3 and have=0'0. If
the osd then dies after deleting bar, but before recovering
bar, PG::read_state() on start up will fail to re-add bar
to the missing set, and bar will be incorrect on that osd.
Now, (4'3, bar) will be added to the divergent_priors mapping
to be scanned during read_state along with the log.
Samuel Just [Mon, 5 Nov 2012 19:33:13 +0000 (11:33 -0800)]
PG::merge_old_entry: fix case for divergent prior_version
Previously, we asserted that a log entry with a divergent
prior_version must be a clone. Consider the following
case:
6'11(6'2) m foo
7'12(6'3) m bar
7'13(7'12) m bar
If this is merged with:
6'11(6'2) m foo
8'12(6'4) m baz
we will hit the assert.
Merging a divergent entry with prior_version after current
tail, but not in the log implies that prior_version was a
divergent entry which we have already merged. The missing
set and filestore collection must therefore have already
been adjusted.
Samuel Just [Mon, 1 Oct 2012 23:11:40 +0000 (16:11 -0700)]
OSD: use PrioritizedQueue for OpWQ
The OpWQ PriorityQueue replaces OSD::op_queue, PG::op_queue,
and PG::qlock. The syncronization is now done as part of the
usual WorkQueue syncronization pattern.
Sage Weil [Mon, 12 Nov 2012 23:40:08 +0000 (15:40 -0800)]
osdc/ObjectCacher: only return ENOENT if ObjectSet is flagged
The fs client can't handle ENOENT from the cache, but librbd wants it.
Also, the fs client will send down multiple ObjectExtents per io, but that
is incompatible with the ENOENT behavior.
Indicate which behavior we want via the ObjectSet, and update librbd to
explicitly ask for it. This fixes the fs client, which is currently
broken (it returns ENOENT on read).
Josh Durgin [Mon, 12 Nov 2012 21:59:36 +0000 (13:59 -0800)]
librbd: fix create existence checking
cda9e516b8bb09b8846814cc8d4ee2879a53b2d5 made us return 0 when the
image already existed, causing copy to erroneosly ignore an existing
image. Separate the case where we know the image exists from being
unable to tell whether it exists because of e.g. an authentication
problem.
Sage Weil [Mon, 12 Nov 2012 16:56:45 +0000 (08:56 -0800)]
debug: adjust default debug levels
Trim out most noise, keep things that are interesting.
Notably, we are logging each message sent and received, and we are logging
the filestore operations when they get queued. Those may still benefit
from being turned off in high IOPS environments.
Sage Weil [Sat, 10 Nov 2012 10:35:04 +0000 (02:35 -0800)]
client: simplify/fix symlink loop check
Checking that we visit a symlink isn't correct; for example, the below is
valid, and we visit /b twice.
/a/b -> c
/a/c/d -> /a/b
In order to do this "correctly", I think we would need to track the pairs
of paths and symlinks we are resolving. But, reading the man pages,
ELOOP is actually just defined as traversing more than MAXSYMLINKS syms.
(It appears to be 20 on my machine.)
Josh Durgin [Thu, 8 Nov 2012 02:19:07 +0000 (18:19 -0800)]
OSDMonitor: remove max_devices and max_osd interdependency
Higher max_osd than max_devices doesn't hurt anything (and is the
normal way to add more osds). Higher max_devices than max_osds are
filtered out of crush results since e541c0f8d871172ec61962372efca943308e5fe,
so they don't matter either.
Sage Weil [Fri, 9 Nov 2012 13:28:12 +0000 (05:28 -0800)]
mds: re-try_set_loner() after doing evals in eval(CInode*, int mask)
Consider a case where current loner is A and wanted loner is B.
At the top of the function we try to set the loner, but that may fail
because we haven't processed the gathered caps yet for the previous
loner. In the body we do that and potentially drop the old loner, but we
do not try_set_loner() again on the desired loner.
Try after our drop. If it succeeds, loop through the eval's one more time
so that we can issue caps approriately.
This fixes a hang induced by a simple loop like:
while true ; do echo asdf >> mnt.a/foo ; tail mnt.b/foo ; done &
while true ; do ls mnt.a mnt.b ; done
Gary Lowell [Fri, 9 Nov 2012 21:28:13 +0000 (13:28 -0800)]
ceph.spec.in: Build debuginfo subpackage.
This is a partial fix for bug 3471. Enable building of debuginfo package.
Some distributions enable this automatically by installing additional rpm
macros, on others it needs to be explicity added to the spec file.
Some users have been incurring into problems adding new monitors while
following these steps. Some of these problems are due to the meaning of
'{path}' being a bit ambiguous. This patch removes said ambiguity by
replacing '{path}' with '{tmp}', supposed to be a temporary directory
containing the files necessary to add the monitor (monmap and keyring).
Fixes: #3438 #3463 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Sage Weil [Fri, 9 Nov 2012 14:48:10 +0000 (06:48 -0800)]
client: do not gratuitously drop FILE_CACHE ref in _read()
The get_caps() had a confusing out-arg called "got" that is really what
caps we *have*; it only takes a ref on the *need* cap. We should only
put that one explicitly (CEPH_CAP_FILE_RD). The _write() method already
does this properly, but _read() did not.
Fixes: #3470 Signed-off-by: Sage Weil <sage@inktank.com>