Sage Weil [Fri, 15 Apr 2011 20:53:54 +0000 (13:53 -0700)]
mds: fix export cancel during IMPORT_PREPPING
If we are in PREPPING, we need to drop the stickydirs() on the inodes, and
not the pins on the dirfrags. Do this in the helper so we can keep the
call chains simple.
Also deal with the case where we get a cancel in PREPPED state.
Sage Weil [Thu, 14 Apr 2011 01:36:33 +0000 (18:36 -0700)]
mds: cancel exports in PREPPING state on any failure
The prepping nodes may need to discover bounds from the failed node and
may hang indefinitely. Meanwhile, we won't send out mds_resolve messages
until in-progress migrations complete. Deadlock.
In certain cases the importing node can manufacture the replica. If it
doesn't realize that right off, though, it will get hung up trying to
discover from the wrong node, get referred to the failed node, and block
waiting for recovery. The replica forging is a bit suspect anyway, so
let's avoid the whole thing if we can!
Sage Weil [Fri, 15 Apr 2011 17:02:46 +0000 (10:02 -0700)]
mds: don't skip inodes in journal that may be trimmed during replay
During replay we trim non-auth inodes on EExport or EImportFinish abort.
Subtree trimming may be delayed, too.
Skip parents if the diri is in the same blob, or if it is journaled in the
current segment *and* it is in a subtree that is unambiguously auth. We can't
easily be more precise than that because the actual event we care about on
replay is EExport, but the migrator doesn't twiddle auth bits to false until
later.
Also, reset last_journaled on import.
This fixes replay bugs like
2011-04-13 18:15:18.064029 7f65588ef710 mds1.journal EImportStart.replay 10000000015 bounds []
2011-04-13 18:15:18.064034 7f65588ef710 mds1.journal EMetaBlob.replay 2 dirlumps by unknown0
2011-04-13 18:15:18.064040 7f65588ef710 mds1.journal EMetaBlob.replay dir 10000000010
2011-04-13 18:15:18.064046 7f65588ef710 mds1.journal EMetaBlob.replay missing dir ino 10000000010
mds/journal.cc: In function 'void EMetaBlob::replay(MDS*, LogSegment*)', in thread '0x7f65588ef710'
mds/journal.cc: 407: FAILED assert(0)
ceph version 0.25-683-g653580a (commit:653580ae84c471c34872f14a0308c78af71f7243)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x53) [0xa53d26]
2: (EMetaBlob::replay(MDS*, LogSegment*)+0x7eb) [0x7a737d]
Fixes: #994 Signed-off-by: Sage Weil <sage@newdream.net>
Stop accepting old-style section names of the form $type$id. Instead,
we want section names of the form $type.$id. So [osd0] will no longer
be a valid section name; instead, use [osd.0].
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Rados Gateway: get rid of RGWOp::err. We already have req_state::err and
that represents the same thing.
Standardize nomenclature for errors. 'errno' is our internal
representation of the error. 'code' is what is returned by S3.
'message' is the message at the end. Improve rgw_err.
dump_errno shouldn't modify req_state, but just dump the error.
A new function set_req_state_err sets the error based on an 'errno'.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Reading a config file into any md_config_t structure except g_conf used
to be impossible. This is because the config_option code used to
contain explicit references to g_conf. Those have been removed, so now
any md_config_t should be able to read a configuration file.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Wed, 13 Apr 2011 03:57:11 +0000 (20:57 -0700)]
mds: fix resolve
This was broken by a01fba175b646f6 when an ambiguous import was changed
from CDIR_AUTH_UNKNOWN to <whoami,whoami> and disambiguate_imports wasn't
updated accordingly. The result was inconsistent results for subtree
ownership on different nodes.
This updates disambiguate_imports to match that EImportStart::replay
change.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 12 Apr 2011 22:32:17 +0000 (15:32 -0700)]
mds: fix _freeze_dir assert for refragment case
The is_freezeable_dir() is true at freeze time but not forever after over
the lifetime of the freeze. We split later on and _freeze_dir on the new
fragments, so this assertion isn't necessarily true then.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Since the object store is ultimately based on ext3, ext4, or btrfs, and
object names ultimately get translated into file names, we need to
impose a corresponding limit on the length of ceph object names.
Otherwise, the "writeback" thread in the FileStore gets ENAMETOOLONG,
and the transaction does not succeed, even though we journalled it.
Perhaps we will extend or eliminate MAX_CEPH_OBJECT_NAME_LEN at some
point by using prehashing or some other technique. Until then, we need
to be sure to check for this.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Tue, 12 Apr 2011 21:13:56 +0000 (14:13 -0700)]
mon: simplify mds follow checks
Instead of assigning followers in the last_beacon laggy check loop, do it
at the end, the same way we let standby nodes take over.
This also fixes a bug where a non-standby node (say, up:replay) that used
to be up:standby-replay and has standby_for_rank set gets reset back to
up:standby-replay.
Fixes: #1001 Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 12 Apr 2011 18:07:54 +0000 (11:07 -0700)]
mds: fix create_mydir_hierarchy to save dir
Mark the dentries dirty so they get saved to disk (they're not journaled!).
This fixes rstat problems on startup, where populate_mydir was recreating
the entries and munging rstats accordingly.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
librbd: check for out of bounds I/O in all forms of read/write
This was found by qemu-io tests, which tried to read and write past
the end of an rbd image. The test hung waiting for a completion that
was never scheduled, since it did not check the return value of
rbd_aio_write.
Reported-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
CDir: check_rstats will now print out dir stats whenever there's a bug.
Previously it only printed out dir stats at high debug levels, which
meant you could get output of the dentries without seeing what the
totals were supposed to be!
Sage Weil [Thu, 7 Apr 2011 21:57:41 +0000 (14:57 -0700)]
osd: more futzing with stat
We can get here when the object doesn't exist if the client specifies
may read and may write (in this case, Filer::probe). Look at the exists
bit in the object context. If the object is supposed to exist, we should never get ENOENT from stat.