John Spray [Tue, 6 May 2014 12:18:03 +0000 (13:18 +0100)]
mds: add atomic log rewrite on format change
Two main pieces to this:
* A new JournalPointer object that stores two journal
inodes so that we can do a double-buffered update,
followed by an atomic swap.
* An extended recovery process in MDLog that dereferences
the JournalPointer and conditionally rewrites the
journal to accomodate format updates.
The JournalPointer indirection should also be useful for
making cephfs-journal-tool do updates more safely.
Signed-off-by: John Spray <john.spray@inktank.com>
John Spray [Thu, 1 May 2014 11:54:14 +0000 (12:54 +0100)]
osdc: Clean up journalstream readable check
Fix redundant (and subtly incorrect) calculation of
the number of bytes needed. It worked because waiting
for a few more bytes before reading the entry size
of an old-format entry was harmless.
Signed-off-by: John Spray <john.spray@inktank.com>
John Spray [Tue, 25 Mar 2014 13:31:24 +0000 (13:31 +0000)]
mds: Fix Dumper::undump (missing lock)
Two problems were causing undump to fail:
* Objecter lock was not being taken around call to
.write() and .write_full() calls, causing assertion.
* Once that is fixed, it is necessary to use a separate,
local lock to protect the completion condition for
write operations
Signed-off-by: John Spray <john.spray@inktank.com>
John Spray [Fri, 2 May 2014 15:09:29 +0000 (16:09 +0100)]
mds: Refactor CINode encoding into CInodeStore
CInode itself combined the on-disk format and
encode/decode logic with lots of other complex
behaviours. This separates the simple parts
out so that they can be used by other tools that
are interested in looking at inodes outside of
a running MDS.
There is a small overhead because CInodeStore
can't decode a SnapRealm inline, so it keeps
a temporary copy of the encoded bufferlist.
Signed-off-by: John Spray <john.spray@inktank.com>
John Spray [Tue, 25 Mar 2014 13:31:03 +0000 (13:31 +0000)]
mds: Add get_metablob to LogEvent
Previously the only way to get at the payload
of things like EUpdate and EOpen was to replay() them
(required a full running MDS) or to use downcasting
(yuck).
Signed-off-by: John Spray <john.spray@inktank.com>
John Spray [Tue, 25 Mar 2014 13:30:50 +0000 (13:30 +0000)]
osdc: Revise Journaler format
* Separate journal encoding/envelope format
code (JournalStream) from I/O code (Journaler)
* Add new sentinel and start_ptr fields to
prefix and suffix of log events.
* Add journal encoding version to journal header
Signed-off-by: John Spray <john.spray@inktank.com>
John Spray [Tue, 29 Apr 2014 11:18:09 +0000 (12:18 +0100)]
objecter: Don't warn on multiple admin sockets
Suppress messages about failure to register admin sockets
if they are EEXIST, because this is a case that can occur
naturally if multiple objecter/librados clients are instantiated
within the same process.
Signed-off-by: John Spray <john.spray@inktank.com>
Ilya Dryomov [Fri, 16 May 2014 15:03:13 +0000 (19:03 +0400)]
OSDMonitor: set next commit in mon primary-affinity reply
Commit 8c5c55c8b47e ("mon: set next commit in mon command replies")
fixed MMonCommand replies to include the right version, but the
primary-affinity handler was authored before that. Fix it.
Dmitry Smirnov [Fri, 16 May 2014 10:26:38 +0000 (20:26 +1000)]
sample.ceph.conf: minor update
* Moved filestore settings above [osd.*] declarations otherwise
(if uncommented) those settings might be applied only to last
OSD which is not very obvious.
* Few options added.
John Spray [Tue, 13 May 2014 16:32:03 +0000 (17:32 +0100)]
doc: update instructions for RPM distros
Fix RPM building instructions: this has been broken since
libs3 was included inline in the ceph repo as a submodule.
"rpmbuild -tb" was concatenating the ceph.spec and
libs3.spec files, resulting in something that didn't work.
Also, the instructions suggested downloading a .tar.gz file
whereas the specfile requires a .tar.bz2 file.
Also, add a convenient yum command line for getting the compile
dependencies on Fedora 20.
Signed-off-by: John Spray <john.spray@inktank.com>
Dmitry Smirnov [Mon, 12 May 2014 04:08:44 +0000 (14:08 +1000)]
prioritise use of `javac` executable (gcj provides it through alternatives).
On Debian this fixes FTBFS when gcj-jdk and openjdk-7-jdk are installed at
the same time because build system will use default `javac` executable
provided by current JDK through `update-alternatives` instead of blindly
calling GCJ when it is present.
Revert commit 40d56a97 (mds: optimize EMetaBlob::fullbit, remotebit,
nullbit encoding). This optimization creates small segments in the
result bufferlist of encoding EMetaBlob. Perf shows lots of CPU time
are used for allocating list node for bufferlist.
Yan, Zheng [Mon, 12 May 2014 02:24:51 +0000 (10:24 +0800)]
mds: properly clear new flag for stale client cap
CInode::encode_inodestat() should clear the 'new' flag of client
cap even when session is stale, because the 'new' flag prevents
Locker::issue_caps() from sending cap message to client.
Danny Al-Gaaf [Mon, 12 May 2014 00:33:44 +0000 (02:33 +0200)]
BtrfsFileStoreBackend.cc: fix ::unlinkat() result handling
Don't check for 'fd' but for the return value of the ::unlinkat() call.
Fix for:
[src/os/BtrfsFileStoreBackend.cc:72] -> [src/os/BtrfsFileStoreBackend.cc:74]:
(warning) Opposite conditions in nested 'if' blocks lead to a dead code block.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Mon, 12 May 2014 00:01:10 +0000 (02:01 +0200)]
rgw_user.cc: cleanup RGWAccessKeyPool::check_op()
Remove dead assignment and unsued variable 'secret_key'. Check
op_state.get_access_key() directly for emptiness without extra
variable. Fix comment above check for access key.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Sun, 11 May 2014 23:26:56 +0000 (01:26 +0200)]
MDBalancer.cc: remove some since 2009 unused code
Remove some since long time unused code and variables (commented out
since 2009).
Fix for:
[src/mds/MDBalancer.cc:757]: (style) Variable 'total_sent' is
assigned a value that is never used.
[src/mds/MDBalancer.cc:665]: (style) Variable 'total_goal' is
assigned a value that is never used.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>