mon: OSDMonitor: split 'osd pool set' out of 'prepare_command'
We should start doing this across the whole 'prepare_command' function.
Makes it prettier to the reader, and easier to add new code.
Change the command to send a string instead of an int to allow us to have
non-integer pool paramters that can be modified. Support input json with
both int and string values so that we work with all flavors of client.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Yan, Zheng [Thu, 10 Oct 2013 02:35:48 +0000 (10:35 +0800)]
mds: fix infinite loop of MDCache::populate_mydir().
make MDCache::populate_mydir() only fetch bare-bone stray dirs.
After all stray dirs are populated, call MDCache::scan_stray_dir(),
it fetches incomplete stray dirs.
Sandon Van Ness [Tue, 8 Oct 2013 18:58:57 +0000 (11:58 -0700)]
Go back to $PWD in fsstress.sh if compiling from source.
Although fsstress was being called with a static path the directory
it was writing to was in the current directory so doing a cd to the
source directory that is made in /tmp and then removing it later
caused it to be unable to write the files in a non-existent dir.
This change gets the current path first and cd's back into it after
it is done compiling fsstress.
Issue #6479.
Signed-off-by: Sandon Van Ness <sandon@inktank.com> Reviewed-by: Alfredo Deza <alfredo.deza@inktank.com>
Greg Farnum [Mon, 7 Oct 2013 20:11:21 +0000 (13:11 -0700)]
ReplicatedPG: copy: use aggregate return code instead of individual Op return
It appears that the OSD is not filling in the individual return codes, and they
should be equivalent for all purposes we care about here (the only Op we are
doing is the copy-get, and if it fails we are getting its failure code).
Reported-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Farnum <greg@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Mon, 7 Oct 2013 12:22:20 +0000 (05:22 -0700)]
os/FileStore: fix ENOENT error code for getattrs()
In commit dc0dfb9e01d593afdd430ca776cf4da2c2240a20 the omap xattrs code
moved up a block and r was no longer local to the block. Translate
ENOENT -> 0 to compensate.
Fix the same error in _rmattrs().
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
fix mon double-free when dropping unhandled messages, and allow "get monmap" messages to go through without authenticating for MonCliente::get_monmap_privately()
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
Sage Weil [Thu, 3 Oct 2013 23:30:29 +0000 (16:30 -0700)]
common/bloom_filter: drop raw_table_size_ member
We were storing table_size_ and raw_table_size_, where one is the size in
bits and the other is the size in bytes. This is silly. Store only the
size in bytes.
Also, bytes are always 8 bits, so use bit shifts and drop some of that
silliness too.
Move the member declarations to the top of the class so you read them
before the methods.
David Zafman [Mon, 30 Sep 2013 22:53:35 +0000 (15:53 -0700)]
common, os: Perform xattr handling based on detected fs type
In FileStore::_detect_fs() store discovered filesystem type in m_fs_type
Add per-filesystem filestore_max_inline_xattr_size_* variants
Add per-filesystem filestore_max_inline_xattrs_* variants
New function set_xattr_limits_via_conf()
Set m_filestore_max_inline_xattr_size based on override or fs type
Set m_filestore_max_inline_xattrs based on override or fs type
Handle conf change of any relevant value by calling set_xattr_limits_via_conf()
Change filestore_max_inline_xattr_size to override if non-zero
Change filestore_max_inline_xattrs to override if non-zero
Fixes: #6143 Signed-off-by: David Zafman <david.zafman@inktank.com>
Sage Weil [Fri, 4 Oct 2013 04:27:36 +0000 (21:27 -0700)]
osd/ReplicatedPG: fix null deref on rollback_to whiteout check
Bring this whole if/else chain up one level so that we can capture both
ENOENT and whiteout in the same case. (And don't dereference the
pointer when we know it is NULL.)
Fixes: #6474 Signed-off-by: Sage Weil <sage@inktank.com>
Greg Farnum [Thu, 3 Oct 2013 00:12:06 +0000 (17:12 -0700)]
TrackedOp: specify queue sizes and warnings on a per-tracker basis
If we have multiple trackers in a daemon, we want to be able to configure
them separately. Plus, users already know how to control op sizes in the
OSD, so changing the config options (as we did in a8bbb81b7b7b6420ea08bc4e99a39adc6c3c397a)
is not really appropriate. Instead, provider setters which can be called
at construction time (or on any other change) and use them in the OSD with
the configurables we had previously. Add an observer so you can continue
to change them at run-time
mon: Monitor: drop client msg if no session exists and msg is not MAuth
If we are not a monitor and we don't have a session yet, we must first
authenticate with the cluster. Therefore, the first message to the
monitor must be an MAuth. If not, we assume it's a stray message and
just drop it.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
mon: MonmapMonitor: make 'ceph mon add' idempotent
MonMap changes lead to bootstraps. Callbacks waiting for a proposal to
finish can have several fates, depending on what happens: finished, rerun
or aborted.
In the case of a bootstrap right after a monmap change, callbacks are
rerun. Considering we queued the message that lead to the monmap change
on this queue, if we instead of finishing it end up reruning it, we will
end up trying to perform the same modification twice -- the last one will
try to modify an already existing state and we will return just that:
whatever you're attempting to do has already been done.
This patch makes 'ceph mon add' completely idempotent. If one tries to
add an already existing monitor (i.e., same name, same ip:port), one
simply gets a 'monitor foo added', with return 0, no matter how many
times one runs the command.
Fixes: #5896 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Note that this can happen if we fail to reconnect do an MDS during its
reconnect interval. If that happens, we probably have inodes in our
cache with no caps and things are generally not going to work very well.
This is but one step in improving the situation.
Separate out the two methods since they share little/no behavior.
majianpeng [Thu, 1 Aug 2013 03:19:02 +0000 (11:19 +0800)]
ceph: Update FUSE_USE_VERSION from 26 to 30.
When compiling, it met this error:
>In file included from /usr/local/include/fuse/fuse.h:19:0,
> from client/fuse_ll.cc:17:
>/usr/local/include/fuse/fuse_common.h:474:4: error: #error only API
>version 30 or greater is supported
Update FUSE_USE_VERSION from 26 to 30.
Yan, Zheng [Fri, 9 Aug 2013 05:43:54 +0000 (13:43 +0800)]
client: trim deleted inode
Previous patch makes MDS send notification to clients when an inode
is deleted. When receiving a such notification, we invalidate any
dentry link to the deleted inode. If there is no other reference to
the inode, the inode gets trimmed.
For cephfs fuse client, we use fuse_lowlevel_notify_inval_entry() or
fuse_lowlevel_notify_delete() to notify the kernel to trim the deleted
inode. (this is not completely reliable because we play unlink/link
tricks when handle MDS replies. it's difficult to keep the user space
cache and kernel dcache in sync)