Yehuda Sadeh [Thu, 3 Oct 2013 18:49:33 +0000 (11:49 -0700)]
librados: drop reference to completion in container destructor
Move the PoolAsyncCompletionImpl reference drop from
C_PoolAsync_Safe::finish() to ~C_PoolAsyncSafe(), as finish() is only
called when the async request is actually sent.
Sandon Van Ness [Tue, 8 Oct 2013 18:58:57 +0000 (11:58 -0700)]
Go back to $PWD in fsstress.sh if compiling from source.
Although fsstress was being called with a static path the directory
it was writing to was in the current directory so doing a cd to the
source directory that is made in /tmp and then removing it later
caused it to be unable to write the files in a non-existent dir.
This change gets the current path first and cd's back into it after
it is done compiling fsstress.
Issue #6479.
Signed-off-by: Sandon Van Ness <sandon@inktank.com> Reviewed-by: Alfredo Deza <alfredo.deza@inktank.com>
Greg Farnum [Mon, 7 Oct 2013 20:11:21 +0000 (13:11 -0700)]
ReplicatedPG: copy: use aggregate return code instead of individual Op return
It appears that the OSD is not filling in the individual return codes, and they
should be equivalent for all purposes we care about here (the only Op we are
doing is the copy-get, and if it fails we are getting its failure code).
Reported-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Farnum <greg@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Mon, 7 Oct 2013 12:22:20 +0000 (05:22 -0700)]
os/FileStore: fix ENOENT error code for getattrs()
In commit dc0dfb9e01d593afdd430ca776cf4da2c2240a20 the omap xattrs code
moved up a block and r was no longer local to the block. Translate
ENOENT -> 0 to compensate.
Fix the same error in _rmattrs().
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
fix mon double-free when dropping unhandled messages, and allow "get monmap" messages to go through without authenticating for MonCliente::get_monmap_privately()
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
David Zafman [Mon, 30 Sep 2013 22:53:35 +0000 (15:53 -0700)]
common, os: Perform xattr handling based on detected fs type
In FileStore::_detect_fs() store discovered filesystem type in m_fs_type
Add per-filesystem filestore_max_inline_xattr_size_* variants
Add per-filesystem filestore_max_inline_xattrs_* variants
New function set_xattr_limits_via_conf()
Set m_filestore_max_inline_xattr_size based on override or fs type
Set m_filestore_max_inline_xattrs based on override or fs type
Handle conf change of any relevant value by calling set_xattr_limits_via_conf()
Change filestore_max_inline_xattr_size to override if non-zero
Change filestore_max_inline_xattrs to override if non-zero
Fixes: #6143 Signed-off-by: David Zafman <david.zafman@inktank.com>
Sage Weil [Fri, 4 Oct 2013 04:27:36 +0000 (21:27 -0700)]
osd/ReplicatedPG: fix null deref on rollback_to whiteout check
Bring this whole if/else chain up one level so that we can capture both
ENOENT and whiteout in the same case. (And don't dereference the
pointer when we know it is NULL.)
Fixes: #6474 Signed-off-by: Sage Weil <sage@inktank.com>
mon: Monitor: drop client msg if no session exists and msg is not MAuth
If we are not a monitor and we don't have a session yet, we must first
authenticate with the cluster. Therefore, the first message to the
monitor must be an MAuth. If not, we assume it's a stray message and
just drop it.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
mon: MonmapMonitor: make 'ceph mon add' idempotent
MonMap changes lead to bootstraps. Callbacks waiting for a proposal to
finish can have several fates, depending on what happens: finished, rerun
or aborted.
In the case of a bootstrap right after a monmap change, callbacks are
rerun. Considering we queued the message that lead to the monmap change
on this queue, if we instead of finishing it end up reruning it, we will
end up trying to perform the same modification twice -- the last one will
try to modify an already existing state and we will return just that:
whatever you're attempting to do has already been done.
This patch makes 'ceph mon add' completely idempotent. If one tries to
add an already existing monitor (i.e., same name, same ip:port), one
simply gets a 'monitor foo added', with return 0, no matter how many
times one runs the command.
Fixes: #5896 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Note that this can happen if we fail to reconnect do an MDS during its
reconnect interval. If that happens, we probably have inodes in our
cache with no caps and things are generally not going to work very well.
This is but one step in improving the situation.
Separate out the two methods since they share little/no behavior.
majianpeng [Thu, 1 Aug 2013 03:19:02 +0000 (11:19 +0800)]
ceph: Update FUSE_USE_VERSION from 26 to 30.
When compiling, it met this error:
>In file included from /usr/local/include/fuse/fuse.h:19:0,
> from client/fuse_ll.cc:17:
>/usr/local/include/fuse/fuse_common.h:474:4: error: #error only API
>version 30 or greater is supported
Update FUSE_USE_VERSION from 26 to 30.
Yan, Zheng [Fri, 9 Aug 2013 05:43:54 +0000 (13:43 +0800)]
client: trim deleted inode
Previous patch makes MDS send notification to clients when an inode
is deleted. When receiving a such notification, we invalidate any
dentry link to the deleted inode. If there is no other reference to
the inode, the inode gets trimmed.
For cephfs fuse client, we use fuse_lowlevel_notify_inval_entry() or
fuse_lowlevel_notify_delete() to notify the kernel to trim the deleted
inode. (this is not completely reliable because we play unlink/link
tricks when handle MDS replies. it's difficult to keep the user space
cache and kernel dcache in sync)
Sage Weil [Fri, 20 Sep 2013 00:57:14 +0000 (17:57 -0700)]
common/bloom_filter: unit tests
Fun facts:
- fpp = false positive probability
- fpp is a function of insert count only
- at .1% fpp, we pay about 2 bytes per insert
- at 1-2% fpp, we pay about 1 byte per insert
- at 15% fpp, we pay about .5 bytes per insert
David Zafman [Fri, 27 Sep 2013 00:42:13 +0000 (17:42 -0700)]
common, os, osd: Use common functions for safe file reading and writing
Add new safe_read_file() and safe_write_file() to update files atomically
Used instead of original OSD::read_meta(), OSD::write_meta() they are based on
Used by read_superblock() and write_superblock()
Used by write_version_stamp() and version_stamp_is_valid()
Fixes: #6422 Signed-off-by: David Zafman <david.zafman@inktank.com>
* Update to the current state of the ghobject implementaiton and the fact
that they encode the shard_t Although the pool also contains the shard
id, it is less relevant to understand the implementation.
* Update with the erasure code plugin infrastructure and the example
plugin now in master.
* Move jerasure to a separate page to be expanded and link it from the
toc
* Kill the partial read and writes notes as it will probably not be
implemented in the near future. Kill some of the notes because they
are no longer relevant.
Greg Farnum [Tue, 1 Oct 2013 23:41:22 +0000 (16:41 -0700)]
ReplicatedPG: copy: use CopyCallback instead of CopyOp in OpContext
In order to make this happen, we make the switch to generate the complete
transaction in the generic copy code and save it into the Callback. Then
in finish_copy() we just take that transaction and prepend it to the existing
transaction.
With that change, and by making use of the existing CopyCallback data,
we no longer need to access the CopyOp from the OpContext, so we can remove it.
Hurray, the pipelines are now independent!