David Zafman [Wed, 18 Sep 2013 01:14:16 +0000 (18:14 -0700)]
osd: Cleanup init()/read_superblock()
Fix error handling in init()
Cleanup read_superblock() by moving unrelated code into init()
Move init() feature upgrade right after compatibility checking
Remove redundant whoami check
Signed-off-by: David Zafman <david.zafman@inktank.com>
David Zafman [Fri, 20 Sep 2013 01:54:36 +0000 (18:54 -0700)]
common, os, osd, test, tools: FileStore must work with ghobjects rather than hobjects
Add ghobject_t to hboject.h header
Add constants NO_SHARD/NO_GEN and change gen_t/shard_t
Convert other headers from hobject_t to ghobject_t
Mostly straight hobject_t to ghobject_t for src/os cc files
Fix tools and tests and enable ceph-dencoder
Add filename generation and parsing including unittest addition
Get ceph-filestore-dump to build
Add gen/shard to DBObjectMap::ghobject_key() and update test case
Add CEPH_FS_FEATURE_INCOMPAT_SHARDS new FileStore feature
Add CEPH_OSD_FEATURE_INCOMPAT_SHARDS new osd feature
Fixes: #5862 Signed-off-by: David Zafman <david.zafman@inktank.com>
David Zafman [Wed, 25 Sep 2013 16:19:16 +0000 (09:19 -0700)]
os, osd, tools: Add backportable compatibility checking for sharded objects
OSD
New CEPH_OSD_FEATURE_INCOMPAT_SHARDS
FileStore
NEW CEPH_FS_FEATURE_INCOMPAT_SHARDS
Add FSSuperblock with feature CompatSet in it
Store sharded_objects state using CompatSet
Add set_allow_sharded_objects() and get_allow_sharded_objects() to FileStore/ObjectStore
Add read_superblock()/write_superblock() internal filestore functions
ceph_filestore_dump
Add OSDsuperblock to export format
Use CompatSet from OSD code itself in filestore-dump tool
Always check compatibility of OSD features with on-disk features
On import verify compatibility of on-disk features with export data
Bump super_ver due to export format change
Backport: dumpling, cuttlefish
Signed-off-by: David Zafman <david.zafman@inktank.com>
David Zafman [Tue, 17 Sep 2013 20:39:57 +0000 (13:39 -0700)]
include: Bug fixes for CompatSet
FeatureSet insert/remove
Use 64-bit arithmetic to allow features past 31
Allow feature 63 by fixing assert in insert
CompatSet::unsupported() bugs
Ignore feature 0 which became illegal
Use 64-bit arithmetic when computing mask
Use id in insert() and to get correct feature name
Use the right map to get name for diff.ro_compat
Sage Weil [Sat, 21 Sep 2013 04:06:09 +0000 (21:06 -0700)]
ceph_test_rados: fix COPY_FROM completion
Fix the copy_from operation to not remove the objects from the in_use list
until after the entire operation is complete. In particular, the racing
read was completing and removing the dest oid from the in-use list before
the copy-from completed. This keeps the model in sync with what the OSD
is actually doing.
If another new read started up, it would grab the previous value from the
model and expect to see that, but would instead see the updated value.
Fixes: #6176
Backport: dumpling
We take different code paths in copy_obj, make sure we close the handle
when we exit the function. Move the call to finish_get_obj() out of
copy_obj_data() as we don't create the handle there, so that should
makes code less confusing and less prone to errors.
Also, note that RGWRados::get_obj() also calls finish_get_obj(). For
everything to work in concert we need to pass a pointer to the handle
and not the handle itself. Therefore we needed to also change the call
to copy_obj_data().
perfglue/heap_profiler.cc: expect args as first element on cmd vector
We used to pass 'heap' as the first element of the cmd vector when
handling commands. We haven't been doing so for a while now, so we
needed to fix this.
Not expecting 'heap' also makes sense, considering that what we need to
know when we reach this function is what command we should handle, and
we should not care what the caller calls us when handling his business.
Fixes: #6361
Backport: dumpling
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
* TestErasureCodePluginExample.cc is renamed to TestErasureCodePlugin.cc
because it's not limited to the example which is really used to
support tests rather than being tested.
* Bugous plugins are added to exhibit failures and enable the unit tests
to check they are handled as expected
ErasureCodePluginFailToInitialize : the entry point returns != 0
ErasureCodePluginFailToRegister : the plugin registry is not updated
ErasureCodePluginMissingEntryPoint : the shared library has no entry
point
* It would be difficult to prove that the mutex protecting against
multiple loads actually does what it is expected to because of the
lack of thread introspection functions such as : tell me if this
thread is waiting on this mutex. A simpler approach is chosen : create
a thread that blocks forever when loading ( that's what the delay in
the example plugin is for ) and then check that the lock has indeed
been acquired. Since this mutex is merely about making sure that only
one thread at a time runs this sequence of code, it's probably enough.
The bool loading data member of ErasureCodePluginRegistry is
set to true when a plugin is being loaded, to provide an observable side
effect for test purposes.
* Andreas-Joachim Peters suggests to reduce copies to the minimum. When
possible the output arguments will just point to the input
argument. This must be documented as any side effect on the input
argument may modify the output argument
* Fix typos
* Fix may/could/must/should to better reflect what's mandatory and
what's not.
* Reword the explanation of minimum_to_decode_with_cost to not suggest
an implementation. This will need to be revisited anyway, when the
semantic of the cost is defined.
osdc/ObjectCacher: finish contexts after dropping object reference
The context to finish can be class C_Client_PutInode, which may drop
inode's last reference. So we should first drop object's reference,
then finish contexts.
Sage Weil [Wed, 11 Sep 2013 22:09:59 +0000 (15:09 -0700)]
osd: block requests on object during COPY_FROM
Block any request on an object (read or write) during the COPY_FROM
operation.
This could potentially be broken down into read vs write operations without
much difficulty, but blocking any op indescriminately is sufficient for
now, so let's keep it simple.
Sage Weil [Wed, 11 Sep 2013 22:10:47 +0000 (15:10 -0700)]
osd: add infrastructure to block io on an obc
Add an is_blocked() method for the obc, and add infrastructure to block
any operations if it returns true. Clean up on_change(), and add a helper
to kick an obc when whatever condition leading to it being blocked is no
longer true.
Sage Weil [Thu, 5 Sep 2013 00:09:52 +0000 (17:09 -0700)]
osd/ReplicatedPG: stage object chunks to replicas during COPY_FROM
As we get each chunk of data during the COPY_FROM operation, write it out
to a temporary object on the replicas. When we get all the pieces, move
it into place.
On btrfs, kb_used + kb_avail can be much smaller than total kb, and
what really matters to avoid filling up the disk is how much space is
available, not how much we've used. Thus, compute the ratio we use to
determine full or nearfull from kb_avail rather than from kb_used.
Signed-off-by: Alexandre Oliva <oliva@gnu.org> Signed-off-by: Sage Weil <sage@inktank.com>
* Andreas-Joachim Peters suggests to reduce copies to the minimum. When
possible the output arguments will just point to the input
argument. This must be documented as any side effect on the input
argument may modify the output argument
* Fix typos
* Fix may/could/must/should to better reflect what's mandatory and
what's not.
* Reword the explanation of minimum_to_decode_with_cost to not suggest
an implementation. This will need to be revisited anyway, when the
semantic of the cost is defined.
Joe Buck [Sat, 14 Sep 2013 00:41:31 +0000 (17:41 -0700)]
Removing extraneous code
The ExternalResource code was unnecessary and caused
issues on CentOS. Removing it.
Update Makefile.am to reflect the fact that
an anonymous class was removed and its
$1.class file is no longer generated.
The in-tree Hadoop shim was a combination of libcephfs wrapper, and the
bits to support Hadoop. This has been replaced by src/java that
implements generic libcephfs wrappers, and externally, the hadoop shim
(see docs).
Fixes: #6175
Backport: dumpling
We get a buffer off the remote gateway which might
not be NULL terminated. The JSON parser needs the
buffer to be NULL terminated even though we provide
a buffer length as it calls strlen().
David Zafman [Wed, 11 Sep 2013 23:56:21 +0000 (16:56 -0700)]
osd/ReplicatedPG.cc: Verify that recovery is truly complete
Backportable change to insure that even if no new ops started or
are running that indeed recovery is complete. Prevents some
error condition or unforseen code path from crashing an osd.
Backport: dumpling, cuttlefish
Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>