David Zafman [Fri, 27 Sep 2013 00:42:13 +0000 (17:42 -0700)]
common, os, osd: Use common functions for safe file reading and writing
Add new safe_read_file() and safe_write_file() to update files atomically
Used instead of original OSD::read_meta(), OSD::write_meta() they are based on
Used by read_superblock() and write_superblock()
Used by write_version_stamp() and version_stamp_is_valid()
Fixes: #6422 Signed-off-by: David Zafman <david.zafman@inktank.com>
* Update to the current state of the ghobject implementaiton and the fact
that they encode the shard_t Although the pool also contains the shard
id, it is less relevant to understand the implementation.
* Update with the erasure code plugin infrastructure and the example
plugin now in master.
* Move jerasure to a separate page to be expanded and link it from the
toc
* Kill the partial read and writes notes as it will probably not be
implemented in the near future. Kill some of the notes because they
are no longer relevant.
Sage Weil [Tue, 1 Oct 2013 21:21:40 +0000 (14:21 -0700)]
osd: remove magical tmap -> omap conversion
This is incomplete and unfortunately unusable in its current state:
- it would only set USES_TMAP for old encoded object_info_t and tmapput,
but would NOT set it for tmapup
- a config option turned that off by default.
That means that the mds conversion from tmap -> omap won't be able to use
this because any existing cluster has tmap objects without the USES_TMAP
flag set. And we don't want to unconditionally try a tmap->omap conversion
on omap operations because there are lots of existing librados users out
there that will be negatively impacted by this.
Instead, the MDS will need to handle this conversion on the client side by
reading either tmap or omap objects and explicitly rewriting the content
with omap (while truncating the tmap data away).
Sage Weil [Wed, 2 Oct 2013 00:04:44 +0000 (17:04 -0700)]
osd: add ISDIRTY, UNDIRTY rados operations
ISDIRTY will query whether the dirty flag is set on an object. UNDIRTY
will explicitly clear it. Note that a user doing so will likely run amok
with the caching code.
Sage Weil [Tue, 1 Oct 2013 19:12:55 +0000 (12:12 -0700)]
osd/ReplicatedPG: update all find_object_context() users to handle whiteouts
In each case, we treat the whiteout as if we got an ENOENT.
We do not change the semantics of bool exists to avoid breaking lots of
potentially fragile code. We are only interested in changing the
user-visible behavior of the object, not the way it is internally stored
or managed.
This will likely be refined as we grow acutal users for whiteoutes in the
pool caching code.
Sage Weil [Tue, 1 Oct 2013 16:28:29 +0000 (09:28 -0700)]
osdc/ObjectCacher: limit writeback IOs generated while holding lock
While analyzing a log from Mike Dawson I saw a long stall while librbd's
objectcacher was starting lots (many hundreds) of IOs. Limit the amount of
time we spend doing this at a time to allow IO replies to be processed so
that the cache remains responsive.
I'm not sure this warrants a tunable (which we would need to add for both
libcephfs and librbd).
Yehuda Sadeh [Mon, 26 Aug 2013 18:16:08 +0000 (11:16 -0700)]
rgw: quiet down warning message
Fixes: #6123
We don't want to know about failing to read region map info
if it's not found, only if failed on some other error. In
any case it's just a warning.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Dan Mick [Fri, 27 Sep 2013 05:24:37 +0000 (22:24 -0700)]
ceph_argparse.py: clean up error reporting when required param missing
Treat "need 1, got 0" as a special case, and change the message to
"missing required parameter <x>". Also, when failing for that reason,
print the command concise description and its helptext.
Fixes: #6384 Signed-off-by: Dan Mick <dan.mick@inktank.com>
Fixes: #6444
Backport: dumpling
If pool creation fails (e.g., due to -EEXIST) then we leak the
completion object. Earlier we couldn't just drop the reference, as
librados have already removed the internal completion object. This fix
drop the completion reference even if got an error, which is now
possible.
librados: pool async create / delete does not delete completion handle
Backport: dumpling
The pool async delete / create function used to delete the internal
completion object. However, caller still holds the allocated completion
object, which it can't drop a reference to (as it'd try to deallocate
the already freed internal object). This fix removes the internal object
deletion, a following commit will fix a related leak (#6444) by having
the application (radosgw) drop the reference even if got an error.
Objecter: add "honor_cache_redirects" flag covering cache settings
When set to false, we do not redirect based on the cache_pool data
in the OSDMap. We'll use this so the OSDs can actually fetch data
into the cache pools on promotion! Signed-off-by: Greg Farnum <greg@inktank.com>
Sage Weil [Thu, 5 Sep 2013 04:29:11 +0000 (21:29 -0700)]
common/crc32c_intel_fast: avoid reading partial trailing word
The optimized intel code reads in word-sized chunks, knowing that the
allocator will only hand out memory in word-sized increments. This makes
valgrind unhappy. Whitelisting doesn't work because for some reason there
is no caller context (probably because of some interaction with yasm?).
Instead, just use the baseline code for the last few bytes. This should
not be significant.
Dan Mick [Fri, 27 Sep 2013 01:00:31 +0000 (18:00 -0700)]
ceph_argparse.py, cephtool/test.sh: fix blacklist with no nonce
It's legal to give a CephEntityAddr to osd blacklist with no nonce,
so allow it in the valid() method; also add validation of any nonce
given that it's a long >= 0.
Also fix comment on CephEntityAddr type description in MonCommands.h,
and add tests for invalid nonces (while fixing the existing tests to remove
the () around expect_false args).
Fixes: #6425 Signed-off-by: Dan Mick <dan.mick@inktank.com>
We now put CEPH_ARGS in the actual args we parse in python, which are passed
to rados piecemeal later. This lets you put things like --id ... in there
that need to be parsed before librados is initialized.
David Zafman [Wed, 18 Sep 2013 01:14:16 +0000 (18:14 -0700)]
osd: Cleanup init()/read_superblock()
Fix error handling in init()
Cleanup read_superblock() by moving unrelated code into init()
Move init() feature upgrade right after compatibility checking
Remove redundant whoami check
Signed-off-by: David Zafman <david.zafman@inktank.com>
David Zafman [Fri, 20 Sep 2013 01:54:36 +0000 (18:54 -0700)]
common, os, osd, test, tools: FileStore must work with ghobjects rather than hobjects
Add ghobject_t to hboject.h header
Add constants NO_SHARD/NO_GEN and change gen_t/shard_t
Convert other headers from hobject_t to ghobject_t
Mostly straight hobject_t to ghobject_t for src/os cc files
Fix tools and tests and enable ceph-dencoder
Add filename generation and parsing including unittest addition
Get ceph-filestore-dump to build
Add gen/shard to DBObjectMap::ghobject_key() and update test case
Add CEPH_FS_FEATURE_INCOMPAT_SHARDS new FileStore feature
Add CEPH_OSD_FEATURE_INCOMPAT_SHARDS new osd feature
Fixes: #5862 Signed-off-by: David Zafman <david.zafman@inktank.com>