Sage Weil [Tue, 1 Oct 2013 21:21:40 +0000 (14:21 -0700)]
osd: remove magical tmap -> omap conversion
This is incomplete and unfortunately unusable in its current state:
- it would only set USES_TMAP for old encoded object_info_t and tmapput,
but would NOT set it for tmapup
- a config option turned that off by default.
That means that the mds conversion from tmap -> omap won't be able to use
this because any existing cluster has tmap objects without the USES_TMAP
flag set. And we don't want to unconditionally try a tmap->omap conversion
on omap operations because there are lots of existing librados users out
there that will be negatively impacted by this.
Instead, the MDS will need to handle this conversion on the client side by
reading either tmap or omap objects and explicitly rewriting the content
with omap (while truncating the tmap data away).
Sage Weil [Wed, 2 Oct 2013 00:04:44 +0000 (17:04 -0700)]
osd: add ISDIRTY, UNDIRTY rados operations
ISDIRTY will query whether the dirty flag is set on an object. UNDIRTY
will explicitly clear it. Note that a user doing so will likely run amok
with the caching code.
Sage Weil [Tue, 1 Oct 2013 19:12:55 +0000 (12:12 -0700)]
osd/ReplicatedPG: update all find_object_context() users to handle whiteouts
In each case, we treat the whiteout as if we got an ENOENT.
We do not change the semantics of bool exists to avoid breaking lots of
potentially fragile code. We are only interested in changing the
user-visible behavior of the object, not the way it is internally stored
or managed.
This will likely be refined as we grow acutal users for whiteoutes in the
pool caching code.
Sage Weil [Tue, 1 Oct 2013 16:28:29 +0000 (09:28 -0700)]
osdc/ObjectCacher: limit writeback IOs generated while holding lock
While analyzing a log from Mike Dawson I saw a long stall while librbd's
objectcacher was starting lots (many hundreds) of IOs. Limit the amount of
time we spend doing this at a time to allow IO replies to be processed so
that the cache remains responsive.
I'm not sure this warrants a tunable (which we would need to add for both
libcephfs and librbd).
Yehuda Sadeh [Mon, 26 Aug 2013 18:16:08 +0000 (11:16 -0700)]
rgw: quiet down warning message
Fixes: #6123
We don't want to know about failing to read region map info
if it's not found, only if failed on some other error. In
any case it's just a warning.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Dan Mick [Fri, 27 Sep 2013 05:24:37 +0000 (22:24 -0700)]
ceph_argparse.py: clean up error reporting when required param missing
Treat "need 1, got 0" as a special case, and change the message to
"missing required parameter <x>". Also, when failing for that reason,
print the command concise description and its helptext.
Fixes: #6384 Signed-off-by: Dan Mick <dan.mick@inktank.com>
Fixes: #6444
Backport: dumpling
If pool creation fails (e.g., due to -EEXIST) then we leak the
completion object. Earlier we couldn't just drop the reference, as
librados have already removed the internal completion object. This fix
drop the completion reference even if got an error, which is now
possible.
librados: pool async create / delete does not delete completion handle
Backport: dumpling
The pool async delete / create function used to delete the internal
completion object. However, caller still holds the allocated completion
object, which it can't drop a reference to (as it'd try to deallocate
the already freed internal object). This fix removes the internal object
deletion, a following commit will fix a related leak (#6444) by having
the application (radosgw) drop the reference even if got an error.
Objecter: add "honor_cache_redirects" flag covering cache settings
When set to false, we do not redirect based on the cache_pool data
in the OSDMap. We'll use this so the OSDs can actually fetch data
into the cache pools on promotion! Signed-off-by: Greg Farnum <greg@inktank.com>
Sage Weil [Thu, 5 Sep 2013 04:29:11 +0000 (21:29 -0700)]
common/crc32c_intel_fast: avoid reading partial trailing word
The optimized intel code reads in word-sized chunks, knowing that the
allocator will only hand out memory in word-sized increments. This makes
valgrind unhappy. Whitelisting doesn't work because for some reason there
is no caller context (probably because of some interaction with yasm?).
Instead, just use the baseline code for the last few bytes. This should
not be significant.
Dan Mick [Fri, 27 Sep 2013 01:00:31 +0000 (18:00 -0700)]
ceph_argparse.py, cephtool/test.sh: fix blacklist with no nonce
It's legal to give a CephEntityAddr to osd blacklist with no nonce,
so allow it in the valid() method; also add validation of any nonce
given that it's a long >= 0.
Also fix comment on CephEntityAddr type description in MonCommands.h,
and add tests for invalid nonces (while fixing the existing tests to remove
the () around expect_false args).
Fixes: #6425 Signed-off-by: Dan Mick <dan.mick@inktank.com>
We now put CEPH_ARGS in the actual args we parse in python, which are passed
to rados piecemeal later. This lets you put things like --id ... in there
that need to be parsed before librados is initialized.
David Zafman [Wed, 18 Sep 2013 01:14:16 +0000 (18:14 -0700)]
osd: Cleanup init()/read_superblock()
Fix error handling in init()
Cleanup read_superblock() by moving unrelated code into init()
Move init() feature upgrade right after compatibility checking
Remove redundant whoami check
Signed-off-by: David Zafman <david.zafman@inktank.com>
David Zafman [Fri, 20 Sep 2013 01:54:36 +0000 (18:54 -0700)]
common, os, osd, test, tools: FileStore must work with ghobjects rather than hobjects
Add ghobject_t to hboject.h header
Add constants NO_SHARD/NO_GEN and change gen_t/shard_t
Convert other headers from hobject_t to ghobject_t
Mostly straight hobject_t to ghobject_t for src/os cc files
Fix tools and tests and enable ceph-dencoder
Add filename generation and parsing including unittest addition
Get ceph-filestore-dump to build
Add gen/shard to DBObjectMap::ghobject_key() and update test case
Add CEPH_FS_FEATURE_INCOMPAT_SHARDS new FileStore feature
Add CEPH_OSD_FEATURE_INCOMPAT_SHARDS new osd feature
Fixes: #5862 Signed-off-by: David Zafman <david.zafman@inktank.com>
David Zafman [Wed, 25 Sep 2013 16:19:16 +0000 (09:19 -0700)]
os, osd, tools: Add backportable compatibility checking for sharded objects
OSD
New CEPH_OSD_FEATURE_INCOMPAT_SHARDS
FileStore
NEW CEPH_FS_FEATURE_INCOMPAT_SHARDS
Add FSSuperblock with feature CompatSet in it
Store sharded_objects state using CompatSet
Add set_allow_sharded_objects() and get_allow_sharded_objects() to FileStore/ObjectStore
Add read_superblock()/write_superblock() internal filestore functions
ceph_filestore_dump
Add OSDsuperblock to export format
Use CompatSet from OSD code itself in filestore-dump tool
Always check compatibility of OSD features with on-disk features
On import verify compatibility of on-disk features with export data
Bump super_ver due to export format change
Backport: dumpling, cuttlefish
Signed-off-by: David Zafman <david.zafman@inktank.com>
David Zafman [Tue, 17 Sep 2013 20:39:57 +0000 (13:39 -0700)]
include: Bug fixes for CompatSet
FeatureSet insert/remove
Use 64-bit arithmetic to allow features past 31
Allow feature 63 by fixing assert in insert
CompatSet::unsupported() bugs
Ignore feature 0 which became illegal
Use 64-bit arithmetic when computing mask
Use id in insert() and to get correct feature name
Use the right map to get name for diff.ro_compat