Haomai Wang [Thu, 20 Mar 2014 08:20:39 +0000 (16:20 +0800)]
Add random cache and replace SharedLRU in KeyValueStore
SharedLRU plays pool performance in KeyValueStore with large header cache size,
so a performance optimized RandomCache could improve it.
RandomCache will record the lookup frequency of key. When evictint element,
it will randomly compare several elements's frequency and evict the least
one.
Haomai Wang [Thu, 20 Mar 2014 06:09:49 +0000 (14:09 +0800)]
Remove exclusive lock on GenericObjectMap
Now most of GenericObjectMap interfaces use header as argument not the union of
coll_t and ghobject_t. So caller should be responsible for maintain the
exclusive header.
Haomai Wang [Thu, 20 Mar 2014 06:04:45 +0000 (14:04 +0800)]
Add Header cache to KeyValueStore
In the performance statistic recently, the header lookup becomes the main time
consuming for the read/write operations. Most of time it occur 50% to deal with
header lookup, decode/encode logics.
Now adding header cache using SharedLRU structure which will maintain the header
cache and caller will get the pointer to the real header. It also avoid too much
header copy operations overhead.
Loic Dachary [Thu, 19 Jun 2014 07:54:32 +0000 (09:54 +0200)]
autotools: avoid check_SCRIPTS duplication
The check_SCRIPTS content must be added to EXTRA_DIST, otherwise it will
not be included by make dist and it won't be possible to run make check
successfully.
One solution would be to add $(check_SCRIPTS) to EXTRA_DIST to avoid
duplication and help with long term maintenance. However, $(srcdir) is
not supported in the content of the check_SCRIPTS variable.
A GNU Make variable substitution (patsubst) is used to prepend $(srcdir)
to each script, only when used in the EXTRA_DIST variable.
Sage Weil [Tue, 17 Jun 2014 17:47:24 +0000 (10:47 -0700)]
osd: introduce simple sleep during scrub
This option is similar to osd_snap_trim_sleep: simply inject an optional
sleep in the thread that is doing scrub work. This is a very kludgey and
coarse knob for limiting the impact of scrub on the cluster, but can help
until we have a more robust and elegant solution.
Only sleep if we are in the NEW_CHUNK state to avoid delaying processing of
an in-progress chunk. In this state nothing is blocked on anything.
Conveniently, chunky_scrub() requeues itself for each new chunk.
Backport: firefly, dumpling Signed-off-by: Sage Weil <sage@inktank.com>
common: Enforces the methods lru_pin() and lru_unpin()
If lru_*pin() is called twice, the counter will be incr/decr
incorrectly since it will count more/less pinned objects than
there is and so corrupts the balancing (lru_adjust()).
common: Fixes issue with lru_clear() + add new test
The method lru_clear() must set attribute lru_num to zero
after lru_top, lru_bot and lru_mid are reseted. indeed, lru_num
is the total number of elements found in all of them.
Also the test insures the good behavior of the method
lru_adjust() - lru_touch() calls lru_adjust every time
to balance lru_top and lru_bot by the value of lru_midpoint.
Loic Dachary [Fri, 13 Jun 2014 12:41:39 +0000 (14:41 +0200)]
tests: prevent kill race condition
When trying to kill a daemon, keep its pid in a variable instead of
retrieving it from the pidfile multiple times. It prevents the following
race condition:
* try to kill ceph-mon
* ceph-mon is in the process of dying and removed its pidfile
* try to kill ceph-mon fails because the pidfile is not found
* another ceph-mon is spawned and fails to bind the port
because the previous ceph-mon is still holding it
Loic Dachary [Wed, 11 Jun 2014 20:50:43 +0000 (22:50 +0200)]
erasure-code: consistent argument parsing for profiles
Remove the = from the goodchars of the erasure_code_profile argument of
osd pool create so that it is consistent with the goodchars of osd
erasure-code-profile set / rm.
Sage Weil [Fri, 6 Jun 2014 20:31:29 +0000 (13:31 -0700)]
osd/OSDMap: do not require ERASURE_CODE feature of clients
Just because an EC pool exists in the cluster does not mean tha tthe client
has to support the feature:
1) The way client IO is initiated is no different for EC pools than for
replicated pools.
2) People may add an EC pool to an existing cluster with old clients and
locking those old clients out is very rude when they are not using the
new pool.
3) The only direct client user of EC pools right now is rgw, and the new
versions already need to support various other features like CRUSH_V2
in order to work. These features are present in new kernels.
Fixes: #8556
Backport: firefly Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Thu, 12 Jun 2014 23:44:53 +0000 (16:44 -0700)]
osd/OSDMap: make get_features() take an entity type
Make the helper that returns what features are required of the OSDMap take
an entity type argument, as the required features may vary between
components in the cluster.
Backport: firefly Signed-off-by: Sage Weil <sage@inktank.com>
Yehuda Sadeh [Wed, 11 Jun 2014 23:50:41 +0000 (16:50 -0700)]
rgw: set a default data extra pool name
Fixes: #8585
Have a default name for the data extra pool, otherwise it would be empty
which means that it'd default to the data pool name (which is a problem
with ec backends).
Greg Farnum [Tue, 20 May 2014 18:07:45 +0000 (11:07 -0700)]
FileStore: remove user_only options from getattrs through the ObjectStore stack
This sort of awareness belongs at a higher level in the stack -- as
evidenced by nobody using the option at this level. Remove it from the
implementations and the interface
Greg Farnum [Tue, 20 May 2014 20:04:02 +0000 (13:04 -0700)]
FileStore: do not use user_only in collection_getattrs
There's no particular reason why any of the callers of collection_getattrs
want to avoid looking at Ceph's internal xattrs.
It looks like this flag (set in 1862ddd88548fd4609f4fa9715dbad42a84d3775) was
set this way by mistake.
And finally, we don't actually set xattrs on collections anymore, anyway.
Yehuda Sadeh [Wed, 11 Jun 2014 06:06:12 +0000 (23:06 -0700)]
rgw: chain to multiple cache entries in one call
This ensures that chained cache entries that depend on more than one raw
cache entry (bucket info cache depends on both the bucket entry point
and on the bucket info object), are chained and created atomically.
Somnath Roy [Wed, 11 Jun 2014 01:10:30 +0000 (18:10 -0700)]
PG: Added a const spg_t member to the PG class
The const spg_t member is been insantiated from constructor
and now get_pgid() can reference this to return a spg_t instance
without the need of pg_info (thus not requiring to acquire pg_lock).
Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
Somnath Roy [Tue, 10 Jun 2014 23:02:52 +0000 (16:02 -0700)]
ShardedTP: The config option changed
The config option for sharded threadpool is changed to
osd_op_num_threads_per_shard instead of osd_op_num_sharded_pool_threads.
Along with osd_op_num_shards this will be much more user friendly while
configuring the number of op threads for the osd.
Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
Steve Taylor [Tue, 10 Jun 2014 18:42:55 +0000 (12:42 -0600)]
Fix for bug #6700
When preparing OSD disks with colocated journals, the intialization process
fails when using dmcrypt. The kernel fails to re-read the partition table after
the storage partition is created because the journal partition is already in use
by dmcrypt. This fix unmaps the journal partition from dmcrypt and allows the
partition table to be read.
Signed-off-by: Stephen F Taylor <steveftaylor@gmail.com>