From: Sage Weil Date: Tue, 18 Mar 2014 20:09:29 +0000 (-0700) Subject: doc/dev/cache-pool: describe the tiering agent X-Git-Tag: v0.79~128 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=f1e3bc9a9bb6003659c852023c89db46767cc7b9;p=ceph.git doc/dev/cache-pool: describe the tiering agent Signed-off-by: Sage Weil --- diff --git a/doc/dev/cache-pool.rst b/doc/dev/cache-pool.rst index 3b079fb103550..af76c83ce1d21 100644 --- a/doc/dev/cache-pool.rst +++ b/doc/dev/cache-pool.rst @@ -7,6 +7,10 @@ Purpose Use a pool of fast storage devices (probably SSDs) and use it as a cache for an existing larger pool. +Use a replicated pool as a front-end to service most I/O, and destage +cold data to a separate erasure coded pool that does not current (and +cannot efficiently) handle the workload. + We should be able to create and add a cache pool to an existing pool of data, and later remove it, without disrupting service or migrating data around. @@ -17,9 +21,9 @@ Use cases Read-write pool, writeback ~~~~~~~~~~~~~~~~~~~~~~~~~~ -We have an existing data pool and put a fast cache pool "in front" of it. Writes will -go to the cache pool and immediately ack. We flush them back to the data pool based on -some policy. +We have an existing data pool and put a fast cache pool "in front" of +it. Writes will go to the cache pool and immediately ack. We flush +them back to the data pool based on the defined policy. Read-only pool, weak consistency ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -27,7 +31,7 @@ Read-only pool, weak consistency We have an existing data pool and add one or more read-only cache pools. We copy data to the cache pool(s) on read. Writes are forwarded to the original data pool. Stale data is expired from the -cache pools based on some as-yet undetermined policy. +cache pools based on the defined policy. This is likely only useful for specific applications with specific data access patterns. It may be a match for rgw, for example. @@ -45,7 +49,7 @@ Direct all traffic for foo to foo-hot:: ceph osd tier set-overlay foo foo-hot -Set the target size and enable the tiering agent for foo-hit:: +Set the target size and enable the tiering agent for foo-hot:: ceph osd pool set foo-hot hit_set_type bloom ceph osd pool set foo-hot hit_set_count 1 @@ -70,3 +74,110 @@ Read-only pools with lazy consistency:: ceph osd tier cache-mode foo-west readonly + +Tiering agent +------------- + +The tiering policy is defined as properties on the cache pool itself. + +HitSet metadata +~~~~~~~~~~~~~~~ + +First, the agent requires HitSet information to be tracked on the +cache pool in order to determine which objects in the pool are being +accessed. This is enabled with:: + + ceph osd pool set foo-hot hit_set_type bloom + ceph osd pool set foo-hot hit_set_count 1 + ceph osd pool set foo-hot hit_set_period 3600 # 1 hour + +The supported HitSet types include 'bloom' (a bloom filter, the +default), 'explicit_hash', and 'explicit_object'. The latter two +explicitly enumerate accessed objects and are less memory efficient. +They are there primarily for debugging and to demonstrate pluggability +for the infrastructure. For the bloom filter type, you can additionally +define the false positive probability for the bloom filter (default is 0.05):: + + ceph osd pool set foo-hot hit_set_fpp 0.15 + +The hit_set_count and hit_set_period define how much time each HitSet +should cover, and how many such HitSets to store. Binning accesses +over time allows Ceph to independently determine whether an object was +accessed at least once and whether it was accessed more than once over +some time period ("age" vs "temperature"). Note that the longer the +period and the higher the count the more RAM will be consumed by the +ceph-osd process. In particular, when the agent is active to flush or +evict cache objects, all hit_set_count HitSets are loaded into RAM. + +Currently there is minimal benefit for hit_set_count > 1 since the +agent does not yet act intelligently on that information. + +Cache mode +~~~~~~~~~~ + +The most important policy is the cache mode: + + ceph osd pool set foo-hot cache-mode writeback + +The supported modes are 'none', 'writeback', 'forward', and +'readonly'. Most installations want 'writeback', which will write +into the cache tier and only later flush updates back to the base +tier. Similarly, any object that is read will be promoted into the +cache tier. + +The 'forward' mode is intended for when the cache is being disabled +and needs to be drained. No new objects will be promoted or written +to the cache pool unless they are already present. A background +operation can then do something like:: + + rados -p foo-hot cache-try-flush-evict-all + rados -p foo-hot cache-flush-evict-all + +to force all data to be flushed back to the base tier. + +The 'readonly' mode is intended for read-only workloads that do not +require consistency to be enforced by the storage system. Writes will +be forwarded to the base tier, but objects that are read will get +promoted to the cache. No attempt is made by Ceph to ensure that the +contents of the cache tier(s) are consistent in the presense of object +updates. + +Cache sizing +~~~~~~~~~~~~ + +The agent performs two basic functions: flushing (writing 'dirty' +cache objects back to the base tier) and evicting (removing cold and +clean objects from the cache). + +The thresholds at which Ceph will flush or evict objects is specified +relative to a 'target size' of the pool. For example:: + + ceph osd pool set foo-hot cache_target_dirty_ratio .4 + ceph osd pool set foo-hot cache_target_full_ratio .8 + +will being flushing dirty objects when 40% of the pool is dirty and begin +evicting clean objects when we reach 80% of the target size. + +The target size can be specified either in terms of objects or bytes:: + + ceph osd pool set foo-hot target_max_bytes 1000000000000 # 1 TB + ceph osd pool set foo-hot target_max_objets 1000000 # 1 million objects + +Note that if both limits are specified, Ceph will being flushing or +evicting when either threshold is triggered. + +Other tunables +~~~~~~~~~~~~~~ + +You can specify a minimum object age before a recently updated object is +flushed to the base tier:: + + ceph osd pool set foo-hot cache_min_flush_age 600 # 10 minutes + +You can specify the minimum age of an object before it will be evicted from +the cache tier:: + + ceph osd pool set foo-hot cache_min_evict_age 1800 # 30 minutes + + +