doc/dev/cache-pool: describe the tiering agent

author Sage Weil <sage@inktank.com>

Tue, 18 Mar 2014 20:09:29 +0000 (13:09 -0700)

committer Sage Weil <sage@inktank.com>

Tue, 18 Mar 2014 20:09:29 +0000 (13:09 -0700)
author Sage Weil <sage@inktank.com>
Tue, 18 Mar 2014 20:09:29 +0000 (13:09 -0700)
committer Sage Weil <sage@inktank.com>
Tue, 18 Mar 2014 20:09:29 +0000 (13:09 -0700)
diff --git a/doc/dev/cache-pool.rst b/doc/dev/cache-pool.rst

index 3b079fb1035504ae7c8c7e6079f08b671914a6c9..af76c83ce1d21afbe7da34f4d248330d384d1101 100644 (file)
--- a/doc/dev/cache-pool.rst
+++ b/doc/dev/cache-pool.rst
@@ -7,6 +7,10 @@ Purpose
  Use a pool of fast storage devices (probably SSDs) and use it as a
  cache for an existing larger pool.
  
+Use a replicated pool as a front-end to service most I/O, and destage
+cold data to a separate erasure coded pool that does not current (and
+cannot efficiently) handle the workload.
+
  We should be able to create and add a cache pool to an existing pool
  of data, and later remove it, without disrupting service or migrating
  data around.
@@ -17,9 +21,9 @@ Use cases
  Read-write pool, writeback
  ~~~~~~~~~~~~~~~~~~~~~~~~~~
  
-We have an existing data pool and put a fast cache pool "in front" of it.  Writes will
-go to the cache pool and immediately ack.  We flush them back to the data pool based on
-some policy.
+We have an existing data pool and put a fast cache pool "in front" of
+it.  Writes will go to the cache pool and immediately ack.  We flush
+them back to the data pool based on the defined policy.
  
  Read-only pool, weak consistency
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -27,7 +31,7 @@ Read-only pool, weak consistency
  We have an existing data pool and add one or more read-only cache
  pools.  We copy data to the cache pool(s) on read.  Writes are
  forwarded to the original data pool.  Stale data is expired from the
-cache pools based on some as-yet undetermined policy.
+cache pools based on the defined policy.
  
  This is likely only useful for specific applications with specific
  data access patterns.  It may be a match for rgw, for example.
@@ -45,7 +49,7 @@ Direct all traffic for foo to foo-hot::
  
   ceph osd tier set-overlay foo foo-hot
  
-Set the target size and enable the tiering agent for foo-hit::
+Set the target size and enable the tiering agent for foo-hot::
  
   ceph osd pool set foo-hot hit_set_type bloom
   ceph osd pool set foo-hot hit_set_count 1
@@ -70,3 +74,110 @@ Read-only pools with lazy consistency::
   ceph osd tier cache-mode foo-west readonly
  
  
+
+Tiering agent
+-------------
+
+The tiering policy is defined as properties on the cache pool itself.
+
+HitSet metadata
+~~~~~~~~~~~~~~~
+
+First, the agent requires HitSet information to be tracked on the
+cache pool in order to determine which objects in the pool are being
+accessed.  This is enabled with::
+
+ ceph osd pool set foo-hot hit_set_type bloom
+ ceph osd pool set foo-hot hit_set_count 1
+ ceph osd pool set foo-hot hit_set_period 3600   # 1 hour
+
+The supported HitSet types include 'bloom' (a bloom filter, the
+default), 'explicit_hash', and 'explicit_object'.  The latter two
+explicitly enumerate accessed objects and are less memory efficient.
+They are there primarily for debugging and to demonstrate pluggability
+for the infrastructure.  For the bloom filter type, you can additionally
+define the false positive probability for the bloom filter (default is 0.05)::
+
+ ceph osd pool set foo-hot hit_set_fpp 0.15
+
+The hit_set_count and hit_set_period define how much time each HitSet
+should cover, and how many such HitSets to store.  Binning accesses
+over time allows Ceph to independently determine whether an object was
+accessed at least once and whether it was accessed more than once over
+some time period ("age" vs "temperature").  Note that the longer the
+period and the higher the count the more RAM will be consumed by the
+ceph-osd process.  In particular, when the agent is active to flush or
+evict cache objects, all hit_set_count HitSets are loaded into RAM.
+
+Currently there is minimal benefit for hit_set_count > 1 since the
+agent does not yet act intelligently on that information.
+
+Cache mode
+~~~~~~~~~~
+
+The most important policy is the cache mode:
+
+ ceph osd pool set foo-hot cache-mode writeback
+
+The supported modes are 'none', 'writeback', 'forward', and
+'readonly'.  Most installations want 'writeback', which will write
+into the cache tier and only later flush updates back to the base
+tier.  Similarly, any object that is read will be promoted into the
+cache tier.
+
+The 'forward' mode is intended for when the cache is being disabled
+and needs to be drained.  No new objects will be promoted or written
+to the cache pool unless they are already present.  A background
+operation can then do something like::
+
+  rados -p foo-hot cache-try-flush-evict-all
+  rados -p foo-hot cache-flush-evict-all
+
+to force all data to be flushed back to the base tier.
+
+The 'readonly' mode is intended for read-only workloads that do not
+require consistency to be enforced by the storage system.  Writes will
+be forwarded to the base tier, but objects that are read will get
+promoted to the cache.  No attempt is made by Ceph to ensure that the
+contents of the cache tier(s) are consistent in the presense of object
+updates.
+
+Cache sizing
+~~~~~~~~~~~~
+
+The agent performs two basic functions: flushing (writing 'dirty'
+cache objects back to the base tier) and evicting (removing cold and
+clean objects from the cache).
+
+The thresholds at which Ceph will flush or evict objects is specified
+relative to a 'target size' of the pool.  For example::
+
+ ceph osd pool set foo-hot cache_target_dirty_ratio .4
+ ceph osd pool set foo-hot cache_target_full_ratio .8
+
+will being flushing dirty objects when 40% of the pool is dirty and begin
+evicting clean objects when we reach 80% of the target size.
+
+The target size can be specified either in terms of objects or bytes::
+
+ ceph osd pool set foo-hot target_max_bytes 1000000000000  # 1 TB
+ ceph osd pool set foo-hot target_max_objets 1000000       # 1 million objects
+
+Note that if both limits are specified, Ceph will being flushing or
+evicting when either threshold is triggered.
+
+Other tunables
+~~~~~~~~~~~~~~
+
+You can specify a minimum object age before a recently updated object is
+flushed to the base tier::
+
+ ceph osd pool set foo-hot cache_min_flush_age 600   # 10 minutes
+
+You can specify the minimum age of an object before it will be evicted from
+the cache tier::
+
+ ceph osd pool set foo-hot cache_min_evict_age 1800   # 30 minutes
+
+
+
author	Sage Weil <sage@inktank.com>
	Tue, 18 Mar 2014 20:09:29 +0000 (13:09 -0700)
committer	Sage Weil <sage@inktank.com>
	Tue, 18 Mar 2014 20:09:29 +0000 (13:09 -0700)