==================
This feature enables transitioning S3 objects to a remote cloud service as part
-of `<https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html>`
+of `object lifecycle <https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html>`_
via :ref:`storage_classes`. The transition is unidirectional: data cannot be
transitioned back from the remote zone. The goal of this feature is to enable
data transition to multiple cloud providers. The currently supported cloud
Cloud Storage Class Tier Configuration
--------------------------------------
-```
- {
- "access_key": <access>,
- "secret": <secret>,
- "endpoint": <endpoint>,
- "region": <region>,
- "host_style": <path | virtual>,
- "acls": [ { "type": <id | email | uri>,
- "source_id": <source_id>,
- "dest_id": <dest_id> } ... ],
- "target_path": <target_path>,
- "target_storage_class": <target-storage-class>,
- "multipart_sync_threshold": {object_size},
- "multipart_min_part_size": {part_size},
- "retain_head_object": <true | false>
- }
-```
-
-Cloud Transition Specific Configurables:
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+::
+
+ {
+ "access_key": <access>,
+ "secret": <secret>,
+ "endpoint": <endpoint>,
+ "region": <region>,
+ "host_style": <path | virtual>,
+ "acls": [ { "type": <id | email | uri>,
+ "source_id": <source_id>,
+ "dest_id": <dest_id> } ... ],
+ "target_path": <target_path>,
+ "target_storage_class": <target-storage-class>,
+ "multipart_sync_threshold": {object_size},
+ "multipart_min_part_size": {part_size},
+ "retain_head_object": <true | false>
+ }
+
+
+Cloud Transition Specific Configurables
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* ``access_key`` (string)
refer to the "Versioned Objects" section below.
-S3 Specific Configurables:
-~~~~~~~~~~~~~~~~~~~~~~~~~~
+S3 Specific Configurables
+~~~~~~~~~~~~~~~~~~~~~~~~~
Currently, cloud transition will work only with backends that are compatible with
-AWS S3. There are a few configurables that can be used to tweak behavior
-when accessing cloud services:
+AWS S3 protocol. There are a few configurables that can be used to tweak behavior
+when accessing cloud services::
-```
- {
- "multipart_sync_threshold": {object_size},
- "multipart_min_part_size": {part_size}
- }
-```
+ {
+ "multipart_sync_threshold": {object_size},
+ "multipart_min_part_size": {part_size}
+ }
* ``multipart_sync_threshold`` (integer)
the zone/zonegroup changes will take effect once the changes are
committed with ``radosgw-admin period update --commit``.
-```
- # radosgw-admin zonegroup placement add --rgw-zonegroup={zone-group-name} \
- --placement-id={placement-id} \
- --storage-class={storage-class-name} \
- --tier-type=cloud-s3
-```
-
-For example:
-
-```
- # radosgw-admin zonegroup placement add --rgw-zonegroup=default \
- --placement-id=default-placement \
- --storage-class=CLOUDTIER --tier-type=cloud-s3
- [
- {
- "key": "default-placement",
- "val": {
- "name": "default-placement",
- "tags": [],
- "storage_classes": [
- "CLOUDTIER",
- "STANDARD"
- ],
- "tier_targets": [
- {
- "key": "CLOUDTIER",
- "val": {
- "tier_type": "cloud-s3",
- "storage_class": "CLOUDTIER",
- "retain_head_object": "false",
- "s3": {
- "endpoint": "",
- "access_key": "",
- "secret": "",
- "host_style": "path",
- "target_storage_class": "",
- "target_path": "",
- "acl_mappings": [],
- "multipart_sync_threshold": 33554432,
- "multipart_min_part_size": 33554432
- }
- }
- }
- ]
- }
- }
- ]
-```
+::
+
+ $ radosgw-admin zonegroup placement add --rgw-zonegroup={zone-group-name} \
+ --placement-id={placement-id} \
+ --storage-class={storage-class-name} \
+ --tier-type=cloud-s3
+
+For example::
+
+ $ radosgw-admin zonegroup placement add --rgw-zonegroup=default \
+ --placement-id=default-placement \
+ --storage-class=CLOUDTIER --tier-type=cloud-s3
+ [
+ {
+ "key": "default-placement",
+ "val": {
+ "name": "default-placement",
+ "tags": [],
+ "storage_classes": [
+ "CLOUDTIER",
+ "STANDARD"
+ ],
+ "tier_targets": [
+ {
+ "key": "CLOUDTIER",
+ "val": {
+ "tier_type": "cloud-s3",
+ "storage_class": "CLOUDTIER",
+ "retain_head_object": "false",
+ "s3": {
+ "endpoint": "",
+ "access_key": "",
+ "secret": "",
+ "host_style": "path",
+ "target_storage_class": "",
+ "target_path": "",
+ "acl_mappings": [],
+ "multipart_sync_threshold": 33554432,
+ "multipart_min_part_size": 33554432
+ }
+ }
+ }
+ ]
+ }
+ }
+ ]
.. note:: Once a storage class
- of ``--tier-type=cloud-s3`` or ``--tier-type=cloud-s3-glacier``,
+ of ``--tier-type=cloud-s3`` or ``--tier-type=cloud-s3-glacier``
is created it cannot be later modified to any other storage class type.
-The tier configuration can be then performed using the following command:
+The tier configuration can be then performed using the following command::
-```
- # radosgw-admin zonegroup placement modify --rgw-zonegroup={zone-group-name} \
- --placement-id={placement-id} \
- --storage-class={storage-class-name} \
- --tier-config={key}={val}[,{key}={val}]
-```
+ $ radosgw-admin zonegroup placement modify --rgw-zonegroup={zone-group-name} \
+ --placement-id={placement-id} \
+ --storage-class={storage-class-name} \
+ --tier-config={key}={val}[,{key}={val}]
The ``key`` in the configuration specifies the config variable to be updated, and
the ``val`` specifies its new value.
-For example:
-
-```
- # radosgw-admin zonegroup placement modify --rgw-zonegroup default \
- --placement-id default-placement \
- --storage-class CLOUDTIER \
- --tier-config=endpoint=http://XX.XX.XX.XX:YY,\
- access_key=<access_key>,secret=<secret>, \
- multipart_sync_threshold=44432, \
- multipart_min_part_size=44432, \
- retain_head_object=true
-```
-
-Nested values can be accessed using periods. For example:
-
-```
- # radosgw-admin zonegroup placement modify --rgw-zonegroup={zone-group-name} \
- --placement-id={placement-id} \
- --storage-class={storage-class-name} \
- --tier-config=acls.source_id=${source-id}, \
- acls.dest_id=${dest-id}
-```
+For example::
+
+ $ radosgw-admin zonegroup placement modify --rgw-zonegroup default \
+ --placement-id default-placement \
+ --storage-class CLOUDTIER \
+ --tier-config=endpoint=http://XX.XX.XX.XX:YY,\
+ access_key=<access_key>,secret=<secret>, \
+ multipart_sync_threshold=44432, \
+ multipart_min_part_size=44432, \
+ retain_head_object=true
+
+Nested tier configuration values can be accessed using periods. This notation
+works similarly to how nested fields are accessed in JSON with tools like ``jq``.
+Note that the use of period separators ``(.)`` is specific to key access within ``--tier-config``,
+and should not be confused with Ceph RGW patterns for realm/zonegroup/zone.
+For example::
+
+ $ radosgw-admin zonegroup placement modify --rgw-zonegroup={zone-group-name} \
+ --placement-id={placement-id} \
+ --storage-class={storage-class-name} \
+ --tier-config=acls.source_id=${source-id}, \
+ acls.dest_id=${dest-id}
Configuration array entries can be accessed by specifying the specific entry to
be referenced enclosed in square brackets, and adding a new array entry can be
performed with an empty array `[]`.
-For example, creating a new ``acl`` array entry:
+For example, creating a new ``acl`` array entry::
-```
- # radosgw-admin zonegroup placement modify --rgw-zonegroup={zone-group-name} \
- --placement-id={placement-id} \
- --storage-class={storage-class-name} \
- --tier-config=acls[].source_id=${source-id}, \
- acls[${source-id}].dest_id=${dest-id}, \
- acls[${source-id}].type=email
-```
+ $ radosgw-admin zonegroup placement modify --rgw-zonegroup={zone-group-name} \
+ --placement-id={placement-id} \
+ --storage-class={storage-class-name} \
+ --tier-config=acls[].source_id=${source-id}, \
+ acls[${source-id}].dest_id=${dest-id}, \
+ acls[${source-id}].type=email
An entry can be removed by supplying ``--tier-config-rm={key}``.
-For example:
-
-```
- # radosgw-admin zonegroup placement modify --rgw-zonegroup default \
- --placement-id default-placement \
- --storage-class CLOUDTIER \
- --tier-config-rm=acls.source_id=testid
-
- # radosgw-admin zonegroup placement modify --rgw-zonegroup default \
- --placement-id default-placement \
- --storage-class CLOUDTIER \
- --tier-config-rm=target_path
-```
-
-The storage class can be removed using the following command:
-
-```
- # radosgw-admin zonegroup placement rm --rgw-zonegroup={zone-group-name} \
- --placement-id={placement-id} \
- --storage-class={storage-class-name}
-```
-
-For example,
-
-```
- # radosgw-admin zonegroup placement rm --rgw-zonegroup default \
- --placement-id default-placement \
- --storage-class CLOUDTIER
- [
- {
- "key": "default-placement",
- "val": {
- "name": "default-placement",
- "tags": [],
- "storage_classes": [
- "STANDARD"
- ]
- }
- }
- ]
-```
+For example::
+
+ $ radosgw-admin zonegroup placement modify --rgw-zonegroup default \
+ --placement-id default-placement \
+ --storage-class CLOUDTIER \
+ --tier-config-rm=acls.source_id=testid
+
+ $ radosgw-admin zonegroup placement modify --rgw-zonegroup default \
+ --placement-id default-placement \
+ --storage-class CLOUDTIER \
+ --tier-config-rm=target_path
+
+The storage class can be removed using the following command::
+
+ $ radosgw-admin zonegroup placement rm --rgw-zonegroup={zone-group-name} \
+ --placement-id={placement-id} \
+ --storage-class={storage-class-name}
+
+For example::
+
+ $ radosgw-admin zonegroup placement rm --rgw-zonegroup default \
+ --placement-id default-placement \
+ --storage-class CLOUDTIER
+ [
+ {
+ "key": "default-placement",
+ "val": {
+ "name": "default-placement",
+ "tags": [],
+ "storage_classes": [
+ "STANDARD"
+ ]
+ }
+ }
+ ]
+
Object Modification and Limitations
-----------------------------------
+-----------------------------------
The cloud storage class, once configured, can be used like any other storage
-class when defining bucket lifecycle (LC) rules. For example,
+class when defining bucket lifecycle (LC) rules. For example::
-```
- <Transition>
- <StorageClass>CLOUDTIER</StorageClass>
- ....
+ <LifecycleConfiguration>
+ <Rule>
....
- </Transition>
-```
+ <Transition>
+ ....
+ <StorageClass>CLOUDTIER</StorageClass>
+ </Transition>
+ </Rule>
+ </LifecycleConfiguration>
Since the transition is unidirectional, when configuring S3
lifecycle rules, the cloud storage class should be specified
Due to API limitations, there is no way to preserve the original object
modification time and ETag, which are stored as metadata attributes
-on the destination objects, as shown below:
+on the destination objects, as shown below::
-```
- x-amz-meta-rgwx-source: rgw
- x-amz-meta-rgwx-source-etag: ed076287532e86365e841e92bfc50d8c
- x-amz-meta-rgwx-source-key: lc.txt
- x-amz-meta-rgwx-source-mtime: 1608546349.757100363
- x-amz-meta-rgwx-versioned-epoch: 0
-```
+ x-amz-meta-rgwx-source: rgw
+ x-amz-meta-rgwx-source-etag: ed076287532e86365e841e92bfc50d8c
+ x-amz-meta-rgwx-source-key: lc.txt
+ x-amz-meta-rgwx-source-mtime: 1608546349.757100363
+ x-amz-meta-rgwx-versioned-epoch: 0
In order to allow cloud services to detect the source and map
user-defined ``x-amz-meta-`` attributes, two additional new
just store the object data being sent.
By default, post-transition, the source object gets deleted. But it is possible
-to retain its metadata but with updated values (including ``storage-class``
+to retain its metadata with updated values (including ``storage-class``
and ``object-size``) by setting the config option ``retain_head_object``
to true. However a ``GET`` operation on such an object will still fail
-with an ``InvalidObjectState`` error.
-
-For example:
-
-```
- # s3cmd info s3://bucket/lc.txt
- s3://bucket/lc.txt (object):
- File size: 12
- Last mod: Mon, 21 Dec 2020 10:25:56 GMT
- MIME type: text/plain
- Storage: CLOUDTIER
- MD5 sum: ed076287532e86365e841e92bfc50d8c
- SSE: none
- Policy: none
- CORS: none
- ACL: M. Tester: FULL_CONTROL
- x-amz-meta-s3cmd-attrs: atime:1608466266/ctime:1597606156/gid:0/gname:root/md5:ed076287532e86365e841e92bfc50d8c/mode:33188/mtime:1597605793/uid:0/uname:root
-
- # s3cmd get s3://bucket/lc.txt lc_restore.txt
- download: 's3://bucket/lc.txt' -> 'lc_restore.txt' [1 of 1]
- ERROR: S3 error: 403 (InvalidObjectState)
-```
+with an ``InvalidObjectState`` error. Any other operations against original
+source objects will be for its metadata entries only keeping transitioned
+objects intact.
+
+For example::
+
+ # s3cmd info s3://bucket/lc.txt
+ s3://bucket/lc.txt (object):
+ File size: 12
+ Last mod: Mon, 21 Dec 2020 10:25:56 GMT
+ MIME type: text/plain
+ Storage: CLOUDTIER
+ MD5 sum: ed076287532e86365e841e92bfc50d8c
+ SSE: none
+ Policy: none
+ CORS: none
+ ACL: M. Tester: FULL_CONTROL
+ x-amz-meta-s3cmd-attrs: atime:1608466266/ctime:1597606156/gid:0/gname:root/md5:ed076287532e86365e841e92bfc50d8c/mode:33188/mtime:1597605793/uid:0/uname:root
+
+ # s3cmd get s3://bucket/lc.txt lc_restore.txt
+ download: 's3://bucket/lc.txt' -> 'lc_restore.txt' [1 of 1]
+ ERROR: S3 error: 403 (InvalidObjectState)
To avoid object name collisions across buckets, the source bucket name is
prepended to the target object name. If the object is versioned, the object's
``versionid`` is appended.
-Below is the object name format:
+Below is the object name format::
+
+ s3://<target_path>/<source_bucket_name>/<source_object_name>(-<source_object_version_id>)
-```
- s3://<target_path>/<source_bucket_name>/<source_object_name>(-<source_object_version_id>)
-```
Versioned Objects
~~~~~~~~~~~~~~~~~
Restoring Objects
-----------------
The objects transitioned to cloud can now be restored. For more information, refer to
-[Restoring Objects from Cloud](https://docs.aws.amazon.com/AmazonS3/latest/dev/cloud-restore.html)
+`Restoring Objects from Cloud <https://docs.aws.amazon.com/AmazonS3/latest/dev/cloud-restore.html>`_.
Future Work