## Introduction
-[`cloud-transition`](https://docs.ceph.com/en/latest/radosgw/cloud-transition) feature enables data transition to a remote cloud service as part of Lifecycle Configuration via Storage Classes. However the transition is unidirectional; data cannot be transitioned back from the remote zone.
+[`cloud-transition`](https://docs.ceph.com/en/latest/radosgw/cloud-transition) feature enables data transition to a remote cloud service as part of Lifecycle Configuration via Storage Classes. However the transition is unidirectional; data cannot be transitioned back from the remote zone.
The `cloud-restore` feature enables restoration of those transitioned objects from the remote cloud S3 endpoints back into RGW.
The objects can be restored either by using S3 `restore-object` CLI or via `read-through`. The restored copies can be either temporary or permanent.
## S3 restore-object CLI
+
The goal here is to implement minimal functionality of [`S3RestoreObject`](https://docs.aws.amazon.com/cli/latest/reference/s3api/restore-object.html) API so that users can restore the cloud transitioned objects.
```sh
--key <value> ( can be object name or * for Bulk restore) \
[--version-id <value>] \
--restore-request (structure) {
- // for temporary restore
+ // for temporary restore
{ "Days": integer, }
// if Days not provided, it will be considered as permanent copy
}
```
-This CLI may be extended in future to include custom parameters (like target-bucket/storage-class etc) specific to RGW.
+This CLI may be extended in future to include custom parameters (like target-bucket/storage-class etc) specific to RGW.
## read-through
+
As per the cloud-transition feature functionality, the cloud-transitioned objects cannot be read. `GET` on those objects fails with ‘InvalidObjectState’ error.
-But using this restore feature, transitioned objects can be restored and read. New tier-config options `allow_read_through` and `read_through_restore_days` are added for the same. Only when `allow_read_through` is enabled, `GET` on the transitioned objects will restore the objects from the S3 endpoint.
+But using this restore feature, transitioned objects can be restored and read. New tier-config options `allow_read_through` and `read_through_restore_days` are added for the same. Only when `allow_read_through` is enabled, `GET` on the transitioned objects will restore the objects from the S3 endpoint.
Note: The object copy restored via `readthrough` is temporary and is retained only for the duration of `read_through_restore_days`.
## Design
-* Similar to cloud-transition feature, this feature currently works for **only s3 compatible cloud endpoint**.
+* Similar to cloud-transition feature, this feature currently works for **only s3 compatible cloud endpoint**.
* This feature works for only **cloud-transitioned objects**. In order to validate this, `retain_head_object` option should be set to true so that the object’s `HEAD` object can be verified before restoring the object.
* **Request flow:**
- * Once the `HEAD` object is verified, its cloudtier storage class config details are fetched.
+ * Once the `HEAD` object is verified, its cloudtier storage class config details are fetched.
Note: Incase the cloudtier storage-class is deleted/updated, the object may not be restored.
- * RestoreStatus for the `HEAD` object is marked `RestoreAlreadyInProgress`
- * Object Restore is done asynchronously by issuing either S3 `GET` or S3 `RESTORE` request to the remote endpoint.
- * Once the object is restored, RestoreStaus is updated as `CloudRestored` and RestoreType is set to either `Temporary` or `Permanent`.
- * Incase the operation fails, RestoreStatus is marked as `RestoreFailed`.
-
+ * RestoreStatus for the `HEAD` object is marked `RestoreAlreadyInProgress`
+ * Object Restore is done asynchronously by issuing either S3 `GET` or S3 `RESTORE` request to the remote endpoint.
+ * Once the object is restored, RestoreStaus is updated as `CloudRestored` and RestoreType is set to either `Temporary` or `Permanent`.
+ * Incase the operation fails, RestoreStatus is marked as `RestoreFailed`.
+
* **New attrs:** Below are the new attrs being added
- * `user.rgw.restore-status`: <Restore operation Status>
- * `user.rgw.restore-type`: <Type of Restore>
- * `user.rgw.restored-at`: <Restoration Time>
- * `user.rgw.restore-expiry-date`: <Expiration time incase of temporary copies>
- * `user.rgw.cloudtier_storage_class`: <CloudTier storage class used in case of temporarily restored copies>
- ```sh
+ * `user.rgw.restore-status`: <Restore operation Status>
+ * `user.rgw.restore-type`: <Type of Restore>
+ * `user.rgw.restored-at`: <Restoration Time>
+ * `user.rgw.restore-expiry-date`: <Expiration time incase of temporary copies>
+ * `user.rgw.cloudtier_storage_class`: <CloudTier storage class used in case of temporarily restored copies>
+
+```cpp
enum RGWRestoreStatus : uint8_t {
None = 0,
RestoreAlreadyInProgress = 1,
Temporary = 1,
Permanent = 2
};
- ```
+```
* **Response:**
* `S3 restore-object CLI` returns SUCCESS - either the 200 OK or 202 Accepted status code.
- * If the object is not previously restored, then RGW returns 202 Accepted in the response.
- * If the object is previously restored, RGW returns 200 OK in the response.
- * Special errors:
+ * If the object is not previously restored, then RGW returns 202 Accepted in the response.
+ * If the object is previously restored, RGW returns 200 OK in the response.
+ * Special errors:
Code: RestoreAlreadyInProgress ( Cause: Object restore is already in progress.)
Code: ObjectNotFound (if Object is not found in cloud endpoint)
Code: I/O error (for any other I/O errors during restore)
* `GET request` continues to return an ‘InvalidObjectState’ error till the object is successfully restored.
- * S3 head-object can be used to verify if the restore is still in progress.
- * Once the object is restored, GET will return the object data.
-
+ * S3 head-object can be used to verify if the restore is still in progress.
+ * Once the object is restored, GET will return the object data.
* **StorageClass**: By default, the objects are restored to `STANDARD` storage class. However, as per [AWS S3 Restore](https://docs.aws.amazon.com/cli/latest/reference/s3api/restore-object.html) the storage-class remains the same for restored objects. Hence for the temporary copies, the `x-amz-storage-class` returned contains original cloudtier storage-class.
- * Note: A new tier-config option may be added to select the storage-class to restore the objects to.
+ * Note: A new tier-config option may be added to select the storage-class to restore the objects to.
* **mtime**: If the restored object is temporary, object is still marked `RGWObj::CloudTiered` and mtime is not changed i.e, still set to transition time. But in case the object is permanent copy, it is marked `RGWObj::Main` and mtime is updated to the restore time (now()).
* **Lifecycle**:
- * `Temporary` copies are not subjected to any further transition to the cloud. However (as is the case with cloud-transitioned objects) they can be deleted via regular LC expiration rules or via external S3 Delete request.
- * `Permanent` copies are treated as any regular objects and are subjected to any LC rules applicable.
+ * `Temporary` copies are not subjected to any further transition to the cloud. However (as is the case with cloud-transitioned objects) they can be deleted via regular LC expiration rules or via external S3 Delete request.
+ * `Permanent` copies are treated as any regular objects and are subjected to any LC rules applicable.
* **Replication**: The restored objects (both temporary and permanent) are also replicated like regular objects and will be deleted across the zones post expiration.
* **VersionedObjects** : In case of versioning, if any object is cloud-transitioned, it would have been non-current. Post restore too, the same non-current object will be updated with the downloaded data and its HEAD object will be updated accordingly as the case with regular objects.
-* **Temporary Object Expiry**: This is done via Object Expirer
- * When the object is restored as temporary, `user.rgw.expiry-date` is set accordingly and `delete_at` attr is also updated with the same value.
- * This object is then added to the list used by `ObjectExpirer`.
- * `LC` worker thread is used to scan through that list and post expiry, resets the objects back to cloud-transitioned state i.e,
- * HEAD object with size=0
- * new attrs removed
- * `delete_at` reset
- * Note: A new RGW option `rgw_restore_debug_interval` is added, which when set will be considered as `Days` value (similar to `rgw_lc_debug_interval`).
-
-* **FAILED Restore**: In case the restore operation fails,
- * The HEAD object will be updated accordingly.. i.e, Storage-class is reset to the original cloud-tier storage class
- * All the new attrs added will be removed , except for `user.rgw.restore-status` which will be updated as `RestoreFailed`
+* **Temporary Object Expiry**: This is done via Object Expirer
+ * When the object is restored as temporary, `user.rgw.expiry-date` is set accordingly and `delete_at` attr is also updated with the same value.
+ * This object is then added to the list used by `ObjectExpirer`.
+ * `LC` worker thread is used to scan through that list and post expiry, resets the objects back to cloud-transitioned state i.e,
+ * HEAD object with size=0
+ * new attrs removed
+ * `delete_at` reset
+ * Note: A new RGW option `rgw_restore_debug_interval` is added, which when set will be considered as `Days` value (similar to `rgw_lc_debug_interval`).
+
+* **FAILED Restore**: In case the restore operation fails,
+ * The HEAD object will be updated accordingly.. i.e, Storage-class is reset to the original cloud-tier storage class
+ * All the new attrs added will be removed , except for `user.rgw.restore-status` which will be updated as `RestoreFailed`
* **Check Restore Progress**: Users can issue S3 `head-object` request to check if the restore is done or still in progress for any object.
* **RGW down/restarts** - Since the restore operation is asynchronous, we need to keep track of the objects being restored. In case RGW is down/restarts, this data will be used to retrigger on-going restore requests or do appropriate cleanup for the failed requests.
-* **Compression** - If the placement-target to which the objects are being restored to has compression enabled, the data will be compressed accordingly (bug2294512)
+* **Compression** - If the placement-target to which the objects are being restored to has compression enabled, the data will be compressed accordingly (bug2294512)
-* **Encryption** - If the restored object is encrypted, the old sse-related xattrs/keys from the HEAD stub will be copied back into object metadata (bug2294512)
+* **Encryption** - If the restored object is encrypted, the old sse-related xattrs/keys from the HEAD stub will be copied back into object metadata (bug2294512)
* **Delete cloud object post restore** - Once the object is successfully restored, the object at the remote endpoint is still retained. However we could choose to delete it for permanent restored copies by adding new tier-config option.
-
## Future work
* **Bulk Restore**: In the case of BulkRestore, some of the objects may not be restored. User needs to manually cross-check the objects to check the objects restored or InProgress.
* **Admin Ops**
* **Restore Notifications**
-
ret = processor.complete(accounted_size, etag, &mtime, set_mtime,
attrs, rgw::cksum::no_cksum, delete_at , nullptr, nullptr, nullptr,
(rgw_zone_set *)&zone_set, &canceled, rctx, log_op ? rgw::sal::FLAG_LOG_OP : 0);
- if (ret < 0) {
- return ret;
- }
+ if (ret < 0) {
+ return ret;
+ }
// XXX: handle olh_epoch for versioned objects like in fetch_remote_obj
return ret;
retain_head_object = false;
}
}
+ if (config.exists("allow_read_through")) {
+ string s = config["allow_read_through"];
+ if (s == "true") {
+ allow_read_through = true;
+ } else {
+ allow_read_through = false;
+ }
+ }
+ if (config.exists("read_through_restore_days")) {
+ r = conf_to_uint64(config, "read_through_restore_days", &read_through_restore_days);
+ if (r < 0) {
+ read_through_restore_days = DEFAULT_READ_THROUGH_RESTORE_DAYS;
+ }
+ }
if (tier_type == "cloud-s3") {
r = t.s3.update_params(config);
if (config.exists("retain_head_object")) {
retain_head_object = false;
}
+ if (config.exists("allow_read_through")) {
+ allow_read_through = false;
+ }
+ if (config.exists("read_through_restore_days")) {
+ read_through_restore_days = DEFAULT_READ_THROUGH_RESTORE_DAYS;
+ }
if (tier_type == "cloud-s3") {
t.s3.clear_params(config);
}
/*
- * GET on CloudTiered objects is processed only when sent from the sync client.
- * In all other cases, fail with `ERR_INVALID_OBJECT_STATE`.
+ * GET on CloudTiered objects either it will synced to other zones.
+ * In all other cases, it will try to fetch the object from remote cloud endpoint.
*/
-int handle_cloudtier_obj(rgw::sal::Attrs& attrs, bool sync_cloudtiered) {
+int handle_cloudtier_obj(req_state* s, const DoutPrefixProvider *dpp, rgw::sal::Driver* driver,
+ rgw::sal::Attrs& attrs, bool sync_cloudtiered, std::optional<uint64_t> days,
+ bool restore_op, optional_yield y)
+{
int op_ret = 0;
+ ldpp_dout(dpp, 20) << "reached handle cloud tier " << dendl;
auto attr_iter = attrs.find(RGW_ATTR_MANIFEST);
- if (attr_iter != attrs.end()) {
- RGWObjManifest m;
- try {
- decode(m, attr_iter->second);
- if (m.get_tier_type() == "cloud-s3") {
- if (!sync_cloudtiered) {
- /* XXX: Instead send presigned redirect or read-through */
+ if (attr_iter == attrs.end()) {
+ if (restore_op) {
+ op_ret = -ERR_INVALID_OBJECT_STATE;
+ s->err.message = "only cloud tier object can be restored";
+ return op_ret;
+ } else { //ignore for read-through
+ return 0;
+ }
+ }
+ RGWObjManifest m;
+ try {
+ decode(m, attr_iter->second);
+ if (m.get_tier_type() != "cloud-s3") {
+ ldpp_dout(dpp, 20) << "not a cloud tier object " << s->object->get_key().name << dendl;
+ if (restore_op) {
+ op_ret = -ERR_INVALID_OBJECT_STATE;
+ s->err.message = "only cloud tier object can be restored";
+ return op_ret;
+ } else { //ignore for read-through
+ return 0;
+ }
+ }
+ RGWObjTier tier_config;
+ m.get_tier_config(&tier_config);
+ if (sync_cloudtiered) {
+ bufferlist t, t_tier;
+ t.append("cloud-s3");
+ attrs[RGW_ATTR_CLOUD_TIER_TYPE] = t;
+ encode(tier_config, t_tier);
+ attrs[RGW_ATTR_CLOUD_TIER_CONFIG] = t_tier;
+ return op_ret;
+ }
+ attr_iter = attrs.find(RGW_ATTR_RESTORE_STATUS);
+ rgw::sal::RGWRestoreStatus restore_status = rgw::sal::RGWRestoreStatus::None;
+ if (attr_iter != attrs.end()) {
+ bufferlist bl = attr_iter->second;
+ auto iter = bl.cbegin();
+ decode(restore_status, iter);
+ }
+ if (attr_iter == attrs.end() || restore_status == rgw::sal::RGWRestoreStatus::RestoreFailed) {
+ // first time restore or previous restore failed
+ rgw::sal::Bucket* pbucket = NULL;
+ pbucket = s->bucket.get();
+
+ std::unique_ptr<rgw::sal::PlacementTier> tier;
+ rgw_placement_rule target_placement;
+ target_placement.inherit_from(pbucket->get_placement_rule());
+ attr_iter = attrs.find(RGW_ATTR_STORAGE_CLASS);
+ if (attr_iter != attrs.end()) {
+ target_placement.storage_class = attr_iter->second.to_str();
+ }
+ op_ret = driver->get_zone()->get_zonegroup().get_placement_tier(target_placement, &tier);
+ ldpp_dout(dpp, 20) << "getting tier placement handle cloud tier" << op_ret <<
+ " storage class " << target_placement.storage_class << dendl;
+ if (op_ret < 0) {
+ s->err.message = "failed to restore object";
+ return op_ret;
+ }
+ rgw::sal::RadosPlacementTier* rtier = static_cast<rgw::sal::RadosPlacementTier*>(tier.get());
+ tier_config.tier_placement = rtier->get_rt();
+ if (!restore_op) {
+ if (tier_config.tier_placement.allow_read_through) {
+ days = tier_config.tier_placement.read_through_restore_days;
+ } else { //read-through is not enabled
op_ret = -ERR_INVALID_OBJECT_STATE;
- } else { // fetch object for sync and set cloud_tier attrs
- bufferlist t, t_tier;
- RGWObjTier tier_config;
- m.get_tier_config(&tier_config);
-
- t.append("cloud-s3");
- attrs[RGW_ATTR_CLOUD_TIER_TYPE] = t;
- encode(tier_config, t_tier);
- attrs[RGW_ATTR_CLOUD_TIER_CONFIG] = t_tier;
+ s->err.message = "Read through is not enabled for this config";
+ return op_ret;
}
}
- } catch (const buffer::end_of_buffer&) {
- // ignore empty manifest; it's not cloud-tiered
- } catch (const std::exception& e) {
+ // fill in the entry. XXX: Maybe we can avoid it by passing only necessary params
+ rgw_bucket_dir_entry ent;
+ ent.key.name = s->object->get_key().name;
+ ent.meta.accounted_size = ent.meta.size = s->obj_size;
+ ent.meta.etag = "" ;
+ ceph::real_time mtime = s->object->get_mtime();
+ uint64_t epoch = 0;
+ op_ret = get_system_versioning_params(s, &epoch, NULL);
+ ldpp_dout(dpp, 20) << "getting versioning params tier placement handle cloud tier" << op_ret << dendl;
+ if (op_ret < 0) {
+ ldpp_dout(dpp, 20) << "failed to get versioning params, op_ret = " << op_ret << dendl;
+ s->err.message = "failed to restore object";
+ return op_ret;
+ }
+ op_ret = s->object->restore_obj_from_cloud(pbucket, tier.get(), target_placement, ent, s->cct, tier_config,
+ mtime, epoch, days, dpp, y, s->bucket->get_info().flags);
+ if (op_ret < 0) {
+ ldpp_dout(dpp, 0) << "object " << ent.key.name << " fetching failed" << op_ret << dendl;
+ s->err.message = "failed to restore object";
+ return op_ret;
+ }
+ ldpp_dout(dpp, 20) << "object " << ent.key.name << " fetching succeed" << dendl;
+ /* Even if restore is complete the first read through request will return but actually downloaded
+ * object asyncronously.
+ */
+ if (!restore_op) { //read-through
+ op_ret = -ERR_REQUEST_TIMEOUT;
+ ldpp_dout(dpp, 5) << "restore is still in progress, please check restore status and retry" << dendl;
+ s->err.message = "restore is still in progress";
+ }
+ return op_ret;
+ } else if ((!restore_op) && (restore_status == rgw::sal::RGWRestoreStatus::RestoreAlreadyInProgress)) {
+ op_ret = -ERR_REQUEST_TIMEOUT;
+ ldpp_dout(dpp, 5) << "restore is still in progress, please check restore status and retry" << dendl;
+ s->err.message = "restore is still in progress";
+ } else { // CloudRestored..return success
+ return 0;
+ }
+ } catch (const buffer::end_of_buffer&) {
+ //empty manifest; it's not cloud-tiered
+ if (restore_op) {
+ op_ret = -ERR_INVALID_OBJECT_STATE;
+ s->err.message = "only cloud tier object can be restored";
}
+ } catch (const std::exception& e) {
}
-
return op_ret;
}
} catch (const buffer::error&) {}
}
-
if (get_type() == RGW_OP_GET_OBJ && get_data) {
- op_ret = handle_cloudtier_obj(attrs, sync_cloudtiered);
+ std::optional<uint64_t> days;
+ op_ret = handle_cloudtier_obj(s, this, driver, attrs, sync_cloudtiered, days, false, y);
if (op_ret < 0) {
ldpp_dout(this, 4) << "Cannot get cloud tiered object: " << *s->object
- <<". Failing with " << op_ret << dendl;
- if (op_ret == -ERR_INVALID_OBJECT_STATE) {
- s->err.message = "This object was transitioned to cloud-s3";
- }
+ <<". Failing with " << op_ret << dendl;
goto done_err;
}
}
JSONDecoder::decode_json("tier_type", tier_type, obj);
JSONDecoder::decode_json("storage_class", storage_class, obj);
JSONDecoder::decode_json("retain_head_object", retain_head_object, obj);
+ JSONDecoder::decode_json("allow_read_through", allow_read_through, obj);
+ JSONDecoder::decode_json("read_through_restore_days", read_through_restore_days, obj);
if (tier_type == "cloud-s3") {
JSONDecoder::decode_json("s3", t.s3, obj);
encode_json("tier_type", tier_type, f);
encode_json("storage_class", storage_class, f);
encode_json("retain_head_object", retain_head_object, f);
+ encode_json("allow_read_through", allow_read_through, f);
+ encode_json("read_through_restore_days", read_through_restore_days, f);
if (tier_type == "cloud-s3") {
encode_json("s3", t.s3, f);
WRITE_CLASS_ENCODER(RGWZoneGroupPlacementTierS3)
struct RGWZoneGroupPlacementTier {
+#define DEFAULT_READ_THROUGH_RESTORE_DAYS 1
+
std::string tier_type;
std::string storage_class;
bool retain_head_object = false;
+ bool allow_read_through = false;
+ uint64_t read_through_restore_days = 1;
struct _tier {
RGWZoneGroupPlacementTierS3 s3;
int clear_params(const JSONFormattable& config);
void encode(bufferlist& bl) const {
- ENCODE_START(1, 1, bl);
+ ENCODE_START(2, 1, bl);
encode(tier_type, bl);
encode(storage_class, bl);
encode(retain_head_object, bl);
+ encode(allow_read_through, bl);
+ encode(read_through_restore_days, bl);
if (tier_type == "cloud-s3") {
encode(t.s3, bl);
}
}
void decode(bufferlist::const_iterator& bl) {
- DECODE_START(1, bl);
+ DECODE_START(2, bl);
decode(tier_type, bl);
decode(storage_class, bl);
decode(retain_head_object, bl);
+ if (struct_v >= 2) {
+ decode(allow_read_through, bl);
+ decode(read_through_restore_days, bl);
+ }
if (tier_type == "cloud-s3") {
decode(t.s3, bl);
}