From: Jiffin Tony Thottan Date: Fri, 21 Jun 2024 09:12:07 +0000 (+0530) Subject: cloud restore: completing read through X-Git-Tag: v20.0.0~916^2~1 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=c49aa6aa3b1f18f4c36977e91311f6f7d9ac2e45;p=ceph.git cloud restore: completing read through What are all supported : * It allows read-through for cloud-tiered objects via restore_obj_from_cloud * New tier config options user need to set allow_read_through to true and read_through_restore_days more than 1 for this feature to work, also objects with retain_head_object will be available for this feature. * First get request will fail with restoring in progress error, objects are downloaded asynchronously. * The objects restore are temporary. * Tested `aws s3api get-object`, `aws s3api head-object` and `aws s3 cp` In addition send timeout errors for first readthrough request Also addressed lint warning and other cleanup(review comments) Signed-off-by: Jiffin Tony Thottan --- diff --git a/src/doc/rgw/cloud-restore.md b/src/doc/rgw/cloud-restore.md index 2a7d149dc6615..d54b18dfa50bc 100644 --- a/src/doc/rgw/cloud-restore.md +++ b/src/doc/rgw/cloud-restore.md @@ -2,13 +2,14 @@ ## Introduction -[`cloud-transition`](https://docs.ceph.com/en/latest/radosgw/cloud-transition) feature enables data transition to a remote cloud service as part of Lifecycle Configuration via Storage Classes. However the transition is unidirectional; data cannot be transitioned back from the remote zone. +[`cloud-transition`](https://docs.ceph.com/en/latest/radosgw/cloud-transition) feature enables data transition to a remote cloud service as part of Lifecycle Configuration via Storage Classes. However the transition is unidirectional; data cannot be transitioned back from the remote zone. The `cloud-restore` feature enables restoration of those transitioned objects from the remote cloud S3 endpoints back into RGW. The objects can be restored either by using S3 `restore-object` CLI or via `read-through`. The restored copies can be either temporary or permanent. ## S3 restore-object CLI + The goal here is to implement minimal functionality of [`S3RestoreObject`](https://docs.aws.amazon.com/cli/latest/reference/s3api/restore-object.html) API so that users can restore the cloud transitioned objects. ```sh @@ -17,41 +18,43 @@ aws s3api restore-object \ --key ( can be object name or * for Bulk restore) \ [--version-id ] \ --restore-request (structure) { - // for temporary restore + // for temporary restore { "Days": integer, } // if Days not provided, it will be considered as permanent copy } ``` -This CLI may be extended in future to include custom parameters (like target-bucket/storage-class etc) specific to RGW. +This CLI may be extended in future to include custom parameters (like target-bucket/storage-class etc) specific to RGW. ## read-through + As per the cloud-transition feature functionality, the cloud-transitioned objects cannot be read. `GET` on those objects fails with ‘InvalidObjectState’ error. -But using this restore feature, transitioned objects can be restored and read. New tier-config options `allow_read_through` and `read_through_restore_days` are added for the same. Only when `allow_read_through` is enabled, `GET` on the transitioned objects will restore the objects from the S3 endpoint. +But using this restore feature, transitioned objects can be restored and read. New tier-config options `allow_read_through` and `read_through_restore_days` are added for the same. Only when `allow_read_through` is enabled, `GET` on the transitioned objects will restore the objects from the S3 endpoint. Note: The object copy restored via `readthrough` is temporary and is retained only for the duration of `read_through_restore_days`. ## Design -* Similar to cloud-transition feature, this feature currently works for **only s3 compatible cloud endpoint**. +* Similar to cloud-transition feature, this feature currently works for **only s3 compatible cloud endpoint**. * This feature works for only **cloud-transitioned objects**. In order to validate this, `retain_head_object` option should be set to true so that the object’s `HEAD` object can be verified before restoring the object. * **Request flow:** - * Once the `HEAD` object is verified, its cloudtier storage class config details are fetched. + * Once the `HEAD` object is verified, its cloudtier storage class config details are fetched. Note: Incase the cloudtier storage-class is deleted/updated, the object may not be restored. - * RestoreStatus for the `HEAD` object is marked `RestoreAlreadyInProgress` - * Object Restore is done asynchronously by issuing either S3 `GET` or S3 `RESTORE` request to the remote endpoint. - * Once the object is restored, RestoreStaus is updated as `CloudRestored` and RestoreType is set to either `Temporary` or `Permanent`. - * Incase the operation fails, RestoreStatus is marked as `RestoreFailed`. - + * RestoreStatus for the `HEAD` object is marked `RestoreAlreadyInProgress` + * Object Restore is done asynchronously by issuing either S3 `GET` or S3 `RESTORE` request to the remote endpoint. + * Once the object is restored, RestoreStaus is updated as `CloudRestored` and RestoreType is set to either `Temporary` or `Permanent`. + * Incase the operation fails, RestoreStatus is marked as `RestoreFailed`. + * **New attrs:** Below are the new attrs being added - * `user.rgw.restore-status`: - * `user.rgw.restore-type`: - * `user.rgw.restored-at`: - * `user.rgw.restore-expiry-date`: - * `user.rgw.cloudtier_storage_class`: - ```sh + * `user.rgw.restore-status`: + * `user.rgw.restore-type`: + * `user.rgw.restored-at`: + * `user.rgw.restore-expiry-date`: + * `user.rgw.cloudtier_storage_class`: + +```cpp enum RGWRestoreStatus : uint8_t { None = 0, RestoreAlreadyInProgress = 1, @@ -63,58 +66,56 @@ Note: Incase the cloudtier storage-class is deleted/updated, the object may not Temporary = 1, Permanent = 2 }; - ``` +``` * **Response:** * `S3 restore-object CLI` returns SUCCESS - either the 200 OK or 202 Accepted status code. - * If the object is not previously restored, then RGW returns 202 Accepted in the response. - * If the object is previously restored, RGW returns 200 OK in the response. - * Special errors: + * If the object is not previously restored, then RGW returns 202 Accepted in the response. + * If the object is previously restored, RGW returns 200 OK in the response. + * Special errors: Code: RestoreAlreadyInProgress ( Cause: Object restore is already in progress.) Code: ObjectNotFound (if Object is not found in cloud endpoint) Code: I/O error (for any other I/O errors during restore) * `GET request` continues to return an ‘InvalidObjectState’ error till the object is successfully restored. - * S3 head-object can be used to verify if the restore is still in progress. - * Once the object is restored, GET will return the object data. - + * S3 head-object can be used to verify if the restore is still in progress. + * Once the object is restored, GET will return the object data. * **StorageClass**: By default, the objects are restored to `STANDARD` storage class. However, as per [AWS S3 Restore](https://docs.aws.amazon.com/cli/latest/reference/s3api/restore-object.html) the storage-class remains the same for restored objects. Hence for the temporary copies, the `x-amz-storage-class` returned contains original cloudtier storage-class. - * Note: A new tier-config option may be added to select the storage-class to restore the objects to. + * Note: A new tier-config option may be added to select the storage-class to restore the objects to. * **mtime**: If the restored object is temporary, object is still marked `RGWObj::CloudTiered` and mtime is not changed i.e, still set to transition time. But in case the object is permanent copy, it is marked `RGWObj::Main` and mtime is updated to the restore time (now()). * **Lifecycle**: - * `Temporary` copies are not subjected to any further transition to the cloud. However (as is the case with cloud-transitioned objects) they can be deleted via regular LC expiration rules or via external S3 Delete request. - * `Permanent` copies are treated as any regular objects and are subjected to any LC rules applicable. + * `Temporary` copies are not subjected to any further transition to the cloud. However (as is the case with cloud-transitioned objects) they can be deleted via regular LC expiration rules or via external S3 Delete request. + * `Permanent` copies are treated as any regular objects and are subjected to any LC rules applicable. * **Replication**: The restored objects (both temporary and permanent) are also replicated like regular objects and will be deleted across the zones post expiration. * **VersionedObjects** : In case of versioning, if any object is cloud-transitioned, it would have been non-current. Post restore too, the same non-current object will be updated with the downloaded data and its HEAD object will be updated accordingly as the case with regular objects. -* **Temporary Object Expiry**: This is done via Object Expirer - * When the object is restored as temporary, `user.rgw.expiry-date` is set accordingly and `delete_at` attr is also updated with the same value. - * This object is then added to the list used by `ObjectExpirer`. - * `LC` worker thread is used to scan through that list and post expiry, resets the objects back to cloud-transitioned state i.e, - * HEAD object with size=0 - * new attrs removed - * `delete_at` reset - * Note: A new RGW option `rgw_restore_debug_interval` is added, which when set will be considered as `Days` value (similar to `rgw_lc_debug_interval`). - -* **FAILED Restore**: In case the restore operation fails, - * The HEAD object will be updated accordingly.. i.e, Storage-class is reset to the original cloud-tier storage class - * All the new attrs added will be removed , except for `user.rgw.restore-status` which will be updated as `RestoreFailed` +* **Temporary Object Expiry**: This is done via Object Expirer + * When the object is restored as temporary, `user.rgw.expiry-date` is set accordingly and `delete_at` attr is also updated with the same value. + * This object is then added to the list used by `ObjectExpirer`. + * `LC` worker thread is used to scan through that list and post expiry, resets the objects back to cloud-transitioned state i.e, + * HEAD object with size=0 + * new attrs removed + * `delete_at` reset + * Note: A new RGW option `rgw_restore_debug_interval` is added, which when set will be considered as `Days` value (similar to `rgw_lc_debug_interval`). + +* **FAILED Restore**: In case the restore operation fails, + * The HEAD object will be updated accordingly.. i.e, Storage-class is reset to the original cloud-tier storage class + * All the new attrs added will be removed , except for `user.rgw.restore-status` which will be updated as `RestoreFailed` * **Check Restore Progress**: Users can issue S3 `head-object` request to check if the restore is done or still in progress for any object. * **RGW down/restarts** - Since the restore operation is asynchronous, we need to keep track of the objects being restored. In case RGW is down/restarts, this data will be used to retrigger on-going restore requests or do appropriate cleanup for the failed requests. -* **Compression** - If the placement-target to which the objects are being restored to has compression enabled, the data will be compressed accordingly (bug2294512) +* **Compression** - If the placement-target to which the objects are being restored to has compression enabled, the data will be compressed accordingly (bug2294512) -* **Encryption** - If the restored object is encrypted, the old sse-related xattrs/keys from the HEAD stub will be copied back into object metadata (bug2294512) +* **Encryption** - If the restored object is encrypted, the old sse-related xattrs/keys from the HEAD stub will be copied back into object metadata (bug2294512) * **Delete cloud object post restore** - Once the object is successfully restored, the object at the remote endpoint is still retained. However we could choose to delete it for permanent restored copies by adding new tier-config option. - ## Future work * **Bulk Restore**: In the case of BulkRestore, some of the objects may not be restored. User needs to manually cross-check the objects to check the objects restored or InProgress. @@ -124,4 +125,3 @@ Note: Incase the cloudtier storage-class is deleted/updated, the object may not * **Admin Ops** * **Restore Notifications** - diff --git a/src/rgw/driver/rados/rgw_rados.cc b/src/rgw/driver/rados/rgw_rados.cc index db8a4afba9d23..868a798b79ab4 100644 --- a/src/rgw/driver/rados/rgw_rados.cc +++ b/src/rgw/driver/rados/rgw_rados.cc @@ -5313,9 +5313,9 @@ int RGWRados::restore_obj_from_cloud(RGWLCCloudTierCtx& tier_ctx, ret = processor.complete(accounted_size, etag, &mtime, set_mtime, attrs, rgw::cksum::no_cksum, delete_at , nullptr, nullptr, nullptr, (rgw_zone_set *)&zone_set, &canceled, rctx, log_op ? rgw::sal::FLAG_LOG_OP : 0); - if (ret < 0) { - return ret; - } + if (ret < 0) { + return ret; + } // XXX: handle olh_epoch for versioned objects like in fetch_remote_obj return ret; diff --git a/src/rgw/driver/rados/rgw_zone.cc b/src/rgw/driver/rados/rgw_zone.cc index f9de570aa5445..7d5fe3bcb21b9 100644 --- a/src/rgw/driver/rados/rgw_zone.cc +++ b/src/rgw/driver/rados/rgw_zone.cc @@ -1355,6 +1355,20 @@ int RGWZoneGroupPlacementTier::update_params(const JSONFormattable& config) retain_head_object = false; } } + if (config.exists("allow_read_through")) { + string s = config["allow_read_through"]; + if (s == "true") { + allow_read_through = true; + } else { + allow_read_through = false; + } + } + if (config.exists("read_through_restore_days")) { + r = conf_to_uint64(config, "read_through_restore_days", &read_through_restore_days); + if (r < 0) { + read_through_restore_days = DEFAULT_READ_THROUGH_RESTORE_DAYS; + } + } if (tier_type == "cloud-s3") { r = t.s3.update_params(config); @@ -1368,6 +1382,12 @@ int RGWZoneGroupPlacementTier::clear_params(const JSONFormattable& config) if (config.exists("retain_head_object")) { retain_head_object = false; } + if (config.exists("allow_read_through")) { + allow_read_through = false; + } + if (config.exists("read_through_restore_days")) { + read_through_restore_days = DEFAULT_READ_THROUGH_RESTORE_DAYS; + } if (tier_type == "cloud-s3") { t.s3.clear_params(config); diff --git a/src/rgw/rgw_op.cc b/src/rgw/rgw_op.cc index b54805bdc7d4b..0ac92fe28777e 100644 --- a/src/rgw/rgw_op.cc +++ b/src/rgw/rgw_op.cc @@ -941,37 +941,131 @@ void handle_replication_status_header( } /* - * GET on CloudTiered objects is processed only when sent from the sync client. - * In all other cases, fail with `ERR_INVALID_OBJECT_STATE`. + * GET on CloudTiered objects either it will synced to other zones. + * In all other cases, it will try to fetch the object from remote cloud endpoint. */ -int handle_cloudtier_obj(rgw::sal::Attrs& attrs, bool sync_cloudtiered) { +int handle_cloudtier_obj(req_state* s, const DoutPrefixProvider *dpp, rgw::sal::Driver* driver, + rgw::sal::Attrs& attrs, bool sync_cloudtiered, std::optional days, + bool restore_op, optional_yield y) +{ int op_ret = 0; + ldpp_dout(dpp, 20) << "reached handle cloud tier " << dendl; auto attr_iter = attrs.find(RGW_ATTR_MANIFEST); - if (attr_iter != attrs.end()) { - RGWObjManifest m; - try { - decode(m, attr_iter->second); - if (m.get_tier_type() == "cloud-s3") { - if (!sync_cloudtiered) { - /* XXX: Instead send presigned redirect or read-through */ + if (attr_iter == attrs.end()) { + if (restore_op) { + op_ret = -ERR_INVALID_OBJECT_STATE; + s->err.message = "only cloud tier object can be restored"; + return op_ret; + } else { //ignore for read-through + return 0; + } + } + RGWObjManifest m; + try { + decode(m, attr_iter->second); + if (m.get_tier_type() != "cloud-s3") { + ldpp_dout(dpp, 20) << "not a cloud tier object " << s->object->get_key().name << dendl; + if (restore_op) { + op_ret = -ERR_INVALID_OBJECT_STATE; + s->err.message = "only cloud tier object can be restored"; + return op_ret; + } else { //ignore for read-through + return 0; + } + } + RGWObjTier tier_config; + m.get_tier_config(&tier_config); + if (sync_cloudtiered) { + bufferlist t, t_tier; + t.append("cloud-s3"); + attrs[RGW_ATTR_CLOUD_TIER_TYPE] = t; + encode(tier_config, t_tier); + attrs[RGW_ATTR_CLOUD_TIER_CONFIG] = t_tier; + return op_ret; + } + attr_iter = attrs.find(RGW_ATTR_RESTORE_STATUS); + rgw::sal::RGWRestoreStatus restore_status = rgw::sal::RGWRestoreStatus::None; + if (attr_iter != attrs.end()) { + bufferlist bl = attr_iter->second; + auto iter = bl.cbegin(); + decode(restore_status, iter); + } + if (attr_iter == attrs.end() || restore_status == rgw::sal::RGWRestoreStatus::RestoreFailed) { + // first time restore or previous restore failed + rgw::sal::Bucket* pbucket = NULL; + pbucket = s->bucket.get(); + + std::unique_ptr tier; + rgw_placement_rule target_placement; + target_placement.inherit_from(pbucket->get_placement_rule()); + attr_iter = attrs.find(RGW_ATTR_STORAGE_CLASS); + if (attr_iter != attrs.end()) { + target_placement.storage_class = attr_iter->second.to_str(); + } + op_ret = driver->get_zone()->get_zonegroup().get_placement_tier(target_placement, &tier); + ldpp_dout(dpp, 20) << "getting tier placement handle cloud tier" << op_ret << + " storage class " << target_placement.storage_class << dendl; + if (op_ret < 0) { + s->err.message = "failed to restore object"; + return op_ret; + } + rgw::sal::RadosPlacementTier* rtier = static_cast(tier.get()); + tier_config.tier_placement = rtier->get_rt(); + if (!restore_op) { + if (tier_config.tier_placement.allow_read_through) { + days = tier_config.tier_placement.read_through_restore_days; + } else { //read-through is not enabled op_ret = -ERR_INVALID_OBJECT_STATE; - } else { // fetch object for sync and set cloud_tier attrs - bufferlist t, t_tier; - RGWObjTier tier_config; - m.get_tier_config(&tier_config); - - t.append("cloud-s3"); - attrs[RGW_ATTR_CLOUD_TIER_TYPE] = t; - encode(tier_config, t_tier); - attrs[RGW_ATTR_CLOUD_TIER_CONFIG] = t_tier; + s->err.message = "Read through is not enabled for this config"; + return op_ret; } } - } catch (const buffer::end_of_buffer&) { - // ignore empty manifest; it's not cloud-tiered - } catch (const std::exception& e) { + // fill in the entry. XXX: Maybe we can avoid it by passing only necessary params + rgw_bucket_dir_entry ent; + ent.key.name = s->object->get_key().name; + ent.meta.accounted_size = ent.meta.size = s->obj_size; + ent.meta.etag = "" ; + ceph::real_time mtime = s->object->get_mtime(); + uint64_t epoch = 0; + op_ret = get_system_versioning_params(s, &epoch, NULL); + ldpp_dout(dpp, 20) << "getting versioning params tier placement handle cloud tier" << op_ret << dendl; + if (op_ret < 0) { + ldpp_dout(dpp, 20) << "failed to get versioning params, op_ret = " << op_ret << dendl; + s->err.message = "failed to restore object"; + return op_ret; + } + op_ret = s->object->restore_obj_from_cloud(pbucket, tier.get(), target_placement, ent, s->cct, tier_config, + mtime, epoch, days, dpp, y, s->bucket->get_info().flags); + if (op_ret < 0) { + ldpp_dout(dpp, 0) << "object " << ent.key.name << " fetching failed" << op_ret << dendl; + s->err.message = "failed to restore object"; + return op_ret; + } + ldpp_dout(dpp, 20) << "object " << ent.key.name << " fetching succeed" << dendl; + /* Even if restore is complete the first read through request will return but actually downloaded + * object asyncronously. + */ + if (!restore_op) { //read-through + op_ret = -ERR_REQUEST_TIMEOUT; + ldpp_dout(dpp, 5) << "restore is still in progress, please check restore status and retry" << dendl; + s->err.message = "restore is still in progress"; + } + return op_ret; + } else if ((!restore_op) && (restore_status == rgw::sal::RGWRestoreStatus::RestoreAlreadyInProgress)) { + op_ret = -ERR_REQUEST_TIMEOUT; + ldpp_dout(dpp, 5) << "restore is still in progress, please check restore status and retry" << dendl; + s->err.message = "restore is still in progress"; + } else { // CloudRestored..return success + return 0; + } + } catch (const buffer::end_of_buffer&) { + //empty manifest; it's not cloud-tiered + if (restore_op) { + op_ret = -ERR_INVALID_OBJECT_STATE; + s->err.message = "only cloud tier object can be restored"; } + } catch (const std::exception& e) { } - return op_ret; } @@ -2366,15 +2460,12 @@ void RGWGetObj::execute(optional_yield y) } catch (const buffer::error&) {} } - if (get_type() == RGW_OP_GET_OBJ && get_data) { - op_ret = handle_cloudtier_obj(attrs, sync_cloudtiered); + std::optional days; + op_ret = handle_cloudtier_obj(s, this, driver, attrs, sync_cloudtiered, days, false, y); if (op_ret < 0) { ldpp_dout(this, 4) << "Cannot get cloud tiered object: " << *s->object - <<". Failing with " << op_ret << dendl; - if (op_ret == -ERR_INVALID_OBJECT_STATE) { - s->err.message = "This object was transitioned to cloud-s3"; - } + <<". Failing with " << op_ret << dendl; goto done_err; } } diff --git a/src/rgw/rgw_zone.cc b/src/rgw/rgw_zone.cc index 8d8b44cd96155..1acaf9b3d4fb3 100644 --- a/src/rgw/rgw_zone.cc +++ b/src/rgw/rgw_zone.cc @@ -860,6 +860,8 @@ void RGWZoneGroupPlacementTier::decode_json(JSONObj *obj) JSONDecoder::decode_json("tier_type", tier_type, obj); JSONDecoder::decode_json("storage_class", storage_class, obj); JSONDecoder::decode_json("retain_head_object", retain_head_object, obj); + JSONDecoder::decode_json("allow_read_through", allow_read_through, obj); + JSONDecoder::decode_json("read_through_restore_days", read_through_restore_days, obj); if (tier_type == "cloud-s3") { JSONDecoder::decode_json("s3", t.s3, obj); @@ -897,6 +899,8 @@ void RGWZoneGroupPlacementTier::dump(Formatter *f) const encode_json("tier_type", tier_type, f); encode_json("storage_class", storage_class, f); encode_json("retain_head_object", retain_head_object, f); + encode_json("allow_read_through", allow_read_through, f); + encode_json("read_through_restore_days", read_through_restore_days, f); if (tier_type == "cloud-s3") { encode_json("s3", t.s3, f); diff --git a/src/rgw/rgw_zone_types.h b/src/rgw/rgw_zone_types.h index 13fce000c4124..d44761d7f5a95 100644 --- a/src/rgw/rgw_zone_types.h +++ b/src/rgw/rgw_zone_types.h @@ -543,9 +543,13 @@ struct RGWZoneGroupPlacementTierS3 { WRITE_CLASS_ENCODER(RGWZoneGroupPlacementTierS3) struct RGWZoneGroupPlacementTier { +#define DEFAULT_READ_THROUGH_RESTORE_DAYS 1 + std::string tier_type; std::string storage_class; bool retain_head_object = false; + bool allow_read_through = false; + uint64_t read_through_restore_days = 1; struct _tier { RGWZoneGroupPlacementTierS3 s3; @@ -555,10 +559,12 @@ struct RGWZoneGroupPlacementTier { int clear_params(const JSONFormattable& config); void encode(bufferlist& bl) const { - ENCODE_START(1, 1, bl); + ENCODE_START(2, 1, bl); encode(tier_type, bl); encode(storage_class, bl); encode(retain_head_object, bl); + encode(allow_read_through, bl); + encode(read_through_restore_days, bl); if (tier_type == "cloud-s3") { encode(t.s3, bl); } @@ -566,10 +572,14 @@ struct RGWZoneGroupPlacementTier { } void decode(bufferlist::const_iterator& bl) { - DECODE_START(1, bl); + DECODE_START(2, bl); decode(tier_type, bl); decode(storage_class, bl); decode(retain_head_object, bl); + if (struct_v >= 2) { + decode(allow_read_through, bl); + decode(read_through_restore_days, bl); + } if (tier_type == "cloud-s3") { decode(t.s3, bl); }